From owner-freebsd-hackers@FreeBSD.ORG Sun Aug 8 23:58:11 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 04DF21065672 for ; Sun, 8 Aug 2010 23:58:11 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id B70168FC15 for ; Sun, 8 Aug 2010 23:58:10 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Ar8FADXhXkyDaFvO/2dsb2JhbACDFZA9jWOwe5BShEdzBIk7 X-IronPort-AV: E=Sophos;i="4.55,339,1278302400"; d="scan'208";a="89847814" Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.206]) by esa-jnhn-pri.mail.uoguelph.ca with ESMTP; 08 Aug 2010 19:58:09 -0400 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id B03CAB3E96; Sun, 8 Aug 2010 19:58:09 -0400 (EDT) Date: Sun, 8 Aug 2010 19:58:09 -0400 (EDT) From: Rick Macklem To: rhfb@akira.stdio.com Message-ID: <282423324.419135.1281311889612.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: <20100729094046.AD3F3C2@akira.stdio.com> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_Part_419134_272684746.1281311889610" X-Originating-IP: [24.65.230.102] X-Mailer: Zimbra 6.0.7_GA_2476.RHEL4 (ZimbraWebClient - SAF3 (Mac)/6.0.7_GA_2473.RHEL4_64) Cc: freebsd-hackers@freebsd.org, dfr@freebsd.org Subject: Re: NFS server hangs (was no subject) X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 08 Aug 2010 23:58:11 -0000 ------=_Part_419134_272684746.1281311889610 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit > I have a similar problem. > > I have a NFS server (8.0 upgraded a couple times since Feb 2010) that > locks up > and requires a reboot. > > The clients are busy vm's from VMWare ESXi using the NFS server for > vmdk virtual > disk storage. > > The ESXi reports nfs server inactive and all the vm's post disk write > errors when > trying to write to their disk. > > /etc/rc.d/nfsd restart fails to work (it can not kill the nfsd > process) > > The nfsd process runs at 100% cpu at rc_lo state in top. > > reboot is the only fix. > > It has only happened under two circumstances. > 1) Installation of a VM using Windows 2008. > 2) Migrating 16 million mail messages from a physical server to a VM > running FreeBSD with ZFS file system as a VM on the ESXi box that uses > NFS to store the VM's ZFS disk. > > The NFS server uses ZFS also. I don't think what you are seeing is the same as what others have reported. (I have a hunch that your problem might be a replay cache problem.) Please try the attached patch and make sure that your sys/rpc/svc.c is at r205562 (upgrade if it isn't). If this patch doesn't help, you could try using the experimental nfs server (which doesn't use the generic replay cache), by adding "-e" to mountd and nfsd. Please let me know if the patch or switching to the experimental nfs server helps, rick ------=_Part_419134_272684746.1281311889610 Content-Type: text/x-patch; name=replay.patch Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename=replay.patch LS0tIHJwYy9yZXBsYXkuYy5zYXYJMjAxMC0wOC0wOCAxODowNTo1MC4wMDAwMDAwMDAgLTA0MDAK KysrIHJwYy9yZXBsYXkuYwkyMDEwLTA4LTA4IDE4OjE2OjQzLjAwMDAwMDAwMCAtMDQwMApAQCAt OTAsOCArOTAsMTAgQEAKIHJlcGxheV9zZXRzaXplKHN0cnVjdCByZXBsYXlfY2FjaGUgKnJjLCBz aXplX3QgbmV3bWF4c2l6ZSkKIHsKIAorCW10eF9sb2NrKCZyYy0+cmNfbG9jayk7CiAJcmMtPnJj X21heHNpemUgPSBuZXdtYXhzaXplOwogCXJlcGxheV9wcnVuZShyYyk7CisJbXR4X3VubG9jaygm cmMtPnJjX2xvY2spOwogfQogCiB2b2lkCkBAIC0xNDQsOCArMTQ2LDggQEAKIAlib29sX3QgZnJl ZWRfb25lOwogCiAJaWYgKHJjLT5yY19jb3VudCA+PSBSRVBMQVlfTUFYIHx8IHJjLT5yY19zaXpl ID4gcmMtPnJjX21heHNpemUpIHsKLQkJZnJlZWRfb25lID0gRkFMU0U7CiAJCWRvIHsKKwkJCWZy ZWVkX29uZSA9IEZBTFNFOwogCQkJLyoKIAkJCSAqIFRyeSB0byBmcmVlIGFuIGVudHJ5LiBEb24n dCBmcmVlIGluLXByb2dyZXNzIGVudHJpZXMKIAkJCSAqLwo= ------=_Part_419134_272684746.1281311889610--