From owner-freebsd-hackers@FreeBSD.ORG Wed Jun 9 21:56:44 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 638111065670 for ; Wed, 9 Jun 2010 21:56:44 +0000 (UTC) (envelope-from prvs=1769b78111=brian@FreeBSD.org) Received: from idcmail-mo2no.shaw.ca (idcmail-mo2no.shaw.ca [64.59.134.9]) by mx1.freebsd.org (Postfix) with ESMTP id 285D38FC18 for ; Wed, 9 Jun 2010 21:56:43 +0000 (UTC) Received: from pd5ml2no-ssvc.prod.shaw.ca ([10.0.153.164]) by pd6mo1no-svcs.prod.shaw.ca with ESMTP; 09 Jun 2010 15:41:43 -0600 X-Cloudmark-SP-Filtered: true X-Cloudmark-SP-Result: v=1.0 c=1 a=mI6YO6ZdSLUA:10 a=VphdPIyG4kEA:10 a=MJPcHhXccCG8eBs0us8XwA==:17 a=XhyYmjemAAAA:8 a=MMwg4So0AAAA:8 a=6I5d2MoRAAAA:8 a=pH9-y6s6C3R8sN3pzFEA:9 a=PfOooPt5BuJR3buouWJKG21W8yEA:4 a=CjuIK1q_8ugA:10 a=WJ3hkfHDukgA:10 a=SV7veod9ZcQA:10 a=UK7eymURZsQolFD8HA8A:9 a=JswPGclowjEDmHJ2xqQRFI7YW_UA:4 Received: from unknown (HELO store.lan.Awfulhak.org) ([70.79.162.198]) by pd5ml2no-dmz.prod.shaw.ca with ESMTP; 09 Jun 2010 15:41:43 -0600 Received: from store.lan.Awfulhak.org (localhost.localdomain [127.0.0.1]) by localhost (Email Security Appliance) with SMTP id 0EA4DC433AF_C100A97B; Wed, 9 Jun 2010 21:41:43 +0000 (GMT) Received: from gw.Awfulhak.org (gw.lan.Awfulhak.org [172.16.0.1]) by store.lan.Awfulhak.org (Sophos Email Appliance) with ESMTP id BFFE4C460F6_C100A94F; Wed, 9 Jun 2010 21:41:40 +0000 (GMT) Received: from dev.lan.Awfulhak.org (brian@dev.lan.Awfulhak.org [172.16.0.5]) by gw.Awfulhak.org (8.14.4/8.14.4) with ESMTP id o59Lfe13041715; Wed, 9 Jun 2010 14:41:40 -0700 (PDT) (envelope-from brian@FreeBSD.org) Date: Wed, 9 Jun 2010 14:41:34 -0700 From: Brian Somers To: rhfb@akira.stdio.com Message-ID: <20100609144134.40c7393d@dev.lan.Awfulhak.org> In-Reply-To: <20100609175244.9769650815@akira.stdio.com> References: <20100609175244.9769650815@akira.stdio.com> X-Mailer: Claws Mail 3.7.6 (GTK+ 2.20.1; i386-portbld-freebsd9.0) Mime-Version: 1.0 Content-Type: multipart/signed; micalg=PGP-SHA1; boundary="Sig_/ZHWSRWOJPjYYgrh__P5RSBx"; protocol="application/pgp-signature" Cc: freebsd-hackers@FreeBSD.org Subject: Re: NFSD lockup running ESXi 4 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 09 Jun 2010 21:56:44 -0000 --Sig_/ZHWSRWOJPjYYgrh__P5RSBx Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: quoted-printable On Wed, 9 Jun 2010 13:52:40 -0400 (EDT) rhfb@akira.stdio.com wrote: > I have an AMD64 FreeBSD 8.0 running 8-Stable from around 2010/04/25 19:13= :08. >=20 > ZFS disk, Nfsd flags "-t -n 16", private network exclusive for nfs networ= k, > not using jumbo frames, HZ=3D1000, Device_Polling, Zero_Copy_Sockets, and= the > following sysctl options: > net.inet.tcp.recvspace=3D232140 > net.inet.tcp.sendspace=3D232140 > net.inet.tcp.slowstart_flightsize=3D159 > net.inet.tcp.mssdflt=3D1460 >=20 > FreeBSD 6 TB zpool, nfs from Three ESXi 4 (newest patch level 193498) > working reliably for months. >=20 > Added a new ESXi, patched to the newest (Post Update 1) patch level 25696= 8. > Added a bunch of VM's, booted them all into the 2008 R2 Server install DV= D. > Then when attempting to do the installs (in parallel/simultaneously) I st= arted > getting the NFS server locking up. NFSD would wedge at 100% CPU in "rc_l= o" > which I presume is rc_lock? Once wedged, /etc/rc.d/nfsd restart can't ki= ll > nfsd. So a reboot is required. A Reboot causes all my active VM's with > pending disk writes to have disk errors in the VM (10 second default time= out > for disk writes in the VM.) This was very reproducable. >=20 > Has anyone noticed this problem? Is this an ESXi problem with the newest > updates? Is this a problem with NFS on FreeBSD 8? I don't know if it's relevant, but I've been having nfs issues on -current. I believe they were caused by gam_server, a gnome program running on an NFS client machine that had /usr/ports nfs mounted and was doing a portupgr= ade. Nothing gnomeish should have been anywhere near /usr/ports, but analysis showed huge numbers of NFS stats against /usr/ports/distfiles/*, restat'ing the same files over and over. nfsd was going crazy on the server and gam_server was clocking up wads of CPU time on the client. FreeBSD-9 kernels prior to around June 6 were freezing on me. It may have been because of the nfsd activity, but I didn't investigate the freeze... Perhaps looking for changes that might might affect nfsd stability in the w= eek prior to June 6 might discover a fix? --=20 Brian Somers Don't _EVER_ lose your sense of humour ! --Sig_/ZHWSRWOJPjYYgrh__P5RSBx Content-Type: application/pgp-signature; name=signature.asc Content-Disposition: attachment; filename=signature.asc -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.14 (FreeBSD) iQCVAwUBTBAKkw7tvOdmanQhAQKGjAP8CCGA5Rn65QmLFMZj1MEqlQjlHt8NeTM3 +HcIfvsMCYVrvDka/1e5MpN42cby+XTEfpW1IE2Ja2Y4xQ0Cv4C0txqi5S+uxzGM Z1Q0kw1ZB43JhI6sQHZcefsquwg6gHnmLPGJJkujxrRvmhVyKd5Zx7hTe+7lz/KS s7Ydpe3b3Gs= =a4IS -----END PGP SIGNATURE----- --Sig_/ZHWSRWOJPjYYgrh__P5RSBx--