Date: Wed, 9 Jun 2010 14:41:34 -0700 From: Brian Somers <brian@FreeBSD.org> To: rhfb@akira.stdio.com Cc: freebsd-hackers@FreeBSD.org Subject: Re: NFSD lockup running ESXi 4 Message-ID: <20100609144134.40c7393d@dev.lan.Awfulhak.org> In-Reply-To: <20100609175244.9769650815@akira.stdio.com> References: <20100609175244.9769650815@akira.stdio.com>
next in thread | previous in thread | raw e-mail | index | archive | help
--Sig_/ZHWSRWOJPjYYgrh__P5RSBx Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: quoted-printable On Wed, 9 Jun 2010 13:52:40 -0400 (EDT) rhfb@akira.stdio.com wrote: > I have an AMD64 FreeBSD 8.0 running 8-Stable from around 2010/04/25 19:13= :08. >=20 > ZFS disk, Nfsd flags "-t -n 16", private network exclusive for nfs networ= k, > not using jumbo frames, HZ=3D1000, Device_Polling, Zero_Copy_Sockets, and= the > following sysctl options: > net.inet.tcp.recvspace=3D232140 > net.inet.tcp.sendspace=3D232140 > net.inet.tcp.slowstart_flightsize=3D159 > net.inet.tcp.mssdflt=3D1460 >=20 > FreeBSD 6 TB zpool, nfs from Three ESXi 4 (newest patch level 193498) > working reliably for months. >=20 > Added a new ESXi, patched to the newest (Post Update 1) patch level 25696= 8. > Added a bunch of VM's, booted them all into the 2008 R2 Server install DV= D. > Then when attempting to do the installs (in parallel/simultaneously) I st= arted > getting the NFS server locking up. NFSD would wedge at 100% CPU in "rc_l= o" > which I presume is rc_lock? Once wedged, /etc/rc.d/nfsd restart can't ki= ll > nfsd. So a reboot is required. A Reboot causes all my active VM's with > pending disk writes to have disk errors in the VM (10 second default time= out > for disk writes in the VM.) This was very reproducable. >=20 > Has anyone noticed this problem? Is this an ESXi problem with the newest > updates? Is this a problem with NFS on FreeBSD 8? I don't know if it's relevant, but I've been having nfs issues on -current. I believe they were caused by gam_server, a gnome program running on an NFS client machine that had /usr/ports nfs mounted and was doing a portupgr= ade. Nothing gnomeish should have been anywhere near /usr/ports, but analysis showed huge numbers of NFS stats against /usr/ports/distfiles/*, restat'ing the same files over and over. nfsd was going crazy on the server and gam_server was clocking up wads of CPU time on the client. FreeBSD-9 kernels prior to around June 6 were freezing on me. It may have been because of the nfsd activity, but I didn't investigate the freeze... Perhaps looking for changes that might might affect nfsd stability in the w= eek prior to June 6 might discover a fix? --=20 Brian Somers <brian@Awfulhak.org> Don't _EVER_ lose your sense of humour ! <brian@FreeBSD.org> --Sig_/ZHWSRWOJPjYYgrh__P5RSBx Content-Type: application/pgp-signature; name=signature.asc Content-Disposition: attachment; filename=signature.asc -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.14 (FreeBSD) iQCVAwUBTBAKkw7tvOdmanQhAQKGjAP8CCGA5Rn65QmLFMZj1MEqlQjlHt8NeTM3 +HcIfvsMCYVrvDka/1e5MpN42cby+XTEfpW1IE2Ja2Y4xQ0Cv4C0txqi5S+uxzGM Z1Q0kw1ZB43JhI6sQHZcefsquwg6gHnmLPGJJkujxrRvmhVyKd5Zx7hTe+7lz/KS s7Ydpe3b3Gs= =a4IS -----END PGP SIGNATURE----- --Sig_/ZHWSRWOJPjYYgrh__P5RSBx--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20100609144134.40c7393d>