Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 9 Jun 2010 14:41:34 -0700
From:      Brian Somers <brian@FreeBSD.org>
To:        rhfb@akira.stdio.com
Cc:        freebsd-hackers@FreeBSD.org
Subject:   Re: NFSD lockup running ESXi 4
Message-ID:  <20100609144134.40c7393d@dev.lan.Awfulhak.org>
In-Reply-To: <20100609175244.9769650815@akira.stdio.com>
References:  <20100609175244.9769650815@akira.stdio.com>

next in thread | previous in thread | raw e-mail | index | archive | help
--Sig_/ZHWSRWOJPjYYgrh__P5RSBx
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: quoted-printable

On Wed,  9 Jun 2010 13:52:40 -0400 (EDT) rhfb@akira.stdio.com wrote:
> I have an AMD64 FreeBSD 8.0 running 8-Stable from around 2010/04/25 19:13=
:08.
>=20
> ZFS disk, Nfsd flags "-t -n 16", private network exclusive for nfs networ=
k,
> not using jumbo frames, HZ=3D1000, Device_Polling, Zero_Copy_Sockets, and=
 the
> following sysctl options:
> net.inet.tcp.recvspace=3D232140
> net.inet.tcp.sendspace=3D232140
> net.inet.tcp.slowstart_flightsize=3D159
> net.inet.tcp.mssdflt=3D1460
>=20
> FreeBSD 6 TB zpool, nfs from Three ESXi 4 (newest patch level 193498)
> working reliably for months.
>=20
> Added a new ESXi, patched to the newest (Post Update 1) patch level 25696=
8.
> Added a bunch of VM's, booted them all into the 2008 R2 Server install DV=
D.
> Then when attempting to do the installs (in parallel/simultaneously) I st=
arted
> getting the NFS server locking up.  NFSD would wedge at 100% CPU in "rc_l=
o"
> which I presume is rc_lock?  Once wedged, /etc/rc.d/nfsd restart can't ki=
ll
> nfsd.  So a reboot is required.  A Reboot causes all my active VM's with
> pending disk writes to have disk errors in the VM (10 second default time=
out
> for disk writes in the VM.)  This was very reproducable.
>=20
> Has anyone noticed this problem?  Is this an ESXi problem with the newest
> updates?  Is this a problem with NFS on FreeBSD 8?

I don't know if it's relevant, but I've been having nfs issues on -current.
I believe they were caused by gam_server, a gnome program running on an
NFS client machine that had /usr/ports nfs mounted and was doing a portupgr=
ade.
Nothing gnomeish should have been anywhere near /usr/ports, but analysis
showed huge numbers of NFS stats against /usr/ports/distfiles/*, restat'ing
the same files over and over.  nfsd was going crazy on the server and
gam_server was clocking up wads of CPU time on the client.

FreeBSD-9 kernels prior to around June 6 were freezing on me.  It may have
been because of the nfsd activity, but I didn't investigate the freeze...

Perhaps looking for changes that might might affect nfsd stability in the w=
eek
prior to June 6 might discover a fix?

--=20
Brian Somers                                          <brian@Awfulhak.org>
Don't _EVER_ lose your sense of humour !               <brian@FreeBSD.org>

--Sig_/ZHWSRWOJPjYYgrh__P5RSBx
Content-Type: application/pgp-signature; name=signature.asc
Content-Disposition: attachment; filename=signature.asc

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.14 (FreeBSD)

iQCVAwUBTBAKkw7tvOdmanQhAQKGjAP8CCGA5Rn65QmLFMZj1MEqlQjlHt8NeTM3
+HcIfvsMCYVrvDka/1e5MpN42cby+XTEfpW1IE2Ja2Y4xQ0Cv4C0txqi5S+uxzGM
Z1Q0kw1ZB43JhI6sQHZcefsquwg6gHnmLPGJJkujxrRvmhVyKd5Zx7hTe+7lz/KS
s7Ydpe3b3Gs=
=a4IS
-----END PGP SIGNATURE-----

--Sig_/ZHWSRWOJPjYYgrh__P5RSBx--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20100609144134.40c7393d>