FreeBSD Mail Archives

Date:      Thu, 14 Mar 2013 19:22:39 +0200
From:      Konstantin Belousov <kostikbel@gmail.com>
To:        John Baldwin <jhb@freebsd.org>
Cc:        Rick Macklem <rmacklem@freebsd.org>, fs@freebsd.org
Subject:   Re: Deadlock in the NFS client
Message-ID:  <20130314172239.GL3794@kib.kiev.ua>
In-Reply-To: <201303141057.13609.jhb@freebsd.org>
References:  <201303131356.37919.jhb@freebsd.org> <492562517.3880600.1363217615412.JavaMail.root@erie.cs.uoguelph.ca> <20130314092728.GI3794@kib.kiev.ua> <201303141057.13609.jhb@freebsd.org>


--f78grIEC2xXSYamv
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Thu, Mar 14, 2013 at 10:57:13AM -0400, John Baldwin wrote:
> On Thursday, March 14, 2013 5:27:28 am Konstantin Belousov wrote:
> > On Wed, Mar 13, 2013 at 07:33:35PM -0400, Rick Macklem wrote:
> > > John Baldwin wrote:
> > > > I ran into a machine that had a deadlock among certain files on a
> > > > given NFS
> > > > mount today. I'm not sure how best to resolve it, though it seems l=
ike
> > > > perhaps there is a bug with how the pool of nfsiod threads is manag=
ed.
> > > > Anyway, more details on the actual hang below. This was on 8.x with
> > > > the
> > > > old NFS client, but I don't see anything in HEAD that would fix thi=
s.
> > > >=20
> > > > First note that the system was idle so it had dropped down to only =
one
> > > > nfsiod thread.
> > > >=20
> > > Hmm, I see the problem and I'm a bit surprised it doesn't bite more o=
ften.
> > > It seems to me that this snippet of code from nfs_asyncio() makes too
> > > weak an assumption:
> > > 	/*
> > > 	 * If none are free, we may already have an iod working on this mount
> > > 	 * point.  If so, it will process our request.
> > > 	 */
> > > 	if (!gotiod) {
> > > 		if (nmp->nm_bufqiods > 0) {
> > > 			NFS_DPF(ASYNCIO,
> > > 		("nfs_asyncio: %d iods are already processing mount %p\n",
> > > 				 nmp->nm_bufqiods, nmp));
> > > 			gotiod =3D TRUE;
> > > 		}
> > > 	}
> > > It assumes that, since an nfsiod thread is processing some buffer for=
 the
> > > mount, it will become available to do this one, which isn't true for =
your
> > > deadlock.
> > >=20
> > > I think the simple fix would be to recode nfs_asyncio() so that
> > > it only returns 0 if it finds an AVAILABLE nfsiod thread that it
> > > has assigned to do the I/O, getting rid of the above. The problem
> > > with doing this is that it may result in a lot more synchronous I/O
> > > (nfs_asyncio() returns EIO, so the caller does the I/O). Maybe more
> > > synchronous I/O could be avoided by allowing nfs_asyncio() to create a
> > > new thread even if the total is above nfs_iodmax. (I think this would
> > > require the fixed array to be replaced with a linked list and might
> > > result in a large number of nfsiod threads.) Maybe just having a large
> > > nfs_iodmax would be an adequate compromise?
> > >
> > > Does having a large # of nfsiod threads cause any serious problem for
> > > most systems these days?
> > >
> > > I'd be tempted to recode nfs_asyncio() as above and then, instead
> > > of nfs_iodmin and nfs_iodmax, I'd simply have: - a fixed number of
> > > nfsiod threads (this could be a tunable, with the understanding that
> > > it should be large for good performance)
> > >
> >=20
> > I do not see how this would solve the deadlock itself. The proposal wou=
ld
> > only allow system to survive slightly longer after the deadlock appeare=
d.
> > And, I think that allowing the unbound amount of nfsiod threads is also
> > fatal.
> >=20
> > The issue there is the LOR between buffer lock and vnode lock. Buffer l=
ock
> > always must come after the vnode lock. The problematic nfsiod thread, w=
hich
> > locks the vnode, volatile this rule, because despite the LK_KERNPROC
> > ownership of the buffer lock, it is the thread which de fact owns the
> > buffer (only the thread can unlock it).
> >=20
> > A possible solution would be to pass LK_NOWAIT to nfs_nget() from the
> > nfs_readdirplusrpc(). From my reading of the code, nfs_nget() should
> > be capable of correctly handling the lock failure. And EBUSY would
> > result in doit =3D 0, which should be fine too.
> >=20
> > It is possible that EBUSY should be reset to 0, though.
>=20
> Yes, thinking about this more, I do think the right answer is for
> readdirplus to do this. The only question I have is if it should do
> this always, or if it should do this only from the nfsiod thread. I
> believe you can't get this in the non-nfsiod case.

I agree that it looks as of the workaround only needed for nfsiod thread.
On the other hand, it is not immediately obvious how to detect that
the current thread is nfsio daemon. Probably a thread flag should be
set.

--f78grIEC2xXSYamv
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (FreeBSD)

iQIcBAEBAgAGBQJRQgdfAAoJEJDCuSvBvK1BJ3AQAJv801LkZRl8NG2tgvt1jm41
o0IcdC0+frAC6yRZQsmwkINJEFhojf5cozWobyBOYzjoC8SKJIJ4JfC82b7tZIgz
3KnRd0cspkuTrM5WbAkKL/MHzZmHAZkNs4VJ2z/Ov+pqedlq+HlecYbH9PUxG7+e
ZFSVTIhSP17pHeLFamR4eVKuJxC9H723ca//h5tgAWHu/PBimHOWdT6URJ6C1N/A
QuTZqkKSJ8HNkO+DPU89h5wC1IpDXwni+YY5M5rfbc9eisogeQ3k3KW4Jv28oDEe
VpPIhSLRZwF/nL/0tn0ha1s62XnAyFYT3r7j1pnFTRssdR0/llB8A3y9vRjkXsaX
uRoKholt57JZ7NsbR+yE8CrWQxBePx5cTaxU7k/42eqwm0JGPMiHaq8DChGQuC/m
tWaxtva48A0jL37ND+w/mifl/Bmul1s+U6VNwuZ732SQyHCTabTqV7t5InPbRL12
JTc+OOui85MU3wvigVUCKxenp181Hx1No+QXKFVeesUU1NENZWlph5+DxTy/UynB
G0v73kO70q1Rs6hQfQPRburyO2t6TdDVTfzpJNeoFR7mPfQ77uRqPWMxaQD6wlPw
v7ZZqH2H1adJltP5tzgxGuRDSmjRxcnKvCgICCLlK3n/JneLSmiFDOeOwRqf7G0g
1OA/JWbKedASOThXMHrD
=P2wk
-----END PGP SIGNATURE-----

--f78grIEC2xXSYamv--

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20130314172239.GL3794>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation