From owner-freebsd-fs@FreeBSD.ORG Thu Mar 14 09:27:37 2013 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id C61645E9; Thu, 14 Mar 2013 09:27:37 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) by mx1.freebsd.org (Postfix) with ESMTP id 213F4E0B; Thu, 14 Mar 2013 09:27:36 +0000 (UTC) Received: from tom.home (kostik@localhost [127.0.0.1]) by kib.kiev.ua (8.14.6/8.14.6) with ESMTP id r2E9RSba039426; Thu, 14 Mar 2013 11:27:28 +0200 (EET) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.8.0 kib.kiev.ua r2E9RSba039426 Received: (from kostik@localhost) by tom.home (8.14.6/8.14.6/Submit) id r2E9RS7p039425; Thu, 14 Mar 2013 11:27:28 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Thu, 14 Mar 2013 11:27:28 +0200 From: Konstantin Belousov To: Rick Macklem Subject: Re: Deadlock in the NFS client Message-ID: <20130314092728.GI3794@kib.kiev.ua> References: <201303131356.37919.jhb@freebsd.org> <492562517.3880600.1363217615412.JavaMail.root@erie.cs.uoguelph.ca> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="A+YNfBJfL1GjoVjN" Content-Disposition: inline In-Reply-To: <492562517.3880600.1363217615412.JavaMail.root@erie.cs.uoguelph.ca> User-Agent: Mutt/1.5.21 (2010-09-15) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no version=3.3.2 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on tom.home Cc: Rick Macklem , fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 14 Mar 2013 09:27:37 -0000 --A+YNfBJfL1GjoVjN Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Wed, Mar 13, 2013 at 07:33:35PM -0400, Rick Macklem wrote: > John Baldwin wrote: > > I ran into a machine that had a deadlock among certain files on a > > given NFS > > mount today. I'm not sure how best to resolve it, though it seems like > > perhaps there is a bug with how the pool of nfsiod threads is managed. > > Anyway, more details on the actual hang below. This was on 8.x with > > the > > old NFS client, but I don't see anything in HEAD that would fix this. > >=20 > > First note that the system was idle so it had dropped down to only one > > nfsiod thread. > >=20 > Hmm, I see the problem and I'm a bit surprised it doesn't bite more often. > It seems to me that this snippet of code from nfs_asyncio() makes too > weak an assumption: > /* > * If none are free, we may already have an iod working on this mount > * point. If so, it will process our request. > */ > if (!gotiod) { > if (nmp->nm_bufqiods > 0) { > NFS_DPF(ASYNCIO, > ("nfs_asyncio: %d iods are already processing mount %p\n", > nmp->nm_bufqiods, nmp)); > gotiod =3D TRUE; > } > } > It assumes that, since an nfsiod thread is processing some buffer for the > mount, it will become available to do this one, which isn't true for your > deadlock. >=20 > I think the simple fix would be to recode nfs_asyncio() so that > it only returns 0 if it finds an AVAILABLE nfsiod thread that it > has assigned to do the I/O, getting rid of the above. The problem > with doing this is that it may result in a lot more synchronous I/O > (nfs_asyncio() returns EIO, so the caller does the I/O). Maybe more > synchronous I/O could be avoided by allowing nfs_asyncio() to create a > new thread even if the total is above nfs_iodmax. (I think this would > require the fixed array to be replaced with a linked list and might > result in a large number of nfsiod threads.) Maybe just having a large > nfs_iodmax would be an adequate compromise? > > Does having a large # of nfsiod threads cause any serious problem for > most systems these days? > > I'd be tempted to recode nfs_asyncio() as above and then, instead > of nfs_iodmin and nfs_iodmax, I'd simply have: - a fixed number of > nfsiod threads (this could be a tunable, with the understanding that > it should be large for good performance) > I do not see how this would solve the deadlock itself. The proposal would only allow system to survive slightly longer after the deadlock appeared. And, I think that allowing the unbound amount of nfsiod threads is also fatal. The issue there is the LOR between buffer lock and vnode lock. Buffer lock always must come after the vnode lock. The problematic nfsiod thread, which locks the vnode, volatile this rule, because despite the LK_KERNPROC ownership of the buffer lock, it is the thread which de fact owns the buffer (only the thread can unlock it). A possible solution would be to pass LK_NOWAIT to nfs_nget() from the nfs_readdirplusrpc(). From my reading of the code, nfs_nget() should be capable of correctly handling the lock failure. And EBUSY would result in doit =3D 0, which should be fine too. It is possible that EBUSY should be reset to 0, though. --A+YNfBJfL1GjoVjN Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (FreeBSD) iQIcBAEBAgAGBQJRQZf/AAoJEJDCuSvBvK1BK+QP/019GONiMGOEZgy9jnRFk2aR 8hUfCDdJLiNy4e3Wa2gw89Yr9TGSiOaa6YRLKwSWQ9I39yFPOOeoM6kC/QD0oQqu qlxdalKYbiJOR61ufnqIQCRsDufbKPD2IfkoTzEYiPCsZLEAu+yV0c/0g09mCMb5 +KIr9ku72CVkLba1BHBA+9CiAb1VFa234iQDc+t792e62ttPJPP7xhTylNaME3Y7 QWqFZjcG6PFfeQDOVkhWUGRO4m6Ak5peEpLXE1po0+sgfcnrZmgw4crgLzmIKKZl vdQ3UetqWflaTCnP3L9B28j0+H/CS53VS9sndST8xYXPADMlnuoLLGkiBpsrfRMj vQZHz7sV6+qNXxN2LZJBgHQPuio4zghyxP6+4j57BmCJfWm6gR2pqcHip9xDBI6j hXbkL1nVPWRH6iIIeRVs9RWXrwaa+upNIX9+aSgKWRXitIH3gRL7Gjg57wI6wxaI CznTVt8geuVz4C1jtNcR0BfZ5i0zjwBH66hLcX86HvfYGY26HmQPB3kOTfmaYpdp KIhYxof5gEbyr3Zgl0MwEhAxLHbVyLBybkz6ENMtbBDOhZJ2pGFawIx7j9ngK+9D g282I/j4S8mGPdSI4uNFPLPuGyYMcjUFTBZwDostsw1pzCy+QCN3gZQWuWdVfIYe ZrYLzGYi4hGE22kBdBNV =YA8y -----END PGP SIGNATURE----- --A+YNfBJfL1GjoVjN--