From owner-freebsd-fs@FreeBSD.ORG Thu Mar 14 14:57:29 2013 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id A20B7799; Thu, 14 Mar 2013 14:57:29 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from bigwig.baldwin.cx (bigknife-pt.tunnel.tserv9.chi1.ipv6.he.net [IPv6:2001:470:1f10:75::2]) by mx1.freebsd.org (Postfix) with ESMTP id 70A2F266; Thu, 14 Mar 2013 14:57:29 +0000 (UTC) Received: from jhbbsd.localnet (unknown [209.249.190.124]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id C4858B985; Thu, 14 Mar 2013 10:57:28 -0400 (EDT) From: John Baldwin To: Konstantin Belousov Subject: Re: Deadlock in the NFS client Date: Thu, 14 Mar 2013 10:57:13 -0400 User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110714-p25; KDE/4.5.5; amd64; ; ) References: <201303131356.37919.jhb@freebsd.org> <492562517.3880600.1363217615412.JavaMail.root@erie.cs.uoguelph.ca> <20130314092728.GI3794@kib.kiev.ua> In-Reply-To: <20130314092728.GI3794@kib.kiev.ua> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-15" Content-Transfer-Encoding: 7bit Message-Id: <201303141057.13609.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Thu, 14 Mar 2013 10:57:28 -0400 (EDT) Cc: Rick Macklem , fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 14 Mar 2013 14:57:29 -0000 On Thursday, March 14, 2013 5:27:28 am Konstantin Belousov wrote: > On Wed, Mar 13, 2013 at 07:33:35PM -0400, Rick Macklem wrote: > > John Baldwin wrote: > > > I ran into a machine that had a deadlock among certain files on a > > > given NFS > > > mount today. I'm not sure how best to resolve it, though it seems like > > > perhaps there is a bug with how the pool of nfsiod threads is managed. > > > Anyway, more details on the actual hang below. This was on 8.x with > > > the > > > old NFS client, but I don't see anything in HEAD that would fix this. > > > > > > First note that the system was idle so it had dropped down to only one > > > nfsiod thread. > > > > > Hmm, I see the problem and I'm a bit surprised it doesn't bite more often. > > It seems to me that this snippet of code from nfs_asyncio() makes too > > weak an assumption: > > /* > > * If none are free, we may already have an iod working on this mount > > * point. If so, it will process our request. > > */ > > if (!gotiod) { > > if (nmp->nm_bufqiods > 0) { > > NFS_DPF(ASYNCIO, > > ("nfs_asyncio: %d iods are already processing mount %p\n", > > nmp->nm_bufqiods, nmp)); > > gotiod = TRUE; > > } > > } > > It assumes that, since an nfsiod thread is processing some buffer for the > > mount, it will become available to do this one, which isn't true for your > > deadlock. > > > > I think the simple fix would be to recode nfs_asyncio() so that > > it only returns 0 if it finds an AVAILABLE nfsiod thread that it > > has assigned to do the I/O, getting rid of the above. The problem > > with doing this is that it may result in a lot more synchronous I/O > > (nfs_asyncio() returns EIO, so the caller does the I/O). Maybe more > > synchronous I/O could be avoided by allowing nfs_asyncio() to create a > > new thread even if the total is above nfs_iodmax. (I think this would > > require the fixed array to be replaced with a linked list and might > > result in a large number of nfsiod threads.) Maybe just having a large > > nfs_iodmax would be an adequate compromise? > > > > Does having a large # of nfsiod threads cause any serious problem for > > most systems these days? > > > > I'd be tempted to recode nfs_asyncio() as above and then, instead > > of nfs_iodmin and nfs_iodmax, I'd simply have: - a fixed number of > > nfsiod threads (this could be a tunable, with the understanding that > > it should be large for good performance) > > > > I do not see how this would solve the deadlock itself. The proposal would > only allow system to survive slightly longer after the deadlock appeared. > And, I think that allowing the unbound amount of nfsiod threads is also > fatal. > > The issue there is the LOR between buffer lock and vnode lock. Buffer lock > always must come after the vnode lock. The problematic nfsiod thread, which > locks the vnode, volatile this rule, because despite the LK_KERNPROC > ownership of the buffer lock, it is the thread which de fact owns the > buffer (only the thread can unlock it). > > A possible solution would be to pass LK_NOWAIT to nfs_nget() from the > nfs_readdirplusrpc(). From my reading of the code, nfs_nget() should > be capable of correctly handling the lock failure. And EBUSY would > result in doit = 0, which should be fine too. > > It is possible that EBUSY should be reset to 0, though. Yes, thinking about this more, I do think the right answer is for readdirplus to do this. The only question I have is if it should do this always, or if it should do this only from the nfsiod thread. I believe you can't get this in the non-nfsiod case. -- John Baldwin