Date: Mon, 18 Mar 2013 14:45:57 -0400 From: John Baldwin <jhb@freebsd.org> To: Rick Macklem <rmacklem@uoguelph.ca> Cc: Rick Macklem <rmacklem@freebsd.org>, fs@freebsd.org Subject: Re: Deadlock in the NFS client Message-ID: <201303181445.57714.jhb@freebsd.org> In-Reply-To: <88927360.3963361.1363399419023.JavaMail.root@erie.cs.uoguelph.ca> References: <88927360.3963361.1363399419023.JavaMail.root@erie.cs.uoguelph.ca>
next in thread | previous in thread | raw e-mail | index | archive | help
On Friday, March 15, 2013 10:03:39 pm Rick Macklem wrote: > John Baldwin wrote: > > On Thursday, March 14, 2013 1:22:39 pm Konstantin Belousov wrote: > > > On Thu, Mar 14, 2013 at 10:57:13AM -0400, John Baldwin wrote: > > > > On Thursday, March 14, 2013 5:27:28 am Konstantin Belousov wrote: > > > > > On Wed, Mar 13, 2013 at 07:33:35PM -0400, Rick Macklem wrote: > > > > > > John Baldwin wrote: > > > > > > > I ran into a machine that had a deadlock among certain files > > > > > > > on a > > > > > > > given NFS > > > > > > > mount today. I'm not sure how best to resolve it, though it > > > > > > > seems like > > > > > > > perhaps there is a bug with how the pool of nfsiod threads > > > > > > > is managed. > > > > > > > Anyway, more details on the actual hang below. This was on > > > > > > > 8.x with > > > > > > > the > > > > > > > old NFS client, but I don't see anything in HEAD that would > > > > > > > fix this. > > > > > > > > > > > > > > First note that the system was idle so it had dropped down > > > > > > > to only one > > > > > > > nfsiod thread. > > > > > > > > > > > > > Hmm, I see the problem and I'm a bit surprised it doesn't bite > > > > > > more often. > > > > > > It seems to me that this snippet of code from nfs_asyncio() > > > > > > makes too > > > > > > weak an assumption: > > > > > > /* > > > > > > * If none are free, we may already have an iod working on > > > > > > this mount > > > > > > * point. If so, it will process our request. > > > > > > */ > > > > > > if (!gotiod) { > > > > > > if (nmp->nm_bufqiods > 0) { > > > > > > NFS_DPF(ASYNCIO, > > > > > > ("nfs_asyncio: %d iods are already processing mount %p\n", > > > > > > nmp->nm_bufqiods, nmp)); > > > > > > gotiod = TRUE; > > > > > > } > > > > > > } > > > > > > It assumes that, since an nfsiod thread is processing some > > > > > > buffer for the > > > > > > mount, it will become available to do this one, which isn't > > > > > > true for your > > > > > > deadlock. > > > > > > > > > > > > I think the simple fix would be to recode nfs_asyncio() so > > > > > > that > > > > > > it only returns 0 if it finds an AVAILABLE nfsiod thread that > > > > > > it > > > > > > has assigned to do the I/O, getting rid of the above. The > > > > > > problem > > > > > > with doing this is that it may result in a lot more > > > > > > synchronous I/O > > > > > > (nfs_asyncio() returns EIO, so the caller does the I/O). Maybe > > > > > > more > > > > > > synchronous I/O could be avoided by allowing nfs_asyncio() to > > > > > > create a > > > > > > new thread even if the total is above nfs_iodmax. (I think > > > > > > this would > > > > > > require the fixed array to be replaced with a linked list and > > > > > > might > > > > > > result in a large number of nfsiod threads.) Maybe just having > > > > > > a large > > > > > > nfs_iodmax would be an adequate compromise? > > > > > > > > > > > > Does having a large # of nfsiod threads cause any serious > > > > > > problem for > > > > > > most systems these days? > > > > > > > > > > > > I'd be tempted to recode nfs_asyncio() as above and then, > > > > > > instead > > > > > > of nfs_iodmin and nfs_iodmax, I'd simply have: - a fixed > > > > > > number of > > > > > > nfsiod threads (this could be a tunable, with the > > > > > > understanding that > > > > > > it should be large for good performance) > > > > > > > > > > > > > > > > I do not see how this would solve the deadlock itself. The > > > > > proposal would > > > > > only allow system to survive slightly longer after the deadlock > > > > > appeared. > > > > > And, I think that allowing the unbound amount of nfsiod threads > > > > > is also > > > > > fatal. > > > > > > > > > > The issue there is the LOR between buffer lock and vnode lock. > > > > > Buffer lock > > > > > always must come after the vnode lock. The problematic nfsiod > > > > > thread, which > > > > > locks the vnode, volatile this rule, because despite the > > > > > LK_KERNPROC > > > > > ownership of the buffer lock, it is the thread which de fact > > > > > owns the > > > > > buffer (only the thread can unlock it). > > > > > > > > > > A possible solution would be to pass LK_NOWAIT to nfs_nget() > > > > > from the > > > > > nfs_readdirplusrpc(). From my reading of the code, nfs_nget() > > > > > should > > > > > be capable of correctly handling the lock failure. And EBUSY > > > > > would > > > > > result in doit = 0, which should be fine too. > > > > > > > > > > It is possible that EBUSY should be reset to 0, though. > > > > > > > > Yes, thinking about this more, I do think the right answer is for > > > > readdirplus to do this. The only question I have is if it should > > > > do > > > > this always, or if it should do this only from the nfsiod thread. > > > > I > > > > believe you can't get this in the non-nfsiod case. > > > > > > I agree that it looks as of the workaround only needed for nfsiod > > > thread. > > > On the other hand, it is not immediately obvious how to detect that > > > the current thread is nfsio daemon. Probably a thread flag should be > > > set. > > > > OTOH, updating the attributes from readdir+ is only an optimization > > anyway, so > > just having it always do LK_NOWAIT is probably ok (and simple). > > Currently I'm > > trying to develop a test case to provoke this so I can test the fix, > > but no > > luck on that yet. > > > > -- > > John Baldwin > Just fyi, ignore my comment about the second version of the patch that > disables the nfsiod threads from doing readdirplus running faster. It > was just that when I tested the 2nd patch, the server's caches were > primed. Oops. > > However, sofar the minimal testing I've done has been essentially > performance neutral between the unpatch and patched versions. > > Hopefully John has a convenient way to do some performance testing, > since I won't be able to do much until the end of April. This does fix the deadlock, but I can't really speak to performance. -- John Baldwin
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201303181445.57714.jhb>