From owner-freebsd-fs@FreeBSD.ORG Wed Mar 13 23:33:37 2013 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 64B8B38F; Wed, 13 Mar 2013 23:33:37 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id E59E1DD2; Wed, 13 Mar 2013 23:33:36 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AqEEABoMQVGDaFvO/2dsb2JhbAA7CIgkuV6CXYFwdIIqAQEEASMEUgUWDgoCAg0ZAlkGiCEGr26SQxeBI4w4gQE0B4ItgRMDlliRAoMmIIFs X-IronPort-AV: E=Sophos;i="4.84,840,1355115600"; d="scan'208";a="21123471" Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.206]) by esa-jnhn.mail.uoguelph.ca with ESMTP; 13 Mar 2013 19:33:35 -0400 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 6C4EDB402D; Wed, 13 Mar 2013 19:33:35 -0400 (EDT) Date: Wed, 13 Mar 2013 19:33:35 -0400 (EDT) From: Rick Macklem To: John Baldwin Message-ID: <492562517.3880600.1363217615412.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: <201303131356.37919.jhb@freebsd.org> Subject: Re: Deadlock in the NFS client MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.203] X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - FF3.0 (Win)/6.0.10_GA_2692) Cc: Rick Macklem , fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 13 Mar 2013 23:33:37 -0000 John Baldwin wrote: > I ran into a machine that had a deadlock among certain files on a > given NFS > mount today. I'm not sure how best to resolve it, though it seems like > perhaps there is a bug with how the pool of nfsiod threads is managed. > Anyway, more details on the actual hang below. This was on 8.x with > the > old NFS client, but I don't see anything in HEAD that would fix this. > > First note that the system was idle so it had dropped down to only one > nfsiod thread. > Hmm, I see the problem and I'm a bit surprised it doesn't bite more often. It seems to me that this snippet of code from nfs_asyncio() makes too weak an assumption: /* * If none are free, we may already have an iod working on this mount * point. If so, it will process our request. */ if (!gotiod) { if (nmp->nm_bufqiods > 0) { NFS_DPF(ASYNCIO, ("nfs_asyncio: %d iods are already processing mount %p\n", nmp->nm_bufqiods, nmp)); gotiod = TRUE; } } It assumes that, since an nfsiod thread is processing some buffer for the mount, it will become available to do this one, which isn't true for your deadlock. I think the simple fix would be to recode nfs_asyncio() so that it only returns 0 if it finds an AVAILABLE nfsiod thread that it has assigned to do the I/O, getting rid of the above. The problem with doing this is that it may result in a lot more synchronous I/O (nfs_asyncio() returns EIO, so the caller does the I/O). Maybe more synchronous I/O could be avoided by allowing nfs_asyncio() to create a new thread even if the total is above nfs_iodmax. (I think this would require the fixed array to be replaced with a linked list and might result in a large number of nfsiod threads.) Maybe just having a large nfs_iodmax would be an adequate compromise? Does having a large # of nfsiod threads cause any serious problem for most systems these days? I'd be tempted to recode nfs_asyncio() as above and then, instead of nfs_iodmin and nfs_iodmax, I'd simply have: - a fixed number of nfsiod threads (this could be a tunable, with the understanding that it should be large for good performance) rick > The nfsiod thread is hung on a vnode lock: > > (kgdb) proc 36927 > [Switching to thread 150 (Thread 100679)]#0 sched_switch ( > td=0xffffff0320de88c0, newtd=0xffffff0003521460, flags=Variable > "flags" is > not available. > ) > at /usr/src/sys/kern/sched_ule.c:1898 > 1898 cpuid = PCPU_GET(cpuid); > (kgdb) where > #0 sched_switch (td=0xffffff0320de88c0, newtd=0xffffff0003521460, > flags=Variable "flags" is not available. > ) > at /usr/src/sys/kern/sched_ule.c:1898 > #1 0xffffffff80407953 in mi_switch (flags=260, newtd=0x0) > at /usr/src/sys/kern/kern_synch.c:449 > #2 0xffffffff8043e342 in sleepq_wait (wchan=0xffffff0358bbb7f8, > pri=96) > at /usr/src/sys/kern/subr_sleepqueue.c:629 > #3 0xffffffff803e5755 in __lockmgr_args (lk=0xffffff0358bbb7f8, > flags=524544, > ilk=0xffffff0358bbb820, wmesg=Variable "wmesg" is not available. > ) at /usr/src/sys/kern/kern_lock.c:220 > #4 0xffffffff80489219 in vop_stdlock (ap=Variable "ap" is not > available. > ) at lockmgr.h:94 > #5 0xffffffff80697322 in VOP_LOCK1_APV (vop=0xffffffff80892b00, > a=0xffffff847ac10600) at vnode_if.c:1988 > #6 0xffffffff804a8bb7 in _vn_lock (vp=0xffffff0358bbb760, > flags=524288, > file=0xffffffff806fa421 "/usr/src/sys/kern/vfs_subr.c", line=2138) > at vnode_if.h:859 > #7 0xffffffff8049b680 in vget (vp=0xffffff0358bbb760, flags=524544, > td=0xffffff0320de88c0) at /usr/src/sys/kern/vfs_subr.c:2138 > #8 0xffffffff8048d4aa in vfs_hash_get (mp=0xffffff004a3a0000, > hash=227722108, > flags=Variable "flags" is not available. > ) at /usr/src/sys/kern/vfs_hash.c:81 > #9 0xffffffff805631f6 in nfs_nget (mntp=0xffffff004a3a0000, > fhp=0xffffff03771eed56, fhsize=32, npp=0xffffff847ac10a40, > flags=524288) > at /usr/src/sys/nfsclient/nfs_node.c:120 > #10 0xffffffff80570229 in nfs_readdirplusrpc (vp=0xffffff0179902760, > uiop=0xffffff847ac10ad0, cred=0xffffff005587c300) > at /usr/src/sys/nfsclient/nfs_vnops.c:2636 > ---Type to continue, or q to quit--- > #11 0xffffffff8055f144 in nfs_doio (vp=0xffffff0179902760, > bp=0xffffff83e05c5860, cr=0xffffff005587c300, td=Variable "td" is not > available. > ) > at /usr/src/sys/nfsclient/nfs_bio.c:1600 > #12 0xffffffff8056770a in nfssvc_iod (instance=Variable "instance" is > not > available. > ) > at /usr/src/sys/nfsclient/nfs_nfsiod.c:303 > #13 0xffffffff803d0c2f in fork_exit (callout=0xffffffff805674b0 > , > arg=0xffffffff809266e0, frame=0xffffff847ac10c40) > at /usr/src/sys/kern/kern_fork.c:861 > > Thread stuck in getblk for that vnode (holds shared lock on this > vnode): > > (kgdb) proc 36902 > [Switching to thread 149 (Thread 101543)]#0 sched_switch ( > td=0xffffff0378d8a8c0, newtd=0xffffff0003521460, flags=Variable > "flags" is > not available. > ) > at /usr/src/sys/kern/sched_ule.c:1898 > 1898 cpuid = PCPU_GET(cpuid); > (kgdb) where > #0 sched_switch (td=0xffffff0378d8a8c0, newtd=0xffffff0003521460, > flags=Variable "flags" is not available. > ) > at /usr/src/sys/kern/sched_ule.c:1898 > #1 0xffffffff80407953 in mi_switch (flags=260, newtd=0x0) > at /usr/src/sys/kern/kern_synch.c:449 > #2 0xffffffff8043e342 in sleepq_wait (wchan=0xffffff83e11bc1c0, > pri=96) > at /usr/src/sys/kern/subr_sleepqueue.c:629 > #3 0xffffffff803e5755 in __lockmgr_args (lk=0xffffff83e11bc1c0, > flags=530688, > ilk=0xffffff0358bbb878, wmesg=Variable "wmesg" is not available. > ) at /usr/src/sys/kern/kern_lock.c:220 > #4 0xffffffff80483aeb in getblk (vp=0xffffff0358bbb760, blkno=3, > size=32768, > slpflag=0, slptimeo=0, flags=0) at lockmgr.h:94 > #5 0xffffffff8055e963 in nfs_getcacheblk (vp=0xffffff0358bbb760, bn=3, > size=32768, td=0xffffff0378d8a8c0) at > /usr/src/sys/nfsclient/nfs_bio.c:1259 > #6 0xffffffff805627d9 in nfs_bioread (vp=0xffffff0358bbb760, > uio=0xffffff847bcf0ad0, ioflag=Variable "ioflag" is not available. > ) at /usr/src/sys/nfsclient/nfs_bio.c:530 > #7 0xffffffff806956a4 in VOP_READ_APV (vop=0xffffffff808a29a0, > a=0xffffff847bcf09c0) at vnode_if.c:887 > #8 0xffffffff804a9e27 in vn_read (fp=0xffffff03b6a506e0, > uio=0xffffff847bcf0ad0, active_cred=Variable "active_cred" is not > available. > ) at vnode_if.h:384 > #9 0xffffffff80444fb1 in dofileread (td=0xffffff0378d8a8c0, fd=3, > fp=0xffffff03b6a506e0, auio=0xffffff847bcf0ad0, offset=Variable > "offset" > is not available. > ) at file.h:242 > > The buffer is locked by LK_KERNPROC: > > (kgdb) bprint bp > 0xffffff83e11bc128: BIO_READ flags (ASYNC|VMIO) > error = 0, bufsize = 32768, bcount = 32768, b_resid = 0 > bufobj = 0xffffff0358bbb878, data = 0xffffff8435a05000, blkno = d, dep > = 0x0 > lock type bufwait: EXCL by LK_KERNPROC with exclusive waiters pending > > And this buffer is queued as the first pending buffer on the mount > waiting > for service by nfsiod: > > (kgdb) set $nmp = (struct nfsmount *)vp->v_mount->mnt_data > (kgdb) p $nmp->nm_bufq.tqh_first > $24 = (struct buf *) 0xffffff83e11bc128 > (kgdb) p bp > $25 = (struct buf *) 0xffffff83e11bc128 > > So, the first process is waiting for a block from an NFS directory. > That > block is in queue to be completed as an async I/O by the nfsiod thread > pool. > However, the lone nfsiod thread in the pool is waiting to exclusively > lock the > original NFS directory to update its attributes, so it cannot service > the > async I/O request. > > -- > John Baldwin