From owner-freebsd-net@FreeBSD.ORG Thu Jul 21 20:37:36 2011 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 292CB1065672; Thu, 21 Jul 2011 20:37:36 +0000 (UTC) (envelope-from jlott@averesystems.com) Received: from zimbra.averesystems.com (75-149-8-245-Pennsylvania.hfc.comcastbusiness.net [75.149.8.245]) by mx1.freebsd.org (Postfix) with ESMTP id D87648FC08; Thu, 21 Jul 2011 20:37:35 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by zimbra.averesystems.com (Postfix) with ESMTP id C39018BC020; Thu, 21 Jul 2011 16:22:35 -0400 (EDT) X-Virus-Scanned: amavisd-new at averesystems.com Received: from zimbra.averesystems.com ([127.0.0.1]) by localhost (zimbra.averesystems.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id YDGr3wCaIfrT; Thu, 21 Jul 2011 16:22:34 -0400 (EDT) Received: from jlott-mac.arriad.com (fw.arriad.com [10.0.0.16]) by zimbra.averesystems.com (Postfix) with ESMTPSA id 619468BC003; Thu, 21 Jul 2011 16:22:34 -0400 (EDT) From: Jeremiah Lott Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Date: Thu, 21 Jul 2011 16:19:59 -0400 Message-Id: <44626428-CF14-4B20-AB57-6D4E8F4678AE@averesystems.com> To: freebsd-net@freebsd.org Mime-Version: 1.0 (Apple Message framework v1084) X-Mailer: Apple Mail (2.1084) Cc: rmacklem@FreeBSD.org Subject: LOR with nfsclient "sillyrename" X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 21 Jul 2011 20:37:36 -0000 We're seeing nfsclient deadlocks with what looks like lock order = reversal after removing a "silly rename". It is fairly rare, but we've = seen it happen a few times. I included relevant back traces from an = occurrence. =46rom what I can see, nfs_inactive() is called with the = vnode locked. If there is a silly-rename, it will call vrele() on its = parent directory, which can potentially try to lock the parent = directory. Since this is the opposite order of the lock acquisition in = lookup, it can deadlock. This happened in a FreeBSD7 build, but I = looked through freebsd head and didn't see any change that addressed = this. Anyone seen this before? Jeremiah Lott Avere Systems (kgdb) thread 132 [Switching to thread 132 (Thread 100437)]#10 0xffffffff804e794e in vget = ( vp=3D0xffffff019bafe5e8, flags=3D8194, td=3D0xffffff00096e16e0) at ../../../kern/vfs_subr.c:2070 2070 if ((error =3D vn_lock(vp, flags | LK_INTERLOCK, td)) !=3D= 0) { (kgdb) bt #0 sched_switch (td=3D0xffffff00096e16e0, newtd=3D0xffffff00021f46e0, = flags=3D1) at ../../../kern/sched_4bsd.c:928 #1 0xffffffff80477a1c in mi_switch (flags=3D1, newtd=3D0x0) at ../../../kern/kern_synch.c:444 #2 0xffffffff804a032b in sleepq_switch (wchan=3DVariable "wchan" is not = available. ) at ../../../kern/subr_sleepqueue.c:535 #3 0xffffffff804a05f1 in sleepq_catch_signals = (wchan=3D0xffffff019bafe680) at ../../../kern/subr_sleepqueue.c:455 #4 0xffffffff804a0ba6 in sleepq_wait_sig (wchan=3DVariable "wchan" is = not available. ) at ../../../kern/subr_sleepqueue.c:650 #5 0xffffffff80477f15 in _sleep (ident=3D0xffffff019bafe680,=20 lock=3D0xffffffff80f58020, priority=3D336, wmesg=3D0xffffffff8077df5c = "nfs",=20 timo=3D0) at ../../../kern/kern_synch.c:224 #6 0xffffffff8045eef0 in acquire (lkpp=3D0xffffffffd1475420, = extflags=3DVariable "extflags" is not available. ) at ../../../kern/kern_lock.c:151 #7 0xffffffff8045f228 in _lockmgr (lkp=3D0xffffff019bafe680, = flags=3DVariable "flags" is not available. ) at ../../../kern/kern_lock.c:384 #8 0xffffffff806fefda in VOP_LOCK1_APV (vop=3D0xffffffff8095f360,=20 a=3D0xffffffffd14754c0) at vnode_if.c:1618 #9 0xffffffff804f106f in _vn_lock (vp=3D0xffffff019bafe5e8, flags=3D2,=20= td=3D0xffffff00096e16e0, file=3D0xffffffff80772758 = "../../../kern/vfs_subr.c",=20 line=3D2070) at vnode_if.h:851 #10 0xffffffff804e794e in vget (vp=3D0xffffff019bafe5e8, flags=3D8194,=20= td=3D0xffffff00096e16e0) at ../../../kern/vfs_subr.c:2070 ---Type to continue, or q to quit--- #11 0xffffffff804db125 in vfs_hash_get (mp=3D0xffffff00049e1000,=20 hash=3D2562259366, flags=3DVariable "flags" is not available. ) at ../../../kern/vfs_hash.c:81 #12 0xffffffff80552198 in nfs_nget (mntp=3D0xffffff00049e1000,=20 fhp=3D0xffffff0d97160578, fhsize=3D48, npp=3D0xffffffffd14757d0, = flags=3D2) at ../../../nfsclient/nfs_node.c:120 #13 0xffffffff8056084b in nfs_lookup (ap=3DVariable "ap" is not = available. ) at ../../../nfsclient/nfs_vnops.c:950 #14 0xffffffff806ffa84 in VOP_LOOKUP_APV (vop=3D0xffffffff8096a980,=20 a=3D0xffffffffd14758e0) at vnode_if.c:99 #15 0xffffffff804dc16d in lookup (ndp=3D0xffffffffd1475a00) at = vnode_if.h:57 #16 0xffffffff804dce94 in namei (ndp=3D0xffffffffd1475a00) at ../../../kern/vfs_lookup.c:215 #17 0xffffffff804ebf4e in kern_unlink (td=3D0xffffff00096e16e0,=20 path=3D0x80e815178
,=20 pathseg=3DUIO_USERSPACE) at ../../../kern/vfs_syscalls.c:1670 #18 0xffffffff8066d756 in syscall (frame=3D0xffffffffd1475c80) at ../../../amd64/amd64/trap.c:939 #19 0xffffffff8065409b in Xfast_syscall () at ../../../amd64/amd64/exception.S:339 #20 0x00000008014524bc in ?? () (kgdb) up 6 #6 0xffffffff8045eef0 in acquire (lkpp=3D0xffffffffd1475420, = extflags=3DVariable "extflags" is not available. ) at ../../../kern/kern_lock.c:151 151 error =3D msleep(lkp, lkp->lk_interlock, = lkp->lk_prio, (kgdb) print *lkp =20 $7 =3D {lk_object =3D {lo_name =3D 0xffffffff8077df5c "nfs",=20 lo_type =3D 0xffffffff8077df5c "nfs", lo_flags =3D 70844416,=20 lo_witness_data =3D {lod_list =3D {stqe_next =3D 0x0}, lod_witness =3D= 0x0}},=20 lk_interlock =3D 0xffffffff80f58020, lk_flags =3D 33816640, = lk_sharecount =3D 0,=20 lk_waitcount =3D 1, lk_exclusivecount =3D 1, lk_prio =3D 336, lk_timo = =3D 51,=20 lk_lockholder =3D 0xffffff0180998000, lk_newlock =3D 0x0} (kgdb) print td No symbol "td" in current context. (kgdb) thread 134 [Switching to thread 134 (Thread 100765)]#0 sched_switch ( td=3D0xffffff0180998000, newtd=3D0xffffff00021f4370, flags=3D1) at ../../../kern/sched_4bsd.c:928 928 if (PMC_PROC_IS_USING_PMCS(td->td_proc)) (kgdb) bt #0 sched_switch (td=3D0xffffff0180998000, newtd=3D0xffffff00021f4370, = flags=3D1) at ../../../kern/sched_4bsd.c:928 #1 0xffffffff80477a1c in mi_switch (flags=3D1, newtd=3D0x0) at ../../../kern/kern_synch.c:444 #2 0xffffffff804a032b in sleepq_switch (wchan=3DVariable "wchan" is not = available. ) at ../../../kern/subr_sleepqueue.c:535 #3 0xffffffff804a05f1 in sleepq_catch_signals = (wchan=3D0xffffff0009b90a70) at ../../../kern/subr_sleepqueue.c:455 #4 0xffffffff804a0ba6 in sleepq_wait_sig (wchan=3DVariable "wchan" is = not available. ) at ../../../kern/subr_sleepqueue.c:650 #5 0xffffffff80477f15 in _sleep (ident=3D0xffffff0009b90a70,=20 lock=3D0xffffffff80f57c60, priority=3D336, wmesg=3D0xffffffff8077df5c = "nfs",=20 timo=3D0) at ../../../kern/kern_synch.c:224 #6 0xffffffff8045eef0 in acquire (lkpp=3D0xffffffffd1ade690, = extflags=3DVariable "extflags" is not available. ) at ../../../kern/kern_lock.c:151 #7 0xffffffff8045f68e in _lockmgr (lkp=3D0xffffff0009b90a70, = flags=3DVariable "flags" is not available. ) at ../../../kern/kern_lock.c:391 #8 0xffffffff806fefda in VOP_LOCK1_APV (vop=3D0xffffffff8095f360,=20 a=3D0xffffffffd1ade730) at vnode_if.c:1618 #9 0xffffffff804f106f in _vn_lock (vp=3D0xffffff0009b909d8, flags=3D2,=20= td=3D0xffffff0180998000, file=3D0xffffffff80772758 = "../../../kern/vfs_subr.c",=20 line=3D2165) at vnode_if.h:851 #10 0xffffffff804e72a7 in vrele (vp=3D0xffffff0009b909d8) at ../../../kern/vfs_subr.c:2165 ---Type to continue, or q to quit--- #11 0xffffffff80551f11 in nfs_inactive (ap=3D0xffffffffd1ade860) at ../../../nfsclient/nfs_node.c:216 #12 0xffffffff806fdfa8 in VOP_INACTIVE_APV (vop=3DVariable "vop" is not = available. ) at vnode_if.c:1513 #13 0xffffffff804e26b8 in vinactive (vp=3D0xffffff019bafe5e8,=20 td=3D0xffffff0180998000) at vnode_if.h:796 #14 0xffffffff804e713e in vput (vp=3D0xffffff019bafe5e8) at ../../../kern/vfs_subr.c:2232 #15 0xffffffff804f197b in vn_close (vp=3D0xffffff019bafe5e8, = flags=3DVariable "flags" is not available. ) at ../../../kern/vfs_vnops.c:295 #16 0xffffffff804f1a20 in vn_closefile (fp=3D0xffffff0180dab000,=20 td=3D0xffffff0180998000) at ../../../kern/vfs_vnops.c:867 #17 0xffffffff80446838 in fdrop (fp=3D0xffffff0180dab000, = td=3D0xffffff0180998000) at file.h:299 #18 0xffffffff804480f9 in closef (fp=3D0xffffff0180dab000, = td=3D0xffffff0180998000) at ../../../kern/kern_descrip.c:2033 #19 0xffffffff804486e0 in kern_close (td=3D0xffffff0180998000, fd=3D53) at ../../../kern/kern_descrip.c:1125 #20 0xffffffff8066d756 in syscall (frame=3D0xffffffffd1adec80) at ../../../amd64/amd64/trap.c:939 #21 0xffffffff8065409b in Xfast_syscall () at ../../../amd64/amd64/exception.S:339 #22 0x00000008014631fc in ?? () Previous frame inner to this frame (corrupt stack?) (kgdb) up 6 #6 0xffffffff8045eef0 in acquire (lkpp=3D0xffffffffd1ade690, = extflags=3DVariable "extflags" is not available. ) at ../../../kern/kern_lock.c:151 151 error =3D msleep(lkp, lkp->lk_interlock, = lkp->lk_prio, (kgdb) print *lkp $8 =3D {lk_object =3D {lo_name =3D 0xffffffff8077df5c "nfs",=20 lo_type =3D 0xffffffff8077df5c "nfs", lo_flags =3D 70844416,=20 lo_witness_data =3D {lod_list =3D {stqe_next =3D 0x0}, lod_witness =3D= 0x0}},=20 lk_interlock =3D 0xffffffff80f57c60, lk_flags =3D 33947712, = lk_sharecount =3D 0,=20 lk_waitcount =3D 2, lk_exclusivecount =3D 1, lk_prio =3D 336, lk_timo = =3D 51,=20 lk_lockholder =3D 0xffffff00096e16e0, lk_newlock =3D 0x0}=