From owner-freebsd-fs@FreeBSD.ORG Wed Jul 17 11:04:31 2013 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id EE5816A1; Wed, 17 Jul 2013 11:04:30 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 0040196E; Wed, 17 Jul 2013 11:04:29 +0000 (UTC) Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id OAA02116; Wed, 17 Jul 2013 14:04:21 +0300 (EEST) (envelope-from avg@FreeBSD.org) Received: from localhost ([127.0.0.1]) by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1UzPWu-00005Q-V1; Wed, 17 Jul 2013 14:04:21 +0300 Message-ID: <51E679FD.3040306@FreeBSD.org> Date: Wed, 17 Jul 2013 14:03:25 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:17.0) Gecko/20130708 Thunderbird/17.0.7 MIME-Version: 1.0 To: zfs-devel@FreeBSD.org, freebsd-fs@FreeBSD.org Subject: zfs_rename: another zfs+vfs deadlock X-Enigmail-Version: 1.5.1 Content-Type: text/plain; charset=X-VIET-VPS Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 17 Jul 2013 11:04:31 -0000 I received a report about what looked like a deadlock involving ZFS. Interesting bits from the report are: Thread 1156 (Thread 2038380): #0 sched_switch (td=0xfffffe01a9e56460, newtd=0xfffffe001c3ff000, flags=Variable "flags" is not available. ) at /usr/src/sys/kern/sched_ule.c:1860 #1 0xffffffff808ab51a in mi_switch (flags=260, newtd=0x0) at /usr/src/sys/kern/kern_synch.c:466 #2 0xffffffff808e3e63 in sleepq_switch (wchan=0xfffffe06d9df2310, pri=96) at /usr/src/sys/kern/subr_sleepqueue.c:538 #3 0xffffffff808e4a9d in sleepq_wait (wchan=0xfffffe06d9df2310, pri=96) at /usr/src/sys/kern/subr_sleepqueue.c:617 #4 0xffffffff8088aebb in __lockmgr_args (lk=0xfffffe06d9df2310, flags=524544, ilk=0xfffffe06d9df23d8, wmesg=Variable "wmesg" is not available. ) at /usr/src/sys/kern/kern_lock.c:214 #5 0xffffffff8092d349 in vop_stdlock (ap=Variable "ap" is not available. ) at lockmgr.h:97 #6 0xffffffff80bd62ab in VOP_LOCK1_APV (vop=0xffffffff8111cc80, a=0xffffff90729ee6e0) at vnode_if.c:1988 #7 0xffffffff8094cfa7 in _vn_lock (vp=0xfffffe06d9df2278, flags=524288, file=Variable "file" is not available. ) at vnode_if.h:859 #8 0xffffffff80942220 in vputx (vp=0xfffffe06d9df2278, func=1) at /usr/src/sys/kern/vfs_subr.c:2279 #9 0xffffffff816e75a4 in zfs_rename_unlock (zlpp=0xffffff90729ee878) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:3501 #10 0xffffffff816e8df4 in zfs_freebsd_rename (ap=Variable "ap" is not available. ) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:3927 #11 0xffffffff80bd67cb in VOP_RENAME_APV (vop=0xffffffff8175b900, a=0xffffff90729eeaa0) at vnode_if.c:1474 #12 0xffffffff80947844 in kern_renameat (td=Variable "td" is not available. ) at vnode_if.h:636 #13 0xffffffff80b4eff2 in amd64_syscall (td=0xfffffe01a9e56460, traced=0) at subr_syscall.c:135 #14 0xffffffff80b39b97 in Xfast_syscall () at /usr/src/sys/amd64/amd64/exception.S:387 fr 11 p *a $1 = {a_gen = {a_desc = 0xffffffff811594c0}, a_fdvp = 0xfffffe05fb094278, a_fvp = 0xfffffe04b7b62278, a_fcnp = 0xffffff90729eea58, a_tdvp = 0xfffffe0514137768, a_tvp = 0x0, a_tcnp = 0xffffff90729ee9a8} Thread 1158 (Thread 4174978): #0 sched_switch (td=0xfffffe088cbef000, newtd=0xfffffe001c40e000, flags=Variable "flags" is not available. ) at /usr/src/sys/kern/sched_ule.c:1860 #1 0xffffffff808ab51a in mi_switch (flags=260, newtd=0x0) at /usr/src/sys/kern/kern_synch.c:466 #2 0xffffffff808e3e63 in sleepq_switch (wchan=0xfffffe0514137800, pri=96) at /usr/src/sys/kern/subr_sleepqueue.c:538 #3 0xffffffff808e4a9d in sleepq_wait (wchan=0xfffffe0514137800, pri=96) at /usr/src/sys/kern/subr_sleepqueue.c:617 #4 0xffffffff8088b4e0 in __lockmgr_args (lk=0xfffffe0514137800, flags=2097152, ilk=0xfffffe05141378c8, wmesg=Variable "wmesg" is not available. ) at /usr/src/sys/kern/kern_lock.c:214 #5 0xffffffff8092d349 in vop_stdlock (ap=Variable "ap" is not available. ) at lockmgr.h:97 #6 0xffffffff80bd62ab in VOP_LOCK1_APV (vop=0xffffffff8111cc80, a=0xffffff9072813470) at vnode_if.c:1988 #7 0xffffffff8094cfa7 in _vn_lock (vp=0xfffffe0514137768, flags=2097152, file=Variable "file" is not available. ) at vnode_if.h:859 #8 0xffffffff816e5bdd in zfs_vnode_lock (vp=0xfffffe0514137768, flags=Variable "flags" is not available. ) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c:1704 #9 0xffffffff816e6d70 in zfs_lookup (dvp=0xfffffe06d9df2278, nm=0xffffff90728135b0 "toBeDeleted", vpp=0xffffff9072813930, cnp=0xffffff9072813958, nameiop=0, cr=0xfffffe0ba89b0a00, td=0xfffffe088cbef000, flags=0) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:1433 #10 0xffffffff816e7511 in zfs_freebsd_lookup (ap=0xffffff9072813710) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:5758 #11 0xffffffff80bd5d3f in VOP_CACHEDLOOKUP_APV (vop=0xffffffff8175b900, a=0xffffff9072813710) at vnode_if.c:187 #12 0xffffffff8092b103 in vfs_cache_lookup (ap=Variable "ap" is not available. ) at vnode_if.h:80 #13 0xffffffff80bd7187 in VOP_LOOKUP_APV (vop=0xffffffff8175b900, a=0xffffff90728137d0) at vnode_if.c:123 #14 0xffffffff8093260a in lookup (ndp=0xffffff90728138f0) at vnode_if.h:54 #15 0xffffffff8093354e in namei (ndp=0xffffff90728138f0) at /usr/src/sys/kern/vfs_lookup.c:297 #16 0xffffffff80944213 in kern_statat_vnhook (td=0xfffffe088cbef000, flag=Variable "flag" is not available. ) at /usr/src/sys/kern/vfs_syscalls.c:2432 #17 0xffffffff809443b5 in kern_statat (td=Variable "td" is not available. ) at /usr/src/sys/kern/vfs_syscalls.c:2413 #18 0xffffffff8094455a in sys_stat (td=Variable "td" is not available. ) at /usr/src/sys/kern/vfs_syscalls.c:2374 #19 0xffffffff80b4eff2 in amd64_syscall (td=0xfffffe088cbef000, traced=0) at subr_syscall.c:135 #20 0xffffffff80b39b97 in Xfast_syscall () at /usr/src/sys/amd64/amd64/exception.S:387 As far as I understand the code, I think that zfs_rename_lock (called from zfs_rename) iterates up ancestor chain of target directory (tdzp) and obtains a reference on each of the ancestors via zfs_zget. zfs_rename_unlock does the opposite - it iterates in the reverse order and VN_RELE-s the ancestor znodes. As you can see above, on FreeBSD VN_RELE translates to vputx, which internally needs to obtain a vnode lock. The problem seems to be is that VOP_RENAME -> zfs_freebsd_rename is called with locked tdvp (and perhaps non-NULL and thus locked tvp). tdvp's vnode lock is released at the very end of zfs_freebsd_rename and so it is held over zfs_rename_unlock. And that means that vnode locks of tvp's ancestors can be acquired while tdvp's vnode lock is held. That violates the VFS lock ordering where a descendant's lock must always be acquired after an ancestor's lock. So that could lead to a deadlock with another VFS operation that acquires locks in the proper order. In the above snippet 0xfffffe06d9df2278 is a directory/ancestor of tdvp and 0xfffffe0514137768 is tdvp. VOP_LOOKUP -> zfs_lookup acquires the locks in the correct order (dvp is the ancestor while vp is the tdvp) while zfs_rename does it in the opposite order. A scenario to reproduce this bug could be like this. mkdir a mkdir a/b mv some-file a/b/ (in parallel with) stat a/b Of course it would have to be repeated many times to hit the right timing window. Also, namecache could interfere with this scenario, but I am not sure. -- Andriy Gapon