Date: Wed, 17 Jul 2013 14:03:25 +0300 From: Andriy Gapon <avg@FreeBSD.org> To: zfs-devel@FreeBSD.org, freebsd-fs@FreeBSD.org Subject: zfs_rename: another zfs+vfs deadlock Message-ID: <51E679FD.3040306@FreeBSD.org>
next in thread | raw e-mail | index | archive | help
I received a report about what looked like a deadlock involving ZFS. Interesting bits from the report are: Thread 1156 (Thread 2038380): #0 sched_switch (td=0xfffffe01a9e56460, newtd=0xfffffe001c3ff000, flags=Variable "flags" is not available. ) at /usr/src/sys/kern/sched_ule.c:1860 #1 0xffffffff808ab51a in mi_switch (flags=260, newtd=0x0) at /usr/src/sys/kern/kern_synch.c:466 #2 0xffffffff808e3e63 in sleepq_switch (wchan=0xfffffe06d9df2310, pri=96) at /usr/src/sys/kern/subr_sleepqueue.c:538 #3 0xffffffff808e4a9d in sleepq_wait (wchan=0xfffffe06d9df2310, pri=96) at /usr/src/sys/kern/subr_sleepqueue.c:617 #4 0xffffffff8088aebb in __lockmgr_args (lk=0xfffffe06d9df2310, flags=524544, ilk=0xfffffe06d9df23d8, wmesg=Variable "wmesg" is not available. ) at /usr/src/sys/kern/kern_lock.c:214 #5 0xffffffff8092d349 in vop_stdlock (ap=Variable "ap" is not available. ) at lockmgr.h:97 #6 0xffffffff80bd62ab in VOP_LOCK1_APV (vop=0xffffffff8111cc80, a=0xffffff90729ee6e0) at vnode_if.c:1988 #7 0xffffffff8094cfa7 in _vn_lock (vp=0xfffffe06d9df2278, flags=524288, file=Variable "file" is not available. ) at vnode_if.h:859 #8 0xffffffff80942220 in vputx (vp=0xfffffe06d9df2278, func=1) at /usr/src/sys/kern/vfs_subr.c:2279 #9 0xffffffff816e75a4 in zfs_rename_unlock (zlpp=0xffffff90729ee878) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:3501 #10 0xffffffff816e8df4 in zfs_freebsd_rename (ap=Variable "ap" is not available. ) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:3927 #11 0xffffffff80bd67cb in VOP_RENAME_APV (vop=0xffffffff8175b900, a=0xffffff90729eeaa0) at vnode_if.c:1474 #12 0xffffffff80947844 in kern_renameat (td=Variable "td" is not available. ) at vnode_if.h:636 #13 0xffffffff80b4eff2 in amd64_syscall (td=0xfffffe01a9e56460, traced=0) at subr_syscall.c:135 #14 0xffffffff80b39b97 in Xfast_syscall () at /usr/src/sys/amd64/amd64/exception.S:387 fr 11 p *a $1 = {a_gen = {a_desc = 0xffffffff811594c0}, a_fdvp = 0xfffffe05fb094278, a_fvp = 0xfffffe04b7b62278, a_fcnp = 0xffffff90729eea58, a_tdvp = 0xfffffe0514137768, a_tvp = 0x0, a_tcnp = 0xffffff90729ee9a8} Thread 1158 (Thread 4174978): #0 sched_switch (td=0xfffffe088cbef000, newtd=0xfffffe001c40e000, flags=Variable "flags" is not available. ) at /usr/src/sys/kern/sched_ule.c:1860 #1 0xffffffff808ab51a in mi_switch (flags=260, newtd=0x0) at /usr/src/sys/kern/kern_synch.c:466 #2 0xffffffff808e3e63 in sleepq_switch (wchan=0xfffffe0514137800, pri=96) at /usr/src/sys/kern/subr_sleepqueue.c:538 #3 0xffffffff808e4a9d in sleepq_wait (wchan=0xfffffe0514137800, pri=96) at /usr/src/sys/kern/subr_sleepqueue.c:617 #4 0xffffffff8088b4e0 in __lockmgr_args (lk=0xfffffe0514137800, flags=2097152, ilk=0xfffffe05141378c8, wmesg=Variable "wmesg" is not available. ) at /usr/src/sys/kern/kern_lock.c:214 #5 0xffffffff8092d349 in vop_stdlock (ap=Variable "ap" is not available. ) at lockmgr.h:97 #6 0xffffffff80bd62ab in VOP_LOCK1_APV (vop=0xffffffff8111cc80, a=0xffffff9072813470) at vnode_if.c:1988 #7 0xffffffff8094cfa7 in _vn_lock (vp=0xfffffe0514137768, flags=2097152, file=Variable "file" is not available. ) at vnode_if.h:859 #8 0xffffffff816e5bdd in zfs_vnode_lock (vp=0xfffffe0514137768, flags=Variable "flags" is not available. ) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c:1704 #9 0xffffffff816e6d70 in zfs_lookup (dvp=0xfffffe06d9df2278, nm=0xffffff90728135b0 "toBeDeleted", vpp=0xffffff9072813930, cnp=0xffffff9072813958, nameiop=0, cr=0xfffffe0ba89b0a00, td=0xfffffe088cbef000, flags=0) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:1433 #10 0xffffffff816e7511 in zfs_freebsd_lookup (ap=0xffffff9072813710) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:5758 #11 0xffffffff80bd5d3f in VOP_CACHEDLOOKUP_APV (vop=0xffffffff8175b900, a=0xffffff9072813710) at vnode_if.c:187 #12 0xffffffff8092b103 in vfs_cache_lookup (ap=Variable "ap" is not available. ) at vnode_if.h:80 #13 0xffffffff80bd7187 in VOP_LOOKUP_APV (vop=0xffffffff8175b900, a=0xffffff90728137d0) at vnode_if.c:123 #14 0xffffffff8093260a in lookup (ndp=0xffffff90728138f0) at vnode_if.h:54 #15 0xffffffff8093354e in namei (ndp=0xffffff90728138f0) at /usr/src/sys/kern/vfs_lookup.c:297 #16 0xffffffff80944213 in kern_statat_vnhook (td=0xfffffe088cbef000, flag=Variable "flag" is not available. ) at /usr/src/sys/kern/vfs_syscalls.c:2432 #17 0xffffffff809443b5 in kern_statat (td=Variable "td" is not available. ) at /usr/src/sys/kern/vfs_syscalls.c:2413 #18 0xffffffff8094455a in sys_stat (td=Variable "td" is not available. ) at /usr/src/sys/kern/vfs_syscalls.c:2374 #19 0xffffffff80b4eff2 in amd64_syscall (td=0xfffffe088cbef000, traced=0) at subr_syscall.c:135 #20 0xffffffff80b39b97 in Xfast_syscall () at /usr/src/sys/amd64/amd64/exception.S:387 As far as I understand the code, I think that zfs_rename_lock (called from zfs_rename) iterates up ancestor chain of target directory (tdzp) and obtains a reference on each of the ancestors via zfs_zget. zfs_rename_unlock does the opposite - it iterates in the reverse order and VN_RELE-s the ancestor znodes. As you can see above, on FreeBSD VN_RELE translates to vputx, which internally needs to obtain a vnode lock. The problem seems to be is that VOP_RENAME -> zfs_freebsd_rename is called with locked tdvp (and perhaps non-NULL and thus locked tvp). tdvp's vnode lock is released at the very end of zfs_freebsd_rename and so it is held over zfs_rename_unlock. And that means that vnode locks of tvp's ancestors can be acquired while tdvp's vnode lock is held. That violates the VFS lock ordering where a descendant's lock must always be acquired after an ancestor's lock. So that could lead to a deadlock with another VFS operation that acquires locks in the proper order. In the above snippet 0xfffffe06d9df2278 is a directory/ancestor of tdvp and 0xfffffe0514137768 is tdvp. VOP_LOOKUP -> zfs_lookup acquires the locks in the correct order (dvp is the ancestor while vp is the tdvp) while zfs_rename does it in the opposite order. A scenario to reproduce this bug could be like this. mkdir a mkdir a/b mv some-file a/b/ (in parallel with) stat a/b Of course it would have to be repeated many times to hit the right timing window. Also, namecache could interfere with this scenario, but I am not sure. -- Andriy Gapon
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?51E679FD.3040306>