Date: Wed, 24 Sep 2014 20:43:15 +0300 From: Konstantin Belousov <kostikbel@gmail.com> To: Peter Holm <peter@holm.cc> Cc: FreeBSD FS <freebsd-fs@freebsd.org> Subject: Re: Deadlock with umount -f involving tmpfs on top of ZFS on r271170 Message-ID: <20140924174315.GO8870@kib.kiev.ua> In-Reply-To: <20140924171509.GA18965@x2.osted.lan> References: <5420D5FC.4030600@FreeBSD.org> <20140923131244.GC8870@kib.kiev.ua> <5422240F.4080003@FreeBSD.org> <20140924102758.GH8870@kib.kiev.ua> <20140924132605.GA11772@x2.osted.lan> <20140924134725.GI8870@kib.kiev.ua> <20140924153045.GA15685@x2.osted.lan> <20140924155728.GN8870@kib.kiev.ua> <20140924171509.GA18965@x2.osted.lan>
next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, Sep 24, 2014 at 07:15:09PM +0200, Peter Holm wrote: > On Wed, Sep 24, 2014 at 06:57:28PM +0300, Konstantin Belousov wrote: > > On Wed, Sep 24, 2014 at 05:30:45PM +0200, Peter Holm wrote: > > > On Wed, Sep 24, 2014 at 04:47:25PM +0300, Konstantin Belousov wrote: > > > > On Wed, Sep 24, 2014 at 03:26:05PM +0200, Peter Holm wrote: > > > > > The patch is an improvement, but: > > > > > > > > > > http://people.freebsd.org/~pho/stress/log/kostik718.txt > > > > > > > > Does you load included both rename and link, or only one of those > > > > syscalls ? I see a bug in the rename part of the patch, below is > > > > the update. > > > > > > > > > > Both. I have split the tests in two now. Uptime is by now one hour. > > > I'll let that run for a few hours more, before switching to random > > > tests. > > > > > > I did get this page fault once: > > > http://people.freebsd.org/~pho/stress/log/kostik719.txt > > > but I guess it's unrelated? I have recompiled uma_core.c and > > > vm_pageout with "-O0" in case it shows up again. > > > > This looks unrelated. But, in the log, I see user-controllable LOR > > caused by my patch. Please use the following update instead. > > > > diff --git a/sys/kern/vfs_syscalls.c b/sys/kern/vfs_syscalls.c > > index b3b7ed5..a4aa19e 100644 > > --- a/sys/kern/vfs_syscalls.c > > Seems unchanged to me? > > 20140924 19:08:31 all (1/2): link.sh > lock order reversal: > 1st 0xfffff800b06ce068 ufs (ufs) @ kern/vfs_subr.c:2137 > 2nd 0xfffffe0785edfeb8 bufwait (bufwait) @ ufs/ffs/ffs_vnops.c:261 > 3rd 0xfffff800b06a6548 ufs (ufs) @ kern/vfs_subr.c:2137 > KDB: stack backtrace: > db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe081db19150 > kdb_backtrace() at kdb_backtrace+0x39/frame 0xfffffe081db19200 > witness_checkorder() at witness_checkorder+0xdc2/frame 0xfffffe081db19290 > __lockmgr_args() at __lockmgr_args+0x9d2/frame 0xfffffe081db193c0 > ffs_lock() at ffs_lock+0x92/frame 0xfffffe081db19410 > VOP_LOCK1_APV() at VOP_LOCK1_APV+0xfc/frame 0xfffffe081db19440 > _vn_lock() at _vn_lock+0xd2/frame 0xfffffe081db194b0 > vget() at vget+0x67/frame 0xfffffe081db194f0 > vfs_hash_get() at vfs_hash_get+0xe1/frame 0xfffffe081db19540 > ffs_vgetf() at ffs_vgetf+0x40/frame 0xfffffe081db195d0 > softdep_sync_buf() at softdep_sync_buf+0xac0/frame 0xfffffe081db196b0 > ffs_syncvnode() at ffs_syncvnode+0x286/frame 0xfffffe081db19730 > ffs_sync() at ffs_sync+0x20f/frame 0xfffffe081db197f0 > dounmount() at dounmount+0x3da/frame 0xfffffe081db19870 > sys_unmount() at sys_unmount+0x2ec/frame 0xfffffe081db199a0 > amd64_syscall() at amd64_syscall+0x278/frame 0xfffffe081db19ab0 > Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfffffe081db19ab0 > --- syscall (22, FreeBSD ELF64, sys_unmount), rip = 0x800891bca, rsp = 0x7fffffffdf08, rbp = 0x7fffffffe020 --- > lock order reversal: > 1st 0xfffff800290fa068 ufs (ufs) @ kern/vfs_mount.c:1223 > 2nd 0xfffff800b0214068 devfs (devfs) @ ufs/ffs/ffs_vfsops.c:1375 > KDB: stack backtrace: > db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe081db19370 > kdb_backtrace() at kdb_backtrace+0x39/frame 0xfffffe081db19420 > witness_checkorder() at witness_checkorder+0xdc2/frame 0xfffffe081db194b0 > __lockmgr_args() at __lockmgr_args+0x9d2/frame 0xfffffe081db195e0 > vop_stdlock() at vop_stdlock+0x3c/frame 0xfffffe081db19600 > VOP_LOCK1_APV() at VOP_LOCK1_APV+0xfc/frame 0xfffffe081db19630 > _vn_lock() at _vn_lock+0xd2/frame 0xfffffe081db196a0 > ffs_flushfiles() at ffs_flushfiles+0x120/frame 0xfffffe081db19710 > softdep_flushfiles() at softdep_flushfiles+0x232/frame 0xfffffe081db19780 > ffs_unmount() at ffs_unmount+0xe5/frame 0xfffffe081db197f0 > dounmount() at dounmount+0x424/frame 0xfffffe081db19870 > sys_unmount() at sys_unmount+0x2ec/frame 0xfffffe081db199a0 > amd64_syscall() at amd64_syscall+0x278/frame 0xfffffe081db19ab0 > Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfffffe081db19ab0 > --- syscall (22, FreeBSD ELF64, sys_unmount), rip = 0x800891bca, rsp = 0x7fffffffdf08, rbp = 0x7fffffffe020 --- > 20140924 19:10:34 all (2/2): link2.sh > > with > FreeBSD 11.0-CURRENT (PHO) #0 r272060M: Wed Sep 24 19:00:20 CEST 2014 No, these two are known and harmless. The patch added new LOR, where you link between two different mount points. The code first locked vnodes, and only then checked for EXDEV. The log show some like ufs/tmpfs etc.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20140924174315.GO8870>