Date: Fri, 2 Nov 2012 14:21:18 +0000 From: Attilio Rao <attilio@freebsd.org> To: Harald Schmalzbauer <h.schmalzbauer@omnilan.de> Cc: stable@freebsd.org, daichi@freebsd.org, Pavel Polyakov <bsd@kobyla.org> Subject: Re: lock violation in unionfs (9.0-STABLE r230270) Message-ID: <CAJ-FndDmV%2BZa%2BFjGcmPPmEW0yMZGj90tp3U%2Btj8K_DmqjSjxfw@mail.gmail.com> In-Reply-To: <50910751.9030303@omnilan.de> References: <op.v9l1byf89gyv16@pp> <CAJ-FndAFMV2iHcMKvMruCP%2BHRzwQuY1Jcd_o6ZEnTCiPV8_8oA@mail.gmail.com> <op.waqux6rr9gyv16@cel.home> <5022840B.3060708@omnilan.de> <CAJ-FndDkuXksyFD2Nd-S7Ty3N8boSk37=a2nYagMkguRYd1r%2Bg@mail.gmail.com> <5048C6D1.8020007@omnilan.de> <CAJ-FndAjQ-w9vLFziQKpkauyRkQnAEeYOh6nXzTR6w1gx7hsEg@mail.gmail.com> <CAJ-FndDdV3ZthE66Z7vqnM5=-=FRzrnNTogisuTS0Fmo%2Bb_0NQ@mail.gmail.com> <CAJ-FndACf6rO2CzhK=WrbQmXMNZtHsfMJ1mdPg4wgajiyZzt9A@mail.gmail.com> <508EDB2F.3010608@omnilan.de> <50910751.9030303@omnilan.de>
next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, Oct 31, 2012 at 11:11 AM, Harald Schmalzbauer <h.schmalzbauer@omnilan.de> wrote: > schrieb Attilio Rao am 29.10.2012 23:02 (localtime): >> On Mon, Oct 29, 2012 at 7:37 PM, Harald Schmalzbauer >> <h.schmalzbauer@omnilan.de> wrote: >>> schrieb Attilio Rao am 27.10.2012 23:07 (localtime): >>>> On Sat, Oct 27, 2012 at 9:46 PM, Attilio Rao <attilio@freebsd.org> wrote: >>>>> On Sat, Sep 8, 2012 at 12:48 AM, Attilio Rao <attilio@freebsd.org> wrote: >>>>>> On Thu, Sep 6, 2012 at 4:52 PM, Harald Schmalzbauer >>>>>> <h.schmalzbauer@omnilan.de> wrote: >>>>>>> schrieb Attilio Rao am 09.08.2012 20:26 (localtime): >>>>>>>> On 8/8/12, Harald Schmalzbauer <h.schmalzbauer@omnilan.de> wrote: >>>>>>>>> schrieb Pavel Polyakov am 06.03.2012 11:20 (localtime): >>>>>>>>>>>> mount -t unionfs -o noatime /usr /mnt >>>>>>>>>>>> >>>>>>>>>>>> insmntque: mp-safe fs and non-locked vp: 0xfffffe01d96704f0 is not >>>>>>>>>>>> exclusive locked but should be >>>>>>>>>>>> KDB: enter: lock violation >>>>>>>>>>> Pavel, >>>>>>>>>>> can you give a spin to this patch?: >>>>>>>>>>> http://www.freebsd.org/~attilio/unionfs_missing_insmntque_lock.patch >>>>>>>>>>> >>>>>>>>>>> I think that the unlocking is due at that point as the vnode lock can >>>>>>>>>>> be switch later on. >>>>>>>>>>> >>>>>>>>>>> Let me know what you think about it and what the test does. >>>>>>>>>> Thanks! >>>>>>>>>> This patch fixes the problem with lock violation. Sorry I've tested it so >>>>>>>>>> late. >>>>>>>>> Hello, >>>>>>>>> >>>>>>>>> this patch still applies cleanly to RELENG_9_1. Was there another fix >>>>>>>>> for the issue or has it just not been PR-sent and thus forgotten? >>>>>>>> Can you and Pavel try the attached patch? Unfortunately I had no time >>>>>>>> to test it, I just made in 5 free mins from a non-FreeBSD workstation, >>>>>>> Sorry, couldn't test earlier, but now I did: >>>>>>> With this patch applied the machine hangs without debug kernel and the >>>>>>> latter gives the following panic: >>>>>>> System call nmount returning with the following locks held: >>>>>>> exclusive lockmgr ufs (ufs) r = 0 (0xc5438278) locked @ >>>>>>> src/sys/fs/unionfs/union_vnops.c:1938 >>>>>>> panic: witness_warn >>>>>>> cpuid = 0 >>>>>>> KDB: stack backtrace: >>>>>>> db_trace_self_wrapper(c0a04f7f,c0c112c4,d1de3bb4,c097aa8c,fc,...) at >>>>>>> db_trace_self_wrapper+0x26 >>>>>>> kdb_backtrace(c0a4965f,0,c09c2ede3c1c,0,...) at kdb_backtrace+0x2a >>>>>>> witness_warn(2,0,c0a4ac34,c0a0990a,286,...) at witness_warn+0x1e4 >>>>>>> syscall(d1de3d08) ar syscall+0x415 >>>>>>> Xint0x80_syscall() at Xint0x80_syscall+0x21 >>>>>>> --- syscall (0, FreeBSD ELF32, nosys), eip = 0x280b883f,esp = >>>>>>> 0xbfbfe46c, ebp = 0xbfbfede8 --- >>>>>>> KDB: enter: panic >>>>>>> [ thread pid 86 tid 100054 ] >>>>>>> Stopped ad kdb_enter+0x3a: movl $0,kdb_why >>>>>>> db> bt >>>>>>> Tracing pid 86 tid 100054 td 0xc541b000 >>>>>>> kdb_enter(c0a00d16,c0a09130,0,0,0,...) at panix+0x190 >>>>>>> witness_warn(2,0,x0a4ac34,c0a0990a,286,...) at witness_warn+0x1e4 >>>>>>> syscall(d1de3d08) at syscall+0x415 >>>>>>> Xint0x80_syscall() at Xint0x80_syscall+0x21 >>>>>>> >>>>>>> Hmm, I guess I forgot to install kernel debug symbols... >>>>>>> Coming back if I have more >>>>>> Unfortunately unionfs does very wrong things with the insmntque() locking. >>>>>> It basically expects the vnode to return locked in the same way >>>>>> requested by the precedent namei() (when that happens) but when you do >>>>>> insmntque() you can only have an LK_EXCLUSIVE lock on the vnode. >>>>> Hello, >>>>> the following patch should workout the issues around unionfs_nodeget() a bit: >>>>> http://www.freebsd.org/~attilio/unionfs_nodeget2.patch >>>>> >>>>> Unfortunately unionfs code is rather messy in the lookup path about >>>>> locking requirements so follow what it needs to be done there is a bit >>>>> difficult. >>>>> I have no way to test this patch, so it is just test-compiled at the >>>>> moment, but I would need that you also test lookup path (so directory >>>>> "ls", find(1) on the whole unionfs volume, etc.) to validate it >>>>> someway. >>>> On a second thought, I think that locking in lookup (and also other >>>> operations) is so fragile and difficult to follow that it makes all >>>> vnops real locking landmines. >>>> I think that the following patch fixes the insmntque insertion and >>>> follows the old approach well enough to be committed separately: >>>> http://www.freebsd.org/~attilio/unionfs_nodeget3.patch >>>> >>> Unfortunately I have no idea about all those locking strategies and >>> implementations. >>> Applying unionfs_nodeget3.patch results in: >>> sys/fs/unionfs/union_subr.c: In function 'unionfs_nodeget': >>> sys/fs/unionfs/union_subr.c:332: error: expected statement >>> before ')' token >>> *** [union_subr.o] Error code 1 >>> >>> I guess there is a typo in this chunk: >>> @@ -317,11 +328,11 @@ unionfs_nodeget(struct mount *mp, struct vnode *up >>> >>> vref(vp); >>> } else >>> *vpp = vp; >>> - >>> -unionfs_nodeget_out: >>> - if (lkflags & LK_TYPE_MASK) >>> - vn_lock(vp, lkflags | LK_RETRY); >>> - >>> + if (lkflags & LK_TYPE_MASK) { >>> + if (lkflags == LK_SHARED)) >>> ---------------------------------------- ^ >>> + vn_lock(vp, LK_DOWNGRADE | LK_RETRY); >>> + } else >>> + VOP_UNLOCK(vp, LK_RELEASE); >>> return (0); >>> } >>> >>> After removing the second right parenthesis kernel compiles. >>> But it still crashes: >>> panic: Lock (lockmgr) ufs not locked @ sys/kern/vfs_default.c:512 >>> cpuid = 1 >>> KDB: stack backtrace: >>> ... >>> If you can use the bt info I'll transcribe - no serial console available :-( >>> >>> Am I right that I should only apply _one_ unionfs-patchX.patch >>> (unionfs_nodeget3.patch in that case)? >> Yes, only that one. >> Can you please do "bt" from DDB and take a picture of you screen with a camera? > > Ok, now I had a reason to take some time finding out how ESXi handles > serial ports ;-) It's quiet easy and very flexible, so no problem > setting up a debug console. > Please find attached the backtrace. > Do I have to load any symbols? It's not very informative what I see, right? Hi Harry, well done. Can you please backout the prior patch and try this one instead?: http://www.freebsd.org/~attilio/unionfs_nodeget4.patch Thanks, Attilio -- Peace can only be achieved by understanding - A. Einstein
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAJ-FndDmV%2BZa%2BFjGcmPPmEW0yMZGj90tp3U%2Btj8K_DmqjSjxfw>