Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 27 Oct 2012 22:07:49 +0100
From:      Attilio Rao <attilio@freebsd.org>
To:        Harald Schmalzbauer <h.schmalzbauer@omnilan.de>
Cc:        stable@freebsd.org, daichi@freebsd.org, Pavel Polyakov <bsd@kobyla.org>
Subject:   Re: lock violation in unionfs (9.0-STABLE r230270)
Message-ID:  <CAJ-FndACf6rO2CzhK=WrbQmXMNZtHsfMJ1mdPg4wgajiyZzt9A@mail.gmail.com>
In-Reply-To: <CAJ-FndDdV3ZthE66Z7vqnM5=-=FRzrnNTogisuTS0Fmo%2Bb_0NQ@mail.gmail.com>
References:  <op.v9l1byf89gyv16@pp> <CAJ-FndAFMV2iHcMKvMruCP%2BHRzwQuY1Jcd_o6ZEnTCiPV8_8oA@mail.gmail.com> <op.waqux6rr9gyv16@cel.home> <5022840B.3060708@omnilan.de> <CAJ-FndDkuXksyFD2Nd-S7Ty3N8boSk37=a2nYagMkguRYd1r%2Bg@mail.gmail.com> <5048C6D1.8020007@omnilan.de> <CAJ-FndAjQ-w9vLFziQKpkauyRkQnAEeYOh6nXzTR6w1gx7hsEg@mail.gmail.com> <CAJ-FndDdV3ZthE66Z7vqnM5=-=FRzrnNTogisuTS0Fmo%2Bb_0NQ@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, Oct 27, 2012 at 9:46 PM, Attilio Rao <attilio@freebsd.org> wrote:
> On Sat, Sep 8, 2012 at 12:48 AM, Attilio Rao <attilio@freebsd.org> wrote:
>> On Thu, Sep 6, 2012 at 4:52 PM, Harald Schmalzbauer
>> <h.schmalzbauer@omnilan.de> wrote:
>>>  schrieb Attilio Rao am 09.08.2012 20:26 (localtime):
>>>> On 8/8/12, Harald Schmalzbauer <h.schmalzbauer@omnilan.de> wrote:
>>>>>  schrieb Pavel Polyakov am 06.03.2012 11:20 (localtime):
>>>>>>>> mount -t unionfs -o noatime /usr /mnt
>>>>>>>>
>>>>>>>> insmntque: mp-safe fs and non-locked vp: 0xfffffe01d96704f0 is not
>>>>>>>> exclusive locked but should be
>>>>>>>> KDB: enter: lock violation
>>>>>>> Pavel,
>>>>>>> can you give a spin to this patch?:
>>>>>>> http://www.freebsd.org/~attilio/unionfs_missing_insmntque_lock.patch
>>>>>>>
>>>>>>> I think that the unlocking is due at that point as the vnode lock can
>>>>>>> be switch later on.
>>>>>>>
>>>>>>> Let me know what you think about it and what the test does.
>>>>>> Thanks!
>>>>>> This patch fixes the problem with lock violation. Sorry I've tested it so
>>>>>> late.
>>>>> Hello,
>>>>>
>>>>> this patch still applies cleanly to RELENG_9_1. Was there another fix
>>>>> for the issue or has it just not been PR-sent and thus forgotten?
>>>> Can you and Pavel try the attached patch? Unfortunately I had no time
>>>> to test it, I just made in 5 free mins from a non-FreeBSD workstation,
>>>
>>> Sorry, couldn't test earlier, but now I did:
>>> With this patch applied the machine hangs without debug kernel and the
>>> latter gives the following panic:
>>> System call nmount returning with the following locks held:
>>> exclusive lockmgr ufs (ufs) r = 0 (0xc5438278) locked @
>>> src/sys/fs/unionfs/union_vnops.c:1938
>>> panic: witness_warn
>>> cpuid = 0
>>> KDB: stack backtrace:
>>> db_trace_self_wrapper(c0a04f7f,c0c112c4,d1de3bb4,c097aa8c,fc,...) at
>>> db_trace_self_wrapper+0x26
>>> kdb_backtrace(c0a4965f,0,c09c2ede3c1c,0,...) at kdb_backtrace+0x2a
>>> witness_warn(2,0,c0a4ac34,c0a0990a,286,...) at witness_warn+0x1e4
>>> syscall(d1de3d08) ar syscall+0x415
>>> Xint0x80_syscall() at Xint0x80_syscall+0x21
>>> --- syscall (0, FreeBSD ELF32, nosys), eip = 0x280b883f,esp =
>>> 0xbfbfe46c, ebp = 0xbfbfede8 ---
>>> KDB: enter: panic
>>> [ thread pid 86 tid 100054 ]
>>> Stopped ad    kdb_enter+0x3a: movl $0,kdb_why
>>> db> bt
>>> Tracing pid 86 tid 100054 td 0xc541b000
>>> kdb_enter(c0a00d16,c0a09130,0,0,0,...) at panix+0x190
>>> witness_warn(2,0,x0a4ac34,c0a0990a,286,...) at witness_warn+0x1e4
>>> syscall(d1de3d08) at syscall+0x415
>>> Xint0x80_syscall() at Xint0x80_syscall+0x21
>>>
>>> Hmm, I guess I forgot to install kernel debug symbols...
>>> Coming back if I have more
>>
>> Unfortunately unionfs does very wrong things with the insmntque() locking.
>> It basically expects the vnode to return locked in the same way
>> requested by the precedent namei() (when that happens) but when you do
>> insmntque() you can only have an LK_EXCLUSIVE lock on the vnode.
>
> Hello,
> the following patch should workout the issues around unionfs_nodeget() a bit:
> http://www.freebsd.org/~attilio/unionfs_nodeget2.patch
>
> Unfortunately unionfs code is rather messy in the lookup path about
> locking requirements so follow what it needs to be done there is a bit
> difficult.
> I have no way to test this patch, so it is just test-compiled at the
> moment, but I would need that you also test lookup path (so directory
> "ls", find(1) on the whole unionfs volume, etc.) to validate it
> someway.

On a second thought, I think that locking in lookup (and also other
operations) is so fragile and difficult to follow that it makes all
vnops real locking landmines.
I think that the following patch fixes the insmntque insertion and
follows the old approach well enough to be committed separately:
http://www.freebsd.org/~attilio/unionfs_nodeget3.patch

However I strongly suggest that someone does review & sweep all the
locking from nodeget and  related functions removing the tedious
lkflags conditional, reinforcing and expliciting locking rules within
functions, checking out for races (which I'm sure are quite a few by
the fact that vn lock gets dropped indiscriminately in many points)
and possibly review the highly proficient usage of LK_RETRY that I'm
sure is not always safe.

All these steps should really be carried out separately.

Thanks,
Attilio


-- 
Peace can only be achieved by understanding - A. Einstein



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAJ-FndACf6rO2CzhK=WrbQmXMNZtHsfMJ1mdPg4wgajiyZzt9A>