Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 29 Oct 2012 20:38:23 +0100
From:      Harald Schmalzbauer <h.schmalzbauer@omnilan.de>
To:        stable@FreeBSD.org
Cc:        daichi@FreeBSD.org, Pavel Polyakov <bsd@kobyla.org>
Subject:   Re: lock violation in unionfs (9.0-STABLE r230270)
Message-ID:  <508EDB2F.3010608@omnilan.de>
In-Reply-To: <CAJ-FndACf6rO2CzhK=WrbQmXMNZtHsfMJ1mdPg4wgajiyZzt9A@mail.gmail.com>
References:  <op.v9l1byf89gyv16@pp> <CAJ-FndAFMV2iHcMKvMruCP%2BHRzwQuY1Jcd_o6ZEnTCiPV8_8oA@mail.gmail.com> <op.waqux6rr9gyv16@cel.home> <5022840B.3060708@omnilan.de> <CAJ-FndDkuXksyFD2Nd-S7Ty3N8boSk37=a2nYagMkguRYd1r%2Bg@mail.gmail.com> <5048C6D1.8020007@omnilan.de> <CAJ-FndAjQ-w9vLFziQKpkauyRkQnAEeYOh6nXzTR6w1gx7hsEg@mail.gmail.com> <CAJ-FndDdV3ZthE66Z7vqnM5=-=FRzrnNTogisuTS0Fmo%2Bb_0NQ@mail.gmail.com> <CAJ-FndACf6rO2CzhK=WrbQmXMNZtHsfMJ1mdPg4wgajiyZzt9A@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
This is an OpenPGP/MIME signed message (RFC 2440 and 3156)
--------------enigCBE037511D23FEEE1D7B9120
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

 schrieb Attilio Rao am 27.10.2012 23:07 (localtime):
> On Sat, Oct 27, 2012 at 9:46 PM, Attilio Rao <attilio@freebsd.org> wrot=
e:
>> On Sat, Sep 8, 2012 at 12:48 AM, Attilio Rao <attilio@freebsd.org> wro=
te:
>>> On Thu, Sep 6, 2012 at 4:52 PM, Harald Schmalzbauer
>>> <h.schmalzbauer@omnilan.de> wrote:
>>>>  schrieb Attilio Rao am 09.08.2012 20:26 (localtime):
>>>>> On 8/8/12, Harald Schmalzbauer <h.schmalzbauer@omnilan.de> wrote:
>>>>>>  schrieb Pavel Polyakov am 06.03.2012 11:20 (localtime):
>>>>>>>>> mount -t unionfs -o noatime /usr /mnt
>>>>>>>>>
>>>>>>>>> insmntque: mp-safe fs and non-locked vp: 0xfffffe01d96704f0 is =
not
>>>>>>>>> exclusive locked but should be
>>>>>>>>> KDB: enter: lock violation
>>>>>>>> Pavel,
>>>>>>>> can you give a spin to this patch?:
>>>>>>>> http://www.freebsd.org/~attilio/unionfs_missing_insmntque_lock.p=
atch
>>>>>>>>
>>>>>>>> I think that the unlocking is due at that point as the vnode loc=
k can
>>>>>>>> be switch later on.
>>>>>>>>
>>>>>>>> Let me know what you think about it and what the test does.
>>>>>>> Thanks!
>>>>>>> This patch fixes the problem with lock violation. Sorry I've test=
ed it so
>>>>>>> late.
>>>>>> Hello,
>>>>>>
>>>>>> this patch still applies cleanly to RELENG_9_1. Was there another =
fix
>>>>>> for the issue or has it just not been PR-sent and thus forgotten?
>>>>> Can you and Pavel try the attached patch? Unfortunately I had no ti=
me
>>>>> to test it, I just made in 5 free mins from a non-FreeBSD workstati=
on,
>>>> Sorry, couldn't test earlier, but now I did:
>>>> With this patch applied the machine hangs without debug kernel and t=
he
>>>> latter gives the following panic:
>>>> System call nmount returning with the following locks held:
>>>> exclusive lockmgr ufs (ufs) r =3D 0 (0xc5438278) locked @
>>>> src/sys/fs/unionfs/union_vnops.c:1938
>>>> panic: witness_warn
>>>> cpuid =3D 0
>>>> KDB: stack backtrace:
>>>> db_trace_self_wrapper(c0a04f7f,c0c112c4,d1de3bb4,c097aa8c,fc,...) at=

>>>> db_trace_self_wrapper+0x26
>>>> kdb_backtrace(c0a4965f,0,c09c2ede3c1c,0,...) at kdb_backtrace+0x2a
>>>> witness_warn(2,0,c0a4ac34,c0a0990a,286,...) at witness_warn+0x1e4
>>>> syscall(d1de3d08) ar syscall+0x415
>>>> Xint0x80_syscall() at Xint0x80_syscall+0x21
>>>> --- syscall (0, FreeBSD ELF32, nosys), eip =3D 0x280b883f,esp =3D
>>>> 0xbfbfe46c, ebp =3D 0xbfbfede8 ---
>>>> KDB: enter: panic
>>>> [ thread pid 86 tid 100054 ]
>>>> Stopped ad    kdb_enter+0x3a: movl $0,kdb_why
>>>> db> bt
>>>> Tracing pid 86 tid 100054 td 0xc541b000
>>>> kdb_enter(c0a00d16,c0a09130,0,0,0,...) at panix+0x190
>>>> witness_warn(2,0,x0a4ac34,c0a0990a,286,...) at witness_warn+0x1e4
>>>> syscall(d1de3d08) at syscall+0x415
>>>> Xint0x80_syscall() at Xint0x80_syscall+0x21
>>>>
>>>> Hmm, I guess I forgot to install kernel debug symbols...
>>>> Coming back if I have more
>>> Unfortunately unionfs does very wrong things with the insmntque() loc=
king.
>>> It basically expects the vnode to return locked in the same way
>>> requested by the precedent namei() (when that happens) but when you d=
o
>>> insmntque() you can only have an LK_EXCLUSIVE lock on the vnode.
>> Hello,
>> the following patch should workout the issues around unionfs_nodeget()=
 a bit:
>> http://www.freebsd.org/~attilio/unionfs_nodeget2.patch
>>
>> Unfortunately unionfs code is rather messy in the lookup path about
>> locking requirements so follow what it needs to be done there is a bit=

>> difficult.
>> I have no way to test this patch, so it is just test-compiled at the
>> moment, but I would need that you also test lookup path (so directory
>> "ls", find(1) on the whole unionfs volume, etc.) to validate it
>> someway.
> On a second thought, I think that locking in lookup (and also other
> operations) is so fragile and difficult to follow that it makes all
> vnops real locking landmines.
> I think that the following patch fixes the insmntque insertion and
> follows the old approach well enough to be committed separately:
> http://www.freebsd.org/~attilio/unionfs_nodeget3.patch
>

Unfortunately I have no idea about all those locking strategies and
implementations.
Applying unionfs_nodeget3.patch results in:
        sys/fs/unionfs/union_subr.c: In function 'unionfs_nodeget':
        sys/fs/unionfs/union_subr.c:332: error: expected statement
before ')' token
        *** [union_subr.o] Error code 1

I guess there is a typo in this chunk:
@@ -317,11 +328,11 @@ unionfs_nodeget(struct mount *mp, struct vnode *up

 		vref(vp);
 	} else
 		*vpp =3D vp;
-
-unionfs_nodeget_out:
-	if (lkflags & LK_TYPE_MASK)
-		vn_lock(vp, lkflags | LK_RETRY);
-
+	if (lkflags & LK_TYPE_MASK) {
+		if (lkflags =3D=3D LK_SHARED))
---------------------------------------- ^
+			vn_lock(vp, LK_DOWNGRADE | LK_RETRY);
+	} else
+		VOP_UNLOCK(vp, LK_RELEASE);
 	return (0);
 }

After removing the second right parenthesis kernel compiles.
But it still crashes:
panic: Lock (lockmgr) ufs not locked @ sys/kern/vfs_default.c:512
cpuid =3D 1
KDB: stack backtrace:
=2E..
If you can use the bt info I'll transcribe - no serial console available =
:-(

Am I right that I should only apply _one_ unionfs-patchX.patch
(unionfs_nodeget3.patch in that case)?

Thanks,

-Harry





--------------enigCBE037511D23FEEE1D7B9120
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.18 (FreeBSD)

iEYEARECAAYFAlCO2y8ACgkQLDqVQ9VXb8g20gCeINqbhpiC7Vd3Z+F/e6qf2YGF
dZMAn2qTC9ze0+UQpBk0h5w9FlULovr/
=/2Lm
-----END PGP SIGNATURE-----

--------------enigCBE037511D23FEEE1D7B9120--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?508EDB2F.3010608>