Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 03 Nov 2012 13:50:53 +0100
From:      Harald Schmalzbauer <h.schmalzbauer@omnilan.de>
To:        attilio@FreeBSD.org
Cc:        stable@FreeBSD.org, daichi@FreeBSD.org, Pavel Polyakov <bsd@kobyla.org>
Subject:   Re: lock violation in unionfs (9.0-STABLE r230270)
Message-ID:  <5095132D.8000007@omnilan.de>
In-Reply-To: <CAJ-FndDmV%2BZa%2BFjGcmPPmEW0yMZGj90tp3U%2Btj8K_DmqjSjxfw@mail.gmail.com>
References:  <op.v9l1byf89gyv16@pp>	<CAJ-FndAFMV2iHcMKvMruCP%2BHRzwQuY1Jcd_o6ZEnTCiPV8_8oA@mail.gmail.com>	<op.waqux6rr9gyv16@cel.home>	<5022840B.3060708@omnilan.de>	<CAJ-FndDkuXksyFD2Nd-S7Ty3N8boSk37=a2nYagMkguRYd1r%2Bg@mail.gmail.com>	<5048C6D1.8020007@omnilan.de>	<CAJ-FndAjQ-w9vLFziQKpkauyRkQnAEeYOh6nXzTR6w1gx7hsEg@mail.gmail.com>	<CAJ-FndDdV3ZthE66Z7vqnM5=-=FRzrnNTogisuTS0Fmo%2Bb_0NQ@mail.gmail.com>	<CAJ-FndACf6rO2CzhK=WrbQmXMNZtHsfMJ1mdPg4wgajiyZzt9A@mail.gmail.com>	<508EDB2F.3010608@omnilan.de>	<50910751.9030303@omnilan.de> <CAJ-FndDmV%2BZa%2BFjGcmPPmEW0yMZGj90tp3U%2Btj8K_DmqjSjxfw@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
This is an OpenPGP/MIME signed message (RFC 2440 and 3156)
--------------enigB4935B67372D64C0CC73A8A4
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

 schrieb Attilio Rao am 02.11.2012 15:21 (localtime):
> On Wed, Oct 31, 2012 at 11:11 AM, Harald Schmalzbauer
> <h.schmalzbauer@omnilan.de> wrote:
>>  schrieb Attilio Rao am 29.10.2012 23:02 (localtime):
>>> On Mon, Oct 29, 2012 at 7:37 PM, Harald Schmalzbauer
>>> <h.schmalzbauer@omnilan.de> wrote:
>>>>  schrieb Attilio Rao am 27.10.2012 23:07 (localtime):
>>>>> On Sat, Oct 27, 2012 at 9:46 PM, Attilio Rao <attilio@freebsd.org> =
wrote:
>>>>>> On Sat, Sep 8, 2012 at 12:48 AM, Attilio Rao <attilio@freebsd.org>=
 wrote:
>>>>>>> On Thu, Sep 6, 2012 at 4:52 PM, Harald Schmalzbauer
>>>>>>> <h.schmalzbauer@omnilan.de> wrote:
>>>>>>>>  schrieb Attilio Rao am 09.08.2012 20:26 (localtime):
>>>>>>>>> On 8/8/12, Harald Schmalzbauer <h.schmalzbauer@omnilan.de> wrot=
e:
>>>>>>>>>>  schrieb Pavel Polyakov am 06.03.2012 11:20 (localtime):
>>>>>>>>>>>>> mount -t unionfs -o noatime /usr /mnt
>>>>>>>>>>>>>
>>>>>>>>>>>>> insmntque: mp-safe fs and non-locked vp: 0xfffffe01d96704f0=
 is not
>>>>>>>>>>>>> exclusive locked but should be
>>>>>>>>>>>>> KDB: enter: lock violation
>>>>>>>>>>>> Pavel,
>>>>>>>>>>>> can you give a spin to this patch?:
>>>>>>>>>>>> http://www.freebsd.org/~attilio/unionfs_missing_insmntque_lo=
ck.patch
>>>>>>>>>>>>
>>>>>>>>>>>> I think that the unlocking is due at that point as the vnode=
 lock can
>>>>>>>>>>>> be switch later on.
>>>>>>>>>>>>
>>>>>>>>>>>> Let me know what you think about it and what the test does.
>>>>>>>>>>> Thanks!
>>>>>>>>>>> This patch fixes the problem with lock violation. Sorry I've =
tested it so
>>>>>>>>>>> late.
>>>>>>>>>> Hello,
>>>>>>>>>>
>>>>>>>>>> this patch still applies cleanly to RELENG_9_1. Was there anot=
her fix
>>>>>>>>>> for the issue or has it just not been PR-sent and thus forgott=
en?
>>>>>>>>> Can you and Pavel try the attached patch? Unfortunately I had n=
o time
>>>>>>>>> to test it, I just made in 5 free mins from a non-FreeBSD works=
tation,
>>>>>>>> Sorry, couldn't test earlier, but now I did:
>>>>>>>> With this patch applied the machine hangs without debug kernel a=
nd the
>>>>>>>> latter gives the following panic:
>>>>>>>> System call nmount returning with the following locks held:
>>>>>>>> exclusive lockmgr ufs (ufs) r =3D 0 (0xc5438278) locked @
>>>>>>>> src/sys/fs/unionfs/union_vnops.c:1938
>>>>>>>> panic: witness_warn
>>>>>>>> cpuid =3D 0
>>>>>>>> KDB: stack backtrace:
>>>>>>>> db_trace_self_wrapper(c0a04f7f,c0c112c4,d1de3bb4,c097aa8c,fc,...=
) at
>>>>>>>> db_trace_self_wrapper+0x26
>>>>>>>> kdb_backtrace(c0a4965f,0,c09c2ede3c1c,0,...) at kdb_backtrace+0x=
2a
>>>>>>>> witness_warn(2,0,c0a4ac34,c0a0990a,286,...) at witness_warn+0x1e=
4
>>>>>>>> syscall(d1de3d08) ar syscall+0x415
>>>>>>>> Xint0x80_syscall() at Xint0x80_syscall+0x21
>>>>>>>> --- syscall (0, FreeBSD ELF32, nosys), eip =3D 0x280b883f,esp =3D=

>>>>>>>> 0xbfbfe46c, ebp =3D 0xbfbfede8 ---
>>>>>>>> KDB: enter: panic
>>>>>>>> [ thread pid 86 tid 100054 ]
>>>>>>>> Stopped ad    kdb_enter+0x3a: movl $0,kdb_why
>>>>>>>> db> bt
>>>>>>>> Tracing pid 86 tid 100054 td 0xc541b000
>>>>>>>> kdb_enter(c0a00d16,c0a09130,0,0,0,...) at panix+0x190
>>>>>>>> witness_warn(2,0,x0a4ac34,c0a0990a,286,...) at witness_warn+0x1e=
4
>>>>>>>> syscall(d1de3d08) at syscall+0x415
>>>>>>>> Xint0x80_syscall() at Xint0x80_syscall+0x21
>>>>>>>>
>>>>>>>> Hmm, I guess I forgot to install kernel debug symbols...
>>>>>>>> Coming back if I have more
>>>>>>> Unfortunately unionfs does very wrong things with the insmntque()=
 locking.
>>>>>>> It basically expects the vnode to return locked in the same way
>>>>>>> requested by the precedent namei() (when that happens) but when y=
ou do
>>>>>>> insmntque() you can only have an LK_EXCLUSIVE lock on the vnode.
>>>>>> Hello,
>>>>>> the following patch should workout the issues around unionfs_nodeg=
et() a bit:
>>>>>> http://www.freebsd.org/~attilio/unionfs_nodeget2.patch
>>>>>>
>>>>>> Unfortunately unionfs code is rather messy in the lookup path abou=
t
>>>>>> locking requirements so follow what it needs to be done there is a=
 bit
>>>>>> difficult.
>>>>>> I have no way to test this patch, so it is just test-compiled at t=
he
>>>>>> moment, but I would need that you also test lookup path (so direct=
ory
>>>>>> "ls", find(1) on the whole unionfs volume, etc.) to validate it
>>>>>> someway.
>>>>> On a second thought, I think that locking in lookup (and also other=

>>>>> operations) is so fragile and difficult to follow that it makes all=

>>>>> vnops real locking landmines.
>>>>> I think that the following patch fixes the insmntque insertion and
>>>>> follows the old approach well enough to be committed separately:
>>>>> http://www.freebsd.org/~attilio/unionfs_nodeget3.patch
>>>>>
>>>> Unfortunately I have no idea about all those locking strategies and
>>>> implementations.
>>>> Applying unionfs_nodeget3.patch results in:
>>>>         sys/fs/unionfs/union_subr.c: In function 'unionfs_nodeget':
>>>>         sys/fs/unionfs/union_subr.c:332: error: expected statement
>>>> before ')' token
>>>>         *** [union_subr.o] Error code 1
>>>>
>>>> I guess there is a typo in this chunk:
>>>> @@ -317,11 +328,11 @@ unionfs_nodeget(struct mount *mp, struct vnode=
 *up
>>>>
>>>>                 vref(vp);
>>>>         } else
>>>>                 *vpp =3D vp;
>>>> -
>>>> -unionfs_nodeget_out:
>>>> -       if (lkflags & LK_TYPE_MASK)
>>>> -               vn_lock(vp, lkflags | LK_RETRY);
>>>> -
>>>> +       if (lkflags & LK_TYPE_MASK) {
>>>> +               if (lkflags =3D=3D LK_SHARED))
>>>> ---------------------------------------- ^
>>>> +                       vn_lock(vp, LK_DOWNGRADE | LK_RETRY);
>>>> +       } else
>>>> +               VOP_UNLOCK(vp, LK_RELEASE);
>>>>         return (0);
>>>>  }
>>>>
>>>> After removing the second right parenthesis kernel compiles.
>>>> But it still crashes:
>>>> panic: Lock (lockmgr) ufs not locked @ sys/kern/vfs_default.c:512
>>>> cpuid =3D 1
>>>> KDB: stack backtrace:
>>>> ...
>>>> If you can use the bt info I'll transcribe - no serial console avail=
able :-(
>>>>
>>>> Am I right that I should only apply _one_ unionfs-patchX.patch
>>>> (unionfs_nodeget3.patch in that case)?
>>> Yes, only that one.
>>> Can you please do "bt" from DDB and take a picture of you screen with=
 a camera?
>> Ok, now I had a reason to take some time finding out how ESXi handles
>> serial ports ;-) It's quiet easy and very flexible, so no problem
>> setting up a debug console.
>> Please find attached the backtrace.
>> Do I have to load any symbols? It's not very informative what I see, r=
ight?
> Hi Harry,
> well done.
>
> Can you please backout the prior patch and try this one instead?:
> http://www.freebsd.org/~attilio/unionfs_nodeget4.patch

Unfortunately still panic, but only with debug kernel.
Accidentally I first built a non-debug kernel and did some tests -> no
crash with regular usage.
I also took a completely unrelated PR to run the example "killer-app" on
an upper_union mounted directory and ran the test for some minutes - no
crash: http://www.freebsd.org/cgi/query-pr.cgi?pr=3D159971

Since I also saw no LOR I checkd if I really have a debug-kernel...
Now with debug kernel I get this panic at boot (where /.safe/etc gets
mounted over /etc):

panic: Lock (lockmgr) ufs not locked @
/usr/local/share/deploy-tools/RELENG_9_1/src/sys/kern/vfs_default.c:512.
cpuid =3D 0
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2a
kdb_backtrace() at kdb_backtrace+0x37
panic() at panic+0x1cd
witness_assert() at witness_assert+0x225
__lockmgr_args() at __lockmgr_args+0xb65
vop_stdunlock() at vop_stdunlock+0x43
VOP_UNLOCK_APV() at VOP_UNLOCK_APV+0x9b
unionfs_unlock() at unionfs_unlock+0xe1
VOP_UNLOCK_APV() at VOP_UNLOCK_APV+0x9b
unionfs_nodeget() at unionfs_nodeget+0x615
unionfs_domount() at unionfs_domount+0x4ab
vfs_donmount() at vfs_donmount+0x960
sys_nmount() at sys_nmount+0x66
amd64_syscall() at amd64_syscall+0x2fa
Xfast_syscall() at Xfast_syscall+0xf7
--- syscall (378, FreeBSD ELF64, sys_nmount), rip =3D 0x80087798c, rsp =3D=

0x7fffffffd328, rbp =3D 0x7fffffffd750 ---
KDB: enter: panic
[ thread pid 72 tid 100083 ]
Stopped at      kdb_enter+0x3b: movq    $0,0x64cd42(%rip)
db> bt
Tracing pid 72 tid 100083 td 0xfffffe00074c78e0
kdb_enter() at kdb_enter+0x3b
panic() at panic+0x1c6
witness_assert() at witness_assert+0x225
__lockmgr_args() at __lockmgr_args+0xb65
vop_stdunlock() at vop_stdunlock+0x43
VOP_UNLOCK_APV() at VOP_UNLOCK_APV+0x9b
unionfs_unlock() at unionfs_unlock+0xe1
VOP_UNLOCK_APV() at VOP_UNLOCK_APV+0x9b
unionfs_nodeget() at unionfs_nodeget+0x615
unionfs_domount() at unionfs_domount+0x4ab
vfs_donmount() at vfs_donmount+0x960
sys_nmount() at sys_nmount+0x66
amd64_syscall() at amd64_syscall+0x2fa
Xfast_syscall() at Xfast_syscall+0xf7
--- syscall (378, FreeBSD ELF64, sys_nmount), rip =3D 0x80087798c, rsp =3D=

0x7fffffffd328, rbp =3D 0x7fffffffd750 ---

Thanks for your help,

-Harry


--------------enigB4935B67372D64C0CC73A8A4
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.18 (FreeBSD)

iEYEARECAAYFAlCVEzYACgkQLDqVQ9VXb8hXWACgmXVSDxPkm771yitjPs5ZdhtV
pjMAoLKo3kQ/U9Dc7oAYYXAFkdcDB9UF
=B2dc
-----END PGP SIGNATURE-----

--------------enigB4935B67372D64C0CC73A8A4--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?5095132D.8000007>