Date: Wed, 31 Oct 2012 12:11:13 +0100 From: Harald Schmalzbauer <h.schmalzbauer@omnilan.de> To: stable@FreeBSD.org Cc: daichi@FreeBSD.org, Pavel Polyakov <bsd@kobyla.org> Subject: Re: lock violation in unionfs (9.0-STABLE r230270) Message-ID: <50910751.9030303@omnilan.de> In-Reply-To: <508EDB2F.3010608@omnilan.de> References: <op.v9l1byf89gyv16@pp> <CAJ-FndAFMV2iHcMKvMruCP%2BHRzwQuY1Jcd_o6ZEnTCiPV8_8oA@mail.gmail.com> <op.waqux6rr9gyv16@cel.home> <5022840B.3060708@omnilan.de> <CAJ-FndDkuXksyFD2Nd-S7Ty3N8boSk37=a2nYagMkguRYd1r%2Bg@mail.gmail.com> <5048C6D1.8020007@omnilan.de> <CAJ-FndAjQ-w9vLFziQKpkauyRkQnAEeYOh6nXzTR6w1gx7hsEg@mail.gmail.com> <CAJ-FndDdV3ZthE66Z7vqnM5=-=FRzrnNTogisuTS0Fmo%2Bb_0NQ@mail.gmail.com> <CAJ-FndACf6rO2CzhK=WrbQmXMNZtHsfMJ1mdPg4wgajiyZzt9A@mail.gmail.com> <508EDB2F.3010608@omnilan.de>
next in thread | previous in thread | raw e-mail | index | archive | help
This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --------------enig62ECCE36897048D96B1EE1BC Content-Type: multipart/mixed; boundary="------------080309090903020506070608" This is a multi-part message in MIME format. --------------080309090903020506070608 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable schrieb Attilio Rao am 29.10.2012 23:02 (localtime): > On Mon, Oct 29, 2012 at 7:37 PM, Harald Schmalzbauer > <h.schmalzbauer@omnilan.de> wrote: >> schrieb Attilio Rao am 27.10.2012 23:07 (localtime): >>> On Sat, Oct 27, 2012 at 9:46 PM, Attilio Rao <attilio@freebsd.org> wr= ote: >>>> On Sat, Sep 8, 2012 at 12:48 AM, Attilio Rao <attilio@freebsd.org> w= rote: >>>>> On Thu, Sep 6, 2012 at 4:52 PM, Harald Schmalzbauer >>>>> <h.schmalzbauer@omnilan.de> wrote: >>>>>> schrieb Attilio Rao am 09.08.2012 20:26 (localtime): >>>>>>> On 8/8/12, Harald Schmalzbauer <h.schmalzbauer@omnilan.de> wrote:= >>>>>>>> schrieb Pavel Polyakov am 06.03.2012 11:20 (localtime): >>>>>>>>>>> mount -t unionfs -o noatime /usr /mnt >>>>>>>>>>> >>>>>>>>>>> insmntque: mp-safe fs and non-locked vp: 0xfffffe01d96704f0 i= s not >>>>>>>>>>> exclusive locked but should be >>>>>>>>>>> KDB: enter: lock violation >>>>>>>>>> Pavel, >>>>>>>>>> can you give a spin to this patch?: >>>>>>>>>> http://www.freebsd.org/~attilio/unionfs_missing_insmntque_lock= =2Epatch >>>>>>>>>> >>>>>>>>>> I think that the unlocking is due at that point as the vnode l= ock can >>>>>>>>>> be switch later on. >>>>>>>>>> >>>>>>>>>> Let me know what you think about it and what the test does. >>>>>>>>> Thanks! >>>>>>>>> This patch fixes the problem with lock violation. Sorry I've te= sted it so >>>>>>>>> late. >>>>>>>> Hello, >>>>>>>> >>>>>>>> this patch still applies cleanly to RELENG_9_1. Was there anothe= r fix >>>>>>>> for the issue or has it just not been PR-sent and thus forgotten= ? >>>>>>> Can you and Pavel try the attached patch? Unfortunately I had no = time >>>>>>> to test it, I just made in 5 free mins from a non-FreeBSD worksta= tion, >>>>>> Sorry, couldn't test earlier, but now I did: >>>>>> With this patch applied the machine hangs without debug kernel and= the >>>>>> latter gives the following panic: >>>>>> System call nmount returning with the following locks held: >>>>>> exclusive lockmgr ufs (ufs) r =3D 0 (0xc5438278) locked @ >>>>>> src/sys/fs/unionfs/union_vnops.c:1938 >>>>>> panic: witness_warn >>>>>> cpuid =3D 0 >>>>>> KDB: stack backtrace: >>>>>> db_trace_self_wrapper(c0a04f7f,c0c112c4,d1de3bb4,c097aa8c,fc,...) = at >>>>>> db_trace_self_wrapper+0x26 >>>>>> kdb_backtrace(c0a4965f,0,c09c2ede3c1c,0,...) at kdb_backtrace+0x2a= >>>>>> witness_warn(2,0,c0a4ac34,c0a0990a,286,...) at witness_warn+0x1e4 >>>>>> syscall(d1de3d08) ar syscall+0x415 >>>>>> Xint0x80_syscall() at Xint0x80_syscall+0x21 >>>>>> --- syscall (0, FreeBSD ELF32, nosys), eip =3D 0x280b883f,esp =3D >>>>>> 0xbfbfe46c, ebp =3D 0xbfbfede8 --- >>>>>> KDB: enter: panic >>>>>> [ thread pid 86 tid 100054 ] >>>>>> Stopped ad kdb_enter+0x3a: movl $0,kdb_why >>>>>> db> bt >>>>>> Tracing pid 86 tid 100054 td 0xc541b000 >>>>>> kdb_enter(c0a00d16,c0a09130,0,0,0,...) at panix+0x190 >>>>>> witness_warn(2,0,x0a4ac34,c0a0990a,286,...) at witness_warn+0x1e4 >>>>>> syscall(d1de3d08) at syscall+0x415 >>>>>> Xint0x80_syscall() at Xint0x80_syscall+0x21 >>>>>> >>>>>> Hmm, I guess I forgot to install kernel debug symbols... >>>>>> Coming back if I have more >>>>> Unfortunately unionfs does very wrong things with the insmntque() l= ocking. >>>>> It basically expects the vnode to return locked in the same way >>>>> requested by the precedent namei() (when that happens) but when you= do >>>>> insmntque() you can only have an LK_EXCLUSIVE lock on the vnode. >>>> Hello, >>>> the following patch should workout the issues around unionfs_nodeget= () a bit: >>>> http://www.freebsd.org/~attilio/unionfs_nodeget2.patch >>>> >>>> Unfortunately unionfs code is rather messy in the lookup path about >>>> locking requirements so follow what it needs to be done there is a b= it >>>> difficult. >>>> I have no way to test this patch, so it is just test-compiled at the= >>>> moment, but I would need that you also test lookup path (so director= y >>>> "ls", find(1) on the whole unionfs volume, etc.) to validate it >>>> someway. >>> On a second thought, I think that locking in lookup (and also other >>> operations) is so fragile and difficult to follow that it makes all >>> vnops real locking landmines. >>> I think that the following patch fixes the insmntque insertion and >>> follows the old approach well enough to be committed separately: >>> http://www.freebsd.org/~attilio/unionfs_nodeget3.patch >>> >> Unfortunately I have no idea about all those locking strategies and >> implementations. >> Applying unionfs_nodeget3.patch results in: >> sys/fs/unionfs/union_subr.c: In function 'unionfs_nodeget': >> sys/fs/unionfs/union_subr.c:332: error: expected statement >> before ')' token >> *** [union_subr.o] Error code 1 >> >> I guess there is a typo in this chunk: >> @@ -317,11 +328,11 @@ unionfs_nodeget(struct mount *mp, struct vnode *= up >> >> vref(vp); >> } else >> *vpp =3D vp; >> - >> -unionfs_nodeget_out: >> - if (lkflags & LK_TYPE_MASK) >> - vn_lock(vp, lkflags | LK_RETRY); >> - >> + if (lkflags & LK_TYPE_MASK) { >> + if (lkflags =3D=3D LK_SHARED)) >> ---------------------------------------- ^ >> + vn_lock(vp, LK_DOWNGRADE | LK_RETRY); >> + } else >> + VOP_UNLOCK(vp, LK_RELEASE); >> return (0); >> } >> >> After removing the second right parenthesis kernel compiles. >> But it still crashes: >> panic: Lock (lockmgr) ufs not locked @ sys/kern/vfs_default.c:512 >> cpuid =3D 1 >> KDB: stack backtrace: >> ... >> If you can use the bt info I'll transcribe - no serial console availab= le :-( >> >> Am I right that I should only apply _one_ unionfs-patchX.patch >> (unionfs_nodeget3.patch in that case)? > Yes, only that one. > Can you please do "bt" from DDB and take a picture of you screen with a= camera? Ok, now I had a reason to take some time finding out how ESXi handles serial ports ;-) It's quiet easy and very flexible, so no problem setting up a debug console. Please find attached the backtrace. Do I have to load any symbols? It's not very informative what I see, righ= t? Thanks, -Harry --------------080309090903020506070608 Content-Type: text/plain; name="unionfs-nodeget3_backtrace.txt" Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename="unionfs-nodeget3_backtrace.txt" panic: Lock (lockmgr) ufs not locked @ /usr/local/share/deploy-tools/RELE= NG_9_1/src/sys/kern/vfs_default.c:512. cpuid =3D 0 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2a kdb_backtrace() at kdb_backtrace+0x37 panic() at panic+0x1cd witness_assert() at witness_assert+0x225 __lockmgr_args() at __lockmgr_args+0xb65 vop_stdunlock() at vop_stdunlock+0x43 VOP_UNLOCK_APV() at VOP_UNLOCK_APV+0x9b unionfs_unlock() at unionfs_unlock+0xe1 VOP_UNLOCK_APV() at VOP_UNLOCK_APV+0x9b unionfs_nodeget() at unionfs_nodeget+0x5a9 unionfs_domount() at unionfs_domount+0x4ab vfs_donmount() at vfs_donmount+0x960 sys_nmount() at sys_nmount+0x66 amd64_syscall() at amd64_syscall+0x2fa Xfast_syscall() at Xfast_syscall+0xf7 --- syscall (378, FreeBSD ELF64, sys_nmount), rip =3D 0x80087798c, rsp =3D= 0x7fffffffd328, rbp =3D 0x7fffffffd750 --- KDB: enter: panic [ thread pid 72 tid 100072 ] Stopped at kdb_enter+0x3b: movq $0,0x64cd52(%rip) db> bt Tracing pid 72 tid 100072 td 0xfffffe0007344470 kdb_enter() at kdb_enter+0x3b panic() at panic+0x1c6 witness_assert() at witness_assert+0x225 __lockmgr_args() at __lockmgr_args+0xb65 vop_stdunlock() at vop_stdunlock+0x43 VOP_UNLOCK_APV() at VOP_UNLOCK_APV+0x9b unionfs_unlock() at unionfs_unlock+0xe1 VOP_UNLOCK_APV() at VOP_UNLOCK_APV+0x9b unionfs_nodeget() at unionfs_nodeget+0x5a9 unionfs_domount() at unionfs_domount+0x4ab vfs_donmount() at vfs_donmount+0x960 sys_nmount() at sys_nmount+0x66 amd64_syscall() at amd64_syscall+0x2fa Xfast_syscall() at Xfast_syscall+0xf7 --- syscall (378, FreeBSD ELF64, sys_nmount), rip =3D 0x80087798c, rsp =3D= 0x7fffffffd328, rbp =3D 0x7fffffffd750 --- db> --------------080309090903020506070608-- --------------enig62ECCE36897048D96B1EE1BC Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.18 (FreeBSD) iEYEARECAAYFAlCRB1EACgkQLDqVQ9VXb8iCBgCfRexfEVFPISILVforSldh6mKe VsYAnAmR4qxAWOBVr7RnVDYmeXeZu2Ok =lxam -----END PGP SIGNATURE----- --------------enig62ECCE36897048D96B1EE1BC--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?50910751.9030303>