From owner-freebsd-stable@FreeBSD.ORG Sat Nov 3 12:51:14 2012 Return-Path: Delivered-To: stable@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 28ECCA5A; Sat, 3 Nov 2012 12:51:14 +0000 (UTC) (envelope-from h.schmalzbauer@omnilan.de) Received: from host.omnilan.net (s1.omnilan.net [62.245.232.135]) by mx1.freebsd.org (Postfix) with ESMTP id 7C2578FC0A; Sat, 3 Nov 2012 12:51:12 +0000 (UTC) Received: from titan.inop.wdn.omnilan.net (titan.inop.wdn.omnilan.net [172.21.3.1]) (authenticated bits=0) by host.omnilan.net (8.13.8/8.13.8) with ESMTP id qA3CqY4n029869 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sat, 3 Nov 2012 13:52:37 +0100 (CET) (envelope-from h.schmalzbauer@omnilan.de) Message-ID: <5095132D.8000007@omnilan.de> Date: Sat, 03 Nov 2012 13:50:53 +0100 From: Harald Schmalzbauer Organization: OmniLAN User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; de-DE; rv:1.9.2.8) Gecko/20100906 Lightning/1.0b2 Thunderbird/3.1.2 MIME-Version: 1.0 To: attilio@FreeBSD.org Subject: Re: lock violation in unionfs (9.0-STABLE r230270) References: <5022840B.3060708@omnilan.de> <5048C6D1.8020007@omnilan.de> <508EDB2F.3010608@omnilan.de> <50910751.9030303@omnilan.de> In-Reply-To: X-Enigmail-Version: 1.1.2 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="------------enigB4935B67372D64C0CC73A8A4" Cc: stable@FreeBSD.org, daichi@FreeBSD.org, Pavel Polyakov X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 03 Nov 2012 12:51:14 -0000 This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --------------enigB4935B67372D64C0CC73A8A4 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable schrieb Attilio Rao am 02.11.2012 15:21 (localtime): > On Wed, Oct 31, 2012 at 11:11 AM, Harald Schmalzbauer > wrote: >> schrieb Attilio Rao am 29.10.2012 23:02 (localtime): >>> On Mon, Oct 29, 2012 at 7:37 PM, Harald Schmalzbauer >>> wrote: >>>> schrieb Attilio Rao am 27.10.2012 23:07 (localtime): >>>>> On Sat, Oct 27, 2012 at 9:46 PM, Attilio Rao = wrote: >>>>>> On Sat, Sep 8, 2012 at 12:48 AM, Attilio Rao = wrote: >>>>>>> On Thu, Sep 6, 2012 at 4:52 PM, Harald Schmalzbauer >>>>>>> wrote: >>>>>>>> schrieb Attilio Rao am 09.08.2012 20:26 (localtime): >>>>>>>>> On 8/8/12, Harald Schmalzbauer wrot= e: >>>>>>>>>> schrieb Pavel Polyakov am 06.03.2012 11:20 (localtime): >>>>>>>>>>>>> mount -t unionfs -o noatime /usr /mnt >>>>>>>>>>>>> >>>>>>>>>>>>> insmntque: mp-safe fs and non-locked vp: 0xfffffe01d96704f0= is not >>>>>>>>>>>>> exclusive locked but should be >>>>>>>>>>>>> KDB: enter: lock violation >>>>>>>>>>>> Pavel, >>>>>>>>>>>> can you give a spin to this patch?: >>>>>>>>>>>> http://www.freebsd.org/~attilio/unionfs_missing_insmntque_lo= ck.patch >>>>>>>>>>>> >>>>>>>>>>>> I think that the unlocking is due at that point as the vnode= lock can >>>>>>>>>>>> be switch later on. >>>>>>>>>>>> >>>>>>>>>>>> Let me know what you think about it and what the test does. >>>>>>>>>>> Thanks! >>>>>>>>>>> This patch fixes the problem with lock violation. Sorry I've = tested it so >>>>>>>>>>> late. >>>>>>>>>> Hello, >>>>>>>>>> >>>>>>>>>> this patch still applies cleanly to RELENG_9_1. Was there anot= her fix >>>>>>>>>> for the issue or has it just not been PR-sent and thus forgott= en? >>>>>>>>> Can you and Pavel try the attached patch? Unfortunately I had n= o time >>>>>>>>> to test it, I just made in 5 free mins from a non-FreeBSD works= tation, >>>>>>>> Sorry, couldn't test earlier, but now I did: >>>>>>>> With this patch applied the machine hangs without debug kernel a= nd the >>>>>>>> latter gives the following panic: >>>>>>>> System call nmount returning with the following locks held: >>>>>>>> exclusive lockmgr ufs (ufs) r =3D 0 (0xc5438278) locked @ >>>>>>>> src/sys/fs/unionfs/union_vnops.c:1938 >>>>>>>> panic: witness_warn >>>>>>>> cpuid =3D 0 >>>>>>>> KDB: stack backtrace: >>>>>>>> db_trace_self_wrapper(c0a04f7f,c0c112c4,d1de3bb4,c097aa8c,fc,...= ) at >>>>>>>> db_trace_self_wrapper+0x26 >>>>>>>> kdb_backtrace(c0a4965f,0,c09c2ede3c1c,0,...) at kdb_backtrace+0x= 2a >>>>>>>> witness_warn(2,0,c0a4ac34,c0a0990a,286,...) at witness_warn+0x1e= 4 >>>>>>>> syscall(d1de3d08) ar syscall+0x415 >>>>>>>> Xint0x80_syscall() at Xint0x80_syscall+0x21 >>>>>>>> --- syscall (0, FreeBSD ELF32, nosys), eip =3D 0x280b883f,esp =3D= >>>>>>>> 0xbfbfe46c, ebp =3D 0xbfbfede8 --- >>>>>>>> KDB: enter: panic >>>>>>>> [ thread pid 86 tid 100054 ] >>>>>>>> Stopped ad kdb_enter+0x3a: movl $0,kdb_why >>>>>>>> db> bt >>>>>>>> Tracing pid 86 tid 100054 td 0xc541b000 >>>>>>>> kdb_enter(c0a00d16,c0a09130,0,0,0,...) at panix+0x190 >>>>>>>> witness_warn(2,0,x0a4ac34,c0a0990a,286,...) at witness_warn+0x1e= 4 >>>>>>>> syscall(d1de3d08) at syscall+0x415 >>>>>>>> Xint0x80_syscall() at Xint0x80_syscall+0x21 >>>>>>>> >>>>>>>> Hmm, I guess I forgot to install kernel debug symbols... >>>>>>>> Coming back if I have more >>>>>>> Unfortunately unionfs does very wrong things with the insmntque()= locking. >>>>>>> It basically expects the vnode to return locked in the same way >>>>>>> requested by the precedent namei() (when that happens) but when y= ou do >>>>>>> insmntque() you can only have an LK_EXCLUSIVE lock on the vnode. >>>>>> Hello, >>>>>> the following patch should workout the issues around unionfs_nodeg= et() a bit: >>>>>> http://www.freebsd.org/~attilio/unionfs_nodeget2.patch >>>>>> >>>>>> Unfortunately unionfs code is rather messy in the lookup path abou= t >>>>>> locking requirements so follow what it needs to be done there is a= bit >>>>>> difficult. >>>>>> I have no way to test this patch, so it is just test-compiled at t= he >>>>>> moment, but I would need that you also test lookup path (so direct= ory >>>>>> "ls", find(1) on the whole unionfs volume, etc.) to validate it >>>>>> someway. >>>>> On a second thought, I think that locking in lookup (and also other= >>>>> operations) is so fragile and difficult to follow that it makes all= >>>>> vnops real locking landmines. >>>>> I think that the following patch fixes the insmntque insertion and >>>>> follows the old approach well enough to be committed separately: >>>>> http://www.freebsd.org/~attilio/unionfs_nodeget3.patch >>>>> >>>> Unfortunately I have no idea about all those locking strategies and >>>> implementations. >>>> Applying unionfs_nodeget3.patch results in: >>>> sys/fs/unionfs/union_subr.c: In function 'unionfs_nodeget': >>>> sys/fs/unionfs/union_subr.c:332: error: expected statement >>>> before ')' token >>>> *** [union_subr.o] Error code 1 >>>> >>>> I guess there is a typo in this chunk: >>>> @@ -317,11 +328,11 @@ unionfs_nodeget(struct mount *mp, struct vnode= *up >>>> >>>> vref(vp); >>>> } else >>>> *vpp =3D vp; >>>> - >>>> -unionfs_nodeget_out: >>>> - if (lkflags & LK_TYPE_MASK) >>>> - vn_lock(vp, lkflags | LK_RETRY); >>>> - >>>> + if (lkflags & LK_TYPE_MASK) { >>>> + if (lkflags =3D=3D LK_SHARED)) >>>> ---------------------------------------- ^ >>>> + vn_lock(vp, LK_DOWNGRADE | LK_RETRY); >>>> + } else >>>> + VOP_UNLOCK(vp, LK_RELEASE); >>>> return (0); >>>> } >>>> >>>> After removing the second right parenthesis kernel compiles. >>>> But it still crashes: >>>> panic: Lock (lockmgr) ufs not locked @ sys/kern/vfs_default.c:512 >>>> cpuid =3D 1 >>>> KDB: stack backtrace: >>>> ... >>>> If you can use the bt info I'll transcribe - no serial console avail= able :-( >>>> >>>> Am I right that I should only apply _one_ unionfs-patchX.patch >>>> (unionfs_nodeget3.patch in that case)? >>> Yes, only that one. >>> Can you please do "bt" from DDB and take a picture of you screen with= a camera? >> Ok, now I had a reason to take some time finding out how ESXi handles >> serial ports ;-) It's quiet easy and very flexible, so no problem >> setting up a debug console. >> Please find attached the backtrace. >> Do I have to load any symbols? It's not very informative what I see, r= ight? > Hi Harry, > well done. > > Can you please backout the prior patch and try this one instead?: > http://www.freebsd.org/~attilio/unionfs_nodeget4.patch Unfortunately still panic, but only with debug kernel. Accidentally I first built a non-debug kernel and did some tests -> no crash with regular usage. I also took a completely unrelated PR to run the example "killer-app" on an upper_union mounted directory and ran the test for some minutes - no crash: http://www.freebsd.org/cgi/query-pr.cgi?pr=3D159971 Since I also saw no LOR I checkd if I really have a debug-kernel... Now with debug kernel I get this panic at boot (where /.safe/etc gets mounted over /etc): panic: Lock (lockmgr) ufs not locked @ /usr/local/share/deploy-tools/RELENG_9_1/src/sys/kern/vfs_default.c:512. cpuid =3D 0 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2a kdb_backtrace() at kdb_backtrace+0x37 panic() at panic+0x1cd witness_assert() at witness_assert+0x225 __lockmgr_args() at __lockmgr_args+0xb65 vop_stdunlock() at vop_stdunlock+0x43 VOP_UNLOCK_APV() at VOP_UNLOCK_APV+0x9b unionfs_unlock() at unionfs_unlock+0xe1 VOP_UNLOCK_APV() at VOP_UNLOCK_APV+0x9b unionfs_nodeget() at unionfs_nodeget+0x615 unionfs_domount() at unionfs_domount+0x4ab vfs_donmount() at vfs_donmount+0x960 sys_nmount() at sys_nmount+0x66 amd64_syscall() at amd64_syscall+0x2fa Xfast_syscall() at Xfast_syscall+0xf7 --- syscall (378, FreeBSD ELF64, sys_nmount), rip =3D 0x80087798c, rsp =3D= 0x7fffffffd328, rbp =3D 0x7fffffffd750 --- KDB: enter: panic [ thread pid 72 tid 100083 ] Stopped at kdb_enter+0x3b: movq $0,0x64cd42(%rip) db> bt Tracing pid 72 tid 100083 td 0xfffffe00074c78e0 kdb_enter() at kdb_enter+0x3b panic() at panic+0x1c6 witness_assert() at witness_assert+0x225 __lockmgr_args() at __lockmgr_args+0xb65 vop_stdunlock() at vop_stdunlock+0x43 VOP_UNLOCK_APV() at VOP_UNLOCK_APV+0x9b unionfs_unlock() at unionfs_unlock+0xe1 VOP_UNLOCK_APV() at VOP_UNLOCK_APV+0x9b unionfs_nodeget() at unionfs_nodeget+0x615 unionfs_domount() at unionfs_domount+0x4ab vfs_donmount() at vfs_donmount+0x960 sys_nmount() at sys_nmount+0x66 amd64_syscall() at amd64_syscall+0x2fa Xfast_syscall() at Xfast_syscall+0xf7 --- syscall (378, FreeBSD ELF64, sys_nmount), rip =3D 0x80087798c, rsp =3D= 0x7fffffffd328, rbp =3D 0x7fffffffd750 --- Thanks for your help, -Harry --------------enigB4935B67372D64C0CC73A8A4 Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.18 (FreeBSD) iEYEARECAAYFAlCVEzYACgkQLDqVQ9VXb8hXWACgmXVSDxPkm771yitjPs5ZdhtV pjMAoLKo3kQ/U9Dc7oAYYXAFkdcDB9UF =B2dc -----END PGP SIGNATURE----- --------------enigB4935B67372D64C0CC73A8A4--