Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 07 Mar 2017 19:44:19 +0100
From:      Harry Schmalzbauer <freebsd@omnilan.de>
To:        Rick Macklem <rmacklem@uoguelph.ca>
Cc:        Konstantin Belousov <kostikbel@gmail.com>, "kib@FreeBSD.org" <kib@freebsd.org>, Mark Johnston <markj@freebsd.org>, FreeBSD Stable <freebsd-stable@freebsd.org>
Subject:   Re: unionfs bugs, a partial patch and some comments [Was: Re: 1-BETA3 Panic: __lockmgr_args: downgrade a recursed lockmgr nfs @ /usr/local/share/deploy-tools/RELENG_11/src/sys/fs/unionfs/union_vnops.c:1905]
Message-ID:  <58BEFF83.9010906@omnilan.de>
In-Reply-To: <58BEAAAC.4090303@omnilan.de>
References:  <57A79E24.8000100@omnilan.de> <YQBPR01MB0401201977AEA8A803F27B23DD1A0@YQBPR01MB0401.CANPRD01.PROD.OUTLOOK.COM> <57A83C78.1070403@omnilan.de> <20160809060213.GA67664@raichu> <57A9A6C0.9060609@omnilan.de> <YTOPR01MB0412B2A08F1A3C1A3B2EB160DD1E0@YTOPR01MB0412.CANPRD01.PROD.OUTLOOK.COM>, <20160812123950.GO83214@kib.kiev.ua> <YTXPR01MB018919BE87B12E458144E218DD140@YTXPR01MB0189.CANPRD01.PROD.OUTLOOK.COM>, <57B8793E.4070004@omnilan.de> <YTXPR01MB018948D19A0A9BB5FAC3D5BADDE60@YTXPR01MB0189.CANPRD01.PROD.OUTLOOK.COM> <58BEAAAC.4090303@omnilan.de>

next in thread | previous in thread | raw e-mail | index | archive | help
 Bezüglich Harry Schmalzbauer's Nachricht vom 07.03.2017 13:42 (localtime):
…
> Something ufs related seems to have tightened the unionfs locking
> problem in stable/11.  Now the machine instantaniously panics during
> boot after mounting root with Rick's latest patch.
>
> Unfortunately I don't have SWAP available on that machine (yet), but
> maybe shit is a hint for anybody.
>
> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame
> 0xfffffe00982220e0
> vpanic() at vpanic+0x186/frame 0xfffffe0098222160
> kassert_panic() at kassert_panic+0x126/frame 0xfffffe00982221d0
> witness_assert() at witness_assert+0x35a/frame 0xfffffe0098222230
> __lockmgr_args() at __lockmgr_args+0x517/frame 0xfffffe00982222d0
> vop_stdunlock() at vop_stdunlock+0x3b/frame 0xfffffe00982222f0
> VOP_UNLOCK_APV() at VOP_UNLOCK_APV+0xe0/frame 0xfffffe0098222320
> unionfs_unlock() at unionfs_unlock+0x112/frame 0xfffffe0098222390
> VOP_UNLOCK_APV() at VOP_UNLOCK_APV+0xe0/frame 0xfffffe00982223c0
> unionfs_nodeget() at unionfs_nodeget+0x3ef/frame 0xfffffe0098222470
> unionfs_domount() at unionfs_domount+0x518/frame 0xfffffe00982226b0
> vfs_donmount() at vfs_donmount+0xe37/frame 0xfffffe00982228f0
> sys_nmount() at sys_nmount+0x72/frame 0xfffffe0098222930
> amd64_syscall() at amd64_syscall+0x2f9/frame 0xfffffe0098222ab0
> Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfffffe0098222ab0
> --- syscall (378, FreeBSD ELF64, sys_nmount), rip = 0x80086ecea, rsp =
> 0x7fffffffe318, rbp = 0x7fffffffeca0 ---

New discovery:
Rick's latest patch casues panic only with KDB. If I compile a kernel
without witenss and KDB, the machine boots fine!
Also, it's at least not so easy anymore to trigger the deadlock :-) . I
need to do more testing but until now Rick's approach seems very
promising :-) . Unfortunately I can't provide a fix or suggestion to why
the KDB kernel panics and the non-KDB doesn't, just the dull imagination
it could be that additional locking checks (KASSERT?), preventing more
damage, are not in place. So I guess I'm in danger waters, but it
defenitly is a highly appreciated improvement for me and my bery best
bet for now (neither eliminating unionfs nor holding off 11 updates were
real options for me, especially because unionfs isn't really well
wokring on 10.3 either, just not leading to deadlocks in more environments)!

I tried the non-debug kernel because I browsed old unionfs discussions
and desperately gave Attilio Rao's patch a try since I couldn't remember
why I haven't kept it locally:
https://people.freebsd.org/~attilio/unionfs_nodeget4.patch (he tried to
solve unionfs problems for RELENG_9 back in 2012:
https://lists.freebsd.org/pipermail/freebsd-stable/2012-November/070358.html)

It's still true that his patch leads to a panic with debugging kernel –
only. Same patch without KDB allows to boot and start squid. But the
result is the same as with plain r314856, the system deadlocks reproducibly.

Also, the trace with his patch looks identical to the plain r314856
unionfs panic.

So I hope Rick or someone else can pick up the latest patch and polish
it to make KDB-kernels happy :-)
I can offer a small donation if that helps!
Of course, I'll also provide KDB info if needed/helpful.

thanks,

-harry



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?58BEFF83.9010906>