From owner-freebsd-stable@freebsd.org Tue Mar 7 18:44:23 2017 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id BED2ED00845 for ; Tue, 7 Mar 2017 18:44:23 +0000 (UTC) (envelope-from freebsd@omnilan.de) Received: from mx0.gentlemail.de (mx0.gentlemail.de [IPv6:2a00:e10:2800::a130]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 58BD61137; Tue, 7 Mar 2017 18:44:23 +0000 (UTC) (envelope-from freebsd@omnilan.de) Received: from mh0.gentlemail.de (ezra.dcm1.omnilan.net [78.138.80.135]) by mx0.gentlemail.de (8.14.5/8.14.5) with ESMTP id v27IiKw0050039; Tue, 7 Mar 2017 19:44:20 +0100 (CET) (envelope-from freebsd@omnilan.de) Received: from titan.inop.mo1.omnilan.net (titan.inop.mo1.omnilan.net [IPv6:2001:a60:f0bb:1::3:1]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mh0.gentlemail.de (Postfix) with ESMTPSA id 0E5B74DB; Tue, 7 Mar 2017 19:44:19 +0100 (CET) Message-ID: <58BEFF83.9010906@omnilan.de> Date: Tue, 07 Mar 2017 19:44:19 +0100 From: Harry Schmalzbauer Organization: OmniLAN User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; de-DE; rv:1.9.2.8) Gecko/20100906 Lightning/1.0b2 Thunderbird/3.1.2 MIME-Version: 1.0 To: Rick Macklem CC: Konstantin Belousov , "kib@FreeBSD.org" , Mark Johnston , FreeBSD Stable Subject: Re: unionfs bugs, a partial patch and some comments [Was: Re: 1-BETA3 Panic: __lockmgr_args: downgrade a recursed lockmgr nfs @ /usr/local/share/deploy-tools/RELENG_11/src/sys/fs/unionfs/union_vnops.c:1905] References: <57A79E24.8000100@omnilan.de> <57A83C78.1070403@omnilan.de> <20160809060213.GA67664@raichu> <57A9A6C0.9060609@omnilan.de> , <20160812123950.GO83214@kib.kiev.ua> , <57B8793E.4070004@omnilan.de> <58BEAAAC.4090303@omnilan.de> In-Reply-To: <58BEAAAC.4090303@omnilan.de> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 8bit X-Greylist: ACL 119 matched, not delayed by milter-greylist-4.2.7 (mx0.gentlemail.de [78.138.80.130]); Tue, 07 Mar 2017 19:44:20 +0100 (CET) X-Milter: Spamilter (Reciever: mx0.gentlemail.de; Sender-ip: 78.138.80.135; Sender-helo: mh0.gentlemail.de; ) X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 07 Mar 2017 18:44:23 -0000 Bezüglich Harry Schmalzbauer's Nachricht vom 07.03.2017 13:42 (localtime): … > Something ufs related seems to have tightened the unionfs locking > problem in stable/11. Now the machine instantaniously panics during > boot after mounting root with Rick's latest patch. > > Unfortunately I don't have SWAP available on that machine (yet), but > maybe shit is a hint for anybody. > > db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame > 0xfffffe00982220e0 > vpanic() at vpanic+0x186/frame 0xfffffe0098222160 > kassert_panic() at kassert_panic+0x126/frame 0xfffffe00982221d0 > witness_assert() at witness_assert+0x35a/frame 0xfffffe0098222230 > __lockmgr_args() at __lockmgr_args+0x517/frame 0xfffffe00982222d0 > vop_stdunlock() at vop_stdunlock+0x3b/frame 0xfffffe00982222f0 > VOP_UNLOCK_APV() at VOP_UNLOCK_APV+0xe0/frame 0xfffffe0098222320 > unionfs_unlock() at unionfs_unlock+0x112/frame 0xfffffe0098222390 > VOP_UNLOCK_APV() at VOP_UNLOCK_APV+0xe0/frame 0xfffffe00982223c0 > unionfs_nodeget() at unionfs_nodeget+0x3ef/frame 0xfffffe0098222470 > unionfs_domount() at unionfs_domount+0x518/frame 0xfffffe00982226b0 > vfs_donmount() at vfs_donmount+0xe37/frame 0xfffffe00982228f0 > sys_nmount() at sys_nmount+0x72/frame 0xfffffe0098222930 > amd64_syscall() at amd64_syscall+0x2f9/frame 0xfffffe0098222ab0 > Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfffffe0098222ab0 > --- syscall (378, FreeBSD ELF64, sys_nmount), rip = 0x80086ecea, rsp = > 0x7fffffffe318, rbp = 0x7fffffffeca0 --- New discovery: Rick's latest patch casues panic only with KDB. If I compile a kernel without witenss and KDB, the machine boots fine! Also, it's at least not so easy anymore to trigger the deadlock :-) . I need to do more testing but until now Rick's approach seems very promising :-) . Unfortunately I can't provide a fix or suggestion to why the KDB kernel panics and the non-KDB doesn't, just the dull imagination it could be that additional locking checks (KASSERT?), preventing more damage, are not in place. So I guess I'm in danger waters, but it defenitly is a highly appreciated improvement for me and my bery best bet for now (neither eliminating unionfs nor holding off 11 updates were real options for me, especially because unionfs isn't really well wokring on 10.3 either, just not leading to deadlocks in more environments)! I tried the non-debug kernel because I browsed old unionfs discussions and desperately gave Attilio Rao's patch a try since I couldn't remember why I haven't kept it locally: https://people.freebsd.org/~attilio/unionfs_nodeget4.patch (he tried to solve unionfs problems for RELENG_9 back in 2012: https://lists.freebsd.org/pipermail/freebsd-stable/2012-November/070358.html) It's still true that his patch leads to a panic with debugging kernel – only. Same patch without KDB allows to boot and start squid. But the result is the same as with plain r314856, the system deadlocks reproducibly. Also, the trace with his patch looks identical to the plain r314856 unionfs panic. So I hope Rick or someone else can pick up the latest patch and polish it to make KDB-kernels happy :-) I can offer a small donation if that helps! Of course, I'll also provide KDB info if needed/helpful. thanks, -harry