From owner-freebsd-stable@freebsd.org Wed Mar 8 08:50:56 2017 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id F0D8DD01601 for ; Wed, 8 Mar 2017 08:50:56 +0000 (UTC) (envelope-from freebsd@omnilan.de) Received: from mx0.gentlemail.de (mx0.gentlemail.de [IPv6:2a00:e10:2800::a130]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 7C5241762; Wed, 8 Mar 2017 08:50:56 +0000 (UTC) (envelope-from freebsd@omnilan.de) Received: from mh0.gentlemail.de (mh0.gentlemail.de [78.138.80.135]) by mx0.gentlemail.de (8.14.5/8.14.5) with ESMTP id v288oqmY063457; Wed, 8 Mar 2017 09:50:52 +0100 (CET) (envelope-from freebsd@omnilan.de) Received: from titan.inop.mo1.omnilan.net (titan.inop.mo1.omnilan.net [IPv6:2001:a60:f0bb:1::3:1]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mh0.gentlemail.de (Postfix) with ESMTPSA id 4782C773; Wed, 8 Mar 2017 09:50:52 +0100 (CET) Message-ID: <58BFC5EB.8020905@omnilan.de> Date: Wed, 08 Mar 2017 09:50:51 +0100 From: Harry Schmalzbauer Organization: OmniLAN User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; de-DE; rv:1.9.2.8) Gecko/20100906 Lightning/1.0b2 Thunderbird/3.1.2 MIME-Version: 1.0 To: Konstantin Belousov CC: Rick Macklem , Mark Johnston , FreeBSD Stable Subject: Re: unionfs bugs, a partial patch and some comments [Was: Re: 1-BETA3 Panic: __lockmgr_args: downgrade a recursed lockmgr nfs @ /usr/local/share/deploy-tools/RELENG_11/src/sys/fs/unionfs/union_vnops.c:1905] References: <57A9A6C0.9060609@omnilan.de> <20160812123950.GO83214@kib.kiev.ua> <57B8793E.4070004@omnilan.de> <58BEAAAC.4090303@omnilan.de> <58BEFF83.9010906@omnilan.de> <58BF0DE4.1020300@omnilan.de> <20170307235550.GP30979@kib.kiev.ua> In-Reply-To: <20170307235550.GP30979@kib.kiev.ua> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 8bit X-Greylist: ACL 119 matched, not delayed by milter-greylist-4.2.7 (mx0.gentlemail.de [78.138.80.130]); Wed, 08 Mar 2017 09:50:53 +0100 (CET) X-Milter: Spamilter (Reciever: mx0.gentlemail.de; Sender-ip: 78.138.80.135; Sender-helo: mh0.gentlemail.de; ) X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 08 Mar 2017 08:50:57 -0000 Bezüglich Konstantin Belousov's Nachricht vom 08.03.2017 00:55 (localtime): > On Tue, Mar 07, 2017 at 10:49:01PM +0000, Rick Macklem wrote: >> Hmm, this is going to sound dumb, but I don't recall generating any >> unionfs patch;-) >> I'll go look for it. Maybe it was Kostik's? > I did not touched unionfs, and have no plans to. It is equally broken in > all relevant versions of FreeBSD. ACK. While this is no good news, I have more bad news: deadlock came back… I'd like to summarize in case anybody else is interested in uninionfs, maybe at any time in the future: I observed locking problems back in 2012 and Attilio Rao's final attempt was this: https://people.freebsd.org/~attilio/unionfs_nodeget4.patch I never used it, most likely because it didn't work even back with RELENG_9. It applies to stable/11, but has no effect besides panicing KDB kernels. What I used up to 10.3 was the following simple patch: --- src/sys/fs/unionfs/union_subr.c (revision 231702) +++ src/sys/fs/unionfs/union_subr.c (working copy) @@ -261,7 +261,9 @@ unionfs_nodeget(struct mount *mp, struct vnode *up free(unp, M_UNIONFSNODE); return (error); } + vn_lock(vp, LK_EXCLUSIVE | LK_RETRY); error = insmntque(vp, mp); /* XXX: Too early for mpsafe fs */ + VOP_UNLOCK(vp, 0); if (error != 0) { free(unp, M_UNIONFSNODE); return (error); This hasn't lead to any panic or deadlock during the last 5 years on ~50 machines, up to 10.3. In 2016 I did some tests with 11.0-Beta1, where this thread origins, and Rick kindly looked into it and provided the following patch: https://lists.freebsd.org/pipermail/freebsd-stable/attachments/20160818/d1d1691d/attachment.obj (Explanation: https://lists.freebsd.org/pipermail/freebsd-stable/2016-August/085294.html) This also panics KDB-kernel (and works without KDB) but at least does have influence on the dedalock, in case symlinks are involved, where deadlocks are significantly postponed. … >>>> >>>> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame >>>> 0xfffffe00982220e0 >>>> vpanic() at vpanic+0x186/frame 0xfffffe0098222160 >>>> kassert_panic() at kassert_panic+0x126/frame 0xfffffe00982221d0 >>>> witness_assert() at witness_assert+0x35a/frame 0xfffffe0098222230 >>>> __lockmgr_args() at __lockmgr_args+0x517/frame 0xfffffe00982222d0 >>>> vop_stdunlock() at vop_stdunlock+0x3b/frame 0xfffffe00982222f0 >>>> VOP_UNLOCK_APV() at VOP_UNLOCK_APV+0xe0/frame 0xfffffe0098222320 >>>> unionfs_unlock() at unionfs_unlock+0x112/frame 0xfffffe0098222390 >>>> VOP_UNLOCK_APV() at VOP_UNLOCK_APV+0xe0/frame 0xfffffe00982223c0 >>>> unionfs_nodeget() at unionfs_nodeget+0x3ef/frame 0xfffffe0098222470 >>>> unionfs_domount() at unionfs_domount+0x518/frame 0xfffffe00982226b0 >>>> vfs_donmount() at vfs_donmount+0xe37/frame 0xfffffe00982228f0 >>>> sys_nmount() at sys_nmount+0x72/frame 0xfffffe0098222930 >>>> amd64_syscall() at amd64_syscall+0x2f9/frame 0xfffffe0098222ab0 >>>> Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfffffe0098222ab0 >>>> --- syscall (378, FreeBSD ELF64, sys_nmount), rip = 0x80086ecea, rsp = >>>> 0x7fffffffe318, rbp = 0x7fffffffeca0 --- >>> New discovery: >>> Rick's latest patch casues panic only with KDB. If I compile a kernel >>> without witenss and KDB, the machine boots fine! >>> Also, it's at least not so easy anymore to trigger the deadlock :-) . I >>> need to do more testing but until now Rick's approach seems very >>> promising :-) . >> >> My unionfs deadlock problem isn't really solved with Rick's latest >> patch, I still can reproduce it: krb5.conf and krb5.keytab are files on >> unionfs referenced by /etc. libexec/negotiate_kerberos_auth reads these >> and if I have enough helper processes handling requests, the deadlock >> occurs. >> >> _But_: If I move the files outside the unionfs and create a symlink, I >> cannot reproduce the deadlock anymore, which was similar easily >> reproducable without it or any of the other workarounds. Picture has changed, the machine daedlocked over night. So it does have a significant influence, but unfortunately isn't the real solution. Thanks for any help, -harry