From owner-freebsd-net@freebsd.org Wed Nov 18 22:44:59 2015 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 5ADAEA3227F for ; Wed, 18 Nov 2015 22:44:59 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 2D007180A for ; Wed, 18 Nov 2015 22:44:59 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id tAIMixQM012387 for ; Wed, 18 Nov 2015 22:44:59 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-net@FreeBSD.org Subject: [Bug 204340] [panic] nfsd, em, msix, fatal trap 9 Date: Wed, 18 Nov 2015 22:44:59 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 10.2-RELEASE X-Bugzilla-Keywords: IntelNetworking, crash X-Bugzilla-Severity: Affects Only Me X-Bugzilla-Who: rmacklem@FreeBSD.org X-Bugzilla-Status: In Progress X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: rmacklem@FreeBSD.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: mfc-stable9? mfc-stable10? X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 18 Nov 2015 22:44:59 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=204340 --- Comment #10 from Rick Macklem --- I have just added 2 more patches that might be relevant to the crashes. When the nfsd threads are terminated, this is what is supposed to happen: - All nfsd threads running in svc_run_internal() return to svc_run(). - svc_run() waits for all these threads to return. - After svc_run returns, the nfsd calls svcpool_destroy(). - svcpool_destroy() unregisters all the xprts (which represent the TCP sockets) - at this point, the reference count should be 1 for all xprts --> Then svcpool_destroy() calls SVC_RELEASE(xprt) for all of them, which drops the reference count to 0 and calls SVC_DESTROY() --> This actually calls svc_vc_destroy(), which shuts down the socket upcall and after that, destroys the mutexes. My best guess w.r.t. the crashes is that the reference count gets messed up on an xprt, so it doesn't get SVC_DESTROY()'d. Then a socket upcall calls xprt_active() after the mutex has been destroyed and BOOM. The two patched should be applied along with the first one. The second patch fixes the one other place that I can spot where the server side krpc code isn't quite SMP safe. Although unlikely, it is conceivable that this could cause the crashes. The third patch makes sure that the backchannel xprt is dereferenced before the call to svcpool_destroy(). The one seems a more likely culprit, but only if you have clients doing NFSv4.1 mounts against the server. If you could try the second patch (and the third if you have NFSv4.1 mounts), that would be appreciated. One final comment: I am assuming that you are terminating the nfsd threads by sending a SIGUSR1 to the nfsd master. This is the only way the nfsd threads should be terminated. (If you are using /etc/rc.d/nfsd, it should be doing that, but you might try using "kill -USR1 " directly, just in case the shell script is busted. This pretty well exhausts what I can see that might cause the crashes and I can't reproduce a crash here, so hopefully you can make some progress from here. Good luck with it, rick -- You are receiving this mail because: You are on the CC list for the bug.