Date: Mon, 16 Nov 2015 00:42:04 +0000 From: bugzilla-noreply@freebsd.org To: freebsd-bugs@FreeBSD.org Subject: [Bug 204340] [panic] nfsd, em, msix, fatal trap 9 Message-ID: <bug-204340-8-rVAnfIlq7Y@https.bugs.freebsd.org/bugzilla/> In-Reply-To: <bug-204340-8@https.bugs.freebsd.org/bugzilla/> References: <bug-204340-8@https.bugs.freebsd.org/bugzilla/>
next in thread | previous in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=204340 Rick Macklem <rmacklem@FreeBSD.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|New |In Progress CC| |rmacklem@FreeBSD.org Assignee|freebsd-bugs@FreeBSD.org |rmacklem@FreeBSD.org --- Comment #2 from Rick Macklem <rmacklem@FreeBSD.org> --- Created attachment 163160 --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=163160&action=edit patch that might fix this problem I think this crash might have been caused by a race between svcpool_destroy() and the socket upcall. The code in svcpool_destroy() assumes that SVC_RELEASE(xprt) drops the ref cnt to 0, so that SVC_DESTROY() is called. -->SVC_DESTROY() shuts down the socket upcall. --> If the ref cnt doesn't go to 0, svcpool_destroy() will mtx_destroy() the mutexes prematurely. I am not sure, but the race might have been introduced by r267228 since, prior to this there was a single mutex for the pool, held while all xprt's are unregistered. After r267228, there is a group of mutexes, where the code only held one at a time, so I think an xprt might get re-registered on another group after that group has had all de-registered. The attached little patch moves the mtx_lock() calls to a separate loop before the xprt_unregister loops, so that all locks are held while all are de-registered. I've added mav@ to the cc list, since he might be the guy that actually understands this. Anyhow, if you could test the attached patch with msi interrupts re-enabled and see if the crashes go away, that would be great. (I don't think that this indicates that the em(4) driver is broken. I suspect that it just affects timing of the interrupts that tripped over this race.) -- You are receiving this mail because: You are the assignee for the bug. You are on the CC list for the bug.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-204340-8-rVAnfIlq7Y>