Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 16 Nov 2015 00:42:04 +0000
From:      bugzilla-noreply@freebsd.org
To:        freebsd-bugs@FreeBSD.org
Subject:   [Bug 204340] [panic] nfsd, em, msix, fatal trap 9
Message-ID:  <bug-204340-8-rVAnfIlq7Y@https.bugs.freebsd.org/bugzilla/>
In-Reply-To: <bug-204340-8@https.bugs.freebsd.org/bugzilla/>
References:  <bug-204340-8@https.bugs.freebsd.org/bugzilla/>

next in thread | previous in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=204340

Rick Macklem <rmacklem@FreeBSD.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|New                         |In Progress
                 CC|                            |rmacklem@FreeBSD.org
           Assignee|freebsd-bugs@FreeBSD.org    |rmacklem@FreeBSD.org

--- Comment #2 from Rick Macklem <rmacklem@FreeBSD.org> ---
Created attachment 163160
  --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=163160&action=edit
patch that might fix this problem

I think this crash might have been caused by a race
between svcpool_destroy() and the socket upcall.
The code in svcpool_destroy() assumes that SVC_RELEASE(xprt)
drops the ref cnt to 0, so that SVC_DESTROY() is called.
-->SVC_DESTROY() shuts down the socket upcall.
--> If the ref cnt doesn't go to 0, svcpool_destroy() will
    mtx_destroy() the mutexes prematurely.

I am not sure, but the race might have been introduced by
r267228 since, prior to this there was a single mutex for
the pool, held while all xprt's are unregistered.
After r267228, there is a group of mutexes, where the code
only held one at a time, so I think an xprt might get re-registered
on another group after that group has had all de-registered.

The attached little patch moves the mtx_lock() calls to a
separate loop before the xprt_unregister loops, so that all
locks are held while all are de-registered.

I've added mav@ to the cc list, since he might be the guy
that actually understands this.

Anyhow, if you could test the attached patch with msi interrupts
re-enabled and see if the crashes go away, that would be great.
(I don't think that this indicates that the em(4) driver is broken.
 I suspect that it just affects timing of the interrupts that tripped
 over this race.)

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-204340-8-rVAnfIlq7Y>