Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 31 Aug 2015 14:41:14 -0700
From:      John Baldwin <jhb@freebsd.org>
To:        "K. Macy" <kmacy@freebsd.org>
Cc:        freebsd-arch@freebsd.org, Sean Bruno <sbruno@freebsd.org>
Subject:   Re: Network card interrupt handling
Message-ID:  <1709356.SnmUAQFSba@ralph.baldwin.cx>
In-Reply-To: <CAHM0Q_N65J9OSaU=znjgJ_gEiu=M-cb9q1hrxskGSvYFhxL_NQ@mail.gmail.com>
References:  <55DDE9B8.4080903@freebsd.org> <24017021.PxBoCiQKDJ@ralph.baldwin.cx> <CAHM0Q_N65J9OSaU=znjgJ_gEiu=M-cb9q1hrxskGSvYFhxL_NQ@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Friday, August 28, 2015 06:25:53 PM K. Macy wrote:
> On Aug 28, 2015 12:59 PM, "John Baldwin" <jhb@freebsd.org> wrote:
> >
> > On Wednesday, August 26, 2015 09:30:48 AM Sean Bruno wrote:
> > > We've been diagnosing what appeared to be out of order processing in
> > > the network stack this week only to find out that the network card
> > > driver was shoveling bits to us out of order (em).
> > >
> > > This *seems* to be due to a design choice where the driver is allowed
> > > to assert a "soft interrupt" to the h/w device while real interrupts
> > > are disabled.  This allows a fake "em_msix_rx" to be started *while*
> > > "em_handle_que" is running from the taskqueue.  We've isolated and
> > > worked around this by setting our processing_limit in the driver to
> > > -1.  This means that *most* packet processing is now handled in the
> > > MSI-X handler instead of being deferred.  Some periodic interference
> > > is still detectable via em_local_timer() which causes one of these
> > > "fake" interrupt assertions in the normal, card is *not* hung case.
> > >
> > > Both functions use identical code for a start.  Both end up down
> > > inside of em_rxeof() to process packets.  Both drop the RX lock prior
> > > to handing the data up the network stack.
> > >
> > > This means that the em_handle_que running from the taskqueue will be
> > > preempted.  Dtrace confirms that this allows out of order processing
> > > to occur at times and generates a lot of resets.
> > >
> > > The reason I'm bringing this up on -arch and not on -net is that this
> > > is a common design pattern in some of the Ethernet drivers.  We've
> > > done preliminary tests on a patch that moves *all* processing of RX
> > > packets to the rx_task taskqueue, which means that em_handle_que is
> > > now the only path to get packets processed.
> >
> > It is only a common pattern in the Intel drivers. :-/  We (collectively)
> > spent quite a while fixing this in ixgbe and igb.  Longer (hopefully more
> > like medium) term I have an update to the interrupt API I want to push in
> > that allows drivers to manually schedule interrupt handlers using an
> > 'hwi' API to replace the manual taskqueues.  This also ensures that
> > the handler that dequeues packets is only ever running in an ithread
> > context and never concurrently.
> >
> 
> Jeff has a generalization of the net_task infrastructure used at Nokia
> called grouptaskq that I've used for iflib. That does essentially what you
> refer to. I've converted ixl and am currently about to test an ixgbe
> conversion. I anticipate converting mlxen, all Intel drivers as well as the
> remaining drivers with device specific code in netmap. The one catch is
> finding someone who will publicly admit to owning re hardware so that I can
> buy it from him and test my changes.

Note that the ithread changes I refer to are for all devices (not just
network interfaces) and fix some other issues as well (e.g. INTR_FILTER is
always enabled and races with tearing down filters are closed, it also uses
a more thread_lock()-friendly state for idle ithreads, and it also allows us
to experiment with sharing ithreads among devices as well as having multiple
threads service a queue of interrupt handlers if desired).  It may be that
this will make your life easier since you might be able to reuse the new
primitives more directly rather than bypassing ithreads.  I've posted the
changes to arch@ a few different times over the past several years just
haven't pushed them in.  (They aren't perfect in that I don't yet have
APIs for changing the plumbing around due to lack of use cases to build
the APIs from.)

-- 
John Baldwin



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1709356.SnmUAQFSba>