Date: Fri, 28 Aug 2015 11:48:00 -0700 From: John-Mark Gurney <jmg@funkthat.com> To: Sean Bruno <sbruno@freebsd.org> Cc: freebsd-arch@freebsd.org Subject: Re: Network card interrupt handling Message-ID: <20150828184800.GE33167@funkthat.com> In-Reply-To: <55DDE9B8.4080903@freebsd.org> References: <55DDE9B8.4080903@freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
--pWyiEgJYm5f9v55/ Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Sean Bruno wrote this message on Wed, Aug 26, 2015 at 09:30 -0700: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA512 > > We've been diagnosing what appeared to be out of order processing in > the network stack this week only to find out that the network card > driver was shoveling bits to us out of order (em). > > This *seems* to be due to a design choice where the driver is allowed > to assert a "soft interrupt" to the h/w device while real interrupts > are disabled. This allows a fake "em_msix_rx" to be started *while* > "em_handle_que" is running from the taskqueue. We've isolated and > worked around this by setting our processing_limit in the driver to > - -1. This means that *most* packet processing is now handled in the > MSI-X handler instead of being deferred. Some periodic interference > is still detectable via em_local_timer() which causes one of these > "fake" interrupt assertions in the normal, card is *not* hung case. I have a better question, for MSI-X, we have a dedicated interrupt thread to do the processing, so why are we even doing any moderation in this case? It's not any different than spinning in the task queue.. How about the attached patch that just disables taskqueue processing for MSI-X RX interrupts, and does all processing in the interrupt thread? Do you need to add the rx_task to the em_local_timer? If so, then I would look at setting a flag in the _rxeof that says that processing is happening... and in the case of the taskqueue, when it sees this flag set, it just exits, while for the interrupt filter case, we'd need to be more careful (possibly set a flag that the taskqueue will inspect, and cause it to stop processing the rx queue)... > Both functions use identical code for a start. Both end up down > inside of em_rxeof() to process packets. Both drop the RX lock prior > to handing the data up the network stack. > > This means that the em_handle_que running from the taskqueue will be > preempted. Dtrace confirms that this allows out of order processing > to occur at times and generates a lot of resets. > > The reason I'm bringing this up on -arch and not on -net is that this > is a common design pattern in some of the Ethernet drivers. We've > done preliminary tests on a patch that moves *all* processing of RX > packets to the rx_task taskqueue, which means that em_handle_que is > now the only path to get packets processed. > > <stable10 diff> > https://people.freebsd.org/~sbruno/em_interupt_to_taskqueue.diff > > My sense is that this is a slightly "better" method to handle the > packets but removes some immediacy from packet processing since all > processing is deferred. However, all packet processing is now > serialized per queue, which I think is the proper implementation. > > Am I smoking "le dope" here or is this the way forward? I think you discovered an interresting issue.. btw, since you're hacking on em a lot, interrested in fixing em's jumbo frames so it doesn't use 9k clusters, but instead page sized clusters? -- John-Mark Gurney Voice: +1 415 225 5579 "All that I will do, has been done, All that I have, has not." --pWyiEgJYm5f9v55/ Content-Type: text/x-diff; charset=us-ascii Content-Disposition: attachment; filename="em.patch" diff --git a/sys/dev/e1000/if_em.c b/sys/dev/e1000/if_em.c index 830325b..b3c333a 100644 --- a/sys/dev/e1000/if_em.c +++ b/sys/dev/e1000/if_em.c @@ -1629,13 +1629,14 @@ em_msix_rx(void *arg) ++rxr->rx_irq; if (!(if_getdrvflags(adapter->ifp) & IFF_DRV_RUNNING)) return; - more = em_rxeof(rxr, adapter->rx_process_limit, NULL); - if (more) - taskqueue_enqueue(rxr->tq, &rxr->rx_task); - else { - /* Reenable this interrupt */ - E1000_WRITE_REG(&adapter->hw, E1000_IMS, rxr->ims); - } + + do { + more = em_rxeof(rxr, -1, NULL); + } while (more); + + /* Reenable this interrupt */ + E1000_WRITE_REG(&adapter->hw, E1000_IMS, rxr->ims); + return; } --pWyiEgJYm5f9v55/--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20150828184800.GE33167>