From owner-freebsd-arch@freebsd.org Fri Aug 28 18:48:01 2015 Return-Path: Delivered-To: freebsd-arch@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id AFA5B9C5D73 for ; Fri, 28 Aug 2015 18:48:01 +0000 (UTC) (envelope-from jmg@gold.funkthat.com) Received: from gold.funkthat.com (gate2.funkthat.com [208.87.223.18]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client CN "gold.funkthat.com", Issuer "gold.funkthat.com" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 92DAD1EA7; Fri, 28 Aug 2015 18:48:01 +0000 (UTC) (envelope-from jmg@gold.funkthat.com) Received: from gold.funkthat.com (localhost [127.0.0.1]) by gold.funkthat.com (8.14.5/8.14.5) with ESMTP id t7SIm0IP075980 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 28 Aug 2015 11:48:00 -0700 (PDT) (envelope-from jmg@gold.funkthat.com) Received: (from jmg@localhost) by gold.funkthat.com (8.14.5/8.14.5/Submit) id t7SIm0I4075979; Fri, 28 Aug 2015 11:48:00 -0700 (PDT) (envelope-from jmg) Date: Fri, 28 Aug 2015 11:48:00 -0700 From: John-Mark Gurney To: Sean Bruno Cc: freebsd-arch@freebsd.org Subject: Re: Network card interrupt handling Message-ID: <20150828184800.GE33167@funkthat.com> References: <55DDE9B8.4080903@freebsd.org> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="pWyiEgJYm5f9v55/" Content-Disposition: inline In-Reply-To: <55DDE9B8.4080903@freebsd.org> X-Operating-System: FreeBSD 9.1-PRERELEASE amd64 X-PGP-Fingerprint: 54BA 873B 6515 3F10 9E88 9322 9CB1 8F74 6D3F A396 X-Files: The truth is out there X-URL: http://resnet.uoregon.edu/~gurney_j/ X-Resume: http://resnet.uoregon.edu/~gurney_j/resume.html X-TipJar: bitcoin:13Qmb6AeTgQecazTWph4XasEsP7nGRbAPE X-to-the-FBI-CIA-and-NSA: HI! HOW YA DOIN? can i haz chizburger? User-Agent: Mutt/1.5.21 (2010-09-15) X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.7 (gold.funkthat.com [127.0.0.1]); Fri, 28 Aug 2015 11:48:00 -0700 (PDT) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 28 Aug 2015 18:48:01 -0000 --pWyiEgJYm5f9v55/ Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Sean Bruno wrote this message on Wed, Aug 26, 2015 at 09:30 -0700: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA512 > > We've been diagnosing what appeared to be out of order processing in > the network stack this week only to find out that the network card > driver was shoveling bits to us out of order (em). > > This *seems* to be due to a design choice where the driver is allowed > to assert a "soft interrupt" to the h/w device while real interrupts > are disabled. This allows a fake "em_msix_rx" to be started *while* > "em_handle_que" is running from the taskqueue. We've isolated and > worked around this by setting our processing_limit in the driver to > - -1. This means that *most* packet processing is now handled in the > MSI-X handler instead of being deferred. Some periodic interference > is still detectable via em_local_timer() which causes one of these > "fake" interrupt assertions in the normal, card is *not* hung case. I have a better question, for MSI-X, we have a dedicated interrupt thread to do the processing, so why are we even doing any moderation in this case? It's not any different than spinning in the task queue.. How about the attached patch that just disables taskqueue processing for MSI-X RX interrupts, and does all processing in the interrupt thread? Do you need to add the rx_task to the em_local_timer? If so, then I would look at setting a flag in the _rxeof that says that processing is happening... and in the case of the taskqueue, when it sees this flag set, it just exits, while for the interrupt filter case, we'd need to be more careful (possibly set a flag that the taskqueue will inspect, and cause it to stop processing the rx queue)... > Both functions use identical code for a start. Both end up down > inside of em_rxeof() to process packets. Both drop the RX lock prior > to handing the data up the network stack. > > This means that the em_handle_que running from the taskqueue will be > preempted. Dtrace confirms that this allows out of order processing > to occur at times and generates a lot of resets. > > The reason I'm bringing this up on -arch and not on -net is that this > is a common design pattern in some of the Ethernet drivers. We've > done preliminary tests on a patch that moves *all* processing of RX > packets to the rx_task taskqueue, which means that em_handle_que is > now the only path to get packets processed. > > > https://people.freebsd.org/~sbruno/em_interupt_to_taskqueue.diff > > My sense is that this is a slightly "better" method to handle the > packets but removes some immediacy from packet processing since all > processing is deferred. However, all packet processing is now > serialized per queue, which I think is the proper implementation. > > Am I smoking "le dope" here or is this the way forward? I think you discovered an interresting issue.. btw, since you're hacking on em a lot, interrested in fixing em's jumbo frames so it doesn't use 9k clusters, but instead page sized clusters? -- John-Mark Gurney Voice: +1 415 225 5579 "All that I will do, has been done, All that I have, has not." --pWyiEgJYm5f9v55/ Content-Type: text/x-diff; charset=us-ascii Content-Disposition: attachment; filename="em.patch" diff --git a/sys/dev/e1000/if_em.c b/sys/dev/e1000/if_em.c index 830325b..b3c333a 100644 --- a/sys/dev/e1000/if_em.c +++ b/sys/dev/e1000/if_em.c @@ -1629,13 +1629,14 @@ em_msix_rx(void *arg) ++rxr->rx_irq; if (!(if_getdrvflags(adapter->ifp) & IFF_DRV_RUNNING)) return; - more = em_rxeof(rxr, adapter->rx_process_limit, NULL); - if (more) - taskqueue_enqueue(rxr->tq, &rxr->rx_task); - else { - /* Reenable this interrupt */ - E1000_WRITE_REG(&adapter->hw, E1000_IMS, rxr->ims); - } + + do { + more = em_rxeof(rxr, -1, NULL); + } while (more); + + /* Reenable this interrupt */ + E1000_WRITE_REG(&adapter->hw, E1000_IMS, rxr->ims); + return; } --pWyiEgJYm5f9v55/--