From owner-freebsd-arch@freebsd.org Fri Aug 28 19:59:37 2015 Return-Path: Delivered-To: freebsd-arch@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id A7DB39C56D3 for ; Fri, 28 Aug 2015 19:59:37 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [IPv6:2001:470:1f11:75::1]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 871D5911; Fri, 28 Aug 2015 19:59:37 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from ralph.baldwin.cx (c-73-231-226-104.hsd1.ca.comcast.net [73.231.226.104]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id D194AB93B; Fri, 28 Aug 2015 15:59:35 -0400 (EDT) From: John Baldwin To: freebsd-arch@freebsd.org Cc: Sean Bruno Subject: Re: Network card interrupt handling Date: Fri, 28 Aug 2015 10:38:36 -0700 Message-ID: <24017021.PxBoCiQKDJ@ralph.baldwin.cx> User-Agent: KMail/4.14.3 (FreeBSD/10.2-PRERELEASE; KDE/4.14.3; amd64; ; ) In-Reply-To: <55DDE9B8.4080903@freebsd.org> References: <55DDE9B8.4080903@freebsd.org> MIME-Version: 1.0 Content-Transfer-Encoding: 7Bit Content-Type: text/plain; charset="us-ascii" X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Fri, 28 Aug 2015 15:59:35 -0400 (EDT) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 28 Aug 2015 19:59:37 -0000 On Wednesday, August 26, 2015 09:30:48 AM Sean Bruno wrote: > We've been diagnosing what appeared to be out of order processing in > the network stack this week only to find out that the network card > driver was shoveling bits to us out of order (em). > > This *seems* to be due to a design choice where the driver is allowed > to assert a "soft interrupt" to the h/w device while real interrupts > are disabled. This allows a fake "em_msix_rx" to be started *while* > "em_handle_que" is running from the taskqueue. We've isolated and > worked around this by setting our processing_limit in the driver to > -1. This means that *most* packet processing is now handled in the > MSI-X handler instead of being deferred. Some periodic interference > is still detectable via em_local_timer() which causes one of these > "fake" interrupt assertions in the normal, card is *not* hung case. > > Both functions use identical code for a start. Both end up down > inside of em_rxeof() to process packets. Both drop the RX lock prior > to handing the data up the network stack. > > This means that the em_handle_que running from the taskqueue will be > preempted. Dtrace confirms that this allows out of order processing > to occur at times and generates a lot of resets. > > The reason I'm bringing this up on -arch and not on -net is that this > is a common design pattern in some of the Ethernet drivers. We've > done preliminary tests on a patch that moves *all* processing of RX > packets to the rx_task taskqueue, which means that em_handle_que is > now the only path to get packets processed. It is only a common pattern in the Intel drivers. :-/ We (collectively) spent quite a while fixing this in ixgbe and igb. Longer (hopefully more like medium) term I have an update to the interrupt API I want to push in that allows drivers to manually schedule interrupt handlers using an 'hwi' API to replace the manual taskqueues. This also ensures that the handler that dequeues packets is only ever running in an ithread context and never concurrently. -- John Baldwin