From owner-freebsd-net@FreeBSD.ORG Mon Oct 15 19:23:26 2012 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id E7535319; Mon, 15 Oct 2012 19:23:26 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from bigwig.baldwin.cx (bigknife-pt.tunnel.tserv9.chi1.ipv6.he.net [IPv6:2001:470:1f10:75::2]) by mx1.freebsd.org (Postfix) with ESMTP id B7F6A8FC0A; Mon, 15 Oct 2012 19:23:26 +0000 (UTC) Received: from jhbbsd.localnet (unknown [209.249.190.124]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 1BBEBB911; Mon, 15 Oct 2012 15:23:26 -0400 (EDT) From: John Baldwin To: Gleb Smirnoff Subject: Re: ixgbe & if_igb RX ring locking Date: Mon, 15 Oct 2012 14:14:27 -0400 User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110714-p20; KDE/4.5.5; amd64; ; ) References: <5079A9A1.4070403@FreeBSD.org> <201210150904.27567.jhb@freebsd.org> <20121015163210.GW89655@FreeBSD.org> In-Reply-To: <20121015163210.GW89655@FreeBSD.org> MIME-Version: 1.0 Content-Type: Text/Plain; charset="koi8-r" Content-Transfer-Encoding: 7bit Message-Id: <201210151414.27318.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Mon, 15 Oct 2012 15:23:26 -0400 (EDT) Cc: freebsd-net@freebsd.org, "Alexander V. Chernikov" , Luigi Rizzo , Jack Vogel , net@freebsd.org X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 15 Oct 2012 19:23:27 -0000 On Monday, October 15, 2012 12:32:10 pm Gleb Smirnoff wrote: > On Mon, Oct 15, 2012 at 09:04:27AM -0400, John Baldwin wrote: > J> > 3) in practice taskqueue routine is a nightmare for many people since > J> > there is no way to stop "kernel {ix0 que}" thread eating 100% cpu after > J> > some traffic burst happens: once it is called it starts to schedule > J> > itself more and more replacing original ISR routine. Additionally, > J> > increasing rx_process_limit does not help since taskqueue is called with > J> > the same limit. Finally, currently netisr taskq threads are not bound to > J> > any CPU which makes the process even more uncontrollable. > J> > J> I think part of the problem here is that the taskqueue in ixgbe(4) is > J> bogusly rescheduled for TX handling. Instead, ixgbe_msix_que() should > J> just start transmitting packets directly. > J> > J> I fixed this in igb(4) here: > J> > J> http://svnweb.freebsd.org/base?view=revision&revision=233708 > > The problem Alexander describes in 3) definitely wasn't fixed in r233708. > > It is still present in head/, and it prevents me to do good benchmarking > of pf(4) on igb(4). > > The problem is related to RX handling, so I don't see how r233708 could > fix it. Before 233708, if you had a single TX packet waiting to go out and an RX interrupt arrived, the task queue would be constantly reschedule causing it to effectively spin at 100% until the TX packet was completely transmitted and the hardware had updated the descriptor to mark it as complete. In fact, as long as you have any pending TX packets at all it will keep spinning until it gets into a state where you have no pending TX packets (so a steady stream of TX packets, including, say ACKs would cause the taskqueue to run forever). In general I think that with MSI-X you should just use an RX processing limit of -1. Anything else is just adding overhead in the form of extra context switches. Neither the task or the MSI-X interrupt handler are on a thread that is shared with any other tasks or handlers, so all that scheduling (or rescheduling) the task will do is result in the task being immediately run (after either a context switch or returning back to the main loop of the taskqueue thread). If you look at the drivers, if a burst of RX traffic ends, the taskqueue should stop running and stop polling the hardware. It is only the TX side that gets stuck needlessly polling. The watchdog timer rescheduling the handler once a second when there is no watchdog condition doesn't help matters either, but I think that is unique to ixgbe(4). It would be good if you could determine exactly why igb thinks it needs to reschedule the taskqueue in your test case on igb(4) post 233708. -- John Baldwin