From owner-freebsd-net@FreeBSD.ORG Wed Oct 17 18:46:11 2012 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id DD99FBA; Wed, 17 Oct 2012 18:46:10 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from bigwig.baldwin.cx (bigknife-pt.tunnel.tserv9.chi1.ipv6.he.net [IPv6:2001:470:1f10:75::2]) by mx1.freebsd.org (Postfix) with ESMTP id A71D38FC0A; Wed, 17 Oct 2012 18:46:10 +0000 (UTC) Received: from jhbbsd.localnet (unknown [209.249.190.124]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 0FF4DB91E; Wed, 17 Oct 2012 14:46:10 -0400 (EDT) From: John Baldwin To: freebsd-net@freebsd.org Subject: Re: ixgbe & if_igb RX ring locking Date: Wed, 17 Oct 2012 10:06:51 -0400 User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110714-p20; KDE/4.5.5; amd64; ; ) References: <5079A9A1.4070403@FreeBSD.org> <507C1960.6050500@FreeBSD.org> <201210150904.27567.jhb@freebsd.org> In-Reply-To: <201210150904.27567.jhb@freebsd.org> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201210171006.51214.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Wed, 17 Oct 2012 14:46:10 -0400 (EDT) Cc: "Alexander V. Chernikov" , Luigi Rizzo , Jack Vogel , net@freebsd.org X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 17 Oct 2012 18:46:11 -0000 On Monday, October 15, 2012 9:04:27 am John Baldwin wrote: > On Monday, October 15, 2012 10:10:40 am Alexander V. Chernikov wrote: > > On 13.10.2012 23:24, Jack Vogel wrote: > > > On Sat, Oct 13, 2012 at 11:22 AM, Luigi Rizzo wrote: > > > > >> > > >> one option could be (same as it is done in the timer > > >> routine in dummynet) to build a list of all the packets > > >> that need to be sent to if_input(), and then call > > >> if_input with the entire list outside the lock. > > >> > > >> It would be even easier if we modify the various *_input() > > >> routines to handle a list of mbufs instead of just one. > > > > Bulk processing is generally a good idea we probably should implement. > > Probably starting from driver queue ending with marked mbufs > > (OURS/forward/legacy processing (appletalk and similar))? > > > > This can minimize an impact for all > > locks on RX side: > > L2 > > * rx PFIL hook > > L3 (both IPv4 and IPv6) > > * global IF_ADDR_RLOCK (currently commented out) > > * Per-interface ADDR_RLOCK > > * PFIL hook > > > > From the first glance, there can be problems with: > > * Increased latency (we should have some kind of rx_process_limit), but > > still > > * reader locks being acquired for much longer amount of time > > > > >> > > >> cheers > > >> luigi > > >> > > >> Very interesting idea Luigi, will have to get that some thought. > > > > > > Jack > > > > Returning to original post topic: > > > > Given > > 1) we are currently binding ixgbe ithreads to CPU cores > > 2) RX queue lock is used by (indirectly) in only 2 places: > > a) ISR routine (msix or legacy irq) > > b) taskqueue routine which is scheduled if some packets remains in RX > > queue and rx_process_limit ended OR we need something to TX > > > > 3) in practice taskqueue routine is a nightmare for many people since > > there is no way to stop "kernel {ix0 que}" thread eating 100% cpu after > > some traffic burst happens: once it is called it starts to schedule > > itself more and more replacing original ISR routine. Additionally, > > increasing rx_process_limit does not help since taskqueue is called with > > the same limit. Finally, currently netisr taskq threads are not bound to > > any CPU which makes the process even more uncontrollable. > > I think part of the problem here is that the taskqueue in ixgbe(4) is > bogusly rescheduled for TX handling. Instead, ixgbe_msix_que() should > just start transmitting packets directly. > > I fixed this in igb(4) here: > > http://svnweb.freebsd.org/base?view=revision&revision=233708 > > You can try this for ixgbe(4). It also comments out a spurious taskqueue > reschedule from the watchdog handler that might also lower the taskqueue > usage. You can try changing that #if 0 to an #if 1 to test just the txeof > changes: Is anyone able to test this btw to see if it improves things on ixgbe at all? (I don't have any ixgbe hardware.) -- John Baldwin