From owner-freebsd-net@FreeBSD.ORG  Mon Oct 15 19:23:26 2012
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
 by hub.freebsd.org (Postfix) with ESMTP id E7535319;
 Mon, 15 Oct 2012 19:23:26 +0000 (UTC) (envelope-from jhb@freebsd.org)
Received: from bigwig.baldwin.cx (bigknife-pt.tunnel.tserv9.chi1.ipv6.he.net
 [IPv6:2001:470:1f10:75::2])
 by mx1.freebsd.org (Postfix) with ESMTP id B7F6A8FC0A;
 Mon, 15 Oct 2012 19:23:26 +0000 (UTC)
Received: from jhbbsd.localnet (unknown [209.249.190.124])
 by bigwig.baldwin.cx (Postfix) with ESMTPSA id 1BBEBB911;
 Mon, 15 Oct 2012 15:23:26 -0400 (EDT)
From: John Baldwin <jhb@freebsd.org>
To: Gleb Smirnoff <glebius@freebsd.org>
Subject: Re: ixgbe & if_igb RX ring locking
Date: Mon, 15 Oct 2012 14:14:27 -0400
User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110714-p20; KDE/4.5.5; amd64; ; )
References: <5079A9A1.4070403@FreeBSD.org>
 <201210150904.27567.jhb@freebsd.org> <20121015163210.GW89655@FreeBSD.org>
In-Reply-To: <20121015163210.GW89655@FreeBSD.org>
MIME-Version: 1.0
Content-Type: Text/Plain;
  charset="koi8-r"
Content-Transfer-Encoding: 7bit
Message-Id: <201210151414.27318.jhb@freebsd.org>
X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7
 (bigwig.baldwin.cx); Mon, 15 Oct 2012 15:23:26 -0400 (EDT)
Cc: freebsd-net@freebsd.org, "Alexander V. Chernikov" <melifaro@freebsd.org>,
 Luigi Rizzo <rizzo@iet.unipi.it>, Jack Vogel <jfvogel@gmail.com>,
 net@freebsd.org
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 15 Oct 2012 19:23:27 -0000

On Monday, October 15, 2012 12:32:10 pm Gleb Smirnoff wrote:
> On Mon, Oct 15, 2012 at 09:04:27AM -0400, John Baldwin wrote:
> J> > 3) in practice taskqueue routine is a nightmare for many people since 
> J> > there is no way to stop "kernel {ix0 que}" thread eating 100% cpu after 
> J> > some traffic burst happens: once it is called it starts to schedule 
> J> > itself more and more replacing original ISR routine. Additionally, 
> J> > increasing rx_process_limit does not help since taskqueue is called with 
> J> > the same limit. Finally, currently netisr taskq threads are not bound to 
> J> > any CPU which makes the process even more uncontrollable.
> J> 
> J> I think part of the problem here is that the taskqueue in ixgbe(4) is
> J> bogusly rescheduled for TX handling.  Instead, ixgbe_msix_que() should
> J> just start transmitting packets directly.
> J> 
> J> I fixed this in igb(4) here:
> J> 
> J> http://svnweb.freebsd.org/base?view=revision&revision=233708
> 
> The problem Alexander describes in 3) definitely wasn't fixed in r233708.
> 
> It is still present in head/, and it prevents me to do good benchmarking
> of pf(4) on igb(4).
> 
> The problem is related to RX handling, so I don't see how r233708 could
> fix it.

Before 233708, if you had a single TX packet waiting to go out and an RX
interrupt arrived, the task queue would be constantly reschedule causing
it to effectively spin at 100% until the TX packet was completely transmitted
and the hardware had updated the descriptor to mark it as complete.  In fact,
as long as you have any pending TX packets at all it will keep spinning until
it gets into a state where you have no pending TX packets (so a steady stream
of TX packets, including, say ACKs would cause the taskqueue to run forever).

In general I think that with MSI-X you should just use an RX processing limit
of -1.  Anything else is just adding overhead in the form of extra context
switches.  Neither the task or the MSI-X interrupt handler are on a thread
that is shared with any other tasks or handlers, so all that scheduling (or
rescheduling) the task will do is result in the task being immediately run
(after either a context switch or returning back to the main loop of the
taskqueue thread).

If you look at the drivers, if a burst of RX traffic ends, the taskqueue
should stop running and stop polling the hardware.  It is only the TX side
that gets stuck needlessly polling.  The watchdog timer rescheduling the
handler once a second when there is no watchdog condition doesn't help
matters either, but I think that is unique to ixgbe(4).

It would be good if you could determine exactly why igb thinks it needs to
reschedule the taskqueue in your test case on igb(4) post 233708.

-- 
John Baldwin