Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 16 Oct 2012 16:46:49 +0400
From:      "Alexander V. Chernikov" <melifaro@FreeBSD.org>
To:        John Baldwin <jhb@freebsd.org>
Cc:        freebsd-net@freebsd.org, Luigi Rizzo <rizzo@iet.unipi.it>, Jack Vogel <jfvogel@gmail.com>, net@freebsd.org
Subject:   Re: ixgbe & if_igb RX ring locking
Message-ID:  <507D5739.70509@FreeBSD.org>
In-Reply-To: <201210151414.27318.jhb@freebsd.org>
References:  <5079A9A1.4070403@FreeBSD.org> <201210150904.27567.jhb@freebsd.org> <20121015163210.GW89655@FreeBSD.org> <201210151414.27318.jhb@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On 15.10.2012 22:14, John Baldwin wrote:
> On Monday, October 15, 2012 12:32:10 pm Gleb Smirnoff wrote:
>> On Mon, Oct 15, 2012 at 09:04:27AM -0400, John Baldwin wrote:
>> J> > 3) in practice taskqueue routine is a nightmare for many people since
>> J> > there is no way to stop "kernel {ix0 que}" thread eating 100% cpu after
>> J> > some traffic burst happens: once it is called it starts to schedule
>> J> > itself more and more replacing original ISR routine. Additionally,
>> J> > increasing rx_process_limit does not help since taskqueue is called with
>> J> > the same limit. Finally, currently netisr taskq threads are not bound to
>> J> > any CPU which makes the process even more uncontrollable.
>> J>
>> J> I think part of the problem here is that the taskqueue in ixgbe(4) is
>> J> bogusly rescheduled for TX handling.  Instead, ixgbe_msix_que() should
>> J> just start transmitting packets directly.
>> J>
>> J> I fixed this in igb(4) here:
>> J>
>> J> http://svnweb.freebsd.org/base?view=revision&revision=233708
>>
>> The problem Alexander describes in 3) definitely wasn't fixed in r233708.
>>
>> It is still present in head/, and it prevents me to do good benchmarking
>> of pf(4) on igb(4).
>>
>> The problem is related to RX handling, so I don't see how r233708 could
>> fix it.
>
> Before 233708, if you had a single TX packet waiting to go out and an RX
> interrupt arrived, the task queue would be constantly reschedule causing
> it to effectively spin at 100% until the TX packet was completely transmitted
> and the hardware had updated the descriptor to mark it as complete.  In fact,
> as long as you have any pending TX packets at all it will keep spinning until
> it gets into a state where you have no pending TX packets (so a steady stream
> of TX packets, including, say ACKs would cause the taskqueue to run forever).
>
> In general I think that with MSI-X you should just use an RX processing limit
> of -1.  Anything else is just adding overhead in the form of extra context
Yes, this is the obvious next step after binding threads to CPUs.
> switches.  Neither the task or the MSI-X interrupt handler are on a thread
> that is shared with any other tasks or handlers, so all that scheduling (or
> rescheduling) the task will do is result in the task being immediately run
> (after either a context switch or returning back to the main loop of the
> taskqueue thread).

>
> If you look at the drivers, if a burst of RX traffic ends, the taskqueue
It is questionable if this behavior is good during burst:

1) Due to RX locking taskq eats signifficant (if not all) RX packets 
from given queue
2) Tasq can run on any cpu so this introduces possible out-of-order 
packets within connection which is bad for forwarding (and there were 
some problems in our TCP stack in the past). Additionally, this behavior 
is totally uncontrollable and unscalable (we run _one_ task _instead_ of 
RX handler) and leads to significant performance flapping on 
heavy-loaded forwarding setups.

> should stop running and stop polling the hardware.  It is only the TX side
> that gets stuck needlessly polling.  The watchdog timer rescheduling the
Unfortunately, until at least single call from driver to this function 
remains, it is possible that potential traffic burst can be consumed by 
tasq (especially if large rx_processing_limit is set).

If there are reasons not to change tasq RX processing behavior, maybe 
adding additional sysctl like:
ix.0.loop_forever = 1 can be a compromise?

e.g. main processing loop does not decrease 'count' variable if this 
loop_forever is set, and tasq invocation limit remains controlled by
current rx_processing_limit.

Nothing is changed by default, but people wishing to get predicable 
results simply set loop_forever to 1 and rx_processing_limit to 1 (or 0).



> handler once a second when there is no watchdog condition doesn't help
> matters either, but I think that is unique to ixgbe(4).
>
> It would be good if you could determine exactly why igb thinks it needs to
> reschedule the taskqueue in your test case on igb(4) post 233708.
>


-- 
WBR, Alexander





Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?507D5739.70509>