Date: Tue, 16 Oct 2012 12:09:55 -0400 From: John Baldwin <jhb@freebsd.org> To: "Alexander V. Chernikov" <melifaro@freebsd.org> Cc: freebsd-net@freebsd.org, Luigi Rizzo <rizzo@iet.unipi.it>, Jack Vogel <jfvogel@gmail.com>, net@freebsd.org Subject: Re: ixgbe & if_igb RX ring locking Message-ID: <201210161209.55979.jhb@freebsd.org> In-Reply-To: <507D5739.70509@FreeBSD.org> References: <5079A9A1.4070403@FreeBSD.org> <201210151414.27318.jhb@freebsd.org> <507D5739.70509@FreeBSD.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Tuesday, October 16, 2012 8:46:49 am Alexander V. Chernikov wrote: > On 15.10.2012 22:14, John Baldwin wrote: > > On Monday, October 15, 2012 12:32:10 pm Gleb Smirnoff wrote: > >> On Mon, Oct 15, 2012 at 09:04:27AM -0400, John Baldwin wrote: > >> J> > 3) in practice taskqueue routine is a nightmare for many people since > >> J> > there is no way to stop "kernel {ix0 que}" thread eating 100% cpu after > >> J> > some traffic burst happens: once it is called it starts to schedule > >> J> > itself more and more replacing original ISR routine. Additionally, > >> J> > increasing rx_process_limit does not help since taskqueue is called with > >> J> > the same limit. Finally, currently netisr taskq threads are not bound to > >> J> > any CPU which makes the process even more uncontrollable. > >> J> > >> J> I think part of the problem here is that the taskqueue in ixgbe(4) is > >> J> bogusly rescheduled for TX handling. Instead, ixgbe_msix_que() should > >> J> just start transmitting packets directly. > >> J> > >> J> I fixed this in igb(4) here: > >> J> > >> J> http://svnweb.freebsd.org/base?view=revision&revision=233708 > >> > >> The problem Alexander describes in 3) definitely wasn't fixed in r233708. > >> > >> It is still present in head/, and it prevents me to do good benchmarking > >> of pf(4) on igb(4). > >> > >> The problem is related to RX handling, so I don't see how r233708 could > >> fix it. > > > > Before 233708, if you had a single TX packet waiting to go out and an RX > > interrupt arrived, the task queue would be constantly reschedule causing > > it to effectively spin at 100% until the TX packet was completely transmitted > > and the hardware had updated the descriptor to mark it as complete. In fact, > > as long as you have any pending TX packets at all it will keep spinning until > > it gets into a state where you have no pending TX packets (so a steady stream > > of TX packets, including, say ACKs would cause the taskqueue to run forever). > > > > In general I think that with MSI-X you should just use an RX processing limit > > of -1. Anything else is just adding overhead in the form of extra context > Yes, this is the obvious next step after binding threads to CPUs. > > switches. Neither the task or the MSI-X interrupt handler are on a thread > > that is shared with any other tasks or handlers, so all that scheduling (or > > rescheduling) the task will do is result in the task being immediately run > > (after either a context switch or returning back to the main loop of the > > taskqueue thread). > > > > > If you look at the drivers, if a burst of RX traffic ends, the taskqueue > It is questionable if this behavior is good during burst: > > 1) Due to RX locking taskq eats signifficant (if not all) RX packets > from given queue > 2) Tasq can run on any cpu so this introduces possible out-of-order > packets within connection which is bad for forwarding (and there were > some problems in our TCP stack in the past). Additionally, this behavior > is totally uncontrollable and unscalable (we run _one_ task _instead_ of > RX handler) and leads to significant performance flapping on > heavy-loaded forwarding setups. The taskqueue and interrupt handler should never run concurrently. If they are doing so now, that is a _bug_ and my patch fixes some of those already. Just as r233708 fixed similar bugs in igb. Normally the interrupt handler should disable the specific MSI-X interrupt when it schedules the task, and the interrupt is not re-enabled until the task decides it doesn't need to reschedule itself. If this is done correctly, then you shouldn't see RX lock contention unless someone is doing 'ifconfig' or something else that triggers an ioctl. Anything else is just papering over these bugs (which are quite bad since they result in out-of-order handling besides the lock contention). In fact, my original motivation for using a separate TX-only task for the if_transmit case for igb was specifically to avoid out-of-order processing on RX, not to prevent lock contention. Can you describe the specific situation in which you now see both the task and the interrupt handler running concurrently? Do you have KTR traces from KTR_SCHED perhaps? -- John Baldwin
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201210161209.55979.jhb>