Date: Thu, 19 Oct 2006 18:19:58 -0600 From: Scott Long <scottl@samsco.org> To: Bruce Evans <bde@zeta.org.au> Cc: freebsd-net <freebsd-net@FreeBSD.org>, John Polstra <jdp@polstra.com> Subject: Re: em network issues Message-ID: <4538162E.1050006@samsco.org> In-Reply-To: <20061020090022.V79425@delplex.bde.org> References: <XFMail.20061019152433.jdp@polstra.com> <20061020090022.V79425@delplex.bde.org>
next in thread | previous in thread | raw e-mail | index | archive | help
Bruce Evans wrote: > On Thu, 19 Oct 2006, John Polstra wrote: > >> On 19-Oct-2006 Scott Long wrote: >>> The performance measurements that Andre and I did early this year showed >>> that the INTR_FAST handler provided a very large benefit. >> >> I'm trying to understand why that's the case. Is it because an >> INTR_FAST interrupt doesn't have to be masked and unmasked in the >> APIC? I can't see any other reason for much of a performance >> difference in that driver. With or without INTR_FAST, you've got >> the bulk of the work being done in a background thread -- either the >> ithread or the taskqueue thread. It's not clear to me that it's any >> cheaper to run a task than it is to run an ithread. > > It's very unlikely to be because masking in the APIC is slow. The > APIC is fast compared with the PIC, and even with the PIC it takes a > very high interrupt rate (say 20 KHz) for the PIC overhead to become > noticeable (say 5-10%), Such interrupt rates may occur, but if they > do you've probably already lost. > > Previously I said that the difference might be due to interrupts > coalescing but that I wouldn't expect that to happen. Now I see how > it can happen on loaded systems: the system might be so loaded that > it often doesn't get around to running the task before a new device > interrupt would occur if device interrupts weren't turned off. The > scheduling of the task might accidentally be best or good enough. A > task might work better than a software ithread accidentally because > it has lower priority, and similarly, a software ithread might work > better than a hardware ithread. The lower-priority threads can also > be preempted, at least with PREEMPTION configured. This is bad for > them but good for whatever preempts them. Apart from this, it's _more_ > expensive to run a task plus an interrupt handler (even if the interrupt > handler is fast) than to run a single interrupt handler, and more > expensive to switch between the handlers, and more expensive again if > PREEMPTION actually has much effect -- then more switches occur. > That's all fine and good, but the em task thread runs at the same priority as a PI_NET ithread. The whole taskqueue thing was just a prototype for getting to ifilters. I've demonstrated positive results with it for aac, em, and mpt drivers. Scott >> A difference might show up if you had two or more em devices sharing >> the same IRQ. Then they'd share one ithread, but would each get their >> own taskqueue thread. But sharing an IRQ among multiple gigabit NICs >> would be avoided by anyone who cared about performance, so it's not a >> very interesting case. Besides, when you first committed this >> stuff, INTR_FAST interrupts were not sharable. > > Sharing an IRQ among a single gigabit NIC and other slower devices is > even less interesting :-). > > It can be hard to measure performance, especially when there are a lot > of threads or a lot of fast interrupts handlers. If the performance > benefits are due to accidental scheduling then they might vanish under > different loads. > It's easy to measure performance when you have a Smartbits. More kpps means more kpps. Thanks again to Andre for making this resource available. Scott
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4538162E.1050006>