From owner-freebsd-net@FreeBSD.ORG Fri Oct 20 00:35:18 2006 Return-Path: X-Original-To: freebsd-net@FreeBSD.org Delivered-To: freebsd-net@FreeBSD.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 516B916A4A0 for ; Fri, 20 Oct 2006 00:35:18 +0000 (UTC) (envelope-from scottl@samsco.org) Received: from pooker.samsco.org (pooker.samsco.org [168.103.85.57]) by mx1.FreeBSD.org (Postfix) with ESMTP id 1E3CE43D5D for ; Fri, 20 Oct 2006 00:35:15 +0000 (GMT) (envelope-from scottl@samsco.org) Received: from [192.168.254.11] (phobos.samsco.home [192.168.254.11]) (authenticated bits=0) by pooker.samsco.org (8.13.4/8.13.4) with ESMTP id k9K0K2gU040705; Thu, 19 Oct 2006 18:20:12 -0600 (MDT) (envelope-from scottl@samsco.org) Message-ID: <4538162E.1050006@samsco.org> Date: Thu, 19 Oct 2006 18:19:58 -0600 From: Scott Long User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en-US; rv:1.8.0.7) Gecko/20060910 SeaMonkey/1.0.5 MIME-Version: 1.0 To: Bruce Evans References: <20061020090022.V79425@delplex.bde.org> In-Reply-To: <20061020090022.V79425@delplex.bde.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-1.4 required=3.8 tests=ALL_TRUSTED autolearn=failed version=3.1.1 X-Spam-Checker-Version: SpamAssassin 3.1.1 (2006-03-10) on pooker.samsco.org Cc: freebsd-net , John Polstra Subject: Re: em network issues X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 20 Oct 2006 00:35:18 -0000 Bruce Evans wrote: > On Thu, 19 Oct 2006, John Polstra wrote: > >> On 19-Oct-2006 Scott Long wrote: >>> The performance measurements that Andre and I did early this year showed >>> that the INTR_FAST handler provided a very large benefit. >> >> I'm trying to understand why that's the case. Is it because an >> INTR_FAST interrupt doesn't have to be masked and unmasked in the >> APIC? I can't see any other reason for much of a performance >> difference in that driver. With or without INTR_FAST, you've got >> the bulk of the work being done in a background thread -- either the >> ithread or the taskqueue thread. It's not clear to me that it's any >> cheaper to run a task than it is to run an ithread. > > It's very unlikely to be because masking in the APIC is slow. The > APIC is fast compared with the PIC, and even with the PIC it takes a > very high interrupt rate (say 20 KHz) for the PIC overhead to become > noticeable (say 5-10%), Such interrupt rates may occur, but if they > do you've probably already lost. > > Previously I said that the difference might be due to interrupts > coalescing but that I wouldn't expect that to happen. Now I see how > it can happen on loaded systems: the system might be so loaded that > it often doesn't get around to running the task before a new device > interrupt would occur if device interrupts weren't turned off. The > scheduling of the task might accidentally be best or good enough. A > task might work better than a software ithread accidentally because > it has lower priority, and similarly, a software ithread might work > better than a hardware ithread. The lower-priority threads can also > be preempted, at least with PREEMPTION configured. This is bad for > them but good for whatever preempts them. Apart from this, it's _more_ > expensive to run a task plus an interrupt handler (even if the interrupt > handler is fast) than to run a single interrupt handler, and more > expensive to switch between the handlers, and more expensive again if > PREEMPTION actually has much effect -- then more switches occur. > That's all fine and good, but the em task thread runs at the same priority as a PI_NET ithread. The whole taskqueue thing was just a prototype for getting to ifilters. I've demonstrated positive results with it for aac, em, and mpt drivers. Scott >> A difference might show up if you had two or more em devices sharing >> the same IRQ. Then they'd share one ithread, but would each get their >> own taskqueue thread. But sharing an IRQ among multiple gigabit NICs >> would be avoided by anyone who cared about performance, so it's not a >> very interesting case. Besides, when you first committed this >> stuff, INTR_FAST interrupts were not sharable. > > Sharing an IRQ among a single gigabit NIC and other slower devices is > even less interesting :-). > > It can be hard to measure performance, especially when there are a lot > of threads or a lot of fast interrupts handlers. If the performance > benefits are due to accidental scheduling then they might vanish under > different loads. > It's easy to measure performance when you have a Smartbits. More kpps means more kpps. Thanks again to Andre for making this resource available. Scott