From owner-freebsd-arch@FreeBSD.ORG Fri Sep 17 15:23:49 2010 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 85E35106567A for ; Fri, 17 Sep 2010 15:23:49 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 598808FC1A for ; Fri, 17 Sep 2010 15:23:49 +0000 (UTC) Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net [66.111.2.69]) by cyrus.watson.org (Postfix) with ESMTPSA id 0E57E46BC0 for ; Fri, 17 Sep 2010 11:23:49 -0400 (EDT) Received: from jhbbsd.localnet (smtp.hudson-trading.com [209.249.190.9]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 3650D8A050 for ; Fri, 17 Sep 2010 11:23:48 -0400 (EDT) From: John Baldwin To: arch@freebsd.org Date: Fri, 17 Sep 2010 11:23:39 -0400 User-Agent: KMail/1.13.5 (FreeBSD/7.3-CBSD-20100819; KDE/4.4.5; amd64; ; ) MIME-Version: 1.0 Content-Type: Text/Plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Message-Id: <201009171123.39382.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.0.1 (bigwig.baldwin.cx); Fri, 17 Sep 2010 11:23:48 -0400 (EDT) X-Virus-Scanned: clamav-milter 0.95.1 at bigwig.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=-2.6 required=4.2 tests=AWL,BAYES_00 autolearn=ham version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on bigwig.baldwin.cx Cc: Subject: Interrupt Threads X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 17 Sep 2010 15:23:49 -0000 I have wanted to rework some of the interrupt threads stuff and enable interrupt filters by default for a while. I finally sat down and hacked out a new ithreads implementation at BSDCan and the following week. The new ithreads stuff moves away from dedicated threads per handlers or irqs. Instead, it adopts a model more akin to what Solaris does (though probably not completely identical). Each CPU has a queue of "pending handlers". When an interrupt fires, all of the handlers for that interrupt are placed on to that CPU's queue. There is a pool of hardware interrupt threads. If the current CPU does not already have an active hardware interrupt thread, it grabs a free one from the pool, pins it to the current CPU, and schedules it. The ithread continues to drain interrupt handlers from its CPU's queue until the queue is empty. Once that happens it disassociates itself from the CPU and goes back into the free pool. The effect is that interrupt handlers are now sort of like DPCs in Windows. If an interrupt handler blocks on a turnstile and there are other handlers pending for this CPU, then the current ithread is divorced from the current CPU and a new ithread is allocated for the current CPU. If we ever fail to allocate an ithread for a given CPU, then a flag is set. All ithreads check that flag before going idle, and if it is set they find the first CPU that needs an ithread and move to that CPU and start draining events. The ithread pool can be dynamically resized at runtime via sysctl, but it can't be smaller than NCPU * 2 or larger than the total number of handlers. Interrupt filters fit into this nicely since this avoids the problem with old interrupt filters that if you fix its design bug it may need to schedule multiple ithreads. Now it still only schedules at most one ithread per interrupt. To handle masking the interrupt and unmasking it when filters w/o handlers complete, I use a simple reference count with atomic ops to keep track of the number of queued handlers that need the interrupt masked and unmask it once the count drops to 0. Software interrupts still use a dedicated ithread, but the queue of pending handlers lives in the ithread, not in the CPU. I've also added some extensions to the current ithreads stuff based on some tricks that existing drivers use. Specifically, an interrupt handler can now call hwi_sched() on itself to reschedule itself at the back of the current CPU's queue. Thus, you can have NIC interrupt handlers do cooperative timesharing by just punting after N packets and using hwi_sched() to reschedule themselves. I also added a new type of interrupt handler that is registered with INTR_MANUAL. It is never automatically scheduled, but a filter can schedule it. As a test, I've ported the igb(4) driver to this framework. It uses hwi_sched() and an INTR_MANUAL handler for link events to replace almost all of the taskqueue usage in igb(4). (The multiqueue transmit bits still need a task for one case, but all the interrupt handler stuff is now "simpler"). Some downsides to this approach include: 1) If you have two busy devices whose interrupts both go to the same CPU but via different IRQs, in the old model those threads could run concurrently on separate CPUs, but in the new model the handlers are tied to the same CPU and compete for CPU time on that CPU. In other words, the new model really wants interrupts to be evenly distributed amongst CPUs to work properly. Not entirely sure what I think about that. 2) Many folks find the ability to see how much CPU IRQ N's thread has used in top useful, but this loses all of that since there is no longer a tight coupling between IRQs and threads. One unresolved issue is that the cardbus code currently uses a filter that returns just FILTER_SCHEDULE_THREAD without FILTER_HANDLED. This is not supported in the new code. I have some ideas on how to fix the cardbus code (most likely using wrappers around the child interrupt handlers) but need to has the details out with Warner. A second unresolved issue is that interrupt storm detection is currently broken. I have some thoughts on how to readd it, but it will likely be a bit tricky. The code currently lives in p4 at //depot/user/jhb/intr/... I have also put up a patch at http://www.freebsd.org/~jhb/patches/intr_threads.patch. This patch includes the changes to the igb(4) driver. -- John Baldwin