From owner-freebsd-current@FreeBSD.ORG Sun Mar 22 22:06:54 2009 Return-Path: Delivered-To: current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 2C1061065672 for ; Sun, 22 Mar 2009 22:06:54 +0000 (UTC) (envelope-from barney_cordoba@yahoo.com) Received: from web63902.mail.re1.yahoo.com (web63902.mail.re1.yahoo.com [69.147.97.117]) by mx1.freebsd.org (Postfix) with SMTP id C250A8FC12 for ; Sun, 22 Mar 2009 22:06:53 +0000 (UTC) (envelope-from barney_cordoba@yahoo.com) Received: (qmail 24659 invoked by uid 60001); 22 Mar 2009 22:06:53 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1237759612; bh=EUu1oJsclGckG0sI7nqKzE2owGMf+4uNfqkAlHTQE3E=; h=Message-ID:X-YMail-OSG:Received:X-Mailer:Date:From:Reply-To:Subject:To:Cc:In-Reply-To:MIME-Version:Content-Type; b=B66calKP2JdMY0RvLe3WLg5TeolljLKmcNUBNO23ytjvffAUPieR6j1PpKFEdrhbOfChNvXDgcZ+HcN48aNifShj9IpkMs5c8BgVVYinwiG0djn0flMRmqAXAjXRrJCbaWpPgga3E0FuZLoVe46H740195SqGKiUY8+c/u00XoM= DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=Message-ID:X-YMail-OSG:Received:X-Mailer:Date:From:Reply-To:Subject:To:Cc:In-Reply-To:MIME-Version:Content-Type; b=okksd6mX1U5Ig7ikGtFejTSn8tZbxN2hui4O3ol7HH972QBBXxf1LvW8H6lHgXZbgyWjmG0rf9V+iiC5AVdo4HSlVKU7H1Lmv4lPBa1hti0Q4GlPM7zsx+6LRhVWaZyDm+StX4BDmfEdTPLfAb8ukutTApoK8mE3xbagjri/teE=; Message-ID: <976309.24341.qm@web63902.mail.re1.yahoo.com> X-YMail-OSG: vyMC92gVM1nin.04CQeVQdnmC9ZC.bWM3yQQNT1QT3eMuxJ6Z1KsAVOAvQtkZtFiGPE9i24d1dhiRdsYjUGeAhTyMUMi7AZ7.IuOj1LMyQjkBkkRkelMz31cT5fppyFunmvD8fFB9RWjMT8BJlsHaGpUeZ167iJbyHIzMX9WcNRd_KJkH2oY6Ix40Lu5wCUnX4aYTkQwQ32qFVeuGG91o4EGJBr8v4Ju Received: from [98.242.222.229] by web63902.mail.re1.yahoo.com via HTTP; Sun, 22 Mar 2009 15:06:52 PDT X-Mailer: YahooMailWebService/0.7.289.1 Date: Sun, 22 Mar 2009 15:06:52 -0700 (PDT) From: Barney Cordoba To: Scott Long In-Reply-To: <20090318150721.U22014@pooker.samsco.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: current@freebsd.org Subject: Re: Interrupt routine usage not shown by top in 8.0 X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: barney_cordoba@yahoo.com List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 22 Mar 2009 22:06:54 -0000 --- On Wed, 3/18/09, Scott Long wrote: > From: Scott Long > Subject: Re: Interrupt routine usage not shown by top in 8.0 > To: "Barney Cordoba" > Cc: "Sam Leffler" , current@freebsd.org > Date: Wednesday, March 18, 2009, 5:25 PM > On Wed, 18 Mar 2009, Barney Cordoba wrote: > > --- On Wed, 3/18/09, Scott Long > wrote: > >> > >> Filters were introduced into the em driver to get > around a > >> problem in > >> certain Intel chipsets that caused aliased > interrupts. > >> That's a > >> different topic of discussion that you are welcome > to > >> search the mail > >> archives on. The filter also solves performance > and > >> latency problems > >> that are inherent to the ithread model when > interrupts are > >> shared > >> between multiple devices. This is especially bad > when a > >> high speed > >> device like em shares an interrupt with a low > speed device > >> like usb. > >> In the course of testing and validating the filter > work, I > >> found that > >> filters caused no degradation in performance or > excess > >> context switches, > >> while cleanly solving the above two problems that > were > >> common on > >> workstation and server class machines of only a > few years > >> ago. > >> > >> However, both of these problems stemmed from using > legacy > >> PCI > >> interrupts. At the time, MSI was still very new > and very > >> unreliable. > >> As the state of the art progressed and MSI became > more > >> reliable, its > >> use has become more common and is the default in > several > >> drivers. The > >> igb and ixgbe drivers and hardware both prefer MSI > over > >> legacy > >> interrupts, while the em driver and hardware still > has a > >> lot of legacy > >> hardware to deal with. So when MSI is the > >> common/expected/default case, > >> there is less of a need for the filter/taskqueue > method. > >> > >> Filters rely on the driver being able to reliably > control > >> the interrupt > >> enable state of the hardware. This is possible > with em > >> hardware, but > >> not as reliable with bge hardware, so the stock > driver code > >> does not > >> have it implemented. I am running a > filter-enabled bge > >> driver in > >> large-scale production, but I also have precise > control > >> over the > >> hardware being used. I also have filter patches > for the > >> bce driver, but > >> bce also tends to prefer MSI, so there isn't > a > >> compelling reason to > >> continue to develop the patches. > >> > >> > >> Scott > > > > Assuming same technique is used within an ithread as > with a fast > > interrupt, that is: > > > > filtered_foo(){ > > taskqueue_enqueue(); > > return FILTER_HANDLED; > > } > > This will give you two context switches, one for the actual > interrupt, and > one for the taskqueue. It'll also encounter a spinlock > in the taskqueue > code, and a spinlock or two in the scheduler. > > > > > ithread_foo(){ > > taskqueue_enqueue(); > > return; > > } > > > > Is there any additional overhead/locking in the > ithread method? I'm > > looking to get better control over cpu distribution. > > > > This will give you 3 context switches. First one will be > for the actual > interrupts. Second one will be for the ithread (recall > that ithreads are > full process contexts and are scheduled as such). Third > one will be for > the taskqueue. Along with the spinlocks for the scheduler > and taskqueue > code mentioned above, there will also be spinlocks to > protect the APIC > registers, as well as extra bus cycles to service the APIC. > > So, that's 2 trips through the scheduler, plus the > associated spinlocks, > plus the overhead of going through the APIC code, whereas > the first method > only goes through the scheduler once. Both will have a > context switch to > service the low-level interrupt. The second method will > definitely have > more context switches, and will almost certainly have > higher overall > service latency and CPU usage. > > Scott Scott, I'm sure you're going to yell at me, but here I go anyway. I set up a little task that basically does: foo_task(){ while(1){ foo_doreceive(); pause("foo",1); } } which wakes hz times per second in 7 and hz/2 times per second in 8. The same accounting issue exists for this case, as I have it bridging 400K pps and usage shows 0 most of the time. I've added some firewall rules which should substantially increase the load, but still no usage. If I really hammer it, like 600Kpps, it starts registering 30% usage, with no ramp up in between. I suppose it could be just falling out of the cache or something, but it doesn't seem realistic. Is there some hack I can implement to make sure a task is accounted for, or some other way to monitor its usage? Barney