Date: Sun, 22 Mar 2009 15:06:52 -0700 (PDT) From: Barney Cordoba <barney_cordoba@yahoo.com> To: Scott Long <scottl@samsco.org> Cc: current@freebsd.org Subject: Re: Interrupt routine usage not shown by top in 8.0 Message-ID: <976309.24341.qm@web63902.mail.re1.yahoo.com> In-Reply-To: <20090318150721.U22014@pooker.samsco.org>
next in thread | previous in thread | raw e-mail | index | archive | help
--- On Wed, 3/18/09, Scott Long <scottl@samsco.org> wrote: > From: Scott Long <scottl@samsco.org> > Subject: Re: Interrupt routine usage not shown by top in 8.0 > To: "Barney Cordoba" <barney_cordoba@yahoo.com> > Cc: "Sam Leffler" <sam@freebsd.org>, current@freebsd.org > Date: Wednesday, March 18, 2009, 5:25 PM > On Wed, 18 Mar 2009, Barney Cordoba wrote: > > --- On Wed, 3/18/09, Scott Long > <scottl@samsco.org> wrote: > >> > >> Filters were introduced into the em driver to get > around a > >> problem in > >> certain Intel chipsets that caused aliased > interrupts. > >> That's a > >> different topic of discussion that you are welcome > to > >> search the mail > >> archives on. The filter also solves performance > and > >> latency problems > >> that are inherent to the ithread model when > interrupts are > >> shared > >> between multiple devices. This is especially bad > when a > >> high speed > >> device like em shares an interrupt with a low > speed device > >> like usb. > >> In the course of testing and validating the filter > work, I > >> found that > >> filters caused no degradation in performance or > excess > >> context switches, > >> while cleanly solving the above two problems that > were > >> common on > >> workstation and server class machines of only a > few years > >> ago. > >> > >> However, both of these problems stemmed from using > legacy > >> PCI > >> interrupts. At the time, MSI was still very new > and very > >> unreliable. > >> As the state of the art progressed and MSI became > more > >> reliable, its > >> use has become more common and is the default in > several > >> drivers. The > >> igb and ixgbe drivers and hardware both prefer MSI > over > >> legacy > >> interrupts, while the em driver and hardware still > has a > >> lot of legacy > >> hardware to deal with. So when MSI is the > >> common/expected/default case, > >> there is less of a need for the filter/taskqueue > method. > >> > >> Filters rely on the driver being able to reliably > control > >> the interrupt > >> enable state of the hardware. This is possible > with em > >> hardware, but > >> not as reliable with bge hardware, so the stock > driver code > >> does not > >> have it implemented. I am running a > filter-enabled bge > >> driver in > >> large-scale production, but I also have precise > control > >> over the > >> hardware being used. I also have filter patches > for the > >> bce driver, but > >> bce also tends to prefer MSI, so there isn't > a > >> compelling reason to > >> continue to develop the patches. > >> > >> > >> Scott > > > > Assuming same technique is used within an ithread as > with a fast > > interrupt, that is: > > > > filtered_foo(){ > > taskqueue_enqueue(); > > return FILTER_HANDLED; > > } > > This will give you two context switches, one for the actual > interrupt, and > one for the taskqueue. It'll also encounter a spinlock > in the taskqueue > code, and a spinlock or two in the scheduler. > > > > > ithread_foo(){ > > taskqueue_enqueue(); > > return; > > } > > > > Is there any additional overhead/locking in the > ithread method? I'm > > looking to get better control over cpu distribution. > > > > This will give you 3 context switches. First one will be > for the actual > interrupts. Second one will be for the ithread (recall > that ithreads are > full process contexts and are scheduled as such). Third > one will be for > the taskqueue. Along with the spinlocks for the scheduler > and taskqueue > code mentioned above, there will also be spinlocks to > protect the APIC > registers, as well as extra bus cycles to service the APIC. > > So, that's 2 trips through the scheduler, plus the > associated spinlocks, > plus the overhead of going through the APIC code, whereas > the first method > only goes through the scheduler once. Both will have a > context switch to > service the low-level interrupt. The second method will > definitely have > more context switches, and will almost certainly have > higher overall > service latency and CPU usage. > > Scott Scott, I'm sure you're going to yell at me, but here I go anyway. I set up a little task that basically does: foo_task(){ while(1){ foo_doreceive(); pause("foo",1); } } which wakes hz times per second in 7 and hz/2 times per second in 8. The same accounting issue exists for this case, as I have it bridging 400K pps and usage shows 0 most of the time. I've added some firewall rules which should substantially increase the load, but still no usage. If I really hammer it, like 600Kpps, it starts registering 30% usage, with no ramp up in between. I suppose it could be just falling out of the cache or something, but it doesn't seem realistic. Is there some hack I can implement to make sure a task is accounted for, or some other way to monitor its usage? Barney
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?976309.24341.qm>