Date: Wed, 25 Oct 2006 11:33:06 -0700 From: "Jack Vogel" <jfvogel@gmail.com> To: "Doug Ambrisko" <ambrisko@ambrisko.com> Cc: freebsd-net <freebsd-net@freebsd.org>, Scott Long <scottl@samsco.org>, John Polstra <jdp@polstra.com> Subject: Re: em network issues Message-ID: <2a41acea0610251133s7eadf41fn937aa6c43e6136a2@mail.gmail.com> In-Reply-To: <200610251818.k9PIIe7p062530@ambrisko.com> References: <XFMail.20061019152433.jdp@polstra.com> <200610251818.k9PIIe7p062530@ambrisko.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On 10/25/06, Doug Ambrisko <ambrisko@ambrisko.com> wrote: > John Polstra writes: > | On 19-Oct-2006 Scott Long wrote: > | > The performance measurements that Andre and I did early this year showed > | > that the INTR_FAST handler provided a very large benefit. > | > | I'm trying to understand why that's the case. Is it because an > | INTR_FAST interrupt doesn't have to be masked and unmasked in the > | APIC? I can't see any other reason for much of a performance > | difference in that driver. With or without INTR_FAST, you've got > | the bulk of the work being done in a background thread -- either the > | ithread or the taskqueue thread. It's not clear to me that it's any > | cheaper to run a task than it is to run an ithread. > | > | A difference might show up if you had two or more em devices sharing > | the same IRQ. Then they'd share one ithread, but would each get their > | own taskqueue thread. But sharing an IRQ among multiple gigabit NICs > | would be avoided by anyone who cared about performance, so it's not a > | very interesting case. Besides, when you first committed this > | stuff, INTR_FAST interrupts were not sharable. > | > | Another change you made in the same commit (if_em.c revision 1.98) > | greatly reduced the number of PCI writes made to the RX ring consumer > | pointer register. That would yield a significant performance > | improvement. Did you see gains from INTR_FAST even without this > | independent change? > > Something that we've fixed locally in atleast one version is: > 1) Limit the loop in em_intr to 3 iterations > 2) Pass a valid value to em_process_receive_interrupts/em_rxeof > a good value like 100 instead of -1. Since this is the count > for how many time to iterate over the rx stuff. Seems this > got lost in the some change of APIs. > 3) In em_process_receive_interrupts/em_rxeof always decrement > the count on every run through the loop. If you notice > count is an is an int that starts at the passed in value > of -1. It then count-- until count==0. Doing -1, -2, -3 > takes awhile until the int rolls over to 0. Passing 100 > limits it more :-) So this can run 3 * 100 versuses > infinite * int roll over assuming we don't skip a count--. > Doing these changes made our multiple em based machines a lot happier > when slammed with traffic without starving other things that shared > interrupts like other em cards (especially in 4.X). Interrupt handler > should have limits of how long they should be able to run then let > someone else go. We use this in 6.X as well and haven't had any problems > with our config's that use this. We haven't tested much without these > since we need to fix other issues and this is now a non-issue for us. > > I haven't pushed this more since I first found issue 1 and the concept > was rejected ... my machine hung in the interrupt spin loop :-( > > If someone wants to examine/play with it more then that's great. > These issues (I think they are bugs) have been in there a while. > > That's my 2 cents. > > Doug A. Interesting, I had forgotten about a couple of these issues. Timely email because I now have a test setup that has repro'd at least one version of the reported problems and I am currently debugging. This is something I can test. Thanks Doug, Jack
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?2a41acea0610251133s7eadf41fn937aa6c43e6136a2>