Date: Thu, 24 Nov 2005 15:22:11 +1100 From: Michael Vince <mv@roq.com> To: Scott Long <scottl@samsco.org> Cc: Kris Kennaway <kris@obsecurity.org>, current@freebsd.org, net@freebsd.org Subject: Re: em interrupt storm Message-ID: <43853FF3.2050103@roq.com> In-Reply-To: <43851B69.5090701@samsco.org> References: <20051123030304.GA84202@xor.obsecurity.org> <XFMail.20051122205449.jdp@polstra.com> <20051123084653.GA90927@xor.obsecurity.org> <43851A08.5080802@roq.com> <43851B69.5090701@samsco.org>
next in thread | previous in thread | raw e-mail | index | archive | help
Scott Long wrote: > Michael Vince wrote: > >> Kris Kennaway wrote: >> >>> On Tue, Nov 22, 2005 at 08:54:49PM -0800, John Polstra wrote: >>> >>> >>>> On 23-Nov-2005 Kris Kennaway wrote: >>>> >>>> >>>>> I am seeing the em driver undergoing an interrupt storm whenever the >>>>> amr driver receives interrupts. In this case I was running newfs on >>>>> the amr array and em0 was not in use: >>>>> >>>>> 28 root 1 -68 -187 0K 8K CPU1 1 0:32 53.98% >>>>> irq16: em0 >>>>> 36 root 1 -64 -183 0K 8K RUN 1 0:37 27.75% >>>>> irq24: amr0 >>>>> >>>>> # vmstat -i >>>>> interrupt total rate >>>>> irq1: atkbd0 2 0 >>>>> irq4: sio0 199 1 >>>>> irq6: fdc0 32 0 >>>>> irq13: npx0 1 0 >>>>> irq14: ata0 47 0 >>>>> irq15: ata1 931 5 >>>>> irq16: em0 6321801 37187 >>>>> irq24: amr0 28023 164 >>>>> cpu0: timer 337533 1985 >>>>> cpu1: timer 337285 1984 >>>>> Total 7025854 41328 >>>>> >>>>> When newfs finished (i.e. amr was idle), em0 stopped storming. >>>>> >>>>> MPTable: <INTEL SE7520BD22 > >>>>> >>>> >>>> >>>> This is the dreaded interrupt aliasing problem that several of us have >>>> experienced with this chipset. High-numbered interrupts alias down to >>>> interrupts in the range 16..19 (or maybe 16..23), a multiple of 8 less >>>> than the original interupt. >>>> >>>> Nobody knows what causes it, and nobody knows how to fix it. >>>> >>> >>> >>> >>> This would be good to document somewhere so that people don't either >>> accidentally buy this hardware, or know what to expect when they run >>> it. >>> >>> Kris >>> >>> >> This is Intels latest server chipset designs and Dell are putting >> that chipset in all their servers. >> Luckily I haven't not seen the problem on any of my Dell servers (as >> long as I am looking at this right). >> >> This server has been running for a long time. >> vmstat -i >> interrupt total rate >> irq1: atkbd0 6 0 >> irq4: sio0 23433 0 >> irq6: fdc0 10 0 >> irq8: rtc 2631238611 128 >> irq13: npx0 1 0 >> irq14: ata0 99 0 >> irq16: uhci0 1507608958 73 >> irq18: uhci2 42005524 2 >> irq19: uhci1 3 0 >> irq23: atapci0 151 0 >> irq46: amr0 41344088 2 >> irq64: em0 1513106157 73 >> irq0: clk 2055605782 99 >> Total 7790932823 379 >> >> This one just transfered over 8gigs of data in 77seconds with around >> 1000 simultaneous tcp connections under a load of 35. Both seem OK. >> vmstat -i >> interrupt total rate >> irq4: sio0 315 0 >> irq13: npx0 1 0 >> irq14: ata0 47 0 >> irq16: uhci0 2894669 2 >> irq18: uhci2 977413 0 >> irq23: ehci0 3 0 >> irq46: amr0 883138 0 >> irq64: em0 2890414 2 >> cpu0: timer 2763566717 1999 >> cpu3: timer 2763797300 1999 >> cpu1: timer 2763551479 1999 >> cpu2: timer 2763797870 1999 >> Total 11062359366 8004 >> >> Mike >> >> > > Looks like at least some of your interrupts are being aliased to > irq16, which just happens to be USB(uhci) in this case. Note that the > rate is > the same between irq64 and irq16, and the totals are pretty close. If > you don't need USB, I'd suggest turning it off. > > Scott Most of my Dell servers occasionally use the USB ports to serial out via tip using a usb2serial cable with the uplcom driver and then into another servers real serial port (sio) so its not really an option to disable USB. How much do you think it affects performance if the USB device is actually rarely used. I also have a 6-stable machine and noticed that the vmstat -i output lists the em and usb together, but em0 isn't used at all, em2 and em3 are the active ones, it doesn't seem reasonable that my usb serial usage would be that high for irq16 or could it be that em2 and em3 and also going through irq16 vmstat -i interrupt total rate irq4: sio0 228 0 irq14: ata0 47 0 irq16: em0 uhci0 917039 11 irq18: uhci2 54823 0 irq23: ehci0 3 0 irq46: amr0 45998 0 irq64: em2 898628 11 lapic0: timer 159140889 1999 Total 161057655 2024 Mike
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?43853FF3.2050103>