Date: Thu, 24 Nov 2005 12:40:24 +1100 From: Michael Vince <mv@roq.com> To: Kris Kennaway <kris@obsecurity.org> Cc: net@freebsd.org, current@freebsd.org, John Polstra <jdp@polstra.com> Subject: Re: em interrupt storm Message-ID: <43851A08.5080802@roq.com> In-Reply-To: <20051123084653.GA90927@xor.obsecurity.org> References: <20051123030304.GA84202@xor.obsecurity.org> <XFMail.20051122205449.jdp@polstra.com> <20051123084653.GA90927@xor.obsecurity.org>
next in thread | previous in thread | raw e-mail | index | archive | help
Kris Kennaway wrote: >On Tue, Nov 22, 2005 at 08:54:49PM -0800, John Polstra wrote: > > >>On 23-Nov-2005 Kris Kennaway wrote: >> >> >>>I am seeing the em driver undergoing an interrupt storm whenever the >>>amr driver receives interrupts. In this case I was running newfs on >>>the amr array and em0 was not in use: >>> >>> 28 root 1 -68 -187 0K 8K CPU1 1 0:32 53.98% irq16: em0 >>> 36 root 1 -64 -183 0K 8K RUN 1 0:37 27.75% irq24: amr0 >>> >>># vmstat -i >>>interrupt total rate >>>irq1: atkbd0 2 0 >>>irq4: sio0 199 1 >>>irq6: fdc0 32 0 >>>irq13: npx0 1 0 >>>irq14: ata0 47 0 >>>irq15: ata1 931 5 >>>irq16: em0 6321801 37187 >>>irq24: amr0 28023 164 >>>cpu0: timer 337533 1985 >>>cpu1: timer 337285 1984 >>>Total 7025854 41328 >>> >>>When newfs finished (i.e. amr was idle), em0 stopped storming. >>> >>>MPTable: <INTEL SE7520BD22 > >>> >>> >>This is the dreaded interrupt aliasing problem that several of us have >>experienced with this chipset. High-numbered interrupts alias down to >>interrupts in the range 16..19 (or maybe 16..23), a multiple of 8 less >>than the original interupt. >> >>Nobody knows what causes it, and nobody knows how to fix it. >> >> > >This would be good to document somewhere so that people don't either >accidentally buy this hardware, or know what to expect when they run >it. > >Kris > > This is Intels latest server chipset designs and Dell are putting that chipset in all their servers. Luckily I haven't not seen the problem on any of my Dell servers (as long as I am looking at this right). This server has been running for a long time. vmstat -i interrupt total rate irq1: atkbd0 6 0 irq4: sio0 23433 0 irq6: fdc0 10 0 irq8: rtc 2631238611 128 irq13: npx0 1 0 irq14: ata0 99 0 irq16: uhci0 1507608958 73 irq18: uhci2 42005524 2 irq19: uhci1 3 0 irq23: atapci0 151 0 irq46: amr0 41344088 2 irq64: em0 1513106157 73 irq0: clk 2055605782 99 Total 7790932823 379 This one just transfered over 8gigs of data in 77seconds with around 1000 simultaneous tcp connections under a load of 35. Both seem OK. vmstat -i interrupt total rate irq4: sio0 315 0 irq13: npx0 1 0 irq14: ata0 47 0 irq16: uhci0 2894669 2 irq18: uhci2 977413 0 irq23: ehci0 3 0 irq46: amr0 883138 0 irq64: em0 2890414 2 cpu0: timer 2763566717 1999 cpu3: timer 2763797300 1999 cpu1: timer 2763551479 1999 cpu2: timer 2763797870 1999 Total 11062359366 8004 Mike
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?43851A08.5080802>