Date: Mon, 21 Jan 2013 19:30:01 GMT From: George Neville-Neil <gnn@FreeBSD.org> To: freebsd-net@FreeBSD.org Subject: Re: kern/172113: [panic] [e1000] [patch] 9.1-RC1/amd64 panices in igb(4): m_getjcl: invalid cluster type Message-ID: <201301211930.r0LJU1jl057381@freefall.freebsd.org>
next in thread | raw e-mail | index | archive | help
The following reply was made to PR kern/172113; it has been noted by GNATS. From: George Neville-Neil <gnn@FreeBSD.org> To: John Baldwin <jhb@FreeBSD.org> Cc: bug-followup@FreeBSD.org, egrosbein@rdtc.ru, jfv@FreeBSD.org Subject: Re: kern/172113: [panic] [e1000] [patch] 9.1-RC1/amd64 panices in igb(4): m_getjcl: invalid cluster type Date: Mon, 21 Jan 2013 14:25:00 -0500 On Jan 19, 2013, at 23:26 , John Baldwin <jhb@FreeBSD.org> wrote: > I was able to finally reproduce this panic today. It seems to require > a server configured for PXE but that receives no DHCP reply (and > possibly with the requisite SuperMicro X8 board). I was able to > prevent the panic with a subset of the referenced patch by only adding > the 'if_drv_flags & IFF_DRV_RUNNING' check to the start of > igb_msix_que(). The rest of the patch was unnecessary. I also added > some debugging to print out the ICR, EICR, IMS, and EIMS registers in > this case. It does look like the hardware is sending an interrupt = that > is not enabled in the interrupt mask (specifically LSC). In fact, the > 82576 datasheet specifically mentions masking LSC until initialization > is complete to avoid spurious interrupts during boot and AFAICT igb(4) > does this since e1000_reset_hw() clears the interrupt mask via writes > to IMC and doesn't re-enable interrupts until igb_init_locked() is > invoked via 'ifconfig up'. Here is my debug output: >=20 > SMP: AP CPU #6 Launched! > SMP: AP CPU #4 Launched! > stray irq0 > igb0: interrupt on que 0: icr 0x1000004 eicr 0 > ims 0 eims 0x80000000 >=20 > Hmmm. Nothing clears EIMS. After some more debugging, I determined > that e1000_reset_hw() always turns this bit in EIMS on, even if it is > off before e1000_reset_hw() is called(!). I added explicit calls to > igb_disable_intr() to clear EIMS after each call to e1000_reset_hw(). > This removes the 'stray irq0', but I still get a spurious interrupt > during boot (albeit with eims 0). I can use the IFF_DRV_RUNNING hack > for now, but I think the real fix is something else. >=20 I think Jack will have to chime in on this one. Do you think it's all = SM X8 boards or just the one we happen to have? I wonder if Jack or Jeffrey (the = testing guy he works with) have access to the right board. Best, George
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201301211930.r0LJU1jl057381>