Date: Sun, 20 Jan 2013 04:30:01 GMT From: John Baldwin <jhb@FreeBSD.org> To: freebsd-net@FreeBSD.org Subject: Re: kern/172113: [panic] [e1000] [patch] 9.1-RC1/amd64 panices in igb(4): m_getjcl: invalid cluster type Message-ID: <201301200430.r0K4U1A4093891@freefall.freebsd.org>
next in thread | raw e-mail | index | archive | help
The following reply was made to PR kern/172113; it has been noted by GNATS. From: John Baldwin <jhb@FreeBSD.org> To: bug-followup@FreeBSD.org, egrosbein@rdtc.ru Cc: jfv@FreeBSD.org, George Neville-Neil <gnn@FreeBSD.org> Subject: Re: kern/172113: [panic] [e1000] [patch] 9.1-RC1/amd64 panices in igb(4): m_getjcl: invalid cluster type Date: Sat, 19 Jan 2013 23:26:17 -0500 I was able to finally reproduce this panic today. It seems to require a server configured for PXE but that receives no DHCP reply (and possibly with the requisite SuperMicro X8 board). I was able to prevent the panic with a subset of the referenced patch by only adding the 'if_drv_flags & IFF_DRV_RUNNING' check to the start of igb_msix_que(). The rest of the patch was unnecessary. I also added some debugging to print out the ICR, EICR, IMS, and EIMS registers in this case. It does look like the hardware is sending an interrupt that is not enabled in the interrupt mask (specifically LSC). In fact, the 82576 datasheet specifically mentions masking LSC until initialization is complete to avoid spurious interrupts during boot and AFAICT igb(4) does this since e1000_reset_hw() clears the interrupt mask via writes to IMC and doesn't re-enable interrupts until igb_init_locked() is invoked via 'ifconfig up'. Here is my debug output: SMP: AP CPU #6 Launched! SMP: AP CPU #4 Launched! stray irq0 igb0: interrupt on que 0: icr 0x1000004 eicr 0 ims 0 eims 0x80000000 Hmmm. Nothing clears EIMS. After some more debugging, I determined that e1000_reset_hw() always turns this bit in EIMS on, even if it is off before e1000_reset_hw() is called(!). I added explicit calls to igb_disable_intr() to clear EIMS after each call to e1000_reset_hw(). This removes the 'stray irq0', but I still get a spurious interrupt during boot (albeit with eims 0). I can use the IFF_DRV_RUNNING hack for now, but I think the real fix is something else. -- John Baldwin
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201301200430.r0K4U1A4093891>