From owner-freebsd-net@FreeBSD.ORG Mon Jan 21 20:30:03 2013 Return-Path: Delivered-To: freebsd-net@smarthost.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id A41B09C0 for ; Mon, 21 Jan 2013 20:30:03 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:1900:2254:206c::16:87]) by mx1.freebsd.org (Postfix) with ESMTP id 86689B5E for ; Mon, 21 Jan 2013 20:30:03 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.6/8.14.6) with ESMTP id r0LKU3k3068338 for ; Mon, 21 Jan 2013 20:30:03 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.6/8.14.6/Submit) id r0LKU3wi068337; Mon, 21 Jan 2013 20:30:03 GMT (envelope-from gnats) Date: Mon, 21 Jan 2013 20:30:03 GMT Message-Id: <201301212030.r0LKU3wi068337@freefall.freebsd.org> To: freebsd-net@FreeBSD.org Cc: From: Jack Vogel Subject: Re: kern/172113: [panic] [e1000] [patch] 9.1-RC1/amd64 panices in igb(4): m_getjcl: invalid cluster type X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list Reply-To: Jack Vogel List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 21 Jan 2013 20:30:03 -0000 The following reply was made to PR kern/172113; it has been noted by GNATS. From: Jack Vogel To: George Neville-Neil Cc: John Baldwin , bug-followup@freebsd.org, egrosbein@rdtc.ru, jfv@freebsd.org Subject: Re: kern/172113: [panic] [e1000] [patch] 9.1-RC1/amd64 panices in igb(4): m_getjcl: invalid cluster type Date: Mon, 21 Jan 2013 12:28:40 -0800 --f46d04339ce484676004d3d24e43 Content-Type: text/plain; charset=ISO-8859-1 Well, do you have a more complete designation of the motherboard? We can look into it, although if the one check stops the problem it may be a low priority. Jack On Mon, Jan 21, 2013 at 11:25 AM, George Neville-Neil wrote: > > On Jan 19, 2013, at 23:26 , John Baldwin wrote: > > > I was able to finally reproduce this panic today. It seems to require > > a server configured for PXE but that receives no DHCP reply (and > > possibly with the requisite SuperMicro X8 board). I was able to > > prevent the panic with a subset of the referenced patch by only adding > > the 'if_drv_flags & IFF_DRV_RUNNING' check to the start of > > igb_msix_que(). The rest of the patch was unnecessary. I also added > > some debugging to print out the ICR, EICR, IMS, and EIMS registers in > > this case. It does look like the hardware is sending an interrupt that > > is not enabled in the interrupt mask (specifically LSC). In fact, the > > 82576 datasheet specifically mentions masking LSC until initialization > > is complete to avoid spurious interrupts during boot and AFAICT igb(4) > > does this since e1000_reset_hw() clears the interrupt mask via writes > > to IMC and doesn't re-enable interrupts until igb_init_locked() is > > invoked via 'ifconfig up'. Here is my debug output: > > > > SMP: AP CPU #6 Launched! > > SMP: AP CPU #4 Launched! > > stray irq0 > > igb0: interrupt on que 0: icr 0x1000004 eicr 0 > > ims 0 eims 0x80000000 > > > > Hmmm. Nothing clears EIMS. After some more debugging, I determined > > that e1000_reset_hw() always turns this bit in EIMS on, even if it is > > off before e1000_reset_hw() is called(!). I added explicit calls to > > igb_disable_intr() to clear EIMS after each call to e1000_reset_hw(). > > This removes the 'stray irq0', but I still get a spurious interrupt > > during boot (albeit with eims 0). I can use the IFF_DRV_RUNNING hack > > for now, but I think the real fix is something else. > > > > I think Jack will have to chime in on this one. Do you think it's all SM > X8 boards > or just the one we happen to have? I wonder if Jack or Jeffrey (the > testing guy he works > with) have access to the right board. > > Best, > George > > > --f46d04339ce484676004d3d24e43 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Well, do you have a more complete designation of the motherboard? We canlook into it, although if the one check stops the problem it may be a low = priority.

Jack


On Mon, Jan 21,= 2013 at 11:25 AM, George Neville-Neil <gnn@freebsd.org> wrote= :

On Jan 19, 2013, at 23:26 , John Baldwin <jhb@FreeBSD.org> wrote:

> I was able to finally reproduce this panic today. =A0It seems to requi= re
> a server configured for PXE but that receives no DHCP reply (and
> possibly with the requisite SuperMicro X8 board). =A0I was able to
> prevent the panic with a subset of the referenced patch by only adding=
> the 'if_drv_flags & IFF_DRV_RUNNING' check to the start of=
> igb_msix_que(). =A0The rest of the patch was unnecessary. =A0I also ad= ded
> some debugging to print out the ICR, EICR, IMS, and EIMS registers in<= br> > this case. =A0It does look like the hardware is sending an interrupt t= hat
> is not enabled in the interrupt mask (specifically LSC). =A0In fact, t= he
> 82576 datasheet specifically mentions masking LSC until initialization=
> is complete to avoid spurious interrupts during boot and AFAICT igb(4)=
> does this since e1000_reset_hw() clears the interrupt mask via writes<= br> > to IMC and doesn't re-enable interrupts until igb_init_locked() is=
> invoked via 'ifconfig up'. =A0Here is my debug output:
>
> SMP: AP CPU #6 Launched!
> SMP: AP CPU #4 Launched!
> stray irq0
> igb0: interrupt on que 0: icr 0x1000004 eicr 0
> =A0 =A0 ims 0 eims 0x80000000
>
> Hmmm. =A0 Nothing clears EIMS. =A0After some more debugging, I determi= ned
> that e1000_reset_hw() always turns this bit in EIMS on, even if it is<= br> > off before e1000_reset_hw() is called(!). =A0I added explicit calls to=
> igb_disable_intr() to clear EIMS after each call to e1000_reset_hw().<= br> > This removes the 'stray irq0', but I still get a spurious inte= rrupt
> during boot (albeit with eims 0). =A0I can use the IFF_DRV_RUNNING hac= k
> for now, but I think the real fix is something else.
>

I think Jack will have to chime in on this one. =A0Do you think it's al= l SM X8 boards
or just the one we happen to have? =A0I wonder if Jack or Jeffrey (the test= ing guy he works
with) have access to the right board.

Best,
George



--f46d04339ce484676004d3d24e43--