Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 31 Oct 2012 12:58:36 -0400
From:      Charles Owens <cowens@greatbaysoftware.com>
To:        freebsd-stable@freebsd.org, Steve McCoy <smccoy@greatbaysoftware.com>,  Jack Vogel <jfvogel@gmail.com>, jdc@parodius.com
Subject:   Panic during kernel boot, igb-init related? (8.3-RELEASE)
Message-ID:  <509158BC.7090901@greatbaysoftware.com>

next in thread | raw e-mail | index | archive | help
Hello,

We're seeing boot-time panics in about 4% of cases when upgrading from 
FreeBSD 8.1 to 8.3-RELEASE (i386).  This problem is subtle enough that 
it escaped detection during our regular testing cycle... now with over 
100 systems upgraded we're convinced there's a real issue.  Our kernel 
config is essentially PAE (ie. static modules ... with a few drivers 
added/removed).  The hardware is Intel Server System SR1625UR.

This appears to match a finding discussed in these threads, having to do 
with timing of initialization of the igb(4)-based NICs (if I'm 
understanding it properly):

http://lists.freebsd.org/pipermail/freebsd-stable/2011-May/062596.html
http://lists.freebsd.org/pipermail/freebsd-stable/2011-June/062949.html
http://lists.freebsd.org/pipermail/freebsd-stable/2011-September/063867.html
http://lists.freebsd.org/pipermail/freebsd-stable/2011-September/063958.html


These threads include some potential patches and possibility of 
commit/MFC... but it isn't clear that there was ever final resolution 
(and MFC to 8-stable).  I've cc'd a few folks from back then.

A real challenge here is the frequency of occurrence. As mentioned, it 
only hit's a fraction of our systems.  When it _does_ hit, the system 
may enter a reboot loop for days and then mysteriously break out of 
it... and thereafter seem to work fine.

I'd be very grateful for any help.  Some questions:

  * Was there ever a final "blessed" patch?
      o if so, will it apply to RELENG_8_3?
  * Is there anything that could be said that might help us with
    reproducing-the-problem / testing / validating-a-fix?


Panic message is --

panic: m_getzone: m_getjcl: invalid cluster type
cpuid = 0
KDB: stack backtrace:
#0 0xc059c717 at kdb_backtrace+0x47
#1 0xc056caf7 at panic+0x117
#2 0xc03c979e at igb_refresh_mbufs+0x25e
#3 0xc03c9f98 at igb_rxeof+0x638
#4 0xc03ca135 at igb_msix_que+0x105
#5 0xc0541e2b at intr_event_execute_handlers+0x13b
#6 0xc05434eb at ithread_loop+0x6b
#7 0xc053efb7 at fork_exit+0x97
#8 0xc0806744 at fork_trampoline+0x8

Thanks very much,

Charles


-- 
Charles Owens
Great Bay Software, Inc.




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?509158BC.7090901>