Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 26 Mar 2013 18:46:41 GMT
From:      Allan Jude <freebsd@scaleengine.com>
To:        freebsd-gnats-submit@FreeBSD.org
Subject:   kern/177399: [igb] [panic] Kernel panic in igb(4) at random intervals
Message-ID:  <201303261846.r2QIkf1C021995@red.freebsd.org>
Resent-Message-ID: <201303261850.r2QIo0Hc072910@freefall.freebsd.org>

next in thread | raw e-mail | index | archive | help

>Number:         177399
>Category:       kern
>Synopsis:       [igb] [panic] Kernel panic in igb(4) at random intervals
>Confidential:   no
>Severity:       non-critical
>Priority:       low
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Tue Mar 26 18:50:00 UTC 2013
>Closed-Date:
>Last-Modified:
>Originator:     Allan Jude
>Release:        9.1-RELEASE (also 9.0-RELEASE)
>Organization:
ScaleEngine Inc.
>Environment:
FreeBSD Yankee.HML1.ScaleEngine.net 9.1-RELEASE FreeBSD 9.1-RELEASE #0 r243825: Tue Dec  4 09:23:10 UTC 2012     root@farrell.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC  amd64

FreeBSD Whiskey.HML1.ScaleEngine.net 9.0-RELEASE-p3 FreeBSD 9.0-RELEASE-p3 #0: Tue Jun 12 02:52:29 UTC 2012     root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC  amd64



FreeBSD Tango.HML1.ScaleEngine.net 9.1-RELEASE FreeBSD 9.1-RELEASE #0 r243825: Tue Dec  4 09:23:10 UTC 2012     root@farrell.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC  amd64
>Description:
Yankee (9.1):
System: SuperMicro 6026T-3RF
Motherboard: Super X8DT3-F
NIC: Intel® 82576 Dual-Port Gigabit Ethernet

igb0: <Intel(R) PRO/1000 Network Connection version - 2.3.4> port 0xef80-0xef9f mem 0xfafe0000-0xfaffffff,0xfafc0000-0xfafdffff,0xfaf9c000-0xfaf9ffff irq 28 at device 0.0 on pci7
igb0: Using MSIX interrupts with 9 vectors
igb0: Ethernet address: 00:25:90:49:b7:c4
igb0: Bound queue 0 to cpu 0
igb0: Bound queue 1 to cpu 1
igb0: Bound queue 2 to cpu 2
igb0: Bound queue 3 to cpu 3
igb0: Bound queue 4 to cpu 4
igb0: Bound queue 5 to cpu 5
igb0: Bound queue 6 to cpu 6
igb0: Bound queue 7 to cpu 7
igb1: <Intel(R) PRO/1000 Network Connection version - 2.3.4> port 0xef40-0xef5f mem 0xfaf60000-0xfaf7ffff,0xfaf40000-0xfaf5ffff,0xfaf1c000-0xfaf1ffff irq 40 at device 0.1 on pci7
igb1: Using MSIX interrupts with 9 vectors
igb1: Ethernet address: 00:25:90:49:b7:c5
igb1: Bound queue 0 to cpu 8
igb1: Bound queue 1 to cpu 9
igb1: Bound queue 2 to cpu 10
igb1: Bound queue 3 to cpu 11
igb1: Bound queue 4 to cpu 12
igb1: Bound queue 5 to cpu 13
igb1: Bound queue 6 to cpu 14
igb1: Bound queue 7 to cpu 15

Whiskey (9.0):
System: SuperMicro 6016T-NTRF
Motherboard: Super X8DTU-F
NIC: Intel® 82576 Dual-Port Gigabit Ethernet

igb0: <Intel(R) PRO/1000 Network Connection version - 2.2.5> port 0xec00-0xec1f mem 0xfbde0000-0xfbdfffff,0xfbdc0000-0xfbddffff,0xfbd9c000-0xfbd9ffff irq 28 at device 0.0 on pci1
igb0: Using MSIX interrupts with 9 vectors
igb0: Ethernet address: 00:25:90:69:f0:40
igb1: <Intel(R) PRO/1000 Network Connection version - 2.2.5> port 0xe880-0xe89f mem 0xfbd60000-0xfbd7ffff,0xfbd40000-0xfbd5ffff,0xfbd1c000-0xfbd1ffff irq 40 at device 0.1 on pci1
igb1: Using MSIX interrupts with 9 vectors
igb1: Ethernet address: 00:25:90:69:f0:41

Victor (9.1):
System: SuperMicro 6016T-NTRF
Motherboard: Super X8DTU-F
NIC: Intel® 82576 Dual-Port Gigabit Ethernet

igb0: <Intel(R) PRO/1000 Network Connection version - 2.3.4> port 0xec00-0xec1f mem 0xfbde0000-0xfbdfffff,0xfbdc0000-0xfbddffff,0xfbd9c000-0xfbd9ffff irq 28 at device 0.0 on pci1
igb0: Using MSIX interrupts with 9 vectors
igb0: Ethernet address: 00:25:90:69:ee:44
igb0: Bound queue 0 to cpu 0
igb0: Bound queue 1 to cpu 1
igb0: Bound queue 2 to cpu 2
igb0: Bound queue 3 to cpu 3
igb0: Bound queue 4 to cpu 4
igb0: Bound queue 5 to cpu 5
igb0: Bound queue 6 to cpu 6
igb0: Bound queue 7 to cpu 7
igb1: <Intel(R) PRO/1000 Network Connection version - 2.3.4> port 0xe880-0xe89f mem 0xfbd60000-0xfbd7ffff,0xfbd40000-0xfbd5ffff,0xfbd1c000-0xfbd1ffff irq 40 at device 0.1 on pci1
igb1: Using MSIX interrupts with 9 vectors
igb1: Ethernet address: 00:25:90:69:ee:45
igb1: Bound queue 0 to cpu 8
igb1: Bound queue 1 to cpu 9
igb1: Bound queue 2 to cpu 10
igb1: Bound queue 3 to cpu 11
igb1: Bound queue 4 to cpu 12
igb1: Bound queue 5 to cpu 13
igb1: Bound queue 6 to cpu 14
igb1: Bound queue 7 to cpu 15

Tango (9.1, different nic)
System: SuperMicro 6027R-N3RF4+
Motherboard: Super X9DRW-3LN4F+
NIC: Intel® i350 Quad-Port Gigabit Ethernet Controller

igb0: <Intel(R) PRO/1000 Network Connection version - 2.3.4> port 0x7020-0x703f mem 0xdf920000-0xdf93ffff,0xdf9c4000-0xdf9c7fff irq 27 at device 0.0 on pci4
igb0: Using MSIX interrupts with 9 vectors
igb0: Ethernet address: 00:25:90:78:e4:d4
igb0: Bound queue 0 to cpu 0
igb0: Bound queue 1 to cpu 1
igb0: Bound queue 2 to cpu 2
igb0: Bound queue 3 to cpu 3
igb0: Bound queue 4 to cpu 4
igb0: Bound queue 5 to cpu 5
igb0: Bound queue 6 to cpu 6
igb0: Bound queue 7 to cpu 7
igb1: <Intel(R) PRO/1000 Network Connection version - 2.3.4> port 0x7000-0x701f mem 0xdf900000-0xdf91ffff,0xdf9c0000-0xdf9c3fff irq 30 at device 0.1 on pci4
igb1: Using MSIX interrupts with 9 vectors
igb1: Ethernet address: 00:25:90:78:e4:d5
igb1: Bound queue 0 to cpu 8
igb1: Bound queue 1 to cpu 9
igb1: Bound queue 2 to cpu 10
igb1: Bound queue 3 to cpu 11
igb1: Bound queue 4 to cpu 12
igb1: Bound queue 5 to cpu 13
igb1: Bound queue 6 to cpu 14
igb1: Bound queue 7 to cpu 15
igb2: <Intel(R) PRO/1000 Network Connection version - 2.3.4> mem 0xfbe20000-0xfbe3ffff,0xfbec4000-0xfbec7fff irq 52 at device 0.0 on pci129
igb2: Using MSIX interrupts with 9 vectors
igb2: Ethernet address: 00:25:90:78:e4:d7
igb2: Bound queue 0 to cpu 16
igb2: Bound queue 1 to cpu 17
igb2: Bound queue 2 to cpu 18
igb2: Bound queue 3 to cpu 19
igb2: Bound queue 4 to cpu 20
igb2: Bound queue 5 to cpu 21
igb2: Bound queue 6 to cpu 22
igb2: Bound queue 7 to cpu 23
igb3: <Intel(R) PRO/1000 Network Connection version - 2.3.4> mem 0xfbe00000-0xfbe1ffff,0xfbec0000-0xfbec3fff irq 50 at device 0.3 on pci129
igb3: Using MSIX interrupts with 9 vectors
igb3: Ethernet address: 00:25:90:78:e4:d6
igb3: Bound queue 0 to cpu 0
igb3: Bound queue 1 to cpu 1
igb3: Bound queue 2 to cpu 2
igb3: Bound queue 3 to cpu 3
igb3: Bound queue 4 to cpu 4
igb3: Bound queue 5 to cpu 5
igb3: Bound queue 6 to cpu 6
igb3: Bound queue 7 to cpu 7


I have a number of Supermicro servers and a number of them unexpectedly reboot every 5-15 days, seemingly at random. The machines each have 96GB of ram
I have only managed to catch a complete crash message once, and a few partials:


Fatal trap 12: page fault while in kernel mode
cpuid = 3; apic id = 03
fault virtual address   = 0x0
fault code              = supervisor read data, page not present
instruction pointer     = 0x20:0xffffffff80a84414
stack pointer           = 0x28:0xffffff9a3d420650
frame pointer           = 0x28:0xffffff9a3d4206e0
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 12 (irq271: igb0:que 3)
trap number             = 12
panic: page fault
cpuid = 3
KDB: stack backtrace:
#0 0xffffffff809208a6 at kdb_backtrace+0x66
#1 0xffffffff808ea8be at panic+0x1ce
#2 0xffffffff80bd8240 at trap_fatal+0x290
#3 0xffffffff80bd857d at trap_pfault+0x1ed
#4 0xffffffff80bd8b9e at trap+0x3ce
#5 0xffffffff80bc315f at calltrap+0x8
#6 0xffffffff80a861d5 at udp_input+0x475
#7 0xffffffff80a043dc at ip_input+0xac
#8 0xffffffff809adafb at netisr_dispatch_src+0x20b
#9 0xffffffff809a35cd at ether_demux+0x14d
#10 0xffffffff809a38a4 at ether_nh_input+0x1f4
#11 0xffffffff809adafb at netisr_dispatch_src+0x20b
#12 0xffffffff804c525c at igb_rxeof+0x3fc
#13 0xffffffff804c97f4 at igb_msix_que+0xe4
#14 0xffffffff808be8d4 at intr_event_execute_handlers+0x104
#15 0xffffffff808c0076 at ithread_loop+0xa6
#16 0xffffffff808bb9ef at fork_exit+0x11f
#17 0xffffffff80bc368e at fork_trampoline+0xe
Uptime: 38d11h42m48s




Feb  3 16:37:25 Yankee kernel: Fatal trap 12: page fault while in kernel mode
Feb  3 16:37:25 Yankee kernel: cpuid = 2; apic id = 02
Feb  3 16:37:25 Yankee kernel: fault virtual address    = 0x0
Feb  3 16:37:25 Yankee kernel: fault code               = supervisor read data, page not present
Feb  3 16:37:25 Yankee kernel: instruction pointer      = 0x20:0xffffffff809c6304
Feb  3 16:37:25 Yankee kernel: stack pointer            = 0x28:0xffffff9a4077d7a0
Feb  3 16:37:25 Yankee kernel: frame pointer            = 0x28:0xffffff9a4077d830
Feb  3 16:37:25 Yankee kernel: code segment             = base 0x0, limit 0xfffff, type 0x1b
Feb  3 16:37:25 Yankee kernel: = DPL 0, pres 1, long 1, def32 0, gran 1




Feb 27 00:37:24 Victor kernel: Fatal trap 12: page fault while in kernel mode
Feb 27 00:37:24 Victor kernel: cpuid = 7; apic id = 15
Feb 27 00:37:24 Victor kernel: fault virtual address    = 0x0
Feb 27 00:37:24 Victor kernel: fault code               = supervisor read data, page not present
Feb 27 00:37:24 Victor kernel: instruction pointer      = 0x20:0xffffffff80a84414
Feb 27 00:37:24 Victor kernel: stack pointer            = 0x28:0xffffff9a40810790
Feb 27 00:37:24 Victor kernel: frame pointer            = 0x28:0xffffff9a40810820
Feb 27 00:37:24 Victor kernel: code segment             = base 0x0, limit 0xfffff, type 0x1b
Feb 27 00:37:24 Victor kernel: = DPL 0, pres 1, long 1, def32 0, gran 1
Feb 27 00:37:24 Victor kernel: processor eflags = interrupt enabled, resume, IOPL = 0
Feb 27 00:37:24 Victor kernel: current process          = 12 (irq263: igb0:que 7)



The problem seems to be with the igb(4) driver. I have repartitioned the Whiskey machine with a 100gb dump device in hopes of capturing a proper crash dump.

Possibly related:

http://lists.freebsd.org/pipermail/freebsd-net/2012-September/033192.html
http://lists.freebsd.org/pipermail/freebsd-net/2012-September/033193.html
http://lists.freebsd.org/pipermail/freebsd-current/2012-November/037968.html
http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/172113



>How-To-Repeat:
Use igb(4) based NIC on various SuperMicro motherboards, after a while kernel will panic. Servers are under different levels of load, no real indication of a 'trigger'
>Fix:


>Release-Note:
>Audit-Trail:
>Unformatted:



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201303261846.r2QIkf1C021995>