Date: Tue, 26 Mar 2013 18:46:41 GMT From: Allan Jude <freebsd@scaleengine.com> To: freebsd-gnats-submit@FreeBSD.org Subject: kern/177399: [igb] [panic] Kernel panic in igb(4) at random intervals Message-ID: <201303261846.r2QIkf1C021995@red.freebsd.org> Resent-Message-ID: <201303261850.r2QIo0Hc072910@freefall.freebsd.org>
next in thread | raw e-mail | index | archive | help
>Number: 177399 >Category: kern >Synopsis: [igb] [panic] Kernel panic in igb(4) at random intervals >Confidential: no >Severity: non-critical >Priority: low >Responsible: freebsd-bugs >State: open >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Tue Mar 26 18:50:00 UTC 2013 >Closed-Date: >Last-Modified: >Originator: Allan Jude >Release: 9.1-RELEASE (also 9.0-RELEASE) >Organization: ScaleEngine Inc. >Environment: FreeBSD Yankee.HML1.ScaleEngine.net 9.1-RELEASE FreeBSD 9.1-RELEASE #0 r243825: Tue Dec 4 09:23:10 UTC 2012 root@farrell.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC amd64 FreeBSD Whiskey.HML1.ScaleEngine.net 9.0-RELEASE-p3 FreeBSD 9.0-RELEASE-p3 #0: Tue Jun 12 02:52:29 UTC 2012 root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC amd64 FreeBSD Tango.HML1.ScaleEngine.net 9.1-RELEASE FreeBSD 9.1-RELEASE #0 r243825: Tue Dec 4 09:23:10 UTC 2012 root@farrell.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC amd64 >Description: Yankee (9.1): System: SuperMicro 6026T-3RF Motherboard: Super X8DT3-F NIC: Intel® 82576 Dual-Port Gigabit Ethernet igb0: <Intel(R) PRO/1000 Network Connection version - 2.3.4> port 0xef80-0xef9f mem 0xfafe0000-0xfaffffff,0xfafc0000-0xfafdffff,0xfaf9c000-0xfaf9ffff irq 28 at device 0.0 on pci7 igb0: Using MSIX interrupts with 9 vectors igb0: Ethernet address: 00:25:90:49:b7:c4 igb0: Bound queue 0 to cpu 0 igb0: Bound queue 1 to cpu 1 igb0: Bound queue 2 to cpu 2 igb0: Bound queue 3 to cpu 3 igb0: Bound queue 4 to cpu 4 igb0: Bound queue 5 to cpu 5 igb0: Bound queue 6 to cpu 6 igb0: Bound queue 7 to cpu 7 igb1: <Intel(R) PRO/1000 Network Connection version - 2.3.4> port 0xef40-0xef5f mem 0xfaf60000-0xfaf7ffff,0xfaf40000-0xfaf5ffff,0xfaf1c000-0xfaf1ffff irq 40 at device 0.1 on pci7 igb1: Using MSIX interrupts with 9 vectors igb1: Ethernet address: 00:25:90:49:b7:c5 igb1: Bound queue 0 to cpu 8 igb1: Bound queue 1 to cpu 9 igb1: Bound queue 2 to cpu 10 igb1: Bound queue 3 to cpu 11 igb1: Bound queue 4 to cpu 12 igb1: Bound queue 5 to cpu 13 igb1: Bound queue 6 to cpu 14 igb1: Bound queue 7 to cpu 15 Whiskey (9.0): System: SuperMicro 6016T-NTRF Motherboard: Super X8DTU-F NIC: Intel® 82576 Dual-Port Gigabit Ethernet igb0: <Intel(R) PRO/1000 Network Connection version - 2.2.5> port 0xec00-0xec1f mem 0xfbde0000-0xfbdfffff,0xfbdc0000-0xfbddffff,0xfbd9c000-0xfbd9ffff irq 28 at device 0.0 on pci1 igb0: Using MSIX interrupts with 9 vectors igb0: Ethernet address: 00:25:90:69:f0:40 igb1: <Intel(R) PRO/1000 Network Connection version - 2.2.5> port 0xe880-0xe89f mem 0xfbd60000-0xfbd7ffff,0xfbd40000-0xfbd5ffff,0xfbd1c000-0xfbd1ffff irq 40 at device 0.1 on pci1 igb1: Using MSIX interrupts with 9 vectors igb1: Ethernet address: 00:25:90:69:f0:41 Victor (9.1): System: SuperMicro 6016T-NTRF Motherboard: Super X8DTU-F NIC: Intel® 82576 Dual-Port Gigabit Ethernet igb0: <Intel(R) PRO/1000 Network Connection version - 2.3.4> port 0xec00-0xec1f mem 0xfbde0000-0xfbdfffff,0xfbdc0000-0xfbddffff,0xfbd9c000-0xfbd9ffff irq 28 at device 0.0 on pci1 igb0: Using MSIX interrupts with 9 vectors igb0: Ethernet address: 00:25:90:69:ee:44 igb0: Bound queue 0 to cpu 0 igb0: Bound queue 1 to cpu 1 igb0: Bound queue 2 to cpu 2 igb0: Bound queue 3 to cpu 3 igb0: Bound queue 4 to cpu 4 igb0: Bound queue 5 to cpu 5 igb0: Bound queue 6 to cpu 6 igb0: Bound queue 7 to cpu 7 igb1: <Intel(R) PRO/1000 Network Connection version - 2.3.4> port 0xe880-0xe89f mem 0xfbd60000-0xfbd7ffff,0xfbd40000-0xfbd5ffff,0xfbd1c000-0xfbd1ffff irq 40 at device 0.1 on pci1 igb1: Using MSIX interrupts with 9 vectors igb1: Ethernet address: 00:25:90:69:ee:45 igb1: Bound queue 0 to cpu 8 igb1: Bound queue 1 to cpu 9 igb1: Bound queue 2 to cpu 10 igb1: Bound queue 3 to cpu 11 igb1: Bound queue 4 to cpu 12 igb1: Bound queue 5 to cpu 13 igb1: Bound queue 6 to cpu 14 igb1: Bound queue 7 to cpu 15 Tango (9.1, different nic) System: SuperMicro 6027R-N3RF4+ Motherboard: Super X9DRW-3LN4F+ NIC: Intel® i350 Quad-Port Gigabit Ethernet Controller igb0: <Intel(R) PRO/1000 Network Connection version - 2.3.4> port 0x7020-0x703f mem 0xdf920000-0xdf93ffff,0xdf9c4000-0xdf9c7fff irq 27 at device 0.0 on pci4 igb0: Using MSIX interrupts with 9 vectors igb0: Ethernet address: 00:25:90:78:e4:d4 igb0: Bound queue 0 to cpu 0 igb0: Bound queue 1 to cpu 1 igb0: Bound queue 2 to cpu 2 igb0: Bound queue 3 to cpu 3 igb0: Bound queue 4 to cpu 4 igb0: Bound queue 5 to cpu 5 igb0: Bound queue 6 to cpu 6 igb0: Bound queue 7 to cpu 7 igb1: <Intel(R) PRO/1000 Network Connection version - 2.3.4> port 0x7000-0x701f mem 0xdf900000-0xdf91ffff,0xdf9c0000-0xdf9c3fff irq 30 at device 0.1 on pci4 igb1: Using MSIX interrupts with 9 vectors igb1: Ethernet address: 00:25:90:78:e4:d5 igb1: Bound queue 0 to cpu 8 igb1: Bound queue 1 to cpu 9 igb1: Bound queue 2 to cpu 10 igb1: Bound queue 3 to cpu 11 igb1: Bound queue 4 to cpu 12 igb1: Bound queue 5 to cpu 13 igb1: Bound queue 6 to cpu 14 igb1: Bound queue 7 to cpu 15 igb2: <Intel(R) PRO/1000 Network Connection version - 2.3.4> mem 0xfbe20000-0xfbe3ffff,0xfbec4000-0xfbec7fff irq 52 at device 0.0 on pci129 igb2: Using MSIX interrupts with 9 vectors igb2: Ethernet address: 00:25:90:78:e4:d7 igb2: Bound queue 0 to cpu 16 igb2: Bound queue 1 to cpu 17 igb2: Bound queue 2 to cpu 18 igb2: Bound queue 3 to cpu 19 igb2: Bound queue 4 to cpu 20 igb2: Bound queue 5 to cpu 21 igb2: Bound queue 6 to cpu 22 igb2: Bound queue 7 to cpu 23 igb3: <Intel(R) PRO/1000 Network Connection version - 2.3.4> mem 0xfbe00000-0xfbe1ffff,0xfbec0000-0xfbec3fff irq 50 at device 0.3 on pci129 igb3: Using MSIX interrupts with 9 vectors igb3: Ethernet address: 00:25:90:78:e4:d6 igb3: Bound queue 0 to cpu 0 igb3: Bound queue 1 to cpu 1 igb3: Bound queue 2 to cpu 2 igb3: Bound queue 3 to cpu 3 igb3: Bound queue 4 to cpu 4 igb3: Bound queue 5 to cpu 5 igb3: Bound queue 6 to cpu 6 igb3: Bound queue 7 to cpu 7 I have a number of Supermicro servers and a number of them unexpectedly reboot every 5-15 days, seemingly at random. The machines each have 96GB of ram I have only managed to catch a complete crash message once, and a few partials: Fatal trap 12: page fault while in kernel mode cpuid = 3; apic id = 03 fault virtual address = 0x0 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff80a84414 stack pointer = 0x28:0xffffff9a3d420650 frame pointer = 0x28:0xffffff9a3d4206e0 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 12 (irq271: igb0:que 3) trap number = 12 panic: page fault cpuid = 3 KDB: stack backtrace: #0 0xffffffff809208a6 at kdb_backtrace+0x66 #1 0xffffffff808ea8be at panic+0x1ce #2 0xffffffff80bd8240 at trap_fatal+0x290 #3 0xffffffff80bd857d at trap_pfault+0x1ed #4 0xffffffff80bd8b9e at trap+0x3ce #5 0xffffffff80bc315f at calltrap+0x8 #6 0xffffffff80a861d5 at udp_input+0x475 #7 0xffffffff80a043dc at ip_input+0xac #8 0xffffffff809adafb at netisr_dispatch_src+0x20b #9 0xffffffff809a35cd at ether_demux+0x14d #10 0xffffffff809a38a4 at ether_nh_input+0x1f4 #11 0xffffffff809adafb at netisr_dispatch_src+0x20b #12 0xffffffff804c525c at igb_rxeof+0x3fc #13 0xffffffff804c97f4 at igb_msix_que+0xe4 #14 0xffffffff808be8d4 at intr_event_execute_handlers+0x104 #15 0xffffffff808c0076 at ithread_loop+0xa6 #16 0xffffffff808bb9ef at fork_exit+0x11f #17 0xffffffff80bc368e at fork_trampoline+0xe Uptime: 38d11h42m48s Feb 3 16:37:25 Yankee kernel: Fatal trap 12: page fault while in kernel mode Feb 3 16:37:25 Yankee kernel: cpuid = 2; apic id = 02 Feb 3 16:37:25 Yankee kernel: fault virtual address = 0x0 Feb 3 16:37:25 Yankee kernel: fault code = supervisor read data, page not present Feb 3 16:37:25 Yankee kernel: instruction pointer = 0x20:0xffffffff809c6304 Feb 3 16:37:25 Yankee kernel: stack pointer = 0x28:0xffffff9a4077d7a0 Feb 3 16:37:25 Yankee kernel: frame pointer = 0x28:0xffffff9a4077d830 Feb 3 16:37:25 Yankee kernel: code segment = base 0x0, limit 0xfffff, type 0x1b Feb 3 16:37:25 Yankee kernel: = DPL 0, pres 1, long 1, def32 0, gran 1 Feb 27 00:37:24 Victor kernel: Fatal trap 12: page fault while in kernel mode Feb 27 00:37:24 Victor kernel: cpuid = 7; apic id = 15 Feb 27 00:37:24 Victor kernel: fault virtual address = 0x0 Feb 27 00:37:24 Victor kernel: fault code = supervisor read data, page not present Feb 27 00:37:24 Victor kernel: instruction pointer = 0x20:0xffffffff80a84414 Feb 27 00:37:24 Victor kernel: stack pointer = 0x28:0xffffff9a40810790 Feb 27 00:37:24 Victor kernel: frame pointer = 0x28:0xffffff9a40810820 Feb 27 00:37:24 Victor kernel: code segment = base 0x0, limit 0xfffff, type 0x1b Feb 27 00:37:24 Victor kernel: = DPL 0, pres 1, long 1, def32 0, gran 1 Feb 27 00:37:24 Victor kernel: processor eflags = interrupt enabled, resume, IOPL = 0 Feb 27 00:37:24 Victor kernel: current process = 12 (irq263: igb0:que 7) The problem seems to be with the igb(4) driver. I have repartitioned the Whiskey machine with a 100gb dump device in hopes of capturing a proper crash dump. Possibly related: http://lists.freebsd.org/pipermail/freebsd-net/2012-September/033192.html http://lists.freebsd.org/pipermail/freebsd-net/2012-September/033193.html http://lists.freebsd.org/pipermail/freebsd-current/2012-November/037968.html http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/172113 >How-To-Repeat: Use igb(4) based NIC on various SuperMicro motherboards, after a while kernel will panic. Servers are under different levels of load, no real indication of a 'trigger' >Fix: >Release-Note: >Audit-Trail: >Unformatted:
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201303261846.r2QIkf1C021995>