Date: Thu, 14 Sep 2017 13:22:55 -0700 From: John Baldwin <jhb@freebsd.org> To: freebsd-current@freebsd.org Cc: Jung-uk Kim <jkim@freebsd.org>, Sean Bruno <sbruno@freebsd.org>, dhw@freebsd.org Subject: Re: Panic: @r323525: iflib Message-ID: <3360405.uEp2nAF1Iy@ralph.baldwin.cx> In-Reply-To: <a268f351-cec4-ca98-28a2-d35561463f8f@FreeBSD.org> References: <20170913131042.GZ1351@albert.catwhisker.org> <dd34f001-cafb-6fc0-e21d-9fc100661358@freebsd.org> <a268f351-cec4-ca98-28a2-d35561463f8f@FreeBSD.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Wednesday, September 13, 2017 07:08:26 PM Jung-uk Kim wrote: > On 09/13/2017 11:21, Sean Bruno wrote: > >> Previous successful build was: > >> FreeBSD g1-252.catwhisker.org 12.0-CURRENT FreeBSD 12.0-CURRENT #398 r323483M/323489:1200044: Tue Sep 12 04:31:08 PDT 2017 root@g1-252.catwhisker.org:/common/S4/obj/usr/src/sys/CANARY amd64 > >> > >> The usual historical information, including a verbose-boot dmesg.boot > >> from the above-cited build, may be found at > >> <http://www.catwhisker.org/~david/FreeBSD/history/>. > >> > >> I will try hand-transcribing some of the lock & backtrace info: > >> > >> ... > >> em0: allocated for 1 rx_queues > >> Kernel page fault with the following non-sleepable locks held: > >> exclusive sleep mutex taskqgroup (taskqgroup) r = 0 (0xfffffe07be2e4800) locked @ /usr/src/sys/kern/subr_gtaskqueue.c:803 > >> stack backtrace: [which I am abbreviating at this point -- dhw] > >> #0 ... at witness_debugger+0x73 > >> #1 ... at witness_warn+0x43f > >> #2 ... at trap_pfault+0x53 > >> #3 ... at trap+0x2c5 > >> #4 ... at calltrap+0x8 > >> #5 ... at iflib_device_register+0x2a61 > >> #6 ... at iflib_device_attach+0xb7 > >> #7 ... at device_attach+0x3ee > >> #8 ... at bus_generic_attach+0x5a > >> #9 ... at pci_attach+0xd5 > >> #10 ... at device_attach+0x3ee > >> #11 ... at bus_generic_attach+0x5a > >> #12 ... at acpi_pcib_acpi_attach+0x3bc > >> #13 ... at device_attach+0x3ee > >> #14 ... at bus_generic_attach+0x5a > >> #15 ... at acpi_attach+0xe85 > >> #16 ... at device_attach+0x3ee > >> #17 ... at bus_generic_attach+0x5a > >> > >> Fatal trap 12: page fault while in kernel mode > >> cpuid = 2; apic id = 02 > >> fault virtual address = 0xffffffff8b530c20 > >> fault code = supervisor write data, page not present > >> ... > >> [ thread pid 0 tid 100000 ] > >> Stopped at 0xffffffff80a743b0 = taskqgroup_attach+0x230: orq %rax,-0x 58(%rbp,%xrx,8) > >> > >> I can provide more specific excerpts, but I need to focus on some > >> other activities for a while. > >> > >> Peace, > >> david > >> > > > > > > When you get a chance, let me know what em(4) device is in your machine > > (pciconf -lvbc). I'll see if I have one around here to test. > > FYI, I have very similar panics after the commit. Reverting the commit > from today's head, i.e., r323566 - (r323516 & r323517), fixed the > problem for me. > > em0@pci0:7:0:0: class=0x020000 card=0x115e8086 chip=0x105e8086 rev=0x06 > hdr=0x00 > vendor = 'Intel Corporation' > device = '82571EB Gigabit Ethernet Controller' > class = network > subclass = ethernet > bar [10] = type Memory, range 32, base 0xd0ca0000, size 131072, > enabled > bar [14] = type Memory, range 32, base 0xd0c80000, size 131072, > enabled > bar [18] = type I/O Port, range 32, base 0x2020, size 32, enabled > cap 01[c8] = powerspec 2 supports D0 D3 current D0 > cap 05[d0] = MSI supports 1 message, 64 bit enabled with 1 message > cap 10[e0] = PCI-Express 1 endpoint max data 256(256) NS > link x4(x4) speed 2.5(2.5) ASPM disabled(L0s) > ecap 0001[100] = AER 1 0 fatal 1 non-fatal 0 corrected > ecap 0003[140] = Serial 1 001517ffff51bcba > em1@pci0:7:0:1: class=0x020000 card=0x115e8086 chip=0x105e8086 rev=0x06 > hdr=0x00 > vendor = 'Intel Corporation' > device = '82571EB Gigabit Ethernet Controller' > class = network > subclass = ethernet > bar [10] = type Memory, range 32, base 0xd0c40000, size 131072, > enabled > bar [14] = type Memory, range 32, base 0xd0c20000, size 131072, > enabled > bar [18] = type I/O Port, range 32, base 0x2000, size 32, enabled > cap 01[c8] = powerspec 2 supports D0 D3 current D0 > cap 05[d0] = MSI supports 1 message, 64 bit enabled with 1 message > cap 10[e0] = PCI-Express 1 endpoint max data 256(256) NS > link x4(x4) speed 2.5(2.5) ASPM disabled(L0s) > ecap 0001[100] = AER 1 0 fatal 1 non-fatal 0 corrected > ecap 0003[140] = Serial 1 001517ffff51bcba > > > I'm assuming you do *not* have any iflib or em(4) tuning options set either. > > Nope. I don't get panics, but igb0 and igb1 fail to attach for me on two identical systems: igb0: <Intel(R) PRO/1000 PCI-Express Network Driver> port 0xe020-0xe03f mem 0xfb220000-0xfb23ffff,0xfb244000-0xfb247fff irq 43 at device 0.0 on pci6 igb0: attach_pre capping queues at 8 igb0: using 1024 tx descriptors and 1024 rx descriptors igb0: msix_init qsets capped at 8 igb0: pxm cpus: 4 queue msgs: 9 admincnt: 1 igb0: trying 4 rx queues 4 tx queues igb0: Using MSIX interrupts with 9 vectors igb0: allocated for 4 tx_queues igb0: allocated for 4 rx_queues taskqgroup_attach_cpu: qid not found for cpu=0 igb0: taskqgroup_attach_cpu failed 22 igb0: Failed to allocate que int 0 err: 22 igb0: IFDI_MSIX_INTR_ASSIGN failed 22 device_attach: igb0 attach returned 22 This is on a quad-core CPU: Intel(R) Xeon(R) CPU E5-1620 v3. It fails both with SMT enabled or disabled in the BIOS. -- John Baldwin
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3360405.uEp2nAF1Iy>