Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 14 Sep 2017 13:22:55 -0700
From:      John Baldwin <jhb@freebsd.org>
To:        freebsd-current@freebsd.org
Cc:        Jung-uk Kim <jkim@freebsd.org>, Sean Bruno <sbruno@freebsd.org>, dhw@freebsd.org
Subject:   Re: Panic: @r323525: iflib
Message-ID:  <3360405.uEp2nAF1Iy@ralph.baldwin.cx>
In-Reply-To: <a268f351-cec4-ca98-28a2-d35561463f8f@FreeBSD.org>
References:  <20170913131042.GZ1351@albert.catwhisker.org> <dd34f001-cafb-6fc0-e21d-9fc100661358@freebsd.org> <a268f351-cec4-ca98-28a2-d35561463f8f@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wednesday, September 13, 2017 07:08:26 PM Jung-uk Kim wrote:
> On 09/13/2017 11:21, Sean Bruno wrote:
> >> Previous successful build was:
> >> FreeBSD g1-252.catwhisker.org 12.0-CURRENT FreeBSD 12.0-CURRENT #398  r323483M/323489:1200044: Tue Sep 12 04:31:08 PDT 2017     root@g1-252.catwhisker.org:/common/S4/obj/usr/src/sys/CANARY  amd64
> >>
> >> The usual historical information, including a verbose-boot dmesg.boot
> >> from the above-cited build, may be found at
> >> <http://www.catwhisker.org/~david/FreeBSD/history/>.
> >>
> >> I will try hand-transcribing some of the lock & backtrace info:
> >>
> >> ...
> >> em0: allocated for 1 rx_queues
> >> Kernel page fault with the following non-sleepable locks held:
> >> exclusive sleep mutex taskqgroup (taskqgroup) r = 0 (0xfffffe07be2e4800) locked @ /usr/src/sys/kern/subr_gtaskqueue.c:803
> >> stack backtrace:  [which I am abbreviating at this point -- dhw]
> >> #0 ... at witness_debugger+0x73
> >> #1 ... at witness_warn+0x43f
> >> #2 ... at trap_pfault+0x53
> >> #3 ... at trap+0x2c5
> >> #4 ... at calltrap+0x8
> >> #5 ... at iflib_device_register+0x2a61
> >> #6 ... at iflib_device_attach+0xb7
> >> #7 ... at device_attach+0x3ee
> >> #8 ... at bus_generic_attach+0x5a
> >> #9 ... at pci_attach+0xd5
> >> #10 ... at device_attach+0x3ee
> >> #11 ... at bus_generic_attach+0x5a
> >> #12 ... at acpi_pcib_acpi_attach+0x3bc
> >> #13 ... at device_attach+0x3ee
> >> #14 ... at bus_generic_attach+0x5a
> >> #15 ... at acpi_attach+0xe85
> >> #16 ... at device_attach+0x3ee
> >> #17 ... at bus_generic_attach+0x5a
> >>
> >> Fatal trap 12: page fault while in kernel mode
> >> cpuid = 2; apic id = 02
> >> fault virtual address   = 0xffffffff8b530c20
> >> fault code              = supervisor write data, page not present
> >> ...
> >> [ thread pid 0 tid 100000 ]
> >> Stopped at      0xffffffff80a743b0 = taskqgroup_attach+0x230:    orq   %rax,-0x 58(%rbp,%xrx,8)
> >>
> >> I can provide more specific excerpts, but I need to focus on some
> >> other activities for a while.
> >>
> >> Peace,
> >> david
> >>
> > 
> > 
> > When you get a chance, let me know what em(4) device is in your machine
> > (pciconf -lvbc).  I'll see if I have one around here to test.
> 
> FYI, I have very similar panics after the commit.  Reverting the commit
> from today's head, i.e., r323566 - (r323516 & r323517), fixed the
> problem for me.
> 
> em0@pci0:7:0:0:	class=0x020000 card=0x115e8086 chip=0x105e8086 rev=0x06
> hdr=0x00
>     vendor     = 'Intel Corporation'
>     device     = '82571EB Gigabit Ethernet Controller'
>     class      = network
>     subclass   = ethernet
>     bar   [10] = type Memory, range 32, base 0xd0ca0000, size 131072,
> enabled
>     bar   [14] = type Memory, range 32, base 0xd0c80000, size 131072,
> enabled
>     bar   [18] = type I/O Port, range 32, base 0x2020, size 32, enabled
>     cap 01[c8] = powerspec 2  supports D0 D3  current D0
>     cap 05[d0] = MSI supports 1 message, 64 bit enabled with 1 message
>     cap 10[e0] = PCI-Express 1 endpoint max data 256(256) NS
>                  link x4(x4) speed 2.5(2.5) ASPM disabled(L0s)
>     ecap 0001[100] = AER 1 0 fatal 1 non-fatal 0 corrected
>     ecap 0003[140] = Serial 1 001517ffff51bcba
> em1@pci0:7:0:1:	class=0x020000 card=0x115e8086 chip=0x105e8086 rev=0x06
> hdr=0x00
>     vendor     = 'Intel Corporation'
>     device     = '82571EB Gigabit Ethernet Controller'
>     class      = network
>     subclass   = ethernet
>     bar   [10] = type Memory, range 32, base 0xd0c40000, size 131072,
> enabled
>     bar   [14] = type Memory, range 32, base 0xd0c20000, size 131072,
> enabled
>     bar   [18] = type I/O Port, range 32, base 0x2000, size 32, enabled
>     cap 01[c8] = powerspec 2  supports D0 D3  current D0
>     cap 05[d0] = MSI supports 1 message, 64 bit enabled with 1 message
>     cap 10[e0] = PCI-Express 1 endpoint max data 256(256) NS
>                  link x4(x4) speed 2.5(2.5) ASPM disabled(L0s)
>     ecap 0001[100] = AER 1 0 fatal 1 non-fatal 0 corrected
>     ecap 0003[140] = Serial 1 001517ffff51bcba
> 
> > I'm assuming you do *not* have any iflib or em(4) tuning options set either.
> 
> Nope.

I don't get panics, but igb0 and igb1 fail to attach for me on two identical
systems:

igb0: <Intel(R) PRO/1000 PCI-Express Network Driver> port 0xe020-0xe03f mem 0xfb220000-0xfb23ffff,0xfb244000-0xfb247fff irq 43 at device 0.0 on pci6            
igb0: attach_pre capping queues at 8                                            
igb0: using 1024 tx descriptors and 1024 rx descriptors                         
igb0: msix_init qsets capped at 8                                               
igb0: pxm cpus: 4 queue msgs: 9 admincnt: 1                                     
igb0: trying 4 rx queues 4 tx queues                                            
igb0: Using MSIX interrupts with 9 vectors                                      
igb0: allocated for 4 tx_queues                                                 
igb0: allocated for 4 rx_queues                                                 
taskqgroup_attach_cpu: qid not found for cpu=0                                  
igb0: taskqgroup_attach_cpu failed 22                                           
igb0: Failed to allocate que int 0 err: 22                                      
igb0: IFDI_MSIX_INTR_ASSIGN failed 22                                           
device_attach: igb0 attach returned 22                                          

This is on a quad-core CPU: Intel(R) Xeon(R) CPU E5-1620 v3.  It fails both
with SMT enabled or disabled in the BIOS.

-- 
John Baldwin



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3360405.uEp2nAF1Iy>