From owner-freebsd-current@freebsd.org Thu Sep 14 20:23:03 2017 Return-Path: Delivered-To: freebsd-current@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 4A03CE0965B for ; Thu, 14 Sep 2017 20:23:03 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from mail.baldwin.cx (bigwig.baldwin.cx [96.47.65.170]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 219B167D21; Thu, 14 Sep 2017 20:23:02 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from ralph.baldwin.cx (c-73-231-226-104.hsd1.ca.comcast.net [73.231.226.104]) by mail.baldwin.cx (Postfix) with ESMTPSA id CEC2910A7B9; Thu, 14 Sep 2017 16:23:00 -0400 (EDT) From: John Baldwin To: freebsd-current@freebsd.org Cc: Jung-uk Kim , Sean Bruno , dhw@freebsd.org Subject: Re: Panic: @r323525: iflib Date: Thu, 14 Sep 2017 13:22:55 -0700 Message-ID: <3360405.uEp2nAF1Iy@ralph.baldwin.cx> User-Agent: KMail/4.14.10 (FreeBSD/11.1-STABLE; KDE/4.14.30; amd64; ; ) In-Reply-To: References: <20170913131042.GZ1351@albert.catwhisker.org> MIME-Version: 1.0 Content-Transfer-Encoding: 7Bit Content-Type: text/plain; charset="us-ascii" X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.4.3 (mail.baldwin.cx); Thu, 14 Sep 2017 16:23:00 -0400 (EDT) X-Virus-Scanned: clamav-milter 0.99.2 at mail.baldwin.cx X-Virus-Status: Clean X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 14 Sep 2017 20:23:03 -0000 On Wednesday, September 13, 2017 07:08:26 PM Jung-uk Kim wrote: > On 09/13/2017 11:21, Sean Bruno wrote: > >> Previous successful build was: > >> FreeBSD g1-252.catwhisker.org 12.0-CURRENT FreeBSD 12.0-CURRENT #398 r323483M/323489:1200044: Tue Sep 12 04:31:08 PDT 2017 root@g1-252.catwhisker.org:/common/S4/obj/usr/src/sys/CANARY amd64 > >> > >> The usual historical information, including a verbose-boot dmesg.boot > >> from the above-cited build, may be found at > >> . > >> > >> I will try hand-transcribing some of the lock & backtrace info: > >> > >> ... > >> em0: allocated for 1 rx_queues > >> Kernel page fault with the following non-sleepable locks held: > >> exclusive sleep mutex taskqgroup (taskqgroup) r = 0 (0xfffffe07be2e4800) locked @ /usr/src/sys/kern/subr_gtaskqueue.c:803 > >> stack backtrace: [which I am abbreviating at this point -- dhw] > >> #0 ... at witness_debugger+0x73 > >> #1 ... at witness_warn+0x43f > >> #2 ... at trap_pfault+0x53 > >> #3 ... at trap+0x2c5 > >> #4 ... at calltrap+0x8 > >> #5 ... at iflib_device_register+0x2a61 > >> #6 ... at iflib_device_attach+0xb7 > >> #7 ... at device_attach+0x3ee > >> #8 ... at bus_generic_attach+0x5a > >> #9 ... at pci_attach+0xd5 > >> #10 ... at device_attach+0x3ee > >> #11 ... at bus_generic_attach+0x5a > >> #12 ... at acpi_pcib_acpi_attach+0x3bc > >> #13 ... at device_attach+0x3ee > >> #14 ... at bus_generic_attach+0x5a > >> #15 ... at acpi_attach+0xe85 > >> #16 ... at device_attach+0x3ee > >> #17 ... at bus_generic_attach+0x5a > >> > >> Fatal trap 12: page fault while in kernel mode > >> cpuid = 2; apic id = 02 > >> fault virtual address = 0xffffffff8b530c20 > >> fault code = supervisor write data, page not present > >> ... > >> [ thread pid 0 tid 100000 ] > >> Stopped at 0xffffffff80a743b0 = taskqgroup_attach+0x230: orq %rax,-0x 58(%rbp,%xrx,8) > >> > >> I can provide more specific excerpts, but I need to focus on some > >> other activities for a while. > >> > >> Peace, > >> david > >> > > > > > > When you get a chance, let me know what em(4) device is in your machine > > (pciconf -lvbc). I'll see if I have one around here to test. > > FYI, I have very similar panics after the commit. Reverting the commit > from today's head, i.e., r323566 - (r323516 & r323517), fixed the > problem for me. > > em0@pci0:7:0:0: class=0x020000 card=0x115e8086 chip=0x105e8086 rev=0x06 > hdr=0x00 > vendor = 'Intel Corporation' > device = '82571EB Gigabit Ethernet Controller' > class = network > subclass = ethernet > bar [10] = type Memory, range 32, base 0xd0ca0000, size 131072, > enabled > bar [14] = type Memory, range 32, base 0xd0c80000, size 131072, > enabled > bar [18] = type I/O Port, range 32, base 0x2020, size 32, enabled > cap 01[c8] = powerspec 2 supports D0 D3 current D0 > cap 05[d0] = MSI supports 1 message, 64 bit enabled with 1 message > cap 10[e0] = PCI-Express 1 endpoint max data 256(256) NS > link x4(x4) speed 2.5(2.5) ASPM disabled(L0s) > ecap 0001[100] = AER 1 0 fatal 1 non-fatal 0 corrected > ecap 0003[140] = Serial 1 001517ffff51bcba > em1@pci0:7:0:1: class=0x020000 card=0x115e8086 chip=0x105e8086 rev=0x06 > hdr=0x00 > vendor = 'Intel Corporation' > device = '82571EB Gigabit Ethernet Controller' > class = network > subclass = ethernet > bar [10] = type Memory, range 32, base 0xd0c40000, size 131072, > enabled > bar [14] = type Memory, range 32, base 0xd0c20000, size 131072, > enabled > bar [18] = type I/O Port, range 32, base 0x2000, size 32, enabled > cap 01[c8] = powerspec 2 supports D0 D3 current D0 > cap 05[d0] = MSI supports 1 message, 64 bit enabled with 1 message > cap 10[e0] = PCI-Express 1 endpoint max data 256(256) NS > link x4(x4) speed 2.5(2.5) ASPM disabled(L0s) > ecap 0001[100] = AER 1 0 fatal 1 non-fatal 0 corrected > ecap 0003[140] = Serial 1 001517ffff51bcba > > > I'm assuming you do *not* have any iflib or em(4) tuning options set either. > > Nope. I don't get panics, but igb0 and igb1 fail to attach for me on two identical systems: igb0: port 0xe020-0xe03f mem 0xfb220000-0xfb23ffff,0xfb244000-0xfb247fff irq 43 at device 0.0 on pci6 igb0: attach_pre capping queues at 8 igb0: using 1024 tx descriptors and 1024 rx descriptors igb0: msix_init qsets capped at 8 igb0: pxm cpus: 4 queue msgs: 9 admincnt: 1 igb0: trying 4 rx queues 4 tx queues igb0: Using MSIX interrupts with 9 vectors igb0: allocated for 4 tx_queues igb0: allocated for 4 rx_queues taskqgroup_attach_cpu: qid not found for cpu=0 igb0: taskqgroup_attach_cpu failed 22 igb0: Failed to allocate que int 0 err: 22 igb0: IFDI_MSIX_INTR_ASSIGN failed 22 device_attach: igb0 attach returned 22 This is on a quad-core CPU: Intel(R) Xeon(R) CPU E5-1620 v3. It fails both with SMT enabled or disabled in the BIOS. -- John Baldwin