Date: Mon, 22 Mar 2021 03:02:50 +0000 From: bugzilla-noreply@freebsd.org To: bugs@FreeBSD.org Subject: [Bug 254474] mlx4 causes kernel panic at boot if compiled into the kernel Message-ID: <bug-254474-227@https.bugs.freebsd.org/bugzilla/>
next in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D254474 Bug ID: 254474 Summary: mlx4 causes kernel panic at boot if compiled into the kernel Product: Base System Version: 13.0-STABLE Hardware: amd64 OS: Any Status: New Keywords: panic Severity: Affects Some People Priority: --- Component: kern Assignee: bugs@FreeBSD.org Reporter: matsuo.hiroshi.39@gmail.com In order to try 13-RC3 on my box with Mellanox ConnectX-2 card, I checked out 13.0 branch and made a KERNCONF file from GENERIC added and removed a few lines with reference to FreeBSD Infiniband Wiki. This kernel ran into panic at boot time. On the other hand I have confirmed= =20 that both 13-RC3 GENERIC kernel (and mlx4 drivers compiled as module) 12.2 custom kernel and mlx4 drivers not as module work correctly. I don't know why mlx4 drivers compiled into 13.0 kernel causes panic. ---<<BOOT>>--- Copyright (c) 1992-2021 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 13.0-RC3 #1: Mon Mar 22 18:37:58 JST 2021 matsuo@build:/usr/obj/usr/src/amd64.amd64/sys/MICROSERVER-PR amd64 FreeBSD clang version 11.0.1 (git@github.com:llvm/llvm-project.git llvmorg-11.0.1-0-g43ff75f2c3fe) VT(vga): resolution 640x480 CPU: AMD Turion(tm) II Neo N54L Dual-Core Processor (2196.39-MHz K8-class C= PU) Origin=3D"AuthenticAMD" Id=3D0x100f63 Family=3D0x10 Model=3D0x6 Stepp= ing=3D3 =20 Features=3D0x178bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,= MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT> Features2=3D0x802009<SSE3,MON,CX16,POPCNT> AMD Features=3D0xee500800<SYSCALL,NX,MMX+,FFXSR,Page1GB,RDTSCP,LM,3DNow!+,3DNow= !> AMD Features2=3D0x837ff<LAHF,CMP,SVM,ExtAPIC,CR8,ABM,SSE4A,MAS,Prefetch,OSVW,IB= S,SKINIT,WDT,NodeId> SVM: NP,NRIP,NAsids=3D64 TSC: P-state invariant real memory =3D 8589934592 (8192 MB) avail memory =3D 8249397248 (7867 MB) Event timer "LAPIC" quality 100 ACPI APIC Table: <HP ProLiant> FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs FreeBSD/SMP: 1 package(s) x 2 core(s) random: unblocking device. Firmware Warning (ACPI): 32/64X length mismatch in FADT/Gpe0Block: 64/32 (20201113/tbfadt-748) ioapic0 <Version 2.1> irqs 0-23 Launching APs: 1 Timecounter "TSC-low" frequency 1098192980 Hz quality 800 KTLS: Initialized 2 threads random: entropy device external interface [ath_hal] loaded WARNING: Device "kbd" is Giant locked and may be deleted before FreeBSD 14.= 0. kbd1 at kbdmux0 000.000052 [4350] netmap_init netmap: loaded module nexus0 vtvga0: <VT VGA driver> cryptosoft0: <software crypto> aesni0: No AES or SHA support. acpi0: <HP ProLiant> acpi0: Power Button (fixed) acpi0: _OSC failed: AE_BUFFER_OVERFLOW cpu0: <ACPI CPU> on acpi0 attimer0: <AT timer> port 0x40-0x43 irq 0 on acpi0 Timecounter "i8254" frequency 1193182 Hz quality 0 Event timer "i8254" frequency 1193182 Hz quality 100 atrtc0: <AT realtime clock> port 0x70-0x71 irq 8 on acpi0 atrtc0: registered as a time-of-day clock, resolution 1.000000s Event timer "RTC" frequency 32768 Hz quality 0 hpet0: <High Precision Event Timer> iomem 0xfed00000-0xfed003ff on acpi0 Timecounter "HPET" frequency 14318180 Hz quality 950 Event timer "HPET" frequency 14318180 Hz quality 550 Event timer "HPET1" frequency 14318180 Hz quality 450 Timecounter "ACPI-safe" frequency 3579545 Hz quality 850 acpi_timer0: <32-bit timer at 3.579545MHz> port 0x808-0x80b on acpi0 apei0: <ACPI Platform Error Interface> on acpi0 pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0 pci0: <ACPI PCI bus> on pcib0 pcib1: <ACPI PCI-PCI bridge> at device 1.0 on pci0 pci1: <ACPI PCI bus> on pcib1 vgapci0: <VGA-compatible display> port 0xe000-0xe0ff mem 0xfa000000-0xfbffffff,0xfe7f0000-0xfe7fffff,0xfe600000-0xfe6fffff irq 18 at device 5.0 on pci1 vgapci0: Boot video device pcib2: <ACPI PCI-PCI bridge> irq 18 at device 2.0 on pci0 pci2: <ACPI PCI bus> on pcib2 pci2: <serial bus> at device 0.0 (no driver attached) pcib3: <ACPI PCI-PCI bridge> irq 18 at device 6.0 on pci0 pci3: <ACPI PCI bus> on pcib3 bge0: <HP NC107i PCIe Gigabit Server Adapter, ASIC rev. 0x5784100> mem 0xfe9f0000-0xfe9fffff irq 18 at device 0.0 on pci3 bge0: CHIP ID 0x05784100; ASIC REV 0x5784; CHIP REV 0x57841; PCI-E miibus0: <MII bus> on bge0 brgphy0: <BCM5784 10/100/1000baseT PHY> PHY 1 on miibus0 brgphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, auto, auto-flow bge0: Using defaults for TSO: 65518/35/2048 bge0: Ethernet address: fc:15:b4:90:34:f3 ahci0: <AMD SB7x0/SB8x0/SB9x0 AHCI SATA controller> port 0xd000-0xd007,0xc000-0xc003,0xb000-0xb007,0xa000-0xa003,0x9000-0x900f mem 0xfe5ffc00-0xfe5fffff irq 19 at device 17.0 on pci0 ahci0: AHCI v1.20 with 6 3Gbps ports, Port Multiplier supported ahci0: quirks=3D0x22000<ATI_PMP_BUG,1MSI> ahcich0: <AHCI channel> at channel 0 on ahci0 ahcich1: <AHCI channel> at channel 1 on ahci0 ahcich2: <AHCI channel> at channel 2 on ahci0 ahcich3: <AHCI channel> at channel 3 on ahci0 ahcich4: <AHCI channel> at channel 4 on ahci0 ahcich5: <AHCI channel> at channel 5 on ahci0 ohci0: <AMD SB7x0/SB8x0/SB9x0 USB controller> mem 0xfe5fe000-0xfe5fefff irq= 18 at device 18.0 on pci0 usbus0 on ohci0 usbus0: 12Mbps Full Speed USB v1.0 ehci0: <AMD SB7x0/SB8x0/SB9x0 USB 2.0 controller> mem 0xfe5ff800-0xfe5ff8ff= irq 17 at device 18.2 on pci0 usbus1: EHCI version 1.0 usbus1 on ehci0 usbus1: 480Mbps High Speed USB v2.0 ohci1: <AMD SB7x0/SB8x0/SB9x0 USB controller> mem 0xfe5fd000-0xfe5fdfff irq= 18 at device 19.0 on pci0 usbus2 on ohci1 usbus2: 12Mbps Full Speed USB v1.0 ehci1: <AMD SB7x0/SB8x0/SB9x0 USB 2.0 controller> mem 0xfe5ff400-0xfe5ff4ff= irq 17 at device 19.2 on pci0 usbus3: EHCI version 1.0 usbus3 on ehci1 usbus3: 480Mbps High Speed USB v2.0 isab0: <PCI-ISA bridge> at device 20.3 on pci0 isa0: <ISA bus> on isab0 pcib4: <ACPI PCI-PCI bridge> at device 20.4 on pci0 pci4: <ACPI PCI bus> on pcib4 ohci2: <AMD SB7x0/SB8x0/SB9x0 USB controller> mem 0xfe5fc000-0xfe5fcfff irq= 18 at device 22.0 on pci0 usbus4 on ohci2 usbus4: 12Mbps Full Speed USB v1.0 ehci2: <AMD SB7x0/SB8x0/SB9x0 USB 2.0 controller> mem 0xfe5ff000-0xfe5ff0ff= irq 17 at device 22.2 on pci0 usbus5: EHCI version 1.0 usbus5 on ehci2 usbus5: 480Mbps High Speed USB v2.0 acpi_button0: <Power Button> on acpi0 hwpstate0: <Cool`n'Quiet 2.0> on cpu0 Timecounters tick every 1.000 msec ZFS filesystem version: 5 ZFS storage pool version: features support (5000) ugen2.1: <ATI OHCI root HUB> at usbus2 ugen4.1: <ATI OHCI root HUB> at usbus4 uhub0 on usbus2 uhub1 on usbus4 uhub0: <ATI OHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus2 uhub1: <ATI OHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus4 ugen1.1: <ATI EHCI root HUB> at usbus1 ugen0.1: <ATI OHCI root HUB> at usbus0 uhub2 on usbus1 uhub3 on usbus0 uhub2: <ATI EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus1 uhub3: <ATI OHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus0 mlx4_core0: <mlx4_core> mem 0xfe800000-0xfe8fffff,0xfd800000-0xfdffffff irq= 18 at device 0.0 on pci2 mlx4_core: Mellanox ConnectX core driver v3.6.0 (December 2020) mlx4_core: Initializing mlx4_core ugen5.1: <ATI EHCI root HUB> at usbus5 uhub4 on usbus5 uhub4: <ATI EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus5 ugen3.1: <ATI EHCI root HUB> at usbus3 uhub5 on usbus3 uhub5: <ATI EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus3 bge0: link state changed to UP ada0 at ahcich0 bus 0 scbus0 target 0 lun 0 ada0: <WDC WD30EZRX-00D8PB0 80.00A80> ACS-2 ATA SATA 3.x device ada0: Serial Number WD-WMC4N0D37EH5 ada0: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) ada0: Command Queueing enabled ada0: 2861588MB (5860533168 512 byte sectors) ada0: quirks=3D0x1<4K> ada1 at ahcich1 bus 0 scbus1 target 0 lun 0 ada1: <WDC WD30EZRX-00D8PB0 80.00A80> ACS-2 ATA SATA 3.x device ada1: Serial Number WD-WMC4N0D7W637 ada1: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) ada1: Command Queueing enabled ada1: 2861588MB (5860533168 512 byte sectors) ada1: quirks=3D0x1<4K> ada2 at ahcich2 bus 0 scbus2 target 0 lun 0 ada2: <WDC WD30EZRX-00D8PB0 80.00A80> ACS-2 ATA SATA 3.x device ada2: Serial Number WD-WMC4N0D6EVLR ada2: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) ada2: Command Queueing enabled ada2: 2861588MB (5860533168 512 byte sectors) ada2: quirks=3D0x1<4K> ada3 at ahcich3 bus 0 scbus3 target 0 lun 0 ada3: <WDC WD30EZRX-00D8PB0 80.00A80> ACS-2 ATA SATA 3.x device ada3: Serial Number WD-WMC4N0DA7JCC ada3: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) ada3: Command Queueing enabled ada3: 2861588MB (5860533168 512 byte sectors) ada3: quirks=3D0x1<4K> ada4 at ahcich5 bus 0 scbus5 target 0 lun 0 ada4: <WDC WD5000AAJS-55A8B2 01.03B01> ATA8-ACS SATA 2.x device ada4: Serial Number WD-WCASY8895731 ada4: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) ada4: Command Queueing enabled ada4: 476940MB (976773168 512 byte sectors) uhub1: 4 ports with 4 removable, self powered uhub3: 5 ports with 5 removable, self powered uhub0: 5 ports with 5 removable, self powered mlx4_core0: Old device ETS support detected mlx4_core0: Consider upgrading device FW. mlx4_core0: Unable to determine PCI device chain minimum BW <mlx4_ib> mlx4_ib_add: mlx4_ib: Mellanox ConnectX InfiniBand driver v3.6.0 (December 2020) <mlx4_ib> mlx4_ib_add: counter index 0 for port 1 allocated 0 <mlx4_ib> mlx4_ib_add: counter index 1 for port 2 allocated 0 ib0: link state changed to DOWN ib0: post srq failed for buf 0 (-22) ib0: ipoib_cm_post_receive_srq failed for buf 0 Fatal trap 12: page fault while in kernel mode cpuid =3D 0; apic id =3D 00 fault virtual address =3D 0x1f4bd438 fault code =3D supervisor read data, page not present instruction pointer =3D 0x20:0xffffffff80ea7f03 stack pointer =3D 0x28:0xffffffff829ba990 frame pointer =3D 0x28:0xffffffff829ba9b0 code segment =3D base 0x0, limit 0xfffff, type 0x1b =3D DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags =3D interrupt enabled, resume, IOPL =3D 0 current process =3D 0 (swapper) trap number =3D 12 panic: page fault cpuid =3D 0 time =3D 5 KDB: stack backtrace: #0 0xffffffff80c60b55 at kdb_backtrace+0x65 #1 0xffffffff80c13771 at vpanic+0x181 #2 0xffffffff80c135e3 at panic+0x43 #3 0xffffffff81135187 at trap_fatal+0x387 #4 0xffffffff811351df at trap_pfault+0x4f #5 0xffffffff8113483d at trap+0x27d #6 0xffffffff8110c028 at calltrap+0x8 #7 0xffffffff80ea7794 at ipoib_cm_dev_cleanup+0x94 #8 0xffffffff80ea6976 at ipoib_cm_dev_init+0x536 #9 0xffffffff80eaf242 at ipoib_transport_dev_init+0xf2 #10 0xffffffff80ea98d1 at ipoib_ib_dev_init+0x31 #11 0xffffffff80eaaf07 at ipoib_dev_init+0x97 #12 0xffffffff80eac812 at ipoib_add_one+0x312 #13 0xffffffff80e71848 at ib_register_device+0x768 #14 0xffffffff80ee2013 at mlx4_ib_add+0x1033 #15 0xffffffff80f00d40 at mlx4_add_device+0x40 #16 0xffffffff80f00c68 at mlx4_register_interface+0xb8 ----- KERNCONF diff ---------- --- GENERIC 2021-03-21 03:48:03.373297000 +0900 +++ MICROSERVER-PR 2021-03-22 09:22:06.646143000 +0900 @@ -19,7 +19,7 @@ # $FreeBSD$ cpu HAMMER -ident GENERIC +ident MICROSERVER-PR makeoptions DEBUG=3D-g # Build kernel with gdb(1) debug symbols makeoptions WITH_CTF=3D1 # Run ctfconvert(1) for DTrace su= pport @@ -249,9 +249,23 @@ # Nvidia/Mellanox Connect-X 4 and later, Ethernet only # mlx5ib requires ibcore infra and is not included by default -device mlx5 # Base driver -device mlxfw # Firmware update -device mlx5en # Ethernet driver +#device mlx5 # Base driver +#device mlxfw # Firmware update +#device mlx5en # Ethernet driver + + +# Mellanox +options OFED +options SDP +options IPOIB_CM + +device ipoib +device mlx4 +device mlx4ib +device mlx4en +device mthca + + # PCI Ethernet NICs that use the common MII bus controller code. # NOTE: Be sure to keep the 'device miibus' line in order to use these NIC= s! --=20 You are receiving this mail because: You are the assignee for the bug.=
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-254474-227>