Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 23 Dec 2023 13:38:50 +0000
From:      bugzilla-noreply@freebsd.org
To:        bugs@FreeBSD.org
Subject:   [Bug 275897] mlx4en: Panic when mlx4en is loaded
Message-ID:  <bug-275897-227@https.bugs.freebsd.org/bugzilla/>

next in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D275897

            Bug ID: 275897
           Summary: mlx4en: Panic when mlx4en is loaded
           Product: Base System
           Version: 14.0-RELEASE
          Hardware: Any
                OS: Any
            Status: New
          Severity: Affects Some People
          Priority: ---
         Component: kern
          Assignee: bugs@FreeBSD.org
          Reporter: yuuzi41@hotmail.com

Created attachment 247214
  --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=3D247214&action=
=3Dedit
core

Kernel Panic (Page fault) happen when I tried to load mlx4en.

My machine has Mellanox ConnectX-3.

----
% pciconf -vl
hostb0@pci0:0:0:0:      class=3D0x060000 rev=3D0x00 hdr=3D0x00 vendor=3D0x8=
086
device=3D0x4e24 subvendor=3D0x8086 subdevice=3D0x7270
    vendor     =3D 'Intel Corporation'
    class      =3D bridge
    subclass   =3D HOST-PCI
vgapci0@pci0:0:2:0:     class=3D0x030000 rev=3D0x01 hdr=3D0x00 vendor=3D0x8=
086
device=3D0x4e61 subvendor=3D0x8086 subdevice=3D0x2212
    vendor     =3D 'Intel Corporation'
    device     =3D 'JasperLake [UHD Graphics]'
    class      =3D display
    subclass   =3D VGA
xhci0@pci0:0:20:0:      class=3D0x0c0330 rev=3D0x01 hdr=3D0x00 vendor=3D0x8=
086
device=3D0x4ded subvendor=3D0x8086 subdevice=3D0x7270
    vendor     =3D 'Intel Corporation'
    class      =3D serial bus
    subclass   =3D USB
none0@pci0:0:20:2:      class=3D0x050000 rev=3D0x01 hdr=3D0x00 vendor=3D0x8=
086
device=3D0x4def subvendor=3D0x8086 subdevice=3D0x7270
    vendor     =3D 'Intel Corporation'
    class      =3D memory
    subclass   =3D RAM
none1@pci0:0:22:0:      class=3D0x078000 rev=3D0x01 hdr=3D0x00 vendor=3D0x8=
086
device=3D0x4de0 subvendor=3D0x8086 subdevice=3D0x7270
    vendor     =3D 'Intel Corporation'
    device     =3D 'Management Engine Interface'
    class      =3D simple comms
sdhci_pci0@pci0:0:26:0: class=3D0x080501 rev=3D0x01 hdr=3D0x00 vendor=3D0x8=
086
device=3D0x4dc4 subvendor=3D0x8086 subdevice=3D0x7270
    vendor     =3D 'Intel Corporation'
    class      =3D base peripheral
    subclass   =3D SD host controller
pcib1@pci0:0:28:0:      class=3D0x060400 rev=3D0x01 hdr=3D0x01 vendor=3D0x8=
086
device=3D0x4db8 subvendor=3D0x8086 subdevice=3D0x7270
    vendor     =3D 'Intel Corporation'
    class      =3D bridge
    subclass   =3D PCI-PCI
pcib2@pci0:0:28:1:      class=3D0x060400 rev=3D0x01 hdr=3D0x01 vendor=3D0x8=
086
device=3D0x4db9 subvendor=3D0x8086 subdevice=3D0x7270
    vendor     =3D 'Intel Corporation'
    class      =3D bridge
    subclass   =3D PCI-PCI
pcib3@pci0:0:28:2:      class=3D0x060400 rev=3D0x01 hdr=3D0x01 vendor=3D0x8=
086
device=3D0x4dba subvendor=3D0x8086 subdevice=3D0x7270
    vendor     =3D 'Intel Corporation'
    class      =3D bridge
    subclass   =3D PCI-PCI
pcib4@pci0:0:28:3:      class=3D0x060400 rev=3D0x01 hdr=3D0x01 vendor=3D0x8=
086
device=3D0x4dbb subvendor=3D0x8086 subdevice=3D0x7270
    vendor     =3D 'Intel Corporation'
    class      =3D bridge
    subclass   =3D PCI-PCI
pcib5@pci0:0:28:4:      class=3D0x060400 rev=3D0x01 hdr=3D0x01 vendor=3D0x8=
086
device=3D0x4dbc subvendor=3D0x8086 subdevice=3D0x7270
    vendor     =3D 'Intel Corporation'
    class      =3D bridge
    subclass   =3D PCI-PCI
isab0@pci0:0:31:0:      class=3D0x060100 rev=3D0x01 hdr=3D0x00 vendor=3D0x8=
086
device=3D0x4d87 subvendor=3D0x8086 subdevice=3D0x7270
    vendor     =3D 'Intel Corporation'
    class      =3D bridge
    subclass   =3D PCI-ISA
none2@pci0:0:31:3:      class=3D0x040300 rev=3D0x01 hdr=3D0x00 vendor=3D0x8=
086
device=3D0x4dc8 subvendor=3D0x8086 subdevice=3D0x7270
    vendor     =3D 'Intel Corporation'
    device     =3D 'Jasper Lake HD Audio'
    class      =3D multimedia
    subclass   =3D HDA
none3@pci0:0:31:4:      class=3D0x0c0500 rev=3D0x01 hdr=3D0x00 vendor=3D0x8=
086
device=3D0x4da3 subvendor=3D0x8086 subdevice=3D0x7270
    vendor     =3D 'Intel Corporation'
    device     =3D 'Jasper Lake SMBus'
    class      =3D serial bus
    subclass   =3D SMBus
none4@pci0:0:31:5:      class=3D0x0c8000 rev=3D0x01 hdr=3D0x00 vendor=3D0x8=
086
device=3D0x4da4 subvendor=3D0x8086 subdevice=3D0x7270
    vendor     =3D 'Intel Corporation'
    device     =3D 'Jasper Lake SPI Controller'
    class      =3D serial bus
igc0@pci0:1:0:0:        class=3D0x020000 rev=3D0x04 hdr=3D0x00 vendor=3D0x8=
086
device=3D0x125c subvendor=3D0x8086 subdevice=3D0x0000
    vendor     =3D 'Intel Corporation'
    device     =3D 'Ethernet Controller I226-V'
    class      =3D network
    subclass   =3D ethernet
igc1@pci0:2:0:0:        class=3D0x020000 rev=3D0x04 hdr=3D0x00 vendor=3D0x8=
086
device=3D0x125c subvendor=3D0x8086 subdevice=3D0x0000
    vendor     =3D 'Intel Corporation'
    device     =3D 'Ethernet Controller I226-V'
    class      =3D network
    subclass   =3D ethernet
igc2@pci0:3:0:0:        class=3D0x020000 rev=3D0x04 hdr=3D0x00 vendor=3D0x8=
086
device=3D0x125c subvendor=3D0x8086 subdevice=3D0x0000
    vendor     =3D 'Intel Corporation'
    device     =3D 'Ethernet Controller I226-V'
    class      =3D network
    subclass   =3D ethernet
nvme0@pci0:4:0:0:       class=3D0x010802 rev=3D0x03 hdr=3D0x00 vendor=3D0x8=
086
device=3D0xf1a6 subvendor=3D0x8086 subdevice=3D0x390b
    vendor     =3D 'Intel Corporation'
    device     =3D 'SSD Pro 7600p/760p/E 6100p Series'
    class      =3D mass storage
    subclass   =3D NVM
mlx4_core0@pci0:5:0:0:  class=3D0x020000 rev=3D0x00 hdr=3D0x00 vendor=3D0x1=
5b3
device=3D0x1003 subvendor=3D0x15b3 subdevice=3D0x0113
    vendor     =3D 'Mellanox Technologies'
    device     =3D 'MT27500 Family [ConnectX-3]'
    class      =3D network
    subclass   =3D ethernet
----

Reproduce procedure:
# kldload mlx4en

the core is attached.

Analysis:

The way I see the stacktrace in the core, the kernel panic happened because
"ifm->ifm_status" was NULL at=20
https://cgit.freebsd.org/src/tree/sys/net/if_media.c?h=3Dreleng/14.0#n293
and that statement has been executed when mlx4en was calling ether_ifattach=
()
function.
https://cgit.freebsd.org/src/tree/sys/dev/mlx4/mlx4_en/mlx4_en_netdev.c?h=
=3Dreleng/14.0#n2296

ifm_status callback looks to be set in ifmedia_init() function
https://cgit.freebsd.org/src/tree/sys/net/if_media.c?h=3Dreleng/14.0#n87
but mlx4en calls ifmedia_init() function after mlx4en calls ether_ifattach()
function.
https://cgit.freebsd.org/src/tree/sys/dev/mlx4/mlx4_en/mlx4_en_netdev.c?h=
=3Dreleng/14.0#n2298

I think that that is the root cause.

I'd like to propose a patch to fix it as below. It changes the order of
statements.
----
diff --git a/sys/dev/mlx4/mlx4_en/mlx4_en_netdev.c
b/sys/dev/mlx4/mlx4_en/mlx4_en_netdev.c
index c26afc0099b5..583de1816d1b 100644
--- a/sys/dev/mlx4/mlx4_en/mlx4_en_netdev.c
+++ b/sys/dev/mlx4/mlx4_en/mlx4_en_netdev.c
@@ -2293,7 +2293,6 @@ int mlx4_en_init_netdev(struct mlx4_en_dev *mdev, int
port,
                dev_addr[ETHER_ADDR_LEN - 1 - i] =3D (u8) (priv->mac >> (8 =
* i));


-       ether_ifattach(dev, dev_addr);
        if_link_state_change(dev, LINK_STATE_DOWN);
        ifmedia_init(&priv->media, IFM_IMASK | IFM_ETH_FMASK,
            mlx4_en_media_change, mlx4_en_media_status);
@@ -2306,6 +2305,8 @@ int mlx4_en_init_netdev(struct mlx4_en_dev *mdev, int
port,

        DEBUGNET_SET(dev, mlx4_en);

+       ether_ifattach(dev, dev_addr);
+
        en_warn(priv, "Using %d TX rings\n", prof->tx_ring_num);
        en_warn(priv, "Using %d RX rings\n", prof->rx_ring_num);
----

--=20
You are receiving this mail because:
You are the assignee for the bug.=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-275897-227>