Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 13 Apr 2023 06:43:03 +0000
From:      bugzilla-noreply@freebsd.org
To:        bugs@FreeBSD.org
Subject:   [Bug 270813] kernel crashes when ena driver is unloaded
Message-ID:  <bug-270813-227@https.bugs.freebsd.org/bugzilla/>

next in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D270813

            Bug ID: 270813
           Summary: kernel crashes when ena driver is unloaded
           Product: Base System
           Version: CURRENT
          Hardware: Any
                OS: Any
            Status: New
          Severity: Affects Many People
          Priority: ---
         Component: kern
          Assignee: bugs@FreeBSD.org
          Reporter: akiyano@amazon.com

Reproduction steps:
-------------------
1. Create an AWS EC2 instance from FreeBSD 14.0-CURRENT-amd64-20230323 UEFI=
 ,
ami-02dbe14b26d93d722 in us-east-1 (or any newer ami that starts with "Free=
BSD
14.0-CURRENT-amd64-")

2. run kldunload if_ena.ko

Result:
-------
Crashes every time. 100% reproducible.

Core dump stack:

__curthread () at /root/freebsd-src/sys/amd64/include/pcpu_aux.h:59
59 __asm("movq %%gs:%P1,%0" : "=3Dr" (td) : "n" (offsetof(struct pcpu,
(kgdb) #0 __curthread () at /root/freebsd-src/sys/amd64/include/pcpu_aux.h:=
59
#1 doadump (textdump=3Dtextdump@entry=3D1)
at /root/freebsd-src/sys/kern/kern_shutdown.c:407
#2 0xffffffff80bedc6c in kern_reboot (howto=3D260)
at /root/freebsd-src/sys/kern/kern_shutdown.c:528
#3 0xffffffff80bee18f in vpanic (fmt=3D<optimized out>,
ap=3Dap@entry=3D0xfffffe01fef62ae0)
at /root/freebsd-src/sys/kern/kern_shutdown.c:972
#4 0xffffffff80bedf13 in panic (fmt=3D<unavailable>)
at /root/freebsd-src/sys/kern/kern_shutdown.c:896
#5 0xffffffff810e2b39 in trap_fatal (frame=3D0xfffffe01fef62b70, eva=3D0)
at /root/freebsd-src/sys/amd64/amd64/trap.c:954
#6 <signal handler called>
#7 dump_sa (nw=3Dnw@entry=3D0xfffffe01fef62d08, attr=3Dattr@entry=3D1,
sa=3D0xdeadc0dedeadc0de) at /root/freebsd-src/sys/netlink/route/iface.c:210
#8 0xffffffff80e5659a in dump_iface (nw=3Dnw@entry=3D0xfffffe01fef62d08,
ifp=3Difp@entry=3D0xfffff80109bbe800, hdr=3Dhdr@entry=3D0xfffffe01fef62d48,
if_flags_mask=3Dif_flags_mask@entry=3D0)
at /root/freebsd-src/sys/netlink/route/iface.c:279
#9 0xffffffff80e55e7b in rtnl_handle_ifevent (ifp=3D0xfffff80109bbe800,
nlmsg_type=3D<optimized out>, if_flags_mask=3D0)
at /root/freebsd-src/sys/netlink/route/iface.c:943
#10 0xffffffff80d1fc1d in do_link_state_change (arg=3D0xfffff80109bbe800,
pending=3D1) at /root/freebsd-src/sys/net/if.c:2205
#11 0xffffffff80c5233a in taskqueue_run_locked (
queue=3Dqueue@entry=3D0xfffff80106ce7100)
at /root/freebsd-src/sys/kern/subr_taskqueue.c:514
#12 0xffffffff80c5224d in taskqueue_run (queue=3D0xfffff80106ce7100)
at /root/freebsd-src/sys/kern/subr_taskqueue.c:529
#13 0xffffffff80ba8126 in intr_event_execute_handlers (ie=3D0xfffff80106a9d=
300,
p=3D<optimized out>) at /root/freebsd-src/sys/kern/kern_intr.c:1207
#14 ithread_execute_handlers (ie=3D0xfffff80106a9d300, p=3D<optimized out>)
at /root/freebsd-src/sys/kern/kern_intr.c:1220
#15 ithread_loop (arg=3Darg@entry=3D0xfffff80106c951c0)
at /root/freebsd-src/sys/kern/kern_intr.c:1308
#16 0xffffffff80ba45c0 in fork_exit (
callout=3D0xffffffff80ba7eb0 <ithread_loop>, arg=3D0xfffff80106c951c0,
frame=3D0xfffffe01fef62f40) at /root/freebsd-src/sys/kern/kern_fork.c:1102
#17 <signal handler called>
(kgdb)

Initial investigation results:
------------------------------

1. printed ifp->if_addr->ifa_addr inside do_link_state_change and it is
0xdeadc0dedeadc0de.

2. Initially I suspected that it is some kernel issue. I therefore tried to
find a kernel commit that caused this:

The last non crashing instance is with ami (us-east-1):
FreeBSD 14.0-CURRENT-amd64-20230316 UEFI , ami-0d80d8baae9fea731
uname -a shows kernel commit hash cee09bda03c8


The first crashing instance is with ami (us-east-1:
FreeBSD 14.0-CURRENT-amd64-20230323 UEFI , ami-02dbe14b26d93d722
uname -a shows kernel commit hash b5d43972e394

However I saw that if the ami was a crashing ami - then no matter which ker=
nel
I built and installed from sources, the issue reproduced. And the other way=
, if
I used a non crashing ami, no matter which kernel I build and installed form
sources, the issue didn't reproduce.

So I figured it is a Userland issue. So I went on to build and install User=
land
without kernel until I found the commit that caused the issue. (command used
make buildworld -j`sysctl -n hw.ncpu` && make installworld -j`sysctl -n
hw.ncpu` && reboot)

This commit proved to be: https://reviews.freebsd.org/D39048 (commit before
doesnt crash, commits >=3D crash).

Relevant discussions:
---------------------
Initially I commented in https://reviews.freebsd.org/D39048, which created =
an
email thread where the following was written:

Zhenlei Huang <zlei@FreeBSD.org>:
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D
iface.c:210 That might be line 214.
Also be aware that `sa =3D=3D 0xdeadc0dedeadc0de`.=20

```
static bool
dump_iface(struct nl_writer *nw, struct ifnet *ifp, const struct nlmsghdr *=
hdr,
    int if_flags_mask)
{
...
    if ((ifp->if_addr !=3D NULL)) {
        dump_sa(nw, IFLA_ADDRESS, ifp->if_addr->ifa_addr);
    }
...
}
```
There probably have concurrency between ifp destroying and interface status
event handling.
`ifp` might be freed before this event handler rtnl_handle_ifevent() .

So only checking `ifp->if_addr !=3D NULL` is not enough.
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D

Fix thoughts:
-------------
My first thought was to alter the dump_iface code that @zlei pointed out an=
d to
check if "if->addr !=3D0xdeadc0dedeadc0de"
But I didn't find any code that does that or a #define for 0xdeadc0dedeadc0=
de
that I could use.
So I guess this is not the right way to do this.

Would appreciate any suggestions you may have on how to tackle this.

--=20
You are receiving this mail because:
You are the assignee for the bug.=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-270813-227>