Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 17 Oct 2019 23:22:12 +0000
From:      "Keller, Jacob E" <jacob.e.keller@intel.com>
To:        John Baldwin <jhb@FreeBSD.org>, "freebsd-net@freebsd.org" <freebsd-net@freebsd.org>
Cc:        "shurd@llnw.com" <shurd@llnw.com>, "Joyner, Eric" <eric.joyner@intel.com>
Subject:   RE: panic on invalid ifp pointer in iflib drivers
Message-ID:  <02874ECE860811409154E81DA85FBB5896931470@ORSMSX121.amr.corp.intel.com>
In-Reply-To: <23f1e835-5dbb-055b-3768-f361311a9387@FreeBSD.org>
References:  <02874ECE860811409154E81DA85FBB589692E0D4@ORSMSX121.amr.corp.intel.com> <23f1e835-5dbb-055b-3768-f361311a9387@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help
> -----Original Message-----
> From: John Baldwin <jhb@FreeBSD.org>
> Sent: Thursday, October 17, 2019 9:31 AM
> To: Keller, Jacob E <jacob.e.keller@intel.com>; freebsd-net@freebsd.org
> Cc: shurd@llnw.com; Joyner, Eric <eric.joyner@intel.com>
> Subject: Re: panic on invalid ifp pointer in iflib drivers
>=20
> Nominally, ifnet drivers should call ether_ifdetach first to remove publi=
c
> references to the ifnet and only call their stop routine after that has r=
eturned.
> This ensures any open if_ioctl invocations have completed, etc. before th=
e
> stop routine is invoked.  Otherwise you are open to a race where the inte=
face
> can be upped via an ioctl after you have stopped the hardware.
>=20
> Any other references to the ifnet via eventhandlers, etc. should also be
> deregistered before calling the stop routine.

Looks like iflib moved this much later when we refactored to add a shared f=
unction to deregister VLAN handlers...

>=20
> After the hardware is stopped, interrupt handlers should be torn down and
> callouts
> and tasks drained to ensure there are no other references to the ifp outs=
ide of
> the thread running detach.
>=20
> After that you can release device resources, destroy mutexes, free the if=
p, etc.
> Note that drivers have to be prepared for ether_ifdetach to invoke if_ioc=
tl (e.g.
> when detaching bpf), but of the drivers I've looked at this has generally=
 been a
> non-issue.
>=20
> It sounds like iflib should be doing the detach before calling iflib_stop=
.
>=20

I tested a patch that moved ether_ifdetach above the call to iflib_stop.

This seems to have made the issue significantly harder to reproduce, but af=
ter multiple attach/detach cycles with IPv6 traffic: (INVARIANTS and WITNES=
S are enabled, as well as meguard protecting ifnet)

Unread portion of the kernel message buffer:
Kernel page fault with the following non-sleepable locks held:
exclusive sleep mutex ip6qlock (ip6qlock) r =3D 0 (0xfffffe00007aa848) lock=
ed @ /usr/src/sys/netinet6/frag6.c:849
shared rw vnet_rwlock (vnet_rwlock) r =3D 0 (0xffffffff820be700) locked @ /=
usr/src/sys/netinet6/frag6.c:845
stack backtrace:
#0 0xffffffff80bb6f83 at witness_debugger+0x73
#1 0xffffffff80bb7fa2 at witness_warn+0x442
#2 0xffffffff8108a0f3 at trap_pfault+0x53
#3 0xffffffff810896e4 at trap+0x2b4
#4 0xffffffff8106201c at calltrap+0x8
#5 0xffffffff80d8c07a at icmp6_error+0x4aa
#6 0xffffffff80d8b30e at frag6_freef+0x10e
#7 0xffffffff80d8b551 at frag6_slowtimo+0x111
#8 0xffffffff80bdcda4 at pfslowtimo+0x54
#9 0xffffffff80b65bdf at softclock_call_cc+0x13f
#10 0xffffffff80b65f9c at softclock+0x7c
#11 0xffffffff80b0f857 at ithread_loop+0x187
#12 0xffffffff80b0c4a4 at fork_exit+0x84
#13 0xffffffff8106305e at fork_trampoline+0xe

Fatal trap 12: page fault while in kernel mode
cpuid =3D 0; apic id =3D 00
fault virtual address   =3D 0xfffffe0000825dd8
fault code              =3D supervisor read data, page not present
instruction pointer     =3D 0x20:0xffffffff80d8c5b2
stack pointer           =3D 0x28:0xfffffe1fc28c6ff0
frame pointer           =3D 0x28:0xfffffe1fc28c7090
code segment            =3D base 0x0, limit 0xfffff, type 0x1b
                        =3D DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        =3D interrupt enabled, resume, IOPL =3D 0
current process         =3D 12 (swi4: clock (0))
trap number             =3D 12
panic: page fault
cpuid =3D 0
time =3D 1571354026
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe1fc28c6=
cb0
vpanic() at vpanic+0x19d/frame 0xfffffe1fc28c6d00
panic() at panic+0x43/frame 0xfffffe1fc28c6d60
trap_fatal() at trap_fatal+0x39c/frame 0xfffffe1fc28c6dc0
trap_pfault() at trap_pfault+0x62/frame 0xfffffe1fc28c6e10
trap() at trap+0x2b4/frame 0xfffffe1fc28c6f20
calltrap() at calltrap+0x8/frame 0xfffffe1fc28c6f20
--- trap 0xc, rip =3D 0xffffffff80d8c5b2, rsp =3D 0xfffffe1fc28c6ff0, rbp =
=3D 0xfffffe1fc28c7090 ---
icmp6_reflect() at icmp6_reflect+0x242/frame 0xfffffe1fc28c7090
icmp6_error() at icmp6_error+0x4aa/frame 0xfffffe1fc28c70e0
frag6_freef() at frag6_freef+0x10e/frame 0xfffffe1fc28c7130
frag6_slowtimo() at frag6_slowtimo+0x111/frame 0xfffffe1fc28c7180
pfslowtimo() at pfslowtimo+0x54/frame 0xfffffe1fc28c71b0
softclock_call_cc() at softclock_call_cc+0x13f/frame 0xfffffe1fc28c7260
softclock() at softclock+0x7c/frame 0xfffffe1fc28c7290
ithread_loop() at ithread_loop+0x187/frame 0xfffffe1fc28c72f0
fork_exit() at fork_exit+0x84/frame 0xfffffe1fc28c7330
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe1fc28c7330
--- trap 0, rip =3D 0, rsp =3D 0, rbp =3D 0 ---
KDB: enter: panic


Hmm.. now that I look at that more closely I think it's a separate issue.

> --
> John Baldwin



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?02874ECE860811409154E81DA85FBB5896931470>