Date: Thu, 17 Oct 2019 19:30:48 +0000 From: "Keller, Jacob E" <jacob.e.keller@intel.com> To: John Baldwin <jhb@FreeBSD.org>, "freebsd-net@freebsd.org" <freebsd-net@freebsd.org> Cc: "shurd@llnw.com" <shurd@llnw.com>, "Joyner, Eric" <eric.joyner@intel.com> Subject: RE: panic on invalid ifp pointer in iflib drivers Message-ID: <02874ECE860811409154E81DA85FBB5896931107@ORSMSX121.amr.corp.intel.com> In-Reply-To: <23f1e835-5dbb-055b-3768-f361311a9387@FreeBSD.org> References: <02874ECE860811409154E81DA85FBB589692E0D4@ORSMSX121.amr.corp.intel.com> <23f1e835-5dbb-055b-3768-f361311a9387@FreeBSD.org>
next in thread | previous in thread | raw e-mail | index | archive | help
> -----Original Message----- > From: John Baldwin <jhb@FreeBSD.org> > Sent: Thursday, October 17, 2019 9:31 AM > To: Keller, Jacob E <jacob.e.keller@intel.com>; freebsd-net@freebsd.org > Cc: shurd@llnw.com; Joyner, Eric <eric.joyner@intel.com> > Subject: Re: panic on invalid ifp pointer in iflib drivers >=20 > On 10/16/19 2:16 PM, Keller, Jacob E wrote: > > Hi, > > > > I'm investigating an issue on the iflib ixl driver in 11.3-RELEASE as w= ell as 12- > RELEASE. We found a panic in that occurs if SCTP/IPv6 traffic is being tr= ansmitted > while the device is detached: > > > > Fatal trap 12: page fault while in kernel mode > > cpuid =3D 0; apic id =3D 00 > > fault virtual address =3D 0xfffffe0000411e38 > > fault code =3D supervisor read data, page not present > > instruction pointer =3D 0x20:0xffffffff80c84700 > > stack pointer =3D 0x28:0xfffffe2f4351b600 > > frame pointer =3D 0x28:0xfffffe2f4351b650 > > code segment =3D base 0x0, limit 0xfffff, type 0x1b > > =3D DPL 0, pres 1, long 1, def32 0, gran 1 > > processor eflags =3D interrupt enabled, resume, IOPL =3D 0 > > current process =3D 12 (swi4: clock (0)) > > trap number =3D 12 > > panic: page fault > > cpuid =3D 0 > > KDB: stack backtrace: > > db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame > 0xfffffe2f4351b2c0 > > vpanic() at vpanic+0x17e/frame 0xfffffe2f4351b320 > > panic() at panic+0x43/frame 0xfffffe2f4351b380 > > trap_fatal() at trap_fatal+0x369/frame 0xfffffe2f4351b3d0 > > trap_pfault() at trap_pfault+0x62/frame 0xfffffe2f4351b420 > > trap() at trap+0x2b3/frame 0xfffffe2f4351b530 > > calltrap() at calltrap+0x8/frame 0xfffffe2f4351b530 > > --- trap 0xc, rip =3D 0xffffffff80c84700, rsp =3D 0xfffffe2f4351b600, r= bp =3D > 0xfffffe2f4351b650 --- > > in6_selecthlim() at in6_selecthlim+0x20/frame 0xfffffe2f4351b650 > > sctp_lowlevel_chunk_output() at sctp_lowlevel_chunk_output+0xeb2/frame > 0xfffffe2f4351b790 > > sctp_chunk_output() at sctp_chunk_output+0x68c/frame 0xfffffe2f4351c110 > > sctp_timeout_handler() at sctp_timeout_handler+0x2d8/frame > 0xfffffe2f4351c180 > > softclock_call_cc() at softclock_call_cc+0x15b/frame 0xfffffe2f4351c230 > > softclock() at softclock+0x7c/frame 0xfffffe2f4351c260 > > intr_event_execute_handlers() at intr_event_execute_handlers+0x9a/frame > 0xfffffe2f4351c2a0 > > ithread_loop() at ithread_loop+0xb7/frame 0xfffffe2f4351c2f0 > > fork_exit() at fork_exit+0x84/frame 0xfffffe2f4351c330 > > fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe2f4351c330 > > --- trap 0, rip =3D 0, rsp =3D 0, rbp =3D 0 --- > > KDB: enter: panic > > > > > > From what I've gathered so far, it appears that the issue is a use-afte= r-free > where the SCTP stack gets an ifp pointer that's no longer valid. We've re= produced > this issue on multiple iflib-based drivers, including ixl and the recentl= y published > ice driver code (available on phabricator). > > > > Additionally, we cannot reproduce it on legacy-stack drivers for ixl, o= r a > mellanox 100G board we have. This leads me to believe that it's an issue = in iflib > rather than in the specific device drivers. > > > > I am not sure exactly what's going wrong here... anyone have suggestion= s? I > thought it might be an issue of when ether_ifdetach is called. That funct= ion is > supposed to clear all of the pre-existing routes from the route entry lis= t. I'm > thinking maybe somehow a route gets added after ether_ifdetach is called. > > > > In the iflib_device_deregister function, ether_ifdetach is called just = after > iflib_stop, (which would call a device's if_stop routine), and then the t= ask queues > are shutdown, a driver's ifdi_detach handler is called, and the ifp is fr= ee'd at the > end. In the ixl legacy driver, ether_ifdetach is called prior to the stop= routine. > However, in the mlx5 driver, it's called after a call to close_locked()..= . > > > > So I'm really not sure exactly what could cause a stale ifp pointer to = get into the > route entry list. >=20 > Nominally, ifnet drivers should call ether_ifdetach first to remove publi= c > references to the ifnet and only call their stop routine after that has r= eturned. > This ensures any open if_ioctl invocations have completed, etc. before th= e > stop routine is invoked. Otherwise you are open to a race where the inte= face > can be upped via an ioctl after you have stopped the hardware. >=20 So we should (a) move ether_ifdetach before the stop, and... > Any other references to the ifnet via eventhandlers, etc. should also be > deregistered before calling the stop routine. >=20 (b) make sure all of the event handlers are moved before the stop too. > After the hardware is stopped, interrupt handlers should be torn down and > callouts > and tasks drained to ensure there are no other references to the ifp outs= ide of > the thread running detach. >=20 > After that you can release device resources, destroy mutexes, free the if= p, etc. > Note that drivers have to be prepared for ether_ifdetach to invoke if_ioc= tl (e.g. > when detaching bpf), but of the drivers I've looked at this has generally= been a > non-issue. >=20 > It sounds like iflib should be doing the detach before calling iflib_stop= . >=20 Right. I think also the VLAN event handlers need to be moved too. Let me give that a try out to see if it fixes things. Thanks, Jake > -- > John Baldwin
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?02874ECE860811409154E81DA85FBB5896931107>