Date: Wed, 16 Oct 2019 22:07:13 +0000 From: "Keller, Jacob E" <jacob.e.keller@intel.com> To: "freebsd-net@freebsd.org" <freebsd-net@freebsd.org> Cc: "shurd@llnw.com" <shurd@llnw.com>, "jhb@freebsd.org" <jhb@freebsd.org>, "Joyner, Eric" <eric.joyner@intel.com> Subject: Re: panic on invalid ifp pointer in iflib drivers Message-ID: <f9394e20ef7fe62440b0bf13df3f779d87ebe1ba.camel@intel.com> In-Reply-To: <02874ECE860811409154E81DA85FBB589692E0D4@ORSMSX121.amr.corp.intel.com> References: <02874ECE860811409154E81DA85FBB589692E0D4@ORSMSX121.amr.corp.intel.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, 2019-10-16 at 21:16 +0000, Keller, Jacob E wrote: > Hi, > > I’m investigating an issue on the iflib ixl driver in 11.3-RELEASE as > well as 12-RELEASE. We found a panic in that occurs if SCTP/IPv6 > traffic is being transmitted while the device is detached: > I've just been told it has reproduced this on the latest 12-stable as well. > Fatal trap 12: page fault while in kernel mode > cpuid = 0; apic id = 00 > fault virtual address = 0xfffffe0000411e38 > fault code = supervisor read data, page not present > instruction pointer = 0x20:0xffffffff80c84700 > stack pointer = 0x28:0xfffffe2f4351b600 > frame pointer = 0x28:0xfffffe2f4351b650 > code segment = base 0x0, limit 0xfffff, type 0x1b > = DPL 0, pres 1, long 1, def32 0, gran 1 > processor eflags = interrupt enabled, resume, IOPL = 0 > current process = 12 (swi4: clock (0)) > trap number = 12 > panic: page fault > cpuid = 0 > KDB: stack backtrace: > db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame > 0xfffffe2f4351b2c0 > vpanic() at vpanic+0x17e/frame 0xfffffe2f4351b320 > panic() at panic+0x43/frame 0xfffffe2f4351b380 > trap_fatal() at trap_fatal+0x369/frame 0xfffffe2f4351b3d0 > trap_pfault() at trap_pfault+0x62/frame 0xfffffe2f4351b420 > trap() at trap+0x2b3/frame 0xfffffe2f4351b530 > calltrap() at calltrap+0x8/frame 0xfffffe2f4351b530 > --- trap 0xc, rip = 0xffffffff80c84700, rsp = 0xfffffe2f4351b600, rbp > = 0xfffffe2f4351b650 --- > in6_selecthlim() at in6_selecthlim+0x20/frame 0xfffffe2f4351b650 > sctp_lowlevel_chunk_output() at > sctp_lowlevel_chunk_output+0xeb2/frame 0xfffffe2f4351b790 > sctp_chunk_output() at sctp_chunk_output+0x68c/frame > 0xfffffe2f4351c110 > sctp_timeout_handler() at sctp_timeout_handler+0x2d8/frame > 0xfffffe2f4351c180 > softclock_call_cc() at softclock_call_cc+0x15b/frame > 0xfffffe2f4351c230 > softclock() at softclock+0x7c/frame 0xfffffe2f4351c260 > intr_event_execute_handlers() at > intr_event_execute_handlers+0x9a/frame 0xfffffe2f4351c2a0 > ithread_loop() at ithread_loop+0xb7/frame 0xfffffe2f4351c2f0 > fork_exit() at fork_exit+0x84/frame 0xfffffe2f4351c330 > fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe2f4351c330 > --- trap 0, rip = 0, rsp = 0, rbp = 0 --- > KDB: enter: panic > > > From what I’ve gathered so far, it appears that the issue is a use- > after-free where the SCTP stack gets an ifp pointer that’s no longer > valid. We’ve reproduced this issue on multiple iflib-based drivers, > including ixl and the recently published ice driver code (available > on phabricator). > > Additionally, we cannot reproduce it on legacy-stack drivers for ixl, > or a mellanox 100G board we have. This leads me to believe that it’s > an issue in iflib rather than in the specific device drivers. > > I am not sure exactly what’s going wrong here... anyone have > suggestions? I thought it might be an issue of when ether_ifdetach is > called. That function is supposed to clear all of the pre-existing > routes from the route entry list. I’m thinking maybe somehow a route > gets added after ether_ifdetach is called. > > In the iflib_device_deregister function, ether_ifdetach is called > just after iflib_stop, (which would call a device’s if_stop routine), > and then the task queues are shutdown, a driver’s ifdi_detach handler > is called, and the ifp is free’d at the end. In the ixl legacy > driver, ether_ifdetach is called prior to the stop routine. However, > in the mlx5 driver, it’s called after a call to close_locked()... > > So I’m really not sure exactly what could cause a stale ifp pointer > to get into the route entry list. > > Thanks, > Jake
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?f9394e20ef7fe62440b0bf13df3f779d87ebe1ba.camel>
