Date: Mon, 6 Apr 2020 14:19:25 -0700 From: Eric Joyner <erj@freebsd.org> To: Mark Johnston <markj@freebsd.org> Cc: Hans Petter Selasky <hps@selasky.org>, John Baldwin <jhb@freebsd.org>, Drew Gallatin <gallatin@netflix.com>, freebsd-net@freebsd.org, shurd <shurd@freebsd.org> Subject: Re: Issue with epoch_drain_callbacks and unloading iavf(4) [using iflib] Message-ID: <CA%2Bb0zg9z7srroWLtV_poedghXjCr0GvHv95cu4JzFrRdZoaeWw@mail.gmail.com> In-Reply-To: <20200331192024.GE97238@raichu> References: <CAKdFRZi3UoRuz=OXnBG=NVcJe605x9OwrLmdCyD98mDeTpbf0Q@mail.gmail.com> <a6523ed6-9d61-d1b4-5822-5787cf5c0e43@selasky.org> <20200130030911.GA15281@spy> <CA%2Bb0zg-1CQ81dsNGv_O3ebLLko6Piei0A1NCPZUT5JH8EOyntw@mail.gmail.com> <CA%2Bb0zg809EGMS1Ngr38BSb1yNpDqxbCnAv9eC%2BcDwbMQ5t%2BqXQ@mail.gmail.com> <20200212222219.GE83892@raichu> <CAKdFRZjdiz_axuweksNUHis7jPKXHqOmhQg%2BQWzpVnsKY%2Bcrmg@mail.gmail.com> <20200328225150.GA82767@raichu> <CAKdFRZgm43LmjJ9dYDBGM8EV0ePRMLPr4YW_tPELANXQGpqpCA@mail.gmail.com> <CA%2Bb0zg_k=8nMhapa=T=yTcSJcUrrnG=AfQB%2Be0gPcCrgkbWtCQ@mail.gmail.com> <20200331192024.GE97238@raichu>
next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, Mar 31, 2020 at 12:28 PM Mark Johnston <markj@freebsd.org> wrote: > On Tue, Mar 31, 2020 at 12:14:20PM -0700, Eric Joyner wrote: > > Mark, > > > > I tried out a kernel with the tip of CURRENT with both D24214 and D24215 > > applied, and I still see the problem. As well, after doing a "sysctl > > debug.kdb.enter=1" and viewing the stack trace there for kldunload, it > > appears to be similar to the one I posted in my last post. > > Can you show it? I don't see how it could be the same, since with the > patch we are no longer calling sched_bind() from the epoch scan call > back. > > > > > - Eric > > > > On Mon, Mar 30, 2020 at 1:19 PM Eric Joyner <erj@freebsd.org> wrote: > > > > > On Sat, Mar 28, 2020 at 3:52 PM Mark Johnston <markj@freebsd.org> > wrote: > > > > > >> On Wed, Mar 11, 2020 at 04:32:40PM -0700, Eric Joyner wrote: > > >> > Mark, > > >> > > > >> > I did get some time to get back and retry this; however your second > > >> patch > > >> > still doesn't solve the problem. Looking into it a bit, it looks > like > > >> the > > >> > kldunload process isn't hitting the code you've changed; it's > hanging in > > >> > epoch_wait_preempt() in if_detach_internal(), which is immediately > > >> before > > >> > epoch_drain_callbacks(). > > >> > > > >> > I did a kernel dump while it was hanging, and this is the backtrace > for > > >> the > > >> > kldunload process: > > >> > > >> I see. I think the callback can be made much simpler and avoid the > > >> problematic sched_bind() calls. I wrote a patch that allows waiting > > >> threads to lend scheduling priority to a preempted thread blocked in > an > > >> epoch section, based on some code I wrote to implement preemptible SMR > > >> sections. If waiting for a running thread, the callback just spins. > > >> > > >> This might be enough to solve your problem, I posted the two lightly > > >> tested patches here: > > >> https://reviews.freebsd.org/D24214 > > >> https://reviews.freebsd.org/D24215 > > >> > > >> If we hit a situation where a reader is preempted and then its CPU is > > >> hogged by a high-priority kernel thread, this still won't be enough, > but > > >> I suspect it'll solve your case. Would you be able to test? > > >> > > > > > > Yeah, I'll try them out. > > > > > > - Eric > > > > _______________________________________________ > freebsd-net@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" > Mark, I think I was mistaken about the backtrace looking the same. I was looking at it from within ddb, and I think I focused on the epoch_block_handler_preempt line and didn't notice that it only stopped there this time. Here's the new one I've got from kgdb: #0 cpustop_handler () at /usr/src/sys/x86/x86/mp_x86.c:1448 #1 0xffffffff80ff2f79 in ipi_nmi_handler () at /usr/src/sys/x86/x86/mp_x86.c:1405 #2 0xffffffff810294a6 in trap (frame=0xfffffe003b9b6f30) at /usr/src/sys/amd64/amd64/trap.c:201 #3 <signal handler called> #4 epoch_block_handler_preempt (global=0xfffff80003de0100, cr=0xfffffe00dee85900, arg=0x0) at /usr/src/sys/kern/subr_epoch.c:507 #5 0xffffffff803b576d in epoch_block (global=0xfffff80003de0100, cr=0xfffffe00dee85900, cb=0xffffffff80bcf190 <epoch_block_handler_preempt>, ct=0x0) at /usr/src/sys/contrib/ck/src/ck_epoch.c:416 #6 ck_epoch_synchronize_wait (global=0xfffff80003de0100, cb=<optimized out>, ct=<optimized out>) at /usr/src/sys/contrib/ck/src/ck_epoch.c:465 #7 0xffffffff80bcf03c in epoch_wait_preempt (epoch=0xfffff80003de0100) at /usr/src/sys/kern/subr_epoch.c:529 #8 0xffffffff80c9410a in if_detach_internal (ifp=0xfffff80067ed4000, vmove=0, ifcp=0x0) at /usr/src/sys/net/if.c:1123 #9 0xffffffff80c93ebd in if_detach (ifp=0xfffff80003de0100) at /usr/src/sys/net/if.c:1063 #10 0xffffffff80cafa56 in iflib_device_deregister (ctx=0xfffff80088c91800) at /usr/src/sys/net/iflib.c:5104 #11 0xffffffff80bc1e2e in DEVICE_DETACH (dev=0xfffff80004706a00) at ./device_if.h:234 #12 device_detach (dev=0xfffff80004706a00) at /usr/src/sys/kern/subr_bus.c:3049 #13 0xffffffff80bc13fd in devclass_driver_deleted (busclass=0xfffff80004352900, dc=0xfffff80004385a00, driver=0xffffffff823329e0 <i40e_read_nvm_buffer_aq+352>) at /usr/src/sys/kern/subr_bus.c:1235 #14 0xffffffff80bc12ef in devclass_delete_driver (busclass=0xfffff80004352900, driver=0xffffffff823329e0 <i40e_read_nvm_buffer_aq+352>) at /usr/src/sys/kern/subr_bus.c:1310 #15 0xffffffff80bc721c in driver_module_handler (mod=0xfffff80015cd8680, what=1, arg=0xffffffff823329b0 <i40e_read_nvm_buffer_aq+304>) at /usr/src/sys/kern/subr_bus.c:5229 #16 0xffffffff80b67b82 in module_unload (mod=0xfffff80015cd8680) at /usr/src/sys/kern/kern_module.c:261 #17 0xffffffff80b5895b in linker_file_unload (file=0xfffff8016da69a00, flags=0) at /usr/src/sys/kern/kern_linker.c:700 #18 0xffffffff80b59dad in kern_kldunload (td=<optimized out>, fileid=5, flags=0) at /usr/src/sys/kern/kern_linker.c:1153 #19 0xffffffff8102aa40 in syscallenter (td=<optimized out>) at /usr/src/sys/amd64/amd64/../../kern/subr_syscall.c:162 #20 amd64_syscall (td=0xfffffe00e839f100, traced=0) at /usr/src/sys/amd64/amd64/trap.c:1161 #21 <signal handler called> #22 0x00000008002ddcba in ?? () Backtrace stopped: Cannot access memory at address 0x7fffffffe188 - Eric
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CA%2Bb0zg9z7srroWLtV_poedghXjCr0GvHv95cu4JzFrRdZoaeWw>