Date: Tue, 31 Mar 2020 15:20:24 -0400 From: Mark Johnston <markj@freebsd.org> To: Eric Joyner <erj@freebsd.org> Cc: freebsd-net@freebsd.org, Hans Petter Selasky <hps@selasky.org>, John Baldwin <jhb@freebsd.org>, shurd <shurd@freebsd.org>, Drew Gallatin <gallatin@netflix.com>, Gleb Smirnoff <glebius@freebsd.org> Subject: Re: Issue with epoch_drain_callbacks and unloading iavf(4) [using iflib] Message-ID: <20200331192024.GE97238@raichu> In-Reply-To: <CA%2Bb0zg_k=8nMhapa=T=yTcSJcUrrnG=AfQB%2Be0gPcCrgkbWtCQ@mail.gmail.com> References: <CAKdFRZi3UoRuz=OXnBG=NVcJe605x9OwrLmdCyD98mDeTpbf0Q@mail.gmail.com> <a6523ed6-9d61-d1b4-5822-5787cf5c0e43@selasky.org> <20200130030911.GA15281@spy> <CA%2Bb0zg-1CQ81dsNGv_O3ebLLko6Piei0A1NCPZUT5JH8EOyntw@mail.gmail.com> <CA%2Bb0zg809EGMS1Ngr38BSb1yNpDqxbCnAv9eC%2BcDwbMQ5t%2BqXQ@mail.gmail.com> <20200212222219.GE83892@raichu> <CAKdFRZjdiz_axuweksNUHis7jPKXHqOmhQg%2BQWzpVnsKY%2Bcrmg@mail.gmail.com> <20200328225150.GA82767@raichu> <CAKdFRZgm43LmjJ9dYDBGM8EV0ePRMLPr4YW_tPELANXQGpqpCA@mail.gmail.com> <CA%2Bb0zg_k=8nMhapa=T=yTcSJcUrrnG=AfQB%2Be0gPcCrgkbWtCQ@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, Mar 31, 2020 at 12:14:20PM -0700, Eric Joyner wrote: > Mark, > > I tried out a kernel with the tip of CURRENT with both D24214 and D24215 > applied, and I still see the problem. As well, after doing a "sysctl > debug.kdb.enter=1" and viewing the stack trace there for kldunload, it > appears to be similar to the one I posted in my last post. Can you show it? I don't see how it could be the same, since with the patch we are no longer calling sched_bind() from the epoch scan call back. > > - Eric > > On Mon, Mar 30, 2020 at 1:19 PM Eric Joyner <erj@freebsd.org> wrote: > > > On Sat, Mar 28, 2020 at 3:52 PM Mark Johnston <markj@freebsd.org> wrote: > > > >> On Wed, Mar 11, 2020 at 04:32:40PM -0700, Eric Joyner wrote: > >> > Mark, > >> > > >> > I did get some time to get back and retry this; however your second > >> patch > >> > still doesn't solve the problem. Looking into it a bit, it looks like > >> the > >> > kldunload process isn't hitting the code you've changed; it's hanging in > >> > epoch_wait_preempt() in if_detach_internal(), which is immediately > >> before > >> > epoch_drain_callbacks(). > >> > > >> > I did a kernel dump while it was hanging, and this is the backtrace for > >> the > >> > kldunload process: > >> > >> I see. I think the callback can be made much simpler and avoid the > >> problematic sched_bind() calls. I wrote a patch that allows waiting > >> threads to lend scheduling priority to a preempted thread blocked in an > >> epoch section, based on some code I wrote to implement preemptible SMR > >> sections. If waiting for a running thread, the callback just spins. > >> > >> This might be enough to solve your problem, I posted the two lightly > >> tested patches here: > >> https://reviews.freebsd.org/D24214 > >> https://reviews.freebsd.org/D24215 > >> > >> If we hit a situation where a reader is preempted and then its CPU is > >> hogged by a high-priority kernel thread, this still won't be enough, but > >> I suspect it'll solve your case. Would you be able to test? > >> > > > > Yeah, I'll try them out. > > > > - Eric > >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20200331192024.GE97238>