Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 6 Apr 2020 14:19:25 -0700
From:      Eric Joyner <erj@freebsd.org>
To:        Mark Johnston <markj@freebsd.org>
Cc:        Hans Petter Selasky <hps@selasky.org>, John Baldwin <jhb@freebsd.org>,  Drew Gallatin <gallatin@netflix.com>, freebsd-net@freebsd.org, shurd <shurd@freebsd.org>
Subject:   Re: Issue with epoch_drain_callbacks and unloading iavf(4) [using iflib]
Message-ID:  <CA%2Bb0zg9z7srroWLtV_poedghXjCr0GvHv95cu4JzFrRdZoaeWw@mail.gmail.com>
In-Reply-To: <20200331192024.GE97238@raichu>
References:  <CAKdFRZi3UoRuz=OXnBG=NVcJe605x9OwrLmdCyD98mDeTpbf0Q@mail.gmail.com> <a6523ed6-9d61-d1b4-5822-5787cf5c0e43@selasky.org> <20200130030911.GA15281@spy> <CA%2Bb0zg-1CQ81dsNGv_O3ebLLko6Piei0A1NCPZUT5JH8EOyntw@mail.gmail.com> <CA%2Bb0zg809EGMS1Ngr38BSb1yNpDqxbCnAv9eC%2BcDwbMQ5t%2BqXQ@mail.gmail.com> <20200212222219.GE83892@raichu> <CAKdFRZjdiz_axuweksNUHis7jPKXHqOmhQg%2BQWzpVnsKY%2Bcrmg@mail.gmail.com> <20200328225150.GA82767@raichu> <CAKdFRZgm43LmjJ9dYDBGM8EV0ePRMLPr4YW_tPELANXQGpqpCA@mail.gmail.com> <CA%2Bb0zg_k=8nMhapa=T=yTcSJcUrrnG=AfQB%2Be0gPcCrgkbWtCQ@mail.gmail.com> <20200331192024.GE97238@raichu>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, Mar 31, 2020 at 12:28 PM Mark Johnston <markj@freebsd.org> wrote:

> On Tue, Mar 31, 2020 at 12:14:20PM -0700, Eric Joyner wrote:
> > Mark,
> >
> > I tried out a kernel with the tip of CURRENT with both D24214 and D24215
> > applied, and I still see the problem. As well, after doing a "sysctl
> > debug.kdb.enter=1" and viewing the stack trace there for kldunload, it
> > appears to be similar to the one I posted in my last post.
>
> Can you show it?  I don't see how it could be the same, since with the
> patch we are no longer calling sched_bind() from the epoch scan call
> back.
>
> >
> > - Eric
> >
> > On Mon, Mar 30, 2020 at 1:19 PM Eric Joyner <erj@freebsd.org> wrote:
> >
> > > On Sat, Mar 28, 2020 at 3:52 PM Mark Johnston <markj@freebsd.org>
> wrote:
> > >
> > >> On Wed, Mar 11, 2020 at 04:32:40PM -0700, Eric Joyner wrote:
> > >> > Mark,
> > >> >
> > >> > I did get some time to get back and retry this; however your second
> > >> patch
> > >> > still doesn't solve the problem. Looking into it a bit, it looks
> like
> > >> the
> > >> > kldunload process isn't hitting the code you've changed; it's
> hanging in
> > >> > epoch_wait_preempt() in if_detach_internal(), which is immediately
> > >> before
> > >> > epoch_drain_callbacks().
> > >> >
> > >> > I did a kernel dump while it was hanging, and this is the backtrace
> for
> > >> the
> > >> > kldunload process:
> > >>
> > >> I see.  I think the callback can be made much simpler and avoid the
> > >> problematic sched_bind() calls.  I wrote a patch that allows waiting
> > >> threads to lend scheduling priority to a preempted thread blocked in
> an
> > >> epoch section, based on some code I wrote to implement preemptible SMR
> > >> sections.  If waiting for a running thread, the callback just spins.
> > >>
> > >> This might be enough to solve your problem, I posted the two lightly
> > >> tested patches here:
> > >> https://reviews.freebsd.org/D24214
> > >> https://reviews.freebsd.org/D24215
> > >>
> > >> If we hit a situation where a reader is preempted and then its CPU is
> > >> hogged by a high-priority kernel thread, this still won't be enough,
> but
> > >> I suspect it'll solve your case.  Would you be able to test?
> > >>
> > >
> > > Yeah, I'll try them out.
> > >
> > >  - Eric
> > >
> _______________________________________________
> freebsd-net@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org"
>

Mark,

I think I was mistaken about the backtrace looking the same. I was looking
at it from within ddb, and I think I focused on the
epoch_block_handler_preempt line and didn't notice that it only stopped
there this time. Here's the new one I've got from kgdb:

#0  cpustop_handler () at /usr/src/sys/x86/x86/mp_x86.c:1448
#1  0xffffffff80ff2f79 in ipi_nmi_handler () at
/usr/src/sys/x86/x86/mp_x86.c:1405
#2  0xffffffff810294a6 in trap (frame=0xfffffe003b9b6f30) at
/usr/src/sys/amd64/amd64/trap.c:201
#3  <signal handler called>
#4  epoch_block_handler_preempt (global=0xfffff80003de0100,
cr=0xfffffe00dee85900, arg=0x0) at /usr/src/sys/kern/subr_epoch.c:507
#5  0xffffffff803b576d in epoch_block (global=0xfffff80003de0100,
cr=0xfffffe00dee85900, cb=0xffffffff80bcf190 <epoch_block_handler_preempt>,
ct=0x0) at /usr/src/sys/contrib/ck/src/ck_epoch.c:416
#6  ck_epoch_synchronize_wait (global=0xfffff80003de0100, cb=<optimized
out>, ct=<optimized out>) at /usr/src/sys/contrib/ck/src/ck_epoch.c:465
#7  0xffffffff80bcf03c in epoch_wait_preempt (epoch=0xfffff80003de0100) at
/usr/src/sys/kern/subr_epoch.c:529
#8  0xffffffff80c9410a in if_detach_internal (ifp=0xfffff80067ed4000,
vmove=0, ifcp=0x0) at /usr/src/sys/net/if.c:1123
#9  0xffffffff80c93ebd in if_detach (ifp=0xfffff80003de0100) at
/usr/src/sys/net/if.c:1063
#10 0xffffffff80cafa56 in iflib_device_deregister (ctx=0xfffff80088c91800)
at /usr/src/sys/net/iflib.c:5104
#11 0xffffffff80bc1e2e in DEVICE_DETACH (dev=0xfffff80004706a00) at
./device_if.h:234
#12 device_detach (dev=0xfffff80004706a00) at
/usr/src/sys/kern/subr_bus.c:3049
#13 0xffffffff80bc13fd in devclass_driver_deleted
(busclass=0xfffff80004352900, dc=0xfffff80004385a00,
driver=0xffffffff823329e0 <i40e_read_nvm_buffer_aq+352>) at
/usr/src/sys/kern/subr_bus.c:1235
#14 0xffffffff80bc12ef in devclass_delete_driver
(busclass=0xfffff80004352900, driver=0xffffffff823329e0
<i40e_read_nvm_buffer_aq+352>) at /usr/src/sys/kern/subr_bus.c:1310
#15 0xffffffff80bc721c in driver_module_handler (mod=0xfffff80015cd8680,
what=1, arg=0xffffffff823329b0 <i40e_read_nvm_buffer_aq+304>) at
/usr/src/sys/kern/subr_bus.c:5229
#16 0xffffffff80b67b82 in module_unload (mod=0xfffff80015cd8680) at
/usr/src/sys/kern/kern_module.c:261
#17 0xffffffff80b5895b in linker_file_unload (file=0xfffff8016da69a00,
flags=0) at /usr/src/sys/kern/kern_linker.c:700
#18 0xffffffff80b59dad in kern_kldunload (td=<optimized out>, fileid=5,
flags=0) at /usr/src/sys/kern/kern_linker.c:1153
#19 0xffffffff8102aa40 in syscallenter (td=<optimized out>) at
/usr/src/sys/amd64/amd64/../../kern/subr_syscall.c:162
#20 amd64_syscall (td=0xfffffe00e839f100, traced=0) at
/usr/src/sys/amd64/amd64/trap.c:1161
#21 <signal handler called>
#22 0x00000008002ddcba in ?? ()
Backtrace stopped: Cannot access memory at address 0x7fffffffe188

- Eric



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CA%2Bb0zg9z7srroWLtV_poedghXjCr0GvHv95cu4JzFrRdZoaeWw>