Date: Fri, 31 Jan 2020 11:44:30 -0800 From: Eric Joyner <erj@freebsd.org> To: Mark Johnston <markj@freebsd.org> Cc: Hans Petter Selasky <hps@selasky.org>, freebsd-net@freebsd.org Subject: Re: Issue with epoch_drain_callbacks and unloading iavf(4) [using iflib] Message-ID: <CA%2Bb0zg-1CQ81dsNGv_O3ebLLko6Piei0A1NCPZUT5JH8EOyntw@mail.gmail.com> In-Reply-To: <20200130030911.GA15281@spy> References: <CAKdFRZjxp=mTkUzFU8qsacP86OQOC9vCDCQ%2BO2iF7svRRGDK8w@mail.gmail.com> <0e2e97f2-df75-3c6f-9bdd-e8c2ab7bf79e@selasky.org> <CAKdFRZi3UoRuz=OXnBG=NVcJe605x9OwrLmdCyD98mDeTpbf0Q@mail.gmail.com> <a6523ed6-9d61-d1b4-5822-5787cf5c0e43@selasky.org> <20200130030911.GA15281@spy>
next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, Jan 29, 2020 at 7:09 PM Mark Johnston <markj@freebsd.org> wrote: > On Thu, Jan 30, 2020 at 02:12:05AM +0100, Hans Petter Selasky wrote: > > On 2020-01-29 22:44, Eric Joyner wrote: > > > On Wed, Jan 29, 2020 at 1:41 PM Hans Petter Selasky <hps@selasky.org> > wrote: > > > > > > > On 2020-01-29 22:30, Eric Joyner wrote: > > > > > Hi freebsd-net, > > > > > > > > > > We've encountered an issue with unloading the iavf(4) driver on > FreeBSD > > > > > 12.1 (and stable). On a VM with two iavf(4) interfaces, if we send > heavy > > > > > traffic to iavf1 and try to kldunload the driver, the kldunload > process > > > > > hangs on iavf0 until iavf1 stops receiving traffic. > > > > > > > > > > After some debugging, it looks like epoch_drain_callbacks() [via > > > > > if_detach_internal()] tries to switch CPUs to run on one that > iavf1 is > > > > > using for RX processing, but since iavf1 is busy, it can't make the > > > > switch, > > > > > so cpu_switch() just hangs and nothing happens until iavf1's RX > thread > > > > > stops being busy. > > > > > > > > > > I can work around this by inserting a kern_yield(PRI_USER) > somewhere in > > > > one > > > > > of the iavf txrx functions that iflib calls into (e.g. > > > > > iavf_isc_rxd_available), but that's not a proper fix. Does anyone > know > > > > what > > > > > to do to prevent this from happening? > > > > > > > > > > Wildly guessing, does maybe epoch_drain_callbacks() need a higher > > > > priority > > > > > than the PI_SOFT used in the group taskqueues used in iflib's RX > > > > processing? > > > > > > > > > > > > > Hi, > > > > > > > > Which scheduler is this? ULE or BSD? > > > > > > > > EPOCH(9) expects some level of round-robin scheduling on the same > > > > priority level. Setting a higher priority on EPOCH(9) might cause > epoch > > > > to start spinning w/o letting the lower priority thread which holds > the > > > > EPOCH() section to finish. > > > > > > > > --HPS > > > > > > > > > > > Hi Hans, > > > > > > kern.sched.name gives me "ULE" > > > > > > > Hi Eric, > > > > epoch_drain_callbacks() depends on that epoch_call_task() gets execution > > which is executed from a GTASKQUEUE at PI_SOFT. Also > epoch_drain_callbacks() > > runs at the priority of the calling thread, and if this is lower than > > PI_SOFT, and a gtaskqueue is spinning heavily, then that won't work. > > I think we can fix this so that the epoch_drain_callbacks() caller's > priority does not matter. Eric, can you try the patch below? It is a > bit hacky, but the idea is to ensure that all pending callbacks can be > invoked, and then wait for each cb task to run though at least once. In > your case I think the callback task threads should still be able to run. > > > For a single CPU system you will be toast in this situation regardless if > > there is no free time on a CPU for EPOCH(). > > > > In general if epoch_call_task() doesn't get execution time, you will > have a > > problem. > > That's not the issue here, though there is at least one related problem: > if a thread in an epoch read section is preempted and the CPU is > consumed by a high priority receive processing thread, for example due > to a DOS, we have no mechanism to boost the priority of the preempted > thread except by calling epoch_synchronize_wait(). In other words, > nothing guarantees that deferred callbacks will eventually get executed > in such a scenario. > > diff --git a/sys/kern/subr_epoch.c b/sys/kern/subr_epoch.c > index 0a477e1d6f7b..4ed7e7e11a3e 100644 > --- a/sys/kern/subr_epoch.c > +++ b/sys/kern/subr_epoch.c > @@ -74,15 +74,19 @@ typedef struct epoch_record { > volatile struct epoch_tdlist er_tdlist; > volatile uint32_t er_gen; > uint32_t er_cpuid; > + int er_drain_state; > } __aligned(EPOCH_ALIGN) *epoch_record_t; > > +#define EPOCH_DRAIN_START 2 > +#define EPOCH_DRAIN_RUNNING 1 > +#define EPOCH_DRAIN_DONE 0 > + > struct epoch { > struct ck_epoch e_epoch __aligned(EPOCH_ALIGN); > epoch_record_t e_pcpu_record; > int e_idx; > int e_flags; > struct sx e_drain_sx; > - struct mtx e_drain_mtx; > volatile int e_drain_count; > const char *e_name; > }; > @@ -335,7 +339,6 @@ epoch_alloc(const char *name, int flags) > epoch->e_idx = epoch_count; > epoch->e_name = name; > sx_init(&epoch->e_drain_sx, "epoch-drain-sx"); > - mtx_init(&epoch->e_drain_mtx, "epoch-drain-mtx", NULL, MTX_DEF); > allepochs[epoch_count++] = epoch; > return (epoch); > } > @@ -348,7 +351,6 @@ epoch_free(epoch_t epoch) > allepochs[epoch->e_idx] = NULL; > epoch_wait(global_epoch); > uma_zfree_pcpu(pcpu_zone_record, epoch->e_pcpu_record); > - mtx_destroy(&epoch->e_drain_mtx); > sx_destroy(&epoch->e_drain_sx); > free(epoch, M_EPOCH); > } > @@ -699,14 +701,24 @@ epoch_call_task(void *arg __unused) > epoch_t epoch; > ck_stack_t cb_stack; > int i, npending, total; > + bool draining; > + > + KASSERT(curthread->td_pinned > 0, > + ("%s: callback task thread is not pinned", __func__)); > > ck_stack_init(&cb_stack); > critical_enter(); > epoch_enter(global_epoch); > - for (total = i = 0; i < epoch_count; i++) { > + for (total = i = 0, draining = false; i < epoch_count; i++) { > if (__predict_false((epoch = allepochs[i]) == NULL)) > continue; > er = epoch_currecord(epoch); > + if (atomic_load_int(&er->er_drain_state) == > EPOCH_DRAIN_START) { > + atomic_store_int(&er->er_drain_state, > + EPOCH_DRAIN_RUNNING); > + draining = true; > + } > + > record = &er->er_record; > if ((npending = record->n_pending) == 0) > continue; > @@ -728,6 +740,20 @@ epoch_call_task(void *arg __unused) > next = CK_STACK_NEXT(cursor); > entry->function(entry); > } > + > + if (__predict_false(draining)) { > + epoch_enter(global_epoch); > + for (i = 0; i < epoch_count; i++) { > + if (__predict_false((epoch = allepochs[i]) == > NULL)) > + continue; > + er = epoch_currecord(epoch); > + if (atomic_load_int(&er->er_drain_state) == > + EPOCH_DRAIN_RUNNING) > + atomic_store_int(&er->er_drain_state, > + EPOCH_DRAIN_DONE); > + } > + epoch_exit(global_epoch); > + } > } > > int > @@ -769,27 +795,18 @@ in_epoch(epoch_t epoch) > } > > static void > -epoch_drain_cb(struct epoch_context *ctx) > +epoch_drain_handler(struct ck_epoch *global __unused, > + ck_epoch_record_t *cr __unused, void *arg __unused) > { > - struct epoch *epoch = > - __containerof(ctx, struct epoch_record, > er_drain_ctx)->er_parent; > - > - if (atomic_fetchadd_int(&epoch->e_drain_count, -1) == 1) { > - mtx_lock(&epoch->e_drain_mtx); > - wakeup(epoch); > - mtx_unlock(&epoch->e_drain_mtx); > - } > + maybe_yield(); > } > > void > epoch_drain_callbacks(epoch_t epoch) > { > epoch_record_t er; > - struct thread *td; > - int was_bound; > - int old_pinned; > - int old_cpu; > - int cpu; > + int cpu, state; > + bool pending; > > WITNESS_WARN(WARN_GIANTOK | WARN_SLEEPOK, NULL, > "epoch_drain_callbacks() may sleep!"); > @@ -802,45 +819,28 @@ epoch_drain_callbacks(epoch_t epoch) > return; > #endif > DROP_GIANT(); > - > sx_xlock(&epoch->e_drain_sx); > - mtx_lock(&epoch->e_drain_mtx); > > - td = curthread; > - thread_lock(td); > - old_cpu = PCPU_GET(cpuid); > - old_pinned = td->td_pinned; > - was_bound = sched_is_bound(td); > - sched_unbind(td); > - td->td_pinned = 0; > + /* Make sure that all pending callbacks are available. */ > + ck_epoch_synchronize_wait(&epoch->e_epoch, epoch_drain_handler, > NULL); > > - CPU_FOREACH(cpu) > - epoch->e_drain_count++; > CPU_FOREACH(cpu) { > er = zpcpu_get_cpu(epoch->e_pcpu_record, cpu); > - sched_bind(td, cpu); > - epoch_call(epoch, &epoch_drain_cb, &er->er_drain_ctx); > + atomic_store_int(&er->er_drain_state, EPOCH_DRAIN_START); > + GROUPTASK_ENQUEUE(DPCPU_ID_PTR(cpu, epoch_cb_task)); > } > > - /* restore CPU binding, if any */ > - if (was_bound != 0) { > - sched_bind(td, old_cpu); > - } else { > - /* get thread back to initial CPU, if any */ > - if (old_pinned != 0) > - sched_bind(td, old_cpu); > - sched_unbind(td); > - } > - /* restore pinned after bind */ > - td->td_pinned = old_pinned; > - > - thread_unlock(td); > - > - while (epoch->e_drain_count != 0) > - msleep(epoch, &epoch->e_drain_mtx, PZERO, "EDRAIN", 0); > + do { > + pending = false; > + CPU_FOREACH(cpu) { > + er = zpcpu_get_cpu(epoch->e_pcpu_record, cpu); > + state = atomic_load_int(&er->er_drain_state); > + if (state != EPOCH_DRAIN_DONE) > + pending = true; > + } > + pause("edrain", 1); > + } while (pending); > > - mtx_unlock(&epoch->e_drain_mtx); > sx_xunlock(&epoch->e_drain_sx); > - > PICKUP_GIANT(); > } > _______________________________________________ > freebsd-net@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" > Hi Mark, I applied your patch to a FreeBSD-current VM and reran the test I was running originally, and I still see the same behavior. Your patch doesn't appear to fix the problem. - Eric
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CA%2Bb0zg-1CQ81dsNGv_O3ebLLko6Piei0A1NCPZUT5JH8EOyntw>