Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 29 Jan 2020 17:28:31 -0800
From:      Eric Joyner <erj@freebsd.org>
To:        Hans Petter Selasky <hps@selasky.org>
Cc:        freebsd-net@freebsd.org
Subject:   Re: Issue with epoch_drain_callbacks and unloading iavf(4) [using iflib]
Message-ID:  <CA%2Bb0zg-8ADHy_MhZMG_A8v9mG%2Bs=t7mvrgAnd7t6DmXJdBYt0g@mail.gmail.com>
In-Reply-To: <a6523ed6-9d61-d1b4-5822-5787cf5c0e43@selasky.org>
References:  <CAKdFRZjxp=mTkUzFU8qsacP86OQOC9vCDCQ%2BO2iF7svRRGDK8w@mail.gmail.com> <0e2e97f2-df75-3c6f-9bdd-e8c2ab7bf79e@selasky.org> <CAKdFRZi3UoRuz=OXnBG=NVcJe605x9OwrLmdCyD98mDeTpbf0Q@mail.gmail.com> <a6523ed6-9d61-d1b4-5822-5787cf5c0e43@selasky.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, Jan 29, 2020 at 5:12 PM Hans Petter Selasky <hps@selasky.org> wrote:

> On 2020-01-29 22:44, Eric Joyner wrote:
> > On Wed, Jan 29, 2020 at 1:41 PM Hans Petter Selasky <hps@selasky.org>
> wrote:
> >
> >> On 2020-01-29 22:30, Eric Joyner wrote:
> >>> Hi freebsd-net,
> >>>
> >>> We've encountered an issue with unloading the iavf(4) driver on FreeBSD
> >>> 12.1 (and stable). On a VM with two iavf(4) interfaces, if we send
> heavy
> >>> traffic to iavf1 and try to kldunload the driver, the kldunload process
> >>> hangs on iavf0 until iavf1 stops receiving traffic.
> >>>
> >>> After some debugging, it looks like epoch_drain_callbacks() [via
> >>> if_detach_internal()] tries to switch CPUs to run on one that iavf1 is
> >>> using for RX processing, but since iavf1 is busy, it can't make the
> >> switch,
> >>> so cpu_switch() just hangs and nothing happens until iavf1's RX thread
> >>> stops being busy.
> >>>
> >>> I can work around this by inserting a kern_yield(PRI_USER) somewhere in
> >> one
> >>> of the iavf txrx functions that iflib calls into (e.g.
> >>> iavf_isc_rxd_available), but that's not a proper fix. Does anyone know
> >> what
> >>> to do to prevent this from happening?
> >>>
> >>> Wildly guessing, does maybe epoch_drain_callbacks() need a higher
> >> priority
> >>> than the PI_SOFT used in the group taskqueues used in iflib's RX
> >> processing?
> >>>
> >>
> >> Hi,
> >>
> >> Which scheduler is this? ULE or BSD?
> >>
> >> EPOCH(9) expects some level of round-robin scheduling on the same
> >> priority level. Setting a higher priority on EPOCH(9) might cause epoch
> >> to start spinning w/o letting the lower priority thread which holds the
> >> EPOCH() section to finish.
> >>
> >> --HPS
> >>
> >>
> > Hi Hans,
> >
> > kern.sched.name gives me "ULE"
> >
>
> Hi Eric,
>
> epoch_drain_callbacks() depends on that epoch_call_task() gets execution
> which is executed from a GTASKQUEUE at PI_SOFT. Also
> epoch_drain_callbacks() runs at the priority of the calling thread, and
> if this is lower than PI_SOFT, and a gtaskqueue is spinning heavily,
> then that won't work.
>
> For a single CPU system you will be toast in this situation regardless
> if there is no free time on a CPU for EPOCH().
>
> In general if epoch_call_task() doesn't get execution time, you will
> have a problem.
>
> Maybe add a flag to iflib which stops the grouptask's before detaching
> the network interface?
>
> --HPS
> _______________________________________________
> freebsd-net@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org"
>

Hi Hans,

Maybe add a flag to iflib which stops the grouptask's before detaching
> the network interface?


That was something we considered, but it only seemed like that would work
for kldunload.

This would result in undesired behavior if you used "devctl detach iavf0"
to just remove iavf0, and didn't expect that to affect iavf1.

But, if stopping traffic on iavf1 in order to unload iavf0 is acceptable,
then that might be a solution.

- Eric



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CA%2Bb0zg-8ADHy_MhZMG_A8v9mG%2Bs=t7mvrgAnd7t6DmXJdBYt0g>