Date: Thu, 30 Jan 2020 02:12:05 +0100 From: Hans Petter Selasky <hps@selasky.org> To: Eric Joyner <erj@freebsd.org> Cc: freebsd-net@freebsd.org Subject: Re: Issue with epoch_drain_callbacks and unloading iavf(4) [using iflib] Message-ID: <a6523ed6-9d61-d1b4-5822-5787cf5c0e43@selasky.org> In-Reply-To: <CAKdFRZi3UoRuz=OXnBG=NVcJe605x9OwrLmdCyD98mDeTpbf0Q@mail.gmail.com> References: <CAKdFRZjxp=mTkUzFU8qsacP86OQOC9vCDCQ%2BO2iF7svRRGDK8w@mail.gmail.com> <0e2e97f2-df75-3c6f-9bdd-e8c2ab7bf79e@selasky.org> <CAKdFRZi3UoRuz=OXnBG=NVcJe605x9OwrLmdCyD98mDeTpbf0Q@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On 2020-01-29 22:44, Eric Joyner wrote: > On Wed, Jan 29, 2020 at 1:41 PM Hans Petter Selasky <hps@selasky.org> wrote: > >> On 2020-01-29 22:30, Eric Joyner wrote: >>> Hi freebsd-net, >>> >>> We've encountered an issue with unloading the iavf(4) driver on FreeBSD >>> 12.1 (and stable). On a VM with two iavf(4) interfaces, if we send heavy >>> traffic to iavf1 and try to kldunload the driver, the kldunload process >>> hangs on iavf0 until iavf1 stops receiving traffic. >>> >>> After some debugging, it looks like epoch_drain_callbacks() [via >>> if_detach_internal()] tries to switch CPUs to run on one that iavf1 is >>> using for RX processing, but since iavf1 is busy, it can't make the >> switch, >>> so cpu_switch() just hangs and nothing happens until iavf1's RX thread >>> stops being busy. >>> >>> I can work around this by inserting a kern_yield(PRI_USER) somewhere in >> one >>> of the iavf txrx functions that iflib calls into (e.g. >>> iavf_isc_rxd_available), but that's not a proper fix. Does anyone know >> what >>> to do to prevent this from happening? >>> >>> Wildly guessing, does maybe epoch_drain_callbacks() need a higher >> priority >>> than the PI_SOFT used in the group taskqueues used in iflib's RX >> processing? >>> >> >> Hi, >> >> Which scheduler is this? ULE or BSD? >> >> EPOCH(9) expects some level of round-robin scheduling on the same >> priority level. Setting a higher priority on EPOCH(9) might cause epoch >> to start spinning w/o letting the lower priority thread which holds the >> EPOCH() section to finish. >> >> --HPS >> >> > Hi Hans, > > kern.sched.name gives me "ULE" > Hi Eric, epoch_drain_callbacks() depends on that epoch_call_task() gets execution which is executed from a GTASKQUEUE at PI_SOFT. Also epoch_drain_callbacks() runs at the priority of the calling thread, and if this is lower than PI_SOFT, and a gtaskqueue is spinning heavily, then that won't work. For a single CPU system you will be toast in this situation regardless if there is no free time on a CPU for EPOCH(). In general if epoch_call_task() doesn't get execution time, you will have a problem. Maybe add a flag to iflib which stops the grouptask's before detaching the network interface? --HPS
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?a6523ed6-9d61-d1b4-5822-5787cf5c0e43>