Date: Sat, 19 Jan 2013 11:14:29 -0500 From: John Baldwin <jhb@freebsd.org> To: Barney Cordoba <barney_cordoba@yahoo.com> Cc: freebsd-net@freebsd.org, Adrian Chadd <adrian@freebsd.org>, Luigi Rizzo <rizzo@iet.unipi.it> Subject: Re: two problems in dev/e1000/if_lem.c::lem_handle_rxtx() Message-ID: <201301191114.29959.jhb@freebsd.org> In-Reply-To: <1358610450.75691.YahooMailClassic@web121604.mail.ne1.yahoo.com> References: <1358610450.75691.YahooMailClassic@web121604.mail.ne1.yahoo.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Saturday, January 19, 2013 10:47:30 AM Barney Cordoba wrote: > --- On Fri, 1/18/13, John Baldwin <jhb@freebsd.org> wrote: > > From: John Baldwin <jhb@freebsd.org> > > Subject: Re: two problems in dev/e1000/if_lem.c::lem_handle_rxtx() > > To: freebsd-net@freebsd.org > > Cc: "Barney Cordoba" <barney_cordoba@yahoo.com>, "Adrian Chadd" > > <adrian@freebsd.org>, "Luigi Rizzo" <rizzo@iet.unipi.it> Date: Friday, > > January 18, 2013, 11:49 AM > > On Friday, January 18, 2013 9:30:40 > > > > am Barney Cordoba wrote: > > > --- On Thu, 1/17/13, Adrian Chadd <adrian@freebsd.org> > > > > wrote: > > > > From: Adrian Chadd <adrian@freebsd.org> > > > > Subject: Re: two problems in > > > > dev/e1000/if_lem.c::lem_handle_rxtx() > > > > > > To: "Barney Cordoba" <barney_cordoba@yahoo.com> > > > > Cc: "Luigi Rizzo" <rizzo@iet.unipi.it>, > > > > freebsd-net@freebsd.org > > > > > > Date: Thursday, January 17, 2013, 11:48 AM > > > > There's also the subtle race > > > > condition in TX and RX handling that > > > > re-queuing the taskqueue gets around. > > > > > > > > Which is: > > > > > > > > * The hardware is constantly receiving frames , > > > > right until > > > > > > you blow > > > > the FIFO away by filling it up; > > > > * The RX thread receives a bunch of frames; > > > > * .. and processes them; > > > > * .. once it's done processing, the hardware may > > > > have read > > > > > > some more > > > > frames in the meantime; > > > > * .. and the hardware may have generated a > > > > mitigated > > > > > > interrupt which > > > > you're ignoring, since you're processing frames; > > > > * So if your architecture isn't 100% paranoid, you > > > > may end > > > > > > up having > > > > to wait for the next interrupt to handle what's > > > > currently in > > > > > > the > > > > queue. > > > > > > > > Now if things are done correct: > > > > > > > > * The hardware generates a mitigated interrupt > > > > * The mask register has that bit disabled, so you > > > > don't end > > > > > > up receiving it; > > > > * You finish your RX queue processing, and there's > > > > more > > > > > > stuff that's > > > > appeared in the FIFO (hence why the hardware has > > > > generated > > > > > > another > > > > mitigated interrupt); > > > > * You unmask the interrupt; > > > > * .. and the hardware immediately sends you the > > > > MSI or > > > > > > signals an interrupt; > > > > * .. thus you re-enter the RX processing thread > > > > almost(!) > > > > > > immediately. > > > > > > > > However as the poster(s) have said, the interrupt > > > > mask/unmask in the > > > > intel driver(s) may not be 100% correct, so you're > > > > going to > > > > > > end up > > > > with situations where interrupts are missed. > > > > > > > > The reason why this wasn't a big deal in the > > > > deep/distant > > > > > > past is > > > > because we didn't used to have kernel preemption, > > > > or > > > > > > multiple kernel > > > > threads running, or an overly aggressive scheduler > > > > trying > > > > > > to > > > > parallelise things as much as possible. A lot of > > > > net80211/ath bugs > > > > have popped out of the woodwork specifically > > > > because of the > > > > > > above > > > > changes to the kernel. They were bugs before, but > > > > people > > > > > > didn't hit > > > > them. > > > > > > I don't see the distinction between the rx thread > > > > getting re-scheduled > > > > > "immediately" vs introducing another thread. In fact > > > > you increase missed > > > > > interrupts by this method. The entire point of > > > > interrupt moderation is > > > > > to tune the intervals where a driver is processed. > > > > > > You might as well just not have a work limit and > > > > process until your done. > > > > > The idea that "gee, I've been taking up too much cpu, > > > > I'd better yield" > > > > > to just queue a task and continue soon after doesn't > > > > make much sense to > > > > > me. > > > > If there are multiple threads with the same priority then > > batching the work up > > into chunks allows the scheduler to round-robin among > > them. However, when a > > task requeues itself that doesn't actually work since the > > taskqueue thread > > will see the requeued task before it yields the CPU. > > Alternatively, if you > > force all the relevant interrupt handlers to use the same > > thread pool and > > instead of requeueing a separate task you requeue your > > handler in the ithread > > pool then you can get the desired round-robin > > behavior. (I have changes to > > the ithread stuff that get us part of the way there in that > > handlers can > > reschedule themselves and much of the plumbing is in place > > for shared thread > > pools among different interrupts.) > > I dont see any "round robin" effect here. You have: > > Repeat: > - Process 100 frames > if (more) > - Queue a Task > > there's only 1 task at a time. All its really doing is yielding and > rescheduling itself to resume the loop. As I said above, in the current e1000 drivers using private taskqueues where the taskqueue thread priority is the same as the ithread priority, the round- robin doesn't really work because the taskqueue thread doesn't yield when the task is rescheduled since it will see the new task and go run it instead of yielding. However, I did describe an alternate setup where you can fix this. Part of the key is to get various NICs to share a single logical queue of tasks. You could simulate this now by having all the deferred tasks share a single taskqueue with a pool of tasks, but that will still not fully cooperate with ithreads. To do that you have to get the interrupt handlers themselves into the shared taskqueue. Some changes I have in a p4 branch allow you to do that by letting interrupt handlers reschedule themselves (avoiding the need for a separate task and preventing the task from running concurrently with the interrupt handler) and providing some (but not yet all) of the framework to allow multiple devices to share a single work queue backed by a shared pool of threads. -- John Baldwin
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201301191114.29959.jhb>