Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 21 Nov 2012 10:26:01 -0800
From:      Adrian Chadd <adrian@freebsd.org>
To:        Andre Oppermann <andre@freebsd.org>
Cc:        Barney Cordoba <barney_cordoba@yahoo.com>, Jim Thompson <jim@netgate.com>, Alfred Perlstein <bright@mu.org>, khatfield@socllc.net, "freebsd-net@freebsd.org" <freebsd-net@freebsd.org>
Subject:   Re: FreeBSD boxes as a 'router'...
Message-ID:  <CAJ-VmonwRD1CuPCoLPLQBJQtOducoWy7giC5mbFJe2BsbrUx0w@mail.gmail.com>
In-Reply-To: <50AC910C.4030004@freebsd.org>
References:  <1353448328.76219.YahooMailClassic@web121602.mail.ne1.yahoo.com> <E1F4816E-676C-4630-9FA1-817F737D007D@netgate.com> <50AC08EC.8070107@mu.org> <832757660.33924.1353460119408@238ae4dab3b4454b88aea4d9f7c372c1.nuevasync.com> <CAJ-Vmok8Ybdi%2BY8ZguMTKC7%2BF5=OxVDog27i4UgY-s3MCZkGcQ@mail.gmail.com> <250266404.35502.1353464214924@238ae4dab3b4454b88aea4d9f7c372c1.nuevasync.com> <50AC8393.3060001@freebsd.org> <CAJ-VmomCxSzTzwi8QxzW8_%2BaMT2DnmRxSGaau=1RWFGP8XBmMQ@mail.gmail.com> <50AC910C.4030004@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On 21 November 2012 00:30, Andre Oppermann <andre@freebsd.org> wrote:
> On 21.11.2012 08:55, Adrian Chadd wrote:
>>
>> Something that has popped up a few times, even recently, is breaking
>> out of an RX loop after you service a number of frames.
>
> That is what I basically described.

Right, and this can be done right now without too much reworking,
right? I mean, people could begin by doing a drive-by on drivers for
this.
The RX path for a driver shouldn't be too difficult to do; the TX path
is the racy one.

>> During stupidly high levels of RX, you may find the NIC happily
>> receiving frames faster than you can service the RX queue. If this
>> occurs, you could end up just plain being stuck there.

> That's the live-lock.

And again you can solve this without having to devolve into polling.
Again, polling to me feels like a bludgeon beating around a system
that isn't really designed for the extreme cases it's facing.
Maybe your work in the tcp_taskqueue branch addresses the larger scale
issues here, but I've solved this relatively easily in the past.

>> So what I've done in the past is to loop over a certain number of
>> frames, then schedule a taskqueue to service whatever's left over.

> Taskqueue's shouldn't be used anymore.  We've got ithreads now.
> Contrary to popular belief (and due to poor documentation) an
> ithread does not run at interrupt level.  Only the fast interrupt
> handler does that.  The ithread is a normal kernel thread tied to
> an fast interrupt handler and trailing it whenever it said
> INTR_SCHEDULE_ITHREAD.

Sure, but taskqueues are still useful if you want to serialise access
without relying on mutexes wrapping large parts of the packet handling
code to enforce said order.

Yes, normal ithreads don't run at interrupt level.

And we can change the priority of taskqueues in each driver, right?
And/or we could change the behaviour of driver ithreads/taskqueues to
be automatically reniced?

I'm not knocking your work here, I'm just trying to understand whether
we can do this stuff as small individual pieces of work rather than
one big subsystem overhaul.

And CoDel is interesting as a concept, but it's certainly not new. But
again, if you don't drop the frames during the driver receive path
(and try to do it higher up in the stack, eg as part of some firewall
rule) you still risk reaching a stable state where the CPU is 100%
pinned because you've wasted cycles pushing those frames into the
queue only to be dropped.

What _I_ had to do there was have a quick gate to look up if a frame
was part of an active session in ipfw and if it was, let it be queued
to the driver. I also had a second gate in the driver for new TCP
connections, but that was a separate hack. Anything else was dropped.

In any case, what I'm trying to say is this - when I was last doing
this kind of stuff, I didn't just subscribe to "polling will fix all."
I spent a few months knee deep in the public intel e1000 documentation
and tuning guide, the em driver and the queue/firewall code, in order
to figure out how to attack this without using polling.

And yes, you've also just described NAPI. :-)




Adrian



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAJ-VmonwRD1CuPCoLPLQBJQtOducoWy7giC5mbFJe2BsbrUx0w>