Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 22 Nov 2012 06:11:02 -0800 (PST)
From:      Barney Cordoba <barney_cordoba@yahoo.com>
To:        Andre Oppermann <andre@freebsd.org>, Adrian Chadd <adrian@freebsd.org>
Cc:        "freebsd-net@freebsd.org" <freebsd-net@freebsd.org>, Jim Thompson <jim@netgate.com>, Alfred Perlstein <bright@mu.org>, khatfield@socllc.net
Subject:   Re: FreeBSD boxes as a 'router'...
Message-ID:  <1353593462.18740.YahooMailClassic@web121601.mail.ne1.yahoo.com>
In-Reply-To: <CAJ-VmonwRD1CuPCoLPLQBJQtOducoWy7giC5mbFJe2BsbrUx0w@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
=0A=0A--- On Wed, 11/21/12, Adrian Chadd <adrian@freebsd.org> wrote:=0A=0A>=
 From: Adrian Chadd <adrian@freebsd.org>=0A> Subject: Re: FreeBSD boxes as =
a 'router'...=0A> To: "Andre Oppermann" <andre@freebsd.org>=0A> Cc: "Barney=
 Cordoba" <barney_cordoba@yahoo.com>, "Jim Thompson" <jim@netgate.com>, "Al=
fred Perlstein" <bright@mu.org>, khatfield@socllc.net, "freebsd-net@freebsd=
.org" <freebsd-net@freebsd.org>=0A> Date: Wednesday, November 21, 2012, 1:2=
6 PM=0A> On 21 November 2012 00:30, Andre=0A> Oppermann <andre@freebsd.org>=
=0A> wrote:=0A> > On 21.11.2012 08:55, Adrian Chadd wrote:=0A> >>=0A> >> So=
mething that has popped up a few times, even=0A> recently, is breaking=0A> =
>> out of an RX loop after you service a number of=0A> frames.=0A> >=0A> > =
That is what I basically described.=0A> =0A> Right, and this can be done ri=
ght now without too much=0A> reworking,=0A> right? I mean, people could beg=
in by doing a drive-by on=0A> drivers for=0A> this.=0A> The RX path for a d=
river shouldn't be too difficult to do;=0A> the TX path=0A> is the racy one=
.=0A> =0A> >> During stupidly high levels of RX, you may find the=0A> NIC h=
appily=0A> >> receiving frames faster than you can service the RX=0A> queue=
. If this=0A> >> occurs, you could end up just plain being stuck=0A> there.=
=0A> =0A> > That's the live-lock.=0A> =0A> And again you can solve this wit=
hout having to devolve into=0A> polling.=0A> Again, polling to me feels lik=
e a bludgeon beating around a=0A> system=0A> that isn't really designed for=
 the extreme cases it's=0A> facing.=0A> Maybe your work in the tcp_taskqueu=
e branch addresses the=0A> larger scale=0A> issues here, but I've solved th=
is relatively easily in the=0A> past.=0A> =0A> >> So what I've done in the =
past is to loop over a=0A> certain number of=0A> >> frames, then schedule a=
 taskqueue to service=0A> whatever's left over.=0A> =0A> > Taskqueue's shou=
ldn't be used anymore.=A0 We've got=0A> ithreads now.=0A> > Contrary to pop=
ular belief (and due to poor=0A> documentation) an=0A> > ithread does not r=
un at interrupt level.=A0 Only the=0A> fast interrupt=0A> > handler does th=
at.=A0 The ithread is a normal kernel=0A> thread tied to=0A> > an fast inte=
rrupt handler and trailing it whenever it=0A> said=0A> > INTR_SCHEDULE_ITHR=
EAD.=0A> =0A> Sure, but taskqueues are still useful if you want to=0A> seri=
alise access=0A> without relying on mutexes wrapping large parts of the=0A>=
 packet handling=0A> code to enforce said order.=0A> =0A> Yes, normal ithre=
ads don't run at interrupt level.=0A> =0A> And we can change the priority o=
f taskqueues in each driver,=0A> right?=0A> And/or we could change the beha=
viour of driver=0A> ithreads/taskqueues to=0A> be automatically reniced?=0A=
=0AWhy schedule a taskqueue? You're just adding more work to a system=0Atha=
t's already overloaded. You'll get another interrupt soon enough.=0AYou can=
 control the delay, to simulate a "poll" without having to =0Aadd yet-anoth=
er task to the system.=0A=0AThe idea that you're getting so many packets th=
at the system can't handle=0Ait, and that you have to schedule a task becau=
se you might not get =0Aanother interrupt is just bad thinking for anything=
 other than an end=0Auser application, in which case this conversation isn'=
t relevant. =0A> =0A> I'm not knocking your work here, I'm just trying to=
=0A> understand whether=0A> we can do this stuff as small individual pieces=
 of work=0A> rather than=0A> one big subsystem overhaul.=0A> =0A> And CoDel=
 is interesting as a concept, but it's certainly=0A> not new. But=0A> again=
, if you don't drop the frames during the driver=0A> receive path=0A> (and =
try to do it higher up in the stack, eg as part of some=0A> firewall=0A> ru=
le) you still risk reaching a stable state where the CPU=0A> is 100%=0A> pi=
nned because you've wasted cycles pushing those frames=0A> into the=0A> que=
ue only to be dropped.=0A=0A=0AQueue althorithms that assume that all netwo=
rk applications are the=0Asame are to be put on the heap with isdn and ATM =
and other stupid ideas=0Adesigned by IETF "thinkers". =0A=0AThe design goal=
 should be to avoid queuing; and drop events are usually=0Anot part of a no=
rmal flow; the concept that you can have a nice algorithm=0Ato handle it as=
sumes that you are trying to do too much with a too slow=0Acpu. Crap design=
ed by Cisco exists only because their hardware never had=0Aenough CPU to do=
 the work needed to be done. =0A=0A=0A> What _I_ had to do there was have a=
 quick gate to look up if=0A> a frame=0A> was part of an active session in =
ipfw and if it was, let it=0A> be queued=0A> to the driver. I also had a se=
cond gate in the driver for=0A> new TCP=0A> connections, but that was a sep=
arate hack. Anything else was=0A> dropped.=0A> =0A> In any case, what I'm t=
rying to say is this - when I was=0A> last doing=0A> this kind of stuff, I =
didn't just subscribe to "polling will=0A> fix all."=0A> I spent a few mont=
hs knee deep in the public intel e1000=0A> documentation=0A> and tuning gui=
de, the em driver and the queue/firewall code,=0A> in order=0A> to figure o=
ut how to attack this without using polling.=0A> =0A> And yes, you've also =
just described NAPI. :-)=0A> =0A=0AIm not sure what you're doing that would=
 cause packets to come in faster=0Athan you can service them, unless you're=
 running on an old XT or something.=0AA modern $300 cpu can manage an awful=
 lot of packets, depending on your=0Aapplication.=0A=0APackets are like cus=
tomers. Sometimes you have to let them go. Its fairly=0Aeasy to determine w=
hat a given system running a given application can =0Ahandle. If you get mo=
re than that you have little chance of figuring out=0Aa scheme to manage it=
.=0A=0AIf you're running an embedded app and you dont have the option of si=
mply=0Agetting a faster machine, then you just have to set a threshold and =
deal=0Awith it. You can try to be "smart" and peek at packets and drop "les=
s=0Aimportant" packets, but in my experience the smarter you try to be, the=
 =0Adumber you turn out to be.=0A=0Awith modern cpus with big caches, the b=
ottlenecks are almost always locking=0Aand not queuing or memory shuffling,=
 assuming you're not running on a=0Asingle core system. So design according=
ly.=0A=0AUnfortunately the multiqueue drivers in FreeBSD aren't usable, so =
until=0Asomeone figures out a proper design that just doesn't suck up more =
cores=0Awith marginal if any gains in capacity, you're stuck =0A=0ABC=0A=0A



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1353593462.18740.YahooMailClassic>