From owner-freebsd-net@FreeBSD.ORG Thu Nov 22 14:14:14 2012 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 10FFB9D for ; Thu, 22 Nov 2012 14:14:14 +0000 (UTC) (envelope-from barney_cordoba@yahoo.com) Received: from nm36-vm0.bullet.mail.ne1.yahoo.com (nm36-vm0.bullet.mail.ne1.yahoo.com [98.138.229.112]) by mx1.freebsd.org (Postfix) with ESMTP id 7CDC28FC0C for ; Thu, 22 Nov 2012 14:14:13 +0000 (UTC) Received: from [98.138.226.178] by nm36.bullet.mail.ne1.yahoo.com with NNFMP; 22 Nov 2012 14:11:03 -0000 Received: from [98.138.89.244] by tm13.bullet.mail.ne1.yahoo.com with NNFMP; 22 Nov 2012 14:11:03 -0000 Received: from [127.0.0.1] by omp1058.mail.ne1.yahoo.com with NNFMP; 22 Nov 2012 14:11:03 -0000 X-Yahoo-Newman-Property: ymail-3 X-Yahoo-Newman-Id: 634641.87417.bm@omp1058.mail.ne1.yahoo.com Received: (qmail 18851 invoked by uid 60001); 22 Nov 2012 14:11:03 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1353593463; bh=0if7lDaF0RaCATGIX9JZUAYAq/MlC1GCdLqhsOK09xg=; h=X-YMail-OSG:Received:X-Rocket-MIMEInfo:X-Mailer:Message-ID:Date:From:Subject:To:Cc:In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding; b=jhS+tQ5pchflEl2ldTDeEwmbcaK8z9G8+zBVUd829wuW59rJEhuYPW8ceYbtnQczzAuEWZI1qT3uHZVC9LMtg5I0GwzWYXkQJpyS0+GKoA5dhaNJzRFZ2o5HxLVQO2J1WZ9wBL+VDnV5MQ9pMU0EeAEu94TGIcid+OTh00RUUGI= DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=X-YMail-OSG:Received:X-Rocket-MIMEInfo:X-Mailer:Message-ID:Date:From:Subject:To:Cc:In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding; b=0gBateqoceRr3rnrY3OszMnTz4beJJC0SkO9siVU/noEwdGA9MFADMJQPAJrX7wNOHRI5AkPtYKLnjGKWmKwXREkL5T63z88fHowRDAWKmxC1c4PmZK3DgwsSX2BVjSXpKjdj8NdzY9fpIRaMHlmc8C63z+SBNMn51EN6xdaZWQ=; X-YMail-OSG: hZk9OesVM1mhBJ3_77k2S1DKwZ7ZCMp7pRQsbVq68sUORcO 4ym9JkLoiJ8uMncxwoTN0HTxiZLlmrLzre.LiSYJ1jq1Nj3dmqutdylbDROC R4VaHb92moPCIzRsH57m069ktkJlbC7yA4xS2rc8SG2J5hbmnPh2EvhNh78T GYFCPzoPVZGFyU2gQIA_52EDEYj9MwDmZ1qyaQBZdZkX0fzr0CoomEzHRG3q mdmJ0lkcG.o9bj3BgqA721NXcGEOhEeHH8cLsYDR1M7V3SxFKrC9Vpw.xiiE UYkAQcSNzx.xqhAFHEFaoTFcS5E6tqqeBTzyT2Kk8NmxsC2dh_x3f6W9akuT 6K2Fxa.EmqbwEGCxLqHCIlI05JxHOTZs9bi6w4FqhsgIA4j0.q2npW_eRZiQ PVohfoPD2bcSEHo3eUvVgjigg2xDXCT7NX2zMK7dTeOg2kQpnWdM2encgz6X h70uhw1tF5lsIsa0oUZlR7fbKOYp7wN8eQCsthDLzBgHhFgHCpRsOh6znLKQ AM2sTLo3vofSJXmAotA-- Received: from [174.48.128.27] by web121601.mail.ne1.yahoo.com via HTTP; Thu, 22 Nov 2012 06:11:02 PST X-Rocket-MIMEInfo: 001.001, CgotLS0gT24gV2VkLCAxMS8yMS8xMiwgQWRyaWFuIENoYWRkIDxhZHJpYW5AZnJlZWJzZC5vcmc.IHdyb3RlOgoKPiBGcm9tOiBBZHJpYW4gQ2hhZGQgPGFkcmlhbkBmcmVlYnNkLm9yZz4KPiBTdWJqZWN0OiBSZTogRnJlZUJTRCBib3hlcyBhcyBhICdyb3V0ZXInLi4uCj4gVG86ICJBbmRyZSBPcHBlcm1hbm4iIDxhbmRyZUBmcmVlYnNkLm9yZz4KPiBDYzogIkJhcm5leSBDb3Jkb2JhIiA8YmFybmV5X2NvcmRvYmFAeWFob28uY29tPiwgIkppbSBUaG9tcHNvbiIgPGppbUBuZXRnYXRlLmNvbT4sICJBbGZyZWQBMAEBAQE- X-Mailer: YahooMailClassic/15.0.8 YahooMailWebService/0.8.123.460 Message-ID: <1353593462.18740.YahooMailClassic@web121601.mail.ne1.yahoo.com> Date: Thu, 22 Nov 2012 06:11:02 -0800 (PST) From: Barney Cordoba Subject: Re: FreeBSD boxes as a 'router'... To: Andre Oppermann , Adrian Chadd In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable Cc: "freebsd-net@freebsd.org" , Jim Thompson , Alfred Perlstein , khatfield@socllc.net X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 22 Nov 2012 14:14:14 -0000 =0A=0A--- On Wed, 11/21/12, Adrian Chadd wrote:=0A=0A>= From: Adrian Chadd =0A> Subject: Re: FreeBSD boxes as = a 'router'...=0A> To: "Andre Oppermann" =0A> Cc: "Barney= Cordoba" , "Jim Thompson" , "Al= fred Perlstein" , khatfield@socllc.net, "freebsd-net@freebsd= .org" =0A> Date: Wednesday, November 21, 2012, 1:2= 6 PM=0A> On 21 November 2012 00:30, Andre=0A> Oppermann = =0A> wrote:=0A> > On 21.11.2012 08:55, Adrian Chadd wrote:=0A> >>=0A> >> So= mething that has popped up a few times, even=0A> recently, is breaking=0A> = >> out of an RX loop after you service a number of=0A> frames.=0A> >=0A> > = That is what I basically described.=0A> =0A> Right, and this can be done ri= ght now without too much=0A> reworking,=0A> right? I mean, people could beg= in by doing a drive-by on=0A> drivers for=0A> this.=0A> The RX path for a d= river shouldn't be too difficult to do;=0A> the TX path=0A> is the racy one= .=0A> =0A> >> During stupidly high levels of RX, you may find the=0A> NIC h= appily=0A> >> receiving frames faster than you can service the RX=0A> queue= . If this=0A> >> occurs, you could end up just plain being stuck=0A> there.= =0A> =0A> > That's the live-lock.=0A> =0A> And again you can solve this wit= hout having to devolve into=0A> polling.=0A> Again, polling to me feels lik= e a bludgeon beating around a=0A> system=0A> that isn't really designed for= the extreme cases it's=0A> facing.=0A> Maybe your work in the tcp_taskqueu= e branch addresses the=0A> larger scale=0A> issues here, but I've solved th= is relatively easily in the=0A> past.=0A> =0A> >> So what I've done in the = past is to loop over a=0A> certain number of=0A> >> frames, then schedule a= taskqueue to service=0A> whatever's left over.=0A> =0A> > Taskqueue's shou= ldn't be used anymore.=A0 We've got=0A> ithreads now.=0A> > Contrary to pop= ular belief (and due to poor=0A> documentation) an=0A> > ithread does not r= un at interrupt level.=A0 Only the=0A> fast interrupt=0A> > handler does th= at.=A0 The ithread is a normal kernel=0A> thread tied to=0A> > an fast inte= rrupt handler and trailing it whenever it=0A> said=0A> > INTR_SCHEDULE_ITHR= EAD.=0A> =0A> Sure, but taskqueues are still useful if you want to=0A> seri= alise access=0A> without relying on mutexes wrapping large parts of the=0A>= packet handling=0A> code to enforce said order.=0A> =0A> Yes, normal ithre= ads don't run at interrupt level.=0A> =0A> And we can change the priority o= f taskqueues in each driver,=0A> right?=0A> And/or we could change the beha= viour of driver=0A> ithreads/taskqueues to=0A> be automatically reniced?=0A= =0AWhy schedule a taskqueue? You're just adding more work to a system=0Atha= t's already overloaded. You'll get another interrupt soon enough.=0AYou can= control the delay, to simulate a "poll" without having to =0Aadd yet-anoth= er task to the system.=0A=0AThe idea that you're getting so many packets th= at the system can't handle=0Ait, and that you have to schedule a task becau= se you might not get =0Aanother interrupt is just bad thinking for anything= other than an end=0Auser application, in which case this conversation isn'= t relevant. =0A> =0A> I'm not knocking your work here, I'm just trying to= =0A> understand whether=0A> we can do this stuff as small individual pieces= of work=0A> rather than=0A> one big subsystem overhaul.=0A> =0A> And CoDel= is interesting as a concept, but it's certainly=0A> not new. But=0A> again= , if you don't drop the frames during the driver=0A> receive path=0A> (and = try to do it higher up in the stack, eg as part of some=0A> firewall=0A> ru= le) you still risk reaching a stable state where the CPU=0A> is 100%=0A> pi= nned because you've wasted cycles pushing those frames=0A> into the=0A> que= ue only to be dropped.=0A=0A=0AQueue althorithms that assume that all netwo= rk applications are the=0Asame are to be put on the heap with isdn and ATM = and other stupid ideas=0Adesigned by IETF "thinkers". =0A=0AThe design goal= should be to avoid queuing; and drop events are usually=0Anot part of a no= rmal flow; the concept that you can have a nice algorithm=0Ato handle it as= sumes that you are trying to do too much with a too slow=0Acpu. Crap design= ed by Cisco exists only because their hardware never had=0Aenough CPU to do= the work needed to be done. =0A=0A=0A> What _I_ had to do there was have a= quick gate to look up if=0A> a frame=0A> was part of an active session in = ipfw and if it was, let it=0A> be queued=0A> to the driver. I also had a se= cond gate in the driver for=0A> new TCP=0A> connections, but that was a sep= arate hack. Anything else was=0A> dropped.=0A> =0A> In any case, what I'm t= rying to say is this - when I was=0A> last doing=0A> this kind of stuff, I = didn't just subscribe to "polling will=0A> fix all."=0A> I spent a few mont= hs knee deep in the public intel e1000=0A> documentation=0A> and tuning gui= de, the em driver and the queue/firewall code,=0A> in order=0A> to figure o= ut how to attack this without using polling.=0A> =0A> And yes, you've also = just described NAPI. :-)=0A> =0A=0AIm not sure what you're doing that would= cause packets to come in faster=0Athan you can service them, unless you're= running on an old XT or something.=0AA modern $300 cpu can manage an awful= lot of packets, depending on your=0Aapplication.=0A=0APackets are like cus= tomers. Sometimes you have to let them go. Its fairly=0Aeasy to determine w= hat a given system running a given application can =0Ahandle. If you get mo= re than that you have little chance of figuring out=0Aa scheme to manage it= .=0A=0AIf you're running an embedded app and you dont have the option of si= mply=0Agetting a faster machine, then you just have to set a threshold and = deal=0Awith it. You can try to be "smart" and peek at packets and drop "les= s=0Aimportant" packets, but in my experience the smarter you try to be, the= =0Adumber you turn out to be.=0A=0Awith modern cpus with big caches, the b= ottlenecks are almost always locking=0Aand not queuing or memory shuffling,= assuming you're not running on a=0Asingle core system. So design according= ly.=0A=0AUnfortunately the multiqueue drivers in FreeBSD aren't usable, so = until=0Asomeone figures out a proper design that just doesn't suck up more = cores=0Awith marginal if any gains in capacity, you're stuck =0A=0ABC=0A=0A