From owner-freebsd-net@FreeBSD.ORG Wed May 2 00:01:49 2012 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 554D0106564A for ; Wed, 2 May 2012 00:01:49 +0000 (UTC) (envelope-from barney_cordoba@yahoo.com) Received: from nm2-vm1.bullet.mail.ne1.yahoo.com (nm2-vm1.bullet.mail.ne1.yahoo.com [98.138.91.33]) by mx1.freebsd.org (Postfix) with SMTP id F21838FC0A for ; Wed, 2 May 2012 00:01:48 +0000 (UTC) Received: from [98.138.90.53] by nm2.bullet.mail.ne1.yahoo.com with NNFMP; 02 May 2012 00:01:48 -0000 Received: from [98.138.87.6] by tm6.bullet.mail.ne1.yahoo.com with NNFMP; 02 May 2012 00:01:48 -0000 Received: from [127.0.0.1] by omp1006.mail.ne1.yahoo.com with NNFMP; 02 May 2012 00:01:48 -0000 X-Yahoo-Newman-Property: ymail-3 X-Yahoo-Newman-Id: 102136.84486.bm@omp1006.mail.ne1.yahoo.com Received: (qmail 15362 invoked by uid 60001); 2 May 2012 00:01:48 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1335916908; bh=QKCJL95YLU020ONhbL3OxoLMjb0+4rbgDsanYbRZfpM=; h=X-YMail-OSG:Received:X-Mailer:Message-ID:Date:From:Subject:To:Cc:MIME-Version:Content-Type:Content-Transfer-Encoding; b=E8GJNfIbjsM62Qpm8GFP6KvoQuTGcNNrCc5bEL9hYIk0/x4MGHb9lk/3CzDjheZys9pNwk4jFYNQWlLe+fyqyODyPV0aZrMD187aoFoho5HJ4aGObp1YezaMFQl+9l3jeB6X2nJXlEh54aGacRBE3SKLoeW5fP2HF4n3zhIzCUg= DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=X-YMail-OSG:Received:X-Mailer:Message-ID:Date:From:Subject:To:Cc:MIME-Version:Content-Type:Content-Transfer-Encoding; b=vYA3hQ2RzN3nd0dYXHCz7CqTFxmU25o8Ex/Kiu81g5N4D6MPUknO9BUsd6gEBUfBgZpD80DQzhToN6YBhje6ZT3uo/Bv+cWCA/q8fBXzDCB4piYWhLSM9l9x/nRjuFGB6M6aFQC5hd8FCpTqX7JnH/RLxUdOYCdvrMRsBZGxZH8=; X-YMail-OSG: MBnoMoEVM1nHtBGiioZRf6NozHer6YsYiRQh0VK6JvcTDC1 VHewNrnyXmQd5fITYlvnFZps8p8b0hiRaCBtSU5Q66gUZ7Hux4V2VHhWS2jF SwQm_GsXPa04yiK_mfLyX_7rn.PltXQd9BiNE_IMd6OMJFz06Axb4MVvaGwc M3aIHQdTA2OeCUeto7LkUGcBSZcB62LE_NXSlyR5c0U4fQIGeGKAOJoBTtw6 5VW8KTlpcHiDKxpBj1ejZTDL2HPLJxdmeXBkmYeTgXmcazh4W0hnumErud2z R4qgSvsEgCFf1B1WZ3FzznmxjmEdu6TuPCm5uP0KzVltDRlRT7MmQj5jZ3DQ I6EUpd3Up1hl5ZguQQDiXCqHBlBAQXEROG5Fr8texpI3tkCSKbFG3ys5Qtfo EpFK0l66bArOEfYXCow.uy4lamWCw8SwGLO0CQOYIfLbqHKq1DlKoEluVozM DLRHtwmZljJdw Received: from [174.48.129.108] by web126006.mail.ne1.yahoo.com via HTTP; Tue, 01 May 2012 17:01:47 PDT X-Mailer: YahooMailClassic/15.0.6 YahooMailWebService/0.8.117.340979 Message-ID: <1335916907.11454.YahooMailClassic@web126006.mail.ne1.yahoo.com> Date: Tue, 1 May 2012 17:01:47 -0700 (PDT) From: Barney Cordoba To: Juli Mallett MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Cc: "freebsd-net@freebsd.org" , Sean Bruno Subject: Re: igb(4) at peak in big purple X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 02 May 2012 00:01:49 -0000 =0A--- On Tue, 5/1/12, Juli Mallett wrote:=0A=0A> Fr= om: Juli Mallett =0A> Subject: Re: igb(4) at peak in = big purple=0A> To: "Barney Cordoba" =0A> Cc: "Sea= n Bruno" , "freebsd-net@freebsd.org" =0A> Date: Tuesday, May 1, 2012, 5:50 PM=0A> Hey Barney,=0A> =0A>= On Tue, May 1, 2012 at 11:13, Barney Cordoba =0A= > wrote:=0A> > --- On Fri, 4/27/12, Juli Mallett =0A>= wrote:=0A> > > [Tricking Intel's cards into giving something like=0A> roun= d-robin packet=0A> > >=C2=A0 delivery to multiple queues.=C2=A0 ]=0A> >=0A>= > That seems like a pretty naive approach. First, you=0A> want all of the = packets in the same flows/connections to use=0A> the same channels, otherwi= se you'll=0A> > be sending a lot of stuff out of sequence.=0A> =0A> I would= n't call it naive, I'd call it "oblivious".=C2=A0 I=0A> feel like I went=0A= > to some lengths to indicate that it was not the right=0A> solution to man= y=0A> problems, but that it was a worthwhile approach in the case=0A> where= one=0A> doesn't care about anything but evenly distributing packets=0A> by= number=0A> (although size is also possible, by using a size-based=0A> wate= rmark=0A> rather than a count-based one) to as many queues as=0A> possible.= =C2=A0 Not=0A> every application requires in-sequence packets (indeed,=0A> = out-of-sequence traffic can be a problem even with flow=0A> affinity=0A> ap= proaches.)=0A> =0A> My note was simply about the case where you need to eve= nly=0A> saturate=0A> queues to divide up the work as much as possible, on= =0A> hardware that=0A> doesn't make it possible to get the behavior you wan= t=0A> (round-robin by=0A> packet) for that case.=C2=A0 Intel's hardware has= the=0A> redirection table,=0A> which makes it possible (with a very applic= ation-aware=0A> approach that=0A> is anything but naive) to get functionali= ty from the=0A> hardware that=0A> isn't otherwise available at a low-level.= =C2=A0 Few of the=0A> things you=0A> assert are better are available from I= ntel's cards =E2=80=94 if=0A> you want to=0A> talk about optimal hardware m= ulti-queue strategies, or=0A> queue-splitting=0A> in software, that's a goo= d conversation to have and this may=0A> even be=0A> the right list, but I'd= encourage you to just build your own=0A> silicon=0A> or use something with= programmable firmware.=C2=A0 For those=0A> of us saddled=0A> with Intel NI= Cs, it's useful to share information on how to=0A> get=0A> behavior that ma= y be desirable (and I promise you it is for=0A> a large=0A> class of applic= ations) but not marketed :)=0A> =0A> > You want to balance your flows,=0A> = > yes, but not balance based on packets, unless all of=0A> your traffic is = icmp.=0A> > You also want to balance bits, not packets; sending 50=0A> 60 b= yte packets=0A> > to queue 1 and 50 1500 byte packets to queue 2 isn't=0A> = balancing. They'll=0A> > be wildly out of order as well.=0A> =0A> This is w= here the obliviousness is useful.=C2=A0 Traffic has=0A> its own=0A> statist= ical distributions in terms of inter-packet gaps,=0A> packet sizes,=0A> etc= .=C2=A0 Assume your application just keeps very accurate=0A> counters of ho= w=0A> many packets have been seen with each Ethernet protocol=0A> type.=C2= =A0 This is=0A> a reasonable approximation of some real applications that= =0A> are=0A> interesting and that people use FreeBSD for.=C2=A0 You don't= =0A> care how big=0A> the packets are, assuming your memory bandwidth is in= finite=0A> (or at=0A> least greater than what you need) =E2=80=94 you just = want to be=0A> sure to see=0A> each one of them, and that means making the = most of the=0A> resources you=0A> have to ensure that even under peak loads= you cannot=0A> possibly drop any=0A> traffic.=0A> =0A> Again, not every ap= plication is like that, and there's a=0A> reason I=0A> didn't post a patch = and encourage the premature-tuning crowd=0A> to give=0A> this sort of thing= a try.=C2=A0 When you don't care about=0A> distributing=0A> packets evenly= by size, you want an algorithm that doesn't=0A> factor them=0A> in.=C2=A0 = Also, I've had the same concern that you now have=0A> previously, and=0A> i= n my experience it's mostly magical thinking.=C2=A0 With=0A> many kinds of= =0A> application and many kinds of real-world traffic it really=0A> doesn't= =0A> matter, even if in theory it's a possibility.=C2=A0 There's=0A> no uni= versal=0A> solution to packet capture that's going to be optimal for=0A> ev= ery=0A> application.=0A> =0A> > Also, using as many cores as possible isn't= necessarily=0A> what you want to=0A> > do, depending on your architecture.= =0A> =0A> I think Sean and I, at least, know that, and it's a point=0A> tha= t I have=0A> gone on about at great length when people endorse the=0A> go-f= aster=0A> stripes of using as many cores as possible, rather than as=0A> ma= ny cores=0A> as necessary.=0A> =0A> > If you have 8 cores on 2 cpus, then y= ou=0A> > =C2=A0probable want to do all of your networking on four=0A> cores= on one cpu.=0A> > There's a big price to pay to shuffle memory between=0A>= caches of separate=0A> > cpus, splitting transactions that use the same me= mory=0A> space is=0A> > counterproductive.=0A> =0A> Not necessarily =E2=80= =94 you may not need to split transactions=0A> with all=0A> kinds of applic= ations.=0A> =0A> > More =C2=A0queues mean more locks, and in the end, lock= =0A> contention is your biggest enemy, not cpu cycles.=0A> =0A> Again, this= depends on your application, and that's a very=0A> naive=0A> assertion :)= =C2=A0 Lock contention may be your biggest=0A> enemy, but it's only=0A> occ= asionally mine :)=0A> =0A> > The idea that splitting packets that use the s= ame=0A> memory and code space=0A> > among cpus isn't a very good one; a bet= ter approach,=0A> assuming you can=0A> =0A> You're making an assumption tha= t wasn't part of the=0A> conversation,=0A> though.=C2=A0 Who said anything = about using the same=0A> memory?=0A> =0A> > micromanage, is to allocate X c= ores (as much as you=0A> need for your peaks)=0A> > to networking, and use = other cores for user space to=0A> minimize the=0A> > interruptions.=0A> =0A= > Who said anything about user space?=C2=A0 :)=0A> =0A> And actually, this = is wrong in even those applications where=0A> it's=0A> right that you need = to dedicate some cores for networking,=0A> too.=C2=A0 In my=0A> experience,= it's much better to have the control path stuff=0A> on the=0A> same cores = you're handling interrupts on if you're using=0A> something=0A> like netmap= .=C2=A0 Interrupts kill the cores that are doing=0A> real work with=0A> eac= h packet.=0A> =0A> Thanks,=0A> Juli.=0A> =0AWell he said it was not causing= issues, so it seems to me to give him =0Aa hack that's likely to be less e= fficient overall isn't the right answer.=0AHis lopsidedness is not normal.= =0A=0AMake sure the interrupt moderation is tuned properly, it can make a h= uge=0Adifference. Interrupts on intel devices are really just polls; you ca= n set=0Athe poll to any interval you want.=0A=0AI'd be interested in seeing= the usage numbers with and without the hack.=0AIntel's hashing gives prett= y even distribution on a router or bridge; =0Athe only time you'd see a rea= lly lopsided distribution would be if you =0Awere running a traffic generat= or with a small number of flows. The answer=0Ais to use more flows in that = case. The same client/server pair is always=0Agoing to use the same queue.= =0A=0ABC=0A