From owner-freebsd-net@FreeBSD.ORG  Wed May  2 00:01:49 2012
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 554D0106564A
	for <freebsd-net@freebsd.org>; Wed,  2 May 2012 00:01:49 +0000 (UTC)
	(envelope-from barney_cordoba@yahoo.com)
Received: from nm2-vm1.bullet.mail.ne1.yahoo.com
	(nm2-vm1.bullet.mail.ne1.yahoo.com [98.138.91.33])
	by mx1.freebsd.org (Postfix) with SMTP id F21838FC0A
	for <freebsd-net@freebsd.org>; Wed,  2 May 2012 00:01:48 +0000 (UTC)
Received: from [98.138.90.53] by nm2.bullet.mail.ne1.yahoo.com with NNFMP;
	02 May 2012 00:01:48 -0000
Received: from [98.138.87.6] by tm6.bullet.mail.ne1.yahoo.com with NNFMP;
	02 May 2012 00:01:48 -0000
Received: from [127.0.0.1] by omp1006.mail.ne1.yahoo.com with NNFMP;
	02 May 2012 00:01:48 -0000
X-Yahoo-Newman-Property: ymail-3
X-Yahoo-Newman-Id: 102136.84486.bm@omp1006.mail.ne1.yahoo.com
Received: (qmail 15362 invoked by uid 60001); 2 May 2012 00:01:48 -0000
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024;
	t=1335916908; bh=QKCJL95YLU020ONhbL3OxoLMjb0+4rbgDsanYbRZfpM=;
	h=X-YMail-OSG:Received:X-Mailer:Message-ID:Date:From:Subject:To:Cc:MIME-Version:Content-Type:Content-Transfer-Encoding;
	b=E8GJNfIbjsM62Qpm8GFP6KvoQuTGcNNrCc5bEL9hYIk0/x4MGHb9lk/3CzDjheZys9pNwk4jFYNQWlLe+fyqyODyPV0aZrMD187aoFoho5HJ4aGObp1YezaMFQl+9l3jeB6X2nJXlEh54aGacRBE3SKLoeW5fP2HF4n3zhIzCUg=
DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com;
	h=X-YMail-OSG:Received:X-Mailer:Message-ID:Date:From:Subject:To:Cc:MIME-Version:Content-Type:Content-Transfer-Encoding;
	b=vYA3hQ2RzN3nd0dYXHCz7CqTFxmU25o8Ex/Kiu81g5N4D6MPUknO9BUsd6gEBUfBgZpD80DQzhToN6YBhje6ZT3uo/Bv+cWCA/q8fBXzDCB4piYWhLSM9l9x/nRjuFGB6M6aFQC5hd8FCpTqX7JnH/RLxUdOYCdvrMRsBZGxZH8=;
X-YMail-OSG: MBnoMoEVM1nHtBGiioZRf6NozHer6YsYiRQh0VK6JvcTDC1
	VHewNrnyXmQd5fITYlvnFZps8p8b0hiRaCBtSU5Q66gUZ7Hux4V2VHhWS2jF
	SwQm_GsXPa04yiK_mfLyX_7rn.PltXQd9BiNE_IMd6OMJFz06Axb4MVvaGwc
	M3aIHQdTA2OeCUeto7LkUGcBSZcB62LE_NXSlyR5c0U4fQIGeGKAOJoBTtw6
	5VW8KTlpcHiDKxpBj1ejZTDL2HPLJxdmeXBkmYeTgXmcazh4W0hnumErud2z
	R4qgSvsEgCFf1B1WZ3FzznmxjmEdu6TuPCm5uP0KzVltDRlRT7MmQj5jZ3DQ
	I6EUpd3Up1hl5ZguQQDiXCqHBlBAQXEROG5Fr8texpI3tkCSKbFG3ys5Qtfo
	EpFK0l66bArOEfYXCow.uy4lamWCw8SwGLO0CQOYIfLbqHKq1DlKoEluVozM
	DLRHtwmZljJdw
Received: from [174.48.129.108] by web126006.mail.ne1.yahoo.com via HTTP;
	Tue, 01 May 2012 17:01:47 PDT
X-Mailer: YahooMailClassic/15.0.6 YahooMailWebService/0.8.117.340979
Message-ID: <1335916907.11454.YahooMailClassic@web126006.mail.ne1.yahoo.com>
Date: Tue, 1 May 2012 17:01:47 -0700 (PDT)
From: Barney Cordoba <barney_cordoba@yahoo.com>
To: Juli Mallett <jmallett@FreeBSD.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
Cc: "freebsd-net@freebsd.org" <freebsd-net@freebsd.org>,
	Sean Bruno <seanbru@yahoo-inc.com>
Subject: Re: igb(4) at peak in big purple
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 02 May 2012 00:01:49 -0000

=0A--- On Tue, 5/1/12, Juli Mallett <jmallett@FreeBSD.org> wrote:=0A=0A> Fr=
om: Juli Mallett <jmallett@FreeBSD.org>=0A> Subject: Re: igb(4) at peak in =
big purple=0A> To: "Barney Cordoba" <barney_cordoba@yahoo.com>=0A> Cc: "Sea=
n Bruno" <seanbru@yahoo-inc.com>, "freebsd-net@freebsd.org" <freebsd-net@fr=
eebsd.org>=0A> Date: Tuesday, May 1, 2012, 5:50 PM=0A> Hey Barney,=0A> =0A>=
 On Tue, May 1, 2012 at 11:13, Barney Cordoba <barney_cordoba@yahoo.com>=0A=
> wrote:=0A> > --- On Fri, 4/27/12, Juli Mallett <jmallett@FreeBSD.org>=0A>=
 wrote:=0A> > > [Tricking Intel's cards into giving something like=0A> roun=
d-robin packet=0A> > >=C2=A0 delivery to multiple queues.=C2=A0 ]=0A> >=0A>=
 > That seems like a pretty naive approach. First, you=0A> want all of the =
packets in the same flows/connections to use=0A> the same channels, otherwi=
se you'll=0A> > be sending a lot of stuff out of sequence.=0A> =0A> I would=
n't call it naive, I'd call it "oblivious".=C2=A0 I=0A> feel like I went=0A=
> to some lengths to indicate that it was not the right=0A> solution to man=
y=0A> problems, but that it was a worthwhile approach in the case=0A> where=
 one=0A> doesn't care about anything but evenly distributing packets=0A> by=
 number=0A> (although size is also possible, by using a size-based=0A> wate=
rmark=0A> rather than a count-based one) to as many queues as=0A> possible.=
=C2=A0 Not=0A> every application requires in-sequence packets (indeed,=0A> =
out-of-sequence traffic can be a problem even with flow=0A> affinity=0A> ap=
proaches.)=0A> =0A> My note was simply about the case where you need to eve=
nly=0A> saturate=0A> queues to divide up the work as much as possible, on=
=0A> hardware that=0A> doesn't make it possible to get the behavior you wan=
t=0A> (round-robin by=0A> packet) for that case.=C2=A0 Intel's hardware has=
 the=0A> redirection table,=0A> which makes it possible (with a very applic=
ation-aware=0A> approach that=0A> is anything but naive) to get functionali=
ty from the=0A> hardware that=0A> isn't otherwise available at a low-level.=
=C2=A0 Few of the=0A> things you=0A> assert are better are available from I=
ntel's cards =E2=80=94 if=0A> you want to=0A> talk about optimal hardware m=
ulti-queue strategies, or=0A> queue-splitting=0A> in software, that's a goo=
d conversation to have and this may=0A> even be=0A> the right list, but I'd=
 encourage you to just build your own=0A> silicon=0A> or use something with=
 programmable firmware.=C2=A0 For those=0A> of us saddled=0A> with Intel NI=
Cs, it's useful to share information on how to=0A> get=0A> behavior that ma=
y be desirable (and I promise you it is for=0A> a large=0A> class of applic=
ations) but not marketed :)=0A> =0A> > You want to balance your flows,=0A> =
> yes, but not balance based on packets, unless all of=0A> your traffic is =
icmp.=0A> > You also want to balance bits, not packets; sending 50=0A> 60 b=
yte packets=0A> > to queue 1 and 50 1500 byte packets to queue 2 isn't=0A> =
balancing. They'll=0A> > be wildly out of order as well.=0A> =0A> This is w=
here the obliviousness is useful.=C2=A0 Traffic has=0A> its own=0A> statist=
ical distributions in terms of inter-packet gaps,=0A> packet sizes,=0A> etc=
.=C2=A0 Assume your application just keeps very accurate=0A> counters of ho=
w=0A> many packets have been seen with each Ethernet protocol=0A> type.=C2=
=A0 This is=0A> a reasonable approximation of some real applications that=
=0A> are=0A> interesting and that people use FreeBSD for.=C2=A0 You don't=
=0A> care how big=0A> the packets are, assuming your memory bandwidth is in=
finite=0A> (or at=0A> least greater than what you need) =E2=80=94 you just =
want to be=0A> sure to see=0A> each one of them, and that means making the =
most of the=0A> resources you=0A> have to ensure that even under peak loads=
 you cannot=0A> possibly drop any=0A> traffic.=0A> =0A> Again, not every ap=
plication is like that, and there's a=0A> reason I=0A> didn't post a patch =
and encourage the premature-tuning crowd=0A> to give=0A> this sort of thing=
 a try.=C2=A0 When you don't care about=0A> distributing=0A> packets evenly=
 by size, you want an algorithm that doesn't=0A> factor them=0A> in.=C2=A0 =
Also, I've had the same concern that you now have=0A> previously, and=0A> i=
n my experience it's mostly magical thinking.=C2=A0 With=0A> many kinds of=
=0A> application and many kinds of real-world traffic it really=0A> doesn't=
=0A> matter, even if in theory it's a possibility.=C2=A0 There's=0A> no uni=
versal=0A> solution to packet capture that's going to be optimal for=0A> ev=
ery=0A> application.=0A> =0A> > Also, using as many cores as possible isn't=
 necessarily=0A> what you want to=0A> > do, depending on your architecture.=
=0A> =0A> I think Sean and I, at least, know that, and it's a point=0A> tha=
t I have=0A> gone on about at great length when people endorse the=0A> go-f=
aster=0A> stripes of using as many cores as possible, rather than as=0A> ma=
ny cores=0A> as necessary.=0A> =0A> > If you have 8 cores on 2 cpus, then y=
ou=0A> > =C2=A0probable want to do all of your networking on four=0A> cores=
 on one cpu.=0A> > There's a big price to pay to shuffle memory between=0A>=
 caches of separate=0A> > cpus, splitting transactions that use the same me=
mory=0A> space is=0A> > counterproductive.=0A> =0A> Not necessarily =E2=80=
=94 you may not need to split transactions=0A> with all=0A> kinds of applic=
ations.=0A> =0A> > More =C2=A0queues mean more locks, and in the end, lock=
=0A> contention is your biggest enemy, not cpu cycles.=0A> =0A> Again, this=
 depends on your application, and that's a very=0A> naive=0A> assertion :)=
=C2=A0 Lock contention may be your biggest=0A> enemy, but it's only=0A> occ=
asionally mine :)=0A> =0A> > The idea that splitting packets that use the s=
ame=0A> memory and code space=0A> > among cpus isn't a very good one; a bet=
ter approach,=0A> assuming you can=0A> =0A> You're making an assumption tha=
t wasn't part of the=0A> conversation,=0A> though.=C2=A0 Who said anything =
about using the same=0A> memory?=0A> =0A> > micromanage, is to allocate X c=
ores (as much as you=0A> need for your peaks)=0A> > to networking, and use =
other cores for user space to=0A> minimize the=0A> > interruptions.=0A> =0A=
> Who said anything about user space?=C2=A0 :)=0A> =0A> And actually, this =
is wrong in even those applications where=0A> it's=0A> right that you need =
to dedicate some cores for networking,=0A> too.=C2=A0 In my=0A> experience,=
 it's much better to have the control path stuff=0A> on the=0A> same cores =
you're handling interrupts on if you're using=0A> something=0A> like netmap=
.=C2=A0 Interrupts kill the cores that are doing=0A> real work with=0A> eac=
h packet.=0A> =0A> Thanks,=0A> Juli.=0A> =0AWell he said it was not causing=
 issues, so it seems to me to give him =0Aa hack that's likely to be less e=
fficient overall isn't the right answer.=0AHis lopsidedness is not normal.=
=0A=0AMake sure the interrupt moderation is tuned properly, it can make a h=
uge=0Adifference. Interrupts on intel devices are really just polls; you ca=
n set=0Athe poll to any interval you want.=0A=0AI'd be interested in seeing=
 the usage numbers with and without the hack.=0AIntel's hashing gives prett=
y even distribution on a router or bridge; =0Athe only time you'd see a rea=
lly lopsided distribution would be if you =0Awere running a traffic generat=
or with a small number of flows. The answer=0Ais to use more flows in that =
case. The same client/server pair is always=0Agoing to use the same queue.=
=0A=0ABC=0A