From owner-freebsd-net@FreeBSD.ORG  Tue May  1 18:13:10 2012
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id CF139106567B
	for <freebsd-net@freebsd.org>; Tue,  1 May 2012 18:13:10 +0000 (UTC)
	(envelope-from barney_cordoba@yahoo.com)
Received: from nm34-vm3.bullet.mail.ne1.yahoo.com
	(nm34-vm3.bullet.mail.ne1.yahoo.com [98.138.229.83])
	by mx1.freebsd.org (Postfix) with SMTP id 7EB968FC12
	for <freebsd-net@freebsd.org>; Tue,  1 May 2012 18:13:10 +0000 (UTC)
Received: from [98.138.90.56] by nm34.bullet.mail.ne1.yahoo.com with NNFMP;
	01 May 2012 18:13:04 -0000
Received: from [98.138.89.246] by tm9.bullet.mail.ne1.yahoo.com with NNFMP;
	01 May 2012 18:13:04 -0000
Received: from [127.0.0.1] by omp1060.mail.ne1.yahoo.com with NNFMP;
	01 May 2012 18:13:03 -0000
X-Yahoo-Newman-Property: ymail-3
X-Yahoo-Newman-Id: 997636.72044.bm@omp1060.mail.ne1.yahoo.com
Received: (qmail 92156 invoked by uid 60001); 1 May 2012 18:13:03 -0000
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024;
	t=1335895983; bh=fAoefYrbc0os7GZz6pkiZFujACZeWeqdlwZciTEzxbE=;
	h=X-YMail-OSG:Received:X-Mailer:Message-ID:Date:From:Subject:To:Cc:In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding;
	b=0TozZCoFzJE/JdUk9Dd39MRXlzS07a5xb4N8tOQorydfjgQs3Ziit2KYV56IFOdTagdeWhnnuCYwPDkohCC8H9qgjR7n1SxcSTS0su0fvD10QqQPwQHlfh7Yt/FYcbHxExoIOddqzi+T3BZ8YJTI2pJx2q5h7z13MZELcEc1Oho=
DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com;
	h=X-YMail-OSG:Received:X-Mailer:Message-ID:Date:From:Subject:To:Cc:In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding;
	b=LBfvWjRJjxVGHpO/7CVNlyOpMgUMXNuX8q2Jt8iLvh6VnsPlLCrvxAKNBfyUdPXMCHlsMp+G5gAw4SM9jp2NgAwAxOsbmV9LTnI6EKjLu9/k078O0RpyUKGDEx+qP3Fqqp0UHjud9ygxNJ30sb7OYsuklNm/mAyfwwWM1tJbBOo=;
X-YMail-OSG: lBs4cM0VM1mJ6jzvkTqokX8lQklIcAxuKTpVoo57XDuswfh
	VhKDrVAmhY76l0HdRT02BzbMGrfUGm9kzmGSbdgbB_O4Ce1U7KUpygnSyGur
	3NwnQanw8zCrrxYDAEtVnnB5s8xtImjQb4ZaeVAB6zbX4jW5LtDZP4tCHkOd
	3UDiRuJIeJ2tEtfAkuPzpzyxVadfXGxkJQw7FeegGbDanSqlX0RbBBnSNR.I
	7iKMHkvrT0tSLOYAHn1i5rYp47haNbuwMv9zRyT3.hqEbsw80OegVmb7PGra
	itfrEBdQAz5TR46D7Wy9ZRYBix0sV5opLFCstyPkB4BtkWi_VpJYjtXWoyGF
	3F5g.GyeEPyX1IQHbxNY9Cm7e.F6xrwTaCBn2JM8oZQPw.LFTt7E7QLJPi9s
	Hb2upJC8DiRjIKpWdeP3.kFj3Oq4JsjqdWZxUHAYz3L_G4iooNdwVR6vBy.Y
	.df9zbw8A.eWvOPQ2JkjmTbFVtqBGkURN018-
Received: from [174.48.129.108] by web126001.mail.ne1.yahoo.com via HTTP;
	Tue, 01 May 2012 11:13:03 PDT
X-Mailer: YahooMailClassic/15.0.6 YahooMailWebService/0.8.117.340979
Message-ID: <1335895983.68943.YahooMailClassic@web126001.mail.ne1.yahoo.com>
Date: Tue, 1 May 2012 11:13:03 -0700 (PDT)
From: Barney Cordoba <barney_cordoba@yahoo.com>
To: Sean Bruno <seanbru@yahoo-inc.com>, Juli Mallett <jmallett@FreeBSD.org>
In-Reply-To: <CACVs6=9RzaZAHx6RC4AGywTzpuc8hNrY4eD-e-AJoV32OEMVgg@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: quoted-printable
Cc: "freebsd-net@freebsd.org" <freebsd-net@freebsd.org>
Subject: Re: igb(4) at peak in big purple
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 01 May 2012 18:13:10 -0000

=0A=0A--- On Fri, 4/27/12, Juli Mallett <jmallett@FreeBSD.org> wrote:=0A=0A=
> From: Juli Mallett <jmallett@FreeBSD.org>=0A> Subject: Re: igb(4) at peak=
 in big purple=0A> To: "Sean Bruno" <seanbru@yahoo-inc.com>=0A> Cc: "freebs=
d-net@freebsd.org" <freebsd-net@freebsd.org>=0A> Date: Friday, April 27, 20=
12, 4:00 PM=0A> On Fri, Apr 27, 2012 at 12:29, Sean=0A> Bruno <seanbru@yaho=
o-inc.com>=0A> wrote:=0A> > On Thu, 2012-04-26 at 11:13 -0700, Juli Mallett=
 wrote:=0A> >> Queue splitting in Intel cards is done using a hash=0A> of p=
rotocol=0A> >> headers, so this is expected behavior. =A0This also=0A> help=
s with TCP and=0A> >> UDP performance, in terms of keeping packets for=0A> =
the same protocol=0A> >> control block on the same core, but for other=0A> =
applications it's not=0A> >> ideal. =A0If your application does not require=
 that=0A> kind of locality,=0A> >> there are things that can be done in the=
 driver to=0A> make it easier to=0A> >> balance packets between all queues =
about-evenly.=0A> >=0A> > Oh? :-)=0A> >=0A> > What should I be looking at t=
o balance more evenly?=0A> =0A> Dirty hacks are involved :)=A0 I've sent so=
me code to=0A> Luigi that I think=0A> would make sense in netmap (since for=
 many tasks one's going=0A> to do=0A> with netmap, you want to use as many =
cores as possible, and=0A> maybe=0A> don't care about locality so much) but=
 it could be useful=0A> in=0A> conjunction with the network stack, too, for=
 tasks that=0A> don't need a=0A> lot of locality.=0A> =0A> Basically this i=
s the deal: the Intel NICs hash of various=0A> header=0A> fields.=A0 Then, =
some bits from that hash are used to=0A> index a table.=0A> That table indi=
cates what queue the received packet should=0A> go to.=0A> Ideally you'd wa=
nt to use some sort of counter to index that=0A> table and=0A> get round-ro=
bin queue usage if you wanted to evenly saturate=0A> all=0A> cores.=A0 Unfo=
rtunately there doesn't seem to be a way to=0A> do that.=0A> =0A> What you =
can do, though, is regularly update the table that=0A> is indexed=0A> by ha=
sh.=A0 Very frequently, in fact, it's a pretty fast=0A> operation.=A0 So=0A=
> what I've done, for example, is to go through an rotate all=0A> of the=0A=
> entries every N packets, where N is something like the=0A> number of=0A> =
receive descriptors per queue divided by the number of=0A> queues.=A0 So=0A=
> bucket 0 goes to queue 0 and bucket 1 goes to queue 1 at=0A> first.=A0 Th=
en=0A> a few hundred packets are received, and the table is=0A> reprogramme=
d, so=0A> now bucket 0 goes to queue 1 and bucket 1 goes to queue 0.=0A> =
=0A> I can provide code to do this, but I don't want to post it=0A> publicl=
y=0A> (unless it is actually going to become an option for netmap)=0A> for =
fear=0A> that people will use it in scenarios where it's harmful and=0A> th=
en=0A> complain.=A0 It's potentially one more painful variation=0A> for the=
 Intel=0A> drivers that Intel can't support, and that just makes=0A> everyo=
ne=0A> miserable.=0A> =0A> Thanks,=0A> Juli.=0A=0AThat seems like a pretty =
naive approach. First, you want all of the packets in the same flows/connec=
tions to use the same channels, otherwise you'll=0Abe sending a lot of stuf=
f out of sequence. You want to balance your flows,=0Ayes, but not balance b=
ased on packets, unless all of your traffic is icmp.=0AYou also want to bal=
ance bits, not packets; sending 50 60 byte packets=0Ato queue 1 and 50 1500=
 byte packets to queue 2 isn't balancing. They'll=0Abe wildly out of order =
as well.=0A=0AAlso, using as many cores as possible isn't necessarily what =
you want to =0Ado, depending on your architecture. If you have 8 cores on 2=
 cpus, then you=0A probable want to do all of your networking on four cores=
 on one cpu. There's a big price to pay to shuffle memory between caches of=
 separate =0Acpus, splitting transactions that use the same memory space is=
 =0Acounterproductive. More  queues mean more locks, and in the end, lock c=
ontention is your biggest enemy, not cpu cycles.=0A=0AThe idea that splitti=
ng packets that use the same memory and code space =0Aamong cpus isn't a ve=
ry good one; a better approach, assuming you can=0Amicromanage, is to alloc=
ate X cores (as much as you need for your peaks)=0Ato networking, and use o=
ther cores for user space to minimize the=0Ainterruptions.=0A=0ABC