From owner-freebsd-net@FreeBSD.ORG Tue May 1 18:13:10 2012 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id CF139106567B for ; Tue, 1 May 2012 18:13:10 +0000 (UTC) (envelope-from barney_cordoba@yahoo.com) Received: from nm34-vm3.bullet.mail.ne1.yahoo.com (nm34-vm3.bullet.mail.ne1.yahoo.com [98.138.229.83]) by mx1.freebsd.org (Postfix) with SMTP id 7EB968FC12 for ; Tue, 1 May 2012 18:13:10 +0000 (UTC) Received: from [98.138.90.56] by nm34.bullet.mail.ne1.yahoo.com with NNFMP; 01 May 2012 18:13:04 -0000 Received: from [98.138.89.246] by tm9.bullet.mail.ne1.yahoo.com with NNFMP; 01 May 2012 18:13:04 -0000 Received: from [127.0.0.1] by omp1060.mail.ne1.yahoo.com with NNFMP; 01 May 2012 18:13:03 -0000 X-Yahoo-Newman-Property: ymail-3 X-Yahoo-Newman-Id: 997636.72044.bm@omp1060.mail.ne1.yahoo.com Received: (qmail 92156 invoked by uid 60001); 1 May 2012 18:13:03 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1335895983; bh=fAoefYrbc0os7GZz6pkiZFujACZeWeqdlwZciTEzxbE=; h=X-YMail-OSG:Received:X-Mailer:Message-ID:Date:From:Subject:To:Cc:In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding; b=0TozZCoFzJE/JdUk9Dd39MRXlzS07a5xb4N8tOQorydfjgQs3Ziit2KYV56IFOdTagdeWhnnuCYwPDkohCC8H9qgjR7n1SxcSTS0su0fvD10QqQPwQHlfh7Yt/FYcbHxExoIOddqzi+T3BZ8YJTI2pJx2q5h7z13MZELcEc1Oho= DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=X-YMail-OSG:Received:X-Mailer:Message-ID:Date:From:Subject:To:Cc:In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding; b=LBfvWjRJjxVGHpO/7CVNlyOpMgUMXNuX8q2Jt8iLvh6VnsPlLCrvxAKNBfyUdPXMCHlsMp+G5gAw4SM9jp2NgAwAxOsbmV9LTnI6EKjLu9/k078O0RpyUKGDEx+qP3Fqqp0UHjud9ygxNJ30sb7OYsuklNm/mAyfwwWM1tJbBOo=; X-YMail-OSG: lBs4cM0VM1mJ6jzvkTqokX8lQklIcAxuKTpVoo57XDuswfh VhKDrVAmhY76l0HdRT02BzbMGrfUGm9kzmGSbdgbB_O4Ce1U7KUpygnSyGur 3NwnQanw8zCrrxYDAEtVnnB5s8xtImjQb4ZaeVAB6zbX4jW5LtDZP4tCHkOd 3UDiRuJIeJ2tEtfAkuPzpzyxVadfXGxkJQw7FeegGbDanSqlX0RbBBnSNR.I 7iKMHkvrT0tSLOYAHn1i5rYp47haNbuwMv9zRyT3.hqEbsw80OegVmb7PGra itfrEBdQAz5TR46D7Wy9ZRYBix0sV5opLFCstyPkB4BtkWi_VpJYjtXWoyGF 3F5g.GyeEPyX1IQHbxNY9Cm7e.F6xrwTaCBn2JM8oZQPw.LFTt7E7QLJPi9s Hb2upJC8DiRjIKpWdeP3.kFj3Oq4JsjqdWZxUHAYz3L_G4iooNdwVR6vBy.Y .df9zbw8A.eWvOPQ2JkjmTbFVtqBGkURN018- Received: from [174.48.129.108] by web126001.mail.ne1.yahoo.com via HTTP; Tue, 01 May 2012 11:13:03 PDT X-Mailer: YahooMailClassic/15.0.6 YahooMailWebService/0.8.117.340979 Message-ID: <1335895983.68943.YahooMailClassic@web126001.mail.ne1.yahoo.com> Date: Tue, 1 May 2012 11:13:03 -0700 (PDT) From: Barney Cordoba To: Sean Bruno , Juli Mallett In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable Cc: "freebsd-net@freebsd.org" Subject: Re: igb(4) at peak in big purple X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 01 May 2012 18:13:10 -0000 =0A=0A--- On Fri, 4/27/12, Juli Mallett wrote:=0A=0A= > From: Juli Mallett =0A> Subject: Re: igb(4) at peak= in big purple=0A> To: "Sean Bruno" =0A> Cc: "freebs= d-net@freebsd.org" =0A> Date: Friday, April 27, 20= 12, 4:00 PM=0A> On Fri, Apr 27, 2012 at 12:29, Sean=0A> Bruno =0A> wrote:=0A> > On Thu, 2012-04-26 at 11:13 -0700, Juli Mallett= wrote:=0A> >> Queue splitting in Intel cards is done using a hash=0A> of p= rotocol=0A> >> headers, so this is expected behavior. =A0This also=0A> help= s with TCP and=0A> >> UDP performance, in terms of keeping packets for=0A> = the same protocol=0A> >> control block on the same core, but for other=0A> = applications it's not=0A> >> ideal. =A0If your application does not require= that=0A> kind of locality,=0A> >> there are things that can be done in the= driver to=0A> make it easier to=0A> >> balance packets between all queues = about-evenly.=0A> >=0A> > Oh? :-)=0A> >=0A> > What should I be looking at t= o balance more evenly?=0A> =0A> Dirty hacks are involved :)=A0 I've sent so= me code to=0A> Luigi that I think=0A> would make sense in netmap (since for= many tasks one's going=0A> to do=0A> with netmap, you want to use as many = cores as possible, and=0A> maybe=0A> don't care about locality so much) but= it could be useful=0A> in=0A> conjunction with the network stack, too, for= tasks that=0A> don't need a=0A> lot of locality.=0A> =0A> Basically this i= s the deal: the Intel NICs hash of various=0A> header=0A> fields.=A0 Then, = some bits from that hash are used to=0A> index a table.=0A> That table indi= cates what queue the received packet should=0A> go to.=0A> Ideally you'd wa= nt to use some sort of counter to index that=0A> table and=0A> get round-ro= bin queue usage if you wanted to evenly saturate=0A> all=0A> cores.=A0 Unfo= rtunately there doesn't seem to be a way to=0A> do that.=0A> =0A> What you = can do, though, is regularly update the table that=0A> is indexed=0A> by ha= sh.=A0 Very frequently, in fact, it's a pretty fast=0A> operation.=A0 So=0A= > what I've done, for example, is to go through an rotate all=0A> of the=0A= > entries every N packets, where N is something like the=0A> number of=0A> = receive descriptors per queue divided by the number of=0A> queues.=A0 So=0A= > bucket 0 goes to queue 0 and bucket 1 goes to queue 1 at=0A> first.=A0 Th= en=0A> a few hundred packets are received, and the table is=0A> reprogramme= d, so=0A> now bucket 0 goes to queue 1 and bucket 1 goes to queue 0.=0A> = =0A> I can provide code to do this, but I don't want to post it=0A> publicl= y=0A> (unless it is actually going to become an option for netmap)=0A> for = fear=0A> that people will use it in scenarios where it's harmful and=0A> th= en=0A> complain.=A0 It's potentially one more painful variation=0A> for the= Intel=0A> drivers that Intel can't support, and that just makes=0A> everyo= ne=0A> miserable.=0A> =0A> Thanks,=0A> Juli.=0A=0AThat seems like a pretty = naive approach. First, you want all of the packets in the same flows/connec= tions to use the same channels, otherwise you'll=0Abe sending a lot of stuf= f out of sequence. You want to balance your flows,=0Ayes, but not balance b= ased on packets, unless all of your traffic is icmp.=0AYou also want to bal= ance bits, not packets; sending 50 60 byte packets=0Ato queue 1 and 50 1500= byte packets to queue 2 isn't balancing. They'll=0Abe wildly out of order = as well.=0A=0AAlso, using as many cores as possible isn't necessarily what = you want to =0Ado, depending on your architecture. If you have 8 cores on 2= cpus, then you=0A probable want to do all of your networking on four cores= on one cpu. There's a big price to pay to shuffle memory between caches of= separate =0Acpus, splitting transactions that use the same memory space is= =0Acounterproductive. More queues mean more locks, and in the end, lock c= ontention is your biggest enemy, not cpu cycles.=0A=0AThe idea that splitti= ng packets that use the same memory and code space =0Aamong cpus isn't a ve= ry good one; a better approach, assuming you can=0Amicromanage, is to alloc= ate X cores (as much as you need for your peaks)=0Ato networking, and use o= ther cores for user space to minimize the=0Ainterruptions.=0A=0ABC