Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 1 May 2012 17:01:47 -0700 (PDT)
From:      Barney Cordoba <barney_cordoba@yahoo.com>
To:        Juli Mallett <jmallett@FreeBSD.org>
Cc:        "freebsd-net@freebsd.org" <freebsd-net@freebsd.org>, Sean Bruno <seanbru@yahoo-inc.com>
Subject:   Re: igb(4) at peak in big purple
Message-ID:  <1335916907.11454.YahooMailClassic@web126006.mail.ne1.yahoo.com>

next in thread | raw e-mail | index | archive | help
=0A--- On Tue, 5/1/12, Juli Mallett <jmallett@FreeBSD.org> wrote:=0A=0A> Fr=
om: Juli Mallett <jmallett@FreeBSD.org>=0A> Subject: Re: igb(4) at peak in =
big purple=0A> To: "Barney Cordoba" <barney_cordoba@yahoo.com>=0A> Cc: "Sea=
n Bruno" <seanbru@yahoo-inc.com>, "freebsd-net@freebsd.org" <freebsd-net@fr=
eebsd.org>=0A> Date: Tuesday, May 1, 2012, 5:50 PM=0A> Hey Barney,=0A> =0A>=
 On Tue, May 1, 2012 at 11:13, Barney Cordoba <barney_cordoba@yahoo.com>=0A=
> wrote:=0A> > --- On Fri, 4/27/12, Juli Mallett <jmallett@FreeBSD.org>=0A>=
 wrote:=0A> > > [Tricking Intel's cards into giving something like=0A> roun=
d-robin packet=0A> > >=C2=A0 delivery to multiple queues.=C2=A0 ]=0A> >=0A>=
 > That seems like a pretty naive approach. First, you=0A> want all of the =
packets in the same flows/connections to use=0A> the same channels, otherwi=
se you'll=0A> > be sending a lot of stuff out of sequence.=0A> =0A> I would=
n't call it naive, I'd call it "oblivious".=C2=A0 I=0A> feel like I went=0A=
> to some lengths to indicate that it was not the right=0A> solution to man=
y=0A> problems, but that it was a worthwhile approach in the case=0A> where=
 one=0A> doesn't care about anything but evenly distributing packets=0A> by=
 number=0A> (although size is also possible, by using a size-based=0A> wate=
rmark=0A> rather than a count-based one) to as many queues as=0A> possible.=
=C2=A0 Not=0A> every application requires in-sequence packets (indeed,=0A> =
out-of-sequence traffic can be a problem even with flow=0A> affinity=0A> ap=
proaches.)=0A> =0A> My note was simply about the case where you need to eve=
nly=0A> saturate=0A> queues to divide up the work as much as possible, on=
=0A> hardware that=0A> doesn't make it possible to get the behavior you wan=
t=0A> (round-robin by=0A> packet) for that case.=C2=A0 Intel's hardware has=
 the=0A> redirection table,=0A> which makes it possible (with a very applic=
ation-aware=0A> approach that=0A> is anything but naive) to get functionali=
ty from the=0A> hardware that=0A> isn't otherwise available at a low-level.=
=C2=A0 Few of the=0A> things you=0A> assert are better are available from I=
ntel's cards =E2=80=94 if=0A> you want to=0A> talk about optimal hardware m=
ulti-queue strategies, or=0A> queue-splitting=0A> in software, that's a goo=
d conversation to have and this may=0A> even be=0A> the right list, but I'd=
 encourage you to just build your own=0A> silicon=0A> or use something with=
 programmable firmware.=C2=A0 For those=0A> of us saddled=0A> with Intel NI=
Cs, it's useful to share information on how to=0A> get=0A> behavior that ma=
y be desirable (and I promise you it is for=0A> a large=0A> class of applic=
ations) but not marketed :)=0A> =0A> > You want to balance your flows,=0A> =
> yes, but not balance based on packets, unless all of=0A> your traffic is =
icmp.=0A> > You also want to balance bits, not packets; sending 50=0A> 60 b=
yte packets=0A> > to queue 1 and 50 1500 byte packets to queue 2 isn't=0A> =
balancing. They'll=0A> > be wildly out of order as well.=0A> =0A> This is w=
here the obliviousness is useful.=C2=A0 Traffic has=0A> its own=0A> statist=
ical distributions in terms of inter-packet gaps,=0A> packet sizes,=0A> etc=
.=C2=A0 Assume your application just keeps very accurate=0A> counters of ho=
w=0A> many packets have been seen with each Ethernet protocol=0A> type.=C2=
=A0 This is=0A> a reasonable approximation of some real applications that=
=0A> are=0A> interesting and that people use FreeBSD for.=C2=A0 You don't=
=0A> care how big=0A> the packets are, assuming your memory bandwidth is in=
finite=0A> (or at=0A> least greater than what you need) =E2=80=94 you just =
want to be=0A> sure to see=0A> each one of them, and that means making the =
most of the=0A> resources you=0A> have to ensure that even under peak loads=
 you cannot=0A> possibly drop any=0A> traffic.=0A> =0A> Again, not every ap=
plication is like that, and there's a=0A> reason I=0A> didn't post a patch =
and encourage the premature-tuning crowd=0A> to give=0A> this sort of thing=
 a try.=C2=A0 When you don't care about=0A> distributing=0A> packets evenly=
 by size, you want an algorithm that doesn't=0A> factor them=0A> in.=C2=A0 =
Also, I've had the same concern that you now have=0A> previously, and=0A> i=
n my experience it's mostly magical thinking.=C2=A0 With=0A> many kinds of=
=0A> application and many kinds of real-world traffic it really=0A> doesn't=
=0A> matter, even if in theory it's a possibility.=C2=A0 There's=0A> no uni=
versal=0A> solution to packet capture that's going to be optimal for=0A> ev=
ery=0A> application.=0A> =0A> > Also, using as many cores as possible isn't=
 necessarily=0A> what you want to=0A> > do, depending on your architecture.=
=0A> =0A> I think Sean and I, at least, know that, and it's a point=0A> tha=
t I have=0A> gone on about at great length when people endorse the=0A> go-f=
aster=0A> stripes of using as many cores as possible, rather than as=0A> ma=
ny cores=0A> as necessary.=0A> =0A> > If you have 8 cores on 2 cpus, then y=
ou=0A> > =C2=A0probable want to do all of your networking on four=0A> cores=
 on one cpu.=0A> > There's a big price to pay to shuffle memory between=0A>=
 caches of separate=0A> > cpus, splitting transactions that use the same me=
mory=0A> space is=0A> > counterproductive.=0A> =0A> Not necessarily =E2=80=
=94 you may not need to split transactions=0A> with all=0A> kinds of applic=
ations.=0A> =0A> > More =C2=A0queues mean more locks, and in the end, lock=
=0A> contention is your biggest enemy, not cpu cycles.=0A> =0A> Again, this=
 depends on your application, and that's a very=0A> naive=0A> assertion :)=
=C2=A0 Lock contention may be your biggest=0A> enemy, but it's only=0A> occ=
asionally mine :)=0A> =0A> > The idea that splitting packets that use the s=
ame=0A> memory and code space=0A> > among cpus isn't a very good one; a bet=
ter approach,=0A> assuming you can=0A> =0A> You're making an assumption tha=
t wasn't part of the=0A> conversation,=0A> though.=C2=A0 Who said anything =
about using the same=0A> memory?=0A> =0A> > micromanage, is to allocate X c=
ores (as much as you=0A> need for your peaks)=0A> > to networking, and use =
other cores for user space to=0A> minimize the=0A> > interruptions.=0A> =0A=
> Who said anything about user space?=C2=A0 :)=0A> =0A> And actually, this =
is wrong in even those applications where=0A> it's=0A> right that you need =
to dedicate some cores for networking,=0A> too.=C2=A0 In my=0A> experience,=
 it's much better to have the control path stuff=0A> on the=0A> same cores =
you're handling interrupts on if you're using=0A> something=0A> like netmap=
.=C2=A0 Interrupts kill the cores that are doing=0A> real work with=0A> eac=
h packet.=0A> =0A> Thanks,=0A> Juli.=0A> =0AWell he said it was not causing=
 issues, so it seems to me to give him =0Aa hack that's likely to be less e=
fficient overall isn't the right answer.=0AHis lopsidedness is not normal.=
=0A=0AMake sure the interrupt moderation is tuned properly, it can make a h=
uge=0Adifference. Interrupts on intel devices are really just polls; you ca=
n set=0Athe poll to any interval you want.=0A=0AI'd be interested in seeing=
 the usage numbers with and without the hack.=0AIntel's hashing gives prett=
y even distribution on a router or bridge; =0Athe only time you'd see a rea=
lly lopsided distribution would be if you =0Awere running a traffic generat=
or with a small number of flows. The answer=0Ais to use more flows in that =
case. The same client/server pair is always=0Agoing to use the same queue.=
=0A=0ABC=0A



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1335916907.11454.YahooMailClassic>