Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 1 May 2012 14:50:58 -0700
From:      Juli Mallett <jmallett@FreeBSD.org>
To:        Barney Cordoba <barney_cordoba@yahoo.com>
Cc:        "freebsd-net@freebsd.org" <freebsd-net@freebsd.org>, Sean Bruno <seanbru@yahoo-inc.com>
Subject:   Re: igb(4) at peak in big purple
Message-ID:  <CACVs6=9YgrvNiEjF7BV8z5DKrWLJsShnNg5m6bJ142zDtiHx_Q@mail.gmail.com>
In-Reply-To: <1335895983.68943.YahooMailClassic@web126001.mail.ne1.yahoo.com>
References:  <CACVs6=9RzaZAHx6RC4AGywTzpuc8hNrY4eD-e-AJoV32OEMVgg@mail.gmail.com> <1335895983.68943.YahooMailClassic@web126001.mail.ne1.yahoo.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Hey Barney,

On Tue, May 1, 2012 at 11:13, Barney Cordoba <barney_cordoba@yahoo.com> wro=
te:
> --- On Fri, 4/27/12, Juli Mallett <jmallett@FreeBSD.org> wrote:
> > [Tricking Intel's cards into giving something like round-robin packet
> >  delivery to multiple queues.  ]
>
> That seems like a pretty naive approach. First, you want all of the packe=
ts in the same flows/connections to use the same channels, otherwise you'll
> be sending a lot of stuff out of sequence.

I wouldn't call it naive, I'd call it "oblivious".  I feel like I went
to some lengths to indicate that it was not the right solution to many
problems, but that it was a worthwhile approach in the case where one
doesn't care about anything but evenly distributing packets by number
(although size is also possible, by using a size-based watermark
rather than a count-based one) to as many queues as possible.  Not
every application requires in-sequence packets (indeed,
out-of-sequence traffic can be a problem even with flow affinity
approaches.)

My note was simply about the case where you need to evenly saturate
queues to divide up the work as much as possible, on hardware that
doesn't make it possible to get the behavior you want (round-robin by
packet) for that case.  Intel's hardware has the redirection table,
which makes it possible (with a very application-aware approach that
is anything but naive) to get functionality from the hardware that
isn't otherwise available at a low-level.  Few of the things you
assert are better are available from Intel's cards =E2=80=94 if you want to
talk about optimal hardware multi-queue strategies, or queue-splitting
in software, that's a good conversation to have and this may even be
the right list, but I'd encourage you to just build your own silicon
or use something with programmable firmware.  For those of us saddled
with Intel NICs, it's useful to share information on how to get
behavior that may be desirable (and I promise you it is for a large
class of applications) but not marketed :)

> You want to balance your flows,
> yes, but not balance based on packets, unless all of your traffic is icmp=
.
> You also want to balance bits, not packets; sending 50 60 byte packets
> to queue 1 and 50 1500 byte packets to queue 2 isn't balancing. They'll
> be wildly out of order as well.

This is where the obliviousness is useful.  Traffic has its own
statistical distributions in terms of inter-packet gaps, packet sizes,
etc.  Assume your application just keeps very accurate counters of how
many packets have been seen with each Ethernet protocol type.  This is
a reasonable approximation of some real applications that are
interesting and that people use FreeBSD for.  You don't care how big
the packets are, assuming your memory bandwidth is infinite (or at
least greater than what you need) =E2=80=94 you just want to be sure to see
each one of them, and that means making the most of the resources you
have to ensure that even under peak loads you cannot possibly drop any
traffic.

Again, not every application is like that, and there's a reason I
didn't post a patch and encourage the premature-tuning crowd to give
this sort of thing a try.  When you don't care about distributing
packets evenly by size, you want an algorithm that doesn't factor them
in.  Also, I've had the same concern that you now have previously, and
in my experience it's mostly magical thinking.  With many kinds of
application and many kinds of real-world traffic it really doesn't
matter, even if in theory it's a possibility.  There's no universal
solution to packet capture that's going to be optimal for every
application.

> Also, using as many cores as possible isn't necessarily what you want to
> do, depending on your architecture.

I think Sean and I, at least, know that, and it's a point that I have
gone on about at great length when people endorse the go-faster
stripes of using as many cores as possible, rather than as many cores
as necessary.

> If you have 8 cores on 2 cpus, then you
> =C2=A0probable want to do all of your networking on four cores on one cpu=
.
> There's a big price to pay to shuffle memory between caches of separate
> cpus, splitting transactions that use the same memory space is
> counterproductive.

Not necessarily =E2=80=94 you may not need to split transactions with all
kinds of applications.

> More =C2=A0queues mean more locks, and in the end, lock contention is you=
r biggest enemy, not cpu cycles.

Again, this depends on your application, and that's a very naive
assertion :)  Lock contention may be your biggest enemy, but it's only
occasionally mine :)

> The idea that splitting packets that use the same memory and code space
> among cpus isn't a very good one; a better approach, assuming you can

You're making an assumption that wasn't part of the conversation,
though.  Who said anything about using the same memory?

> micromanage, is to allocate X cores (as much as you need for your peaks)
> to networking, and use other cores for user space to minimize the
> interruptions.

Who said anything about user space?  :)

And actually, this is wrong in even those applications where it's
right that you need to dedicate some cores for networking, too.  In my
experience, it's much better to have the control path stuff on the
same cores you're handling interrupts on if you're using something
like netmap.  Interrupts kill the cores that are doing real work with
each packet.

Thanks,
Juli.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CACVs6=9YgrvNiEjF7BV8z5DKrWLJsShnNg5m6bJ142zDtiHx_Q>