Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 24 Mar 2012 14:02:00 -0700
From:      Juli Mallett <jmallett@FreeBSD.org>
To:        Jack Vogel <jfvogel@gmail.com>
Cc:        freebsd-net@freebsd.org, Ivan Voras <ivoras@freebsd.org>
Subject:   Re: nmbclusters: how do we want to fix this for 8.3 ?
Message-ID:  <CACVs6=_kGtQX05baYdi2xqG380uLpcmn9WWo4NeGZ%2BvrXEnXZw@mail.gmail.com>
In-Reply-To: <CAFOYbcm_UySny1pUq2hYBcLDpCq6-BwBZLYVEnwAwcy6vtcvng@mail.gmail.com>
References:  <CAFOYbc=oU5DxZDZQZZe4wJhVDoP=ocVOnpDq7bT=HbVkAjffLQ@mail.gmail.com> <20120222205231.GA81949@onelab2.iet.unipi.it> <1329944986.2621.46.camel@bwh-desktop> <20120222214433.GA82582@onelab2.iet.unipi.it> <CAFOYbc=BWkvGuqAOVehaYEVc7R_4b1Cq1i7Ged=-YEpCekNvfA@mail.gmail.com> <134564BB-676B-49BB-8BDA-6B8EB8965969@netasq.com> <ji5ldg$8tl$1@dough.gmane.org> <CACVs6=_avBzUm0mJd%2BkNvPuBodmc56wHmdg_pCrAODfztVnamw@mail.gmail.com> <20120324200853.GE2253@funkthat.com> <CAFOYbcm_UySny1pUq2hYBcLDpCq6-BwBZLYVEnwAwcy6vtcvng@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, Mar 24, 2012 at 13:33, Jack Vogel <jfvogel@gmail.com> wrote:
> On Sat, Mar 24, 2012 at 1:08 PM, John-Mark Gurney <jmg@funkthat.com> wrot=
e:
>> If we had some sort of tuning algorithm that would keep track of the
>> current receive queue usage depth, and always keep enough mbufs on the
>> queue to handle the largest expected burst of packets (either historical=
,
>> or by looking at largest tcp window size, etc), this would both improve
>> memory usage, and in general reduce the number of require mbufs on the
>> system... =C2=A0If you have fast processors, you might be able to get aw=
ay with
>> less mbufs since you can drain the receive queue faster, but on slower
>> systems, you would use more mbufs.
>
> These are the days when machines might have 64 GIGABYTES of main storage,
> so having sufficient memory to run high performance networking seems litt=
le
> to
> ask.

I think the suggestion is that this should be configurable.  FreeBSD
is also being used on systems, in production, doing networking-related
tasks, with <128MB of RAM.  And it works fine, more or less.

>> This tuning would also fix the problem of interfaces not coming up since
>> at boot, each interface might only allocate 128 or so mbufs, and then
>> dynamicly grow as necessary...
>
> You want modern fast networked servers but only giving them 128 mbufs,
> ya right , allocating memory takes time, so when you do this people will
> whine about latency :)

Allocating memory doesn't have to take much time.  A multi-queue
driver could steal mbufs from an underutilized queue.  It could grow
the number of descriptors based on load.  Some of those things are
hard to implement in the first place and harder to cover the corner
cases of, but not all.

> When you start pumping 10G...40G...100G ...the scale of the system
> is different, thinking in terms of the old 10Mb or 100Mb days just doesn'=
t
> work.

This is a red herring.  Yes, some systems need to do 40/100G.  They
require special tuning.  The default shouldn't assume that everyone's
getting maximum pps.  This seems an especially-silly argument when
much of the silicon available can't even keep up with maximum packet
rates with minimally-sized frames, at 10G or even at 1G.

But again, 1G NICs are the default now.  Does every FreeBSD system
with a 1G NIC have loads of memory?  No.  I have an Atheros system
with 2 1G NICs and 256MB of RAM.  It can't do anything at 1gbps.  Not
even drop packets.  Why should its memory usage model be tuned for
something it can't do?

I'm not saying it should be impossible to allocate a bajillion
gigaquads of memory to receive rings, I certainly do it myself all the
time.  But choosing defaults is a tricky thing, and systems that are
"pumping 10G" need other tweaks anyway, whether that's enabling
forwarding or something else.  Because they have to be configured for
the task that they are to do.  If part of that is increasing the
number of receive descriptors (as the Intel drivers already allow us
to do =E2=80=94 thanks, Jack) and the number of queues, is that such a bad
thing?  I really don't think it makes sense for my 8-core system or my
16-core system to come up with 8- or 16-queues *per interface*.  That
just doesn't make sense.  8/N or 16/N queues where N is the number of
interfaces makes more sense under heavy load.  1 queue per port is
*ideal* if a single core can handle the load of that interface.

> Sorry but the direction is to scale everything, not pare back on the netw=
ork
> IMHO.

There is not just one direction.  There is not just one point of
scaling.  Relatively-new defaults do not necessarily have to be
increased in the future.  I mean, should a 1G NIC use 64 queues on a
64-core system that can do 100gbps @ 64 bytes on one core?  It's
actively-harmful to performance.  The answer to "what's the most
sensible default?" is not "what does a system that just forwards
packets need?"  A system that just forwards packets already needs IPs
configured and a sysctl set.  If we make it easier to change the
tuning of the system for that scenario, then nobody's going to care
what our defaults are, or think us "slow" for them.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CACVs6=_kGtQX05baYdi2xqG380uLpcmn9WWo4NeGZ%2BvrXEnXZw>