From owner-freebsd-net@FreeBSD.ORG  Sat Mar 24 21:18:04 2012
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id AB505106567F;
	Sat, 24 Mar 2012 21:18:04 +0000 (UTC)
	(envelope-from jfvogel@gmail.com)
Received: from mail-wi0-f170.google.com (mail-wi0-f170.google.com
	[209.85.212.170])
	by mx1.freebsd.org (Postfix) with ESMTP id A227A8FC1A;
	Sat, 24 Mar 2012 21:18:03 +0000 (UTC)
Received: by wibhr17 with SMTP id hr17so2743222wib.1
	for <multiple recipients>; Sat, 24 Mar 2012 14:17:57 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
	h=mime-version:in-reply-to:references:date:message-id:subject:from:to
	:cc:content-type;
	bh=qqw+E28ybgHipYM9OGFzuBu+5Pbrvsj40rlhPx4mTvc=;
	b=OvKLgSbAtypaDBCriVg0NCqFEv4VgxmbbktNLQeQiTlJsoZWpHuDo/ZuiS8oPVzwLI
	jt5QQs3iIjOMclZMVDEc+Uyr/C8QP3DeGlS2zUR3U8HoT/TBeQfhsbvYO0SOYdM7RXqP
	AlipTh8rWP72VKvdHz41Zz8J1wj2dTRHaU3ikJqrPwtyhCuLegsl94+98mTD/Gb5r7kB
	HfPnB0oaAakLa2Dri/YCXm+mTIyeM63pLRaSSpi6RFOGsaT1eY5yYtBnwYhvmNZv0Muw
	762lxR+xSIO05UKoVgnjV8zQ1PL6+cvKZs5Y2djdh8aW0HbgU7zaEFaeKLQ0QlzEBFFE
	x2vQ==
MIME-Version: 1.0
Received: by 10.180.82.132 with SMTP id i4mr7071099wiy.12.1332623877495; Sat,
	24 Mar 2012 14:17:57 -0700 (PDT)
Received: by 10.180.82.168 with HTTP; Sat, 24 Mar 2012 14:17:57 -0700 (PDT)
In-Reply-To: <CACVs6=_kGtQX05baYdi2xqG380uLpcmn9WWo4NeGZ+vrXEnXZw@mail.gmail.com>
References: <CAFOYbc=oU5DxZDZQZZe4wJhVDoP=ocVOnpDq7bT=HbVkAjffLQ@mail.gmail.com>
	<20120222205231.GA81949@onelab2.iet.unipi.it>
	<1329944986.2621.46.camel@bwh-desktop>
	<20120222214433.GA82582@onelab2.iet.unipi.it>
	<CAFOYbc=BWkvGuqAOVehaYEVc7R_4b1Cq1i7Ged=-YEpCekNvfA@mail.gmail.com>
	<134564BB-676B-49BB-8BDA-6B8EB8965969@netasq.com>
	<ji5ldg$8tl$1@dough.gmane.org>
	<CACVs6=_avBzUm0mJd+kNvPuBodmc56wHmdg_pCrAODfztVnamw@mail.gmail.com>
	<20120324200853.GE2253@funkthat.com>
	<CAFOYbcm_UySny1pUq2hYBcLDpCq6-BwBZLYVEnwAwcy6vtcvng@mail.gmail.com>
	<CACVs6=_kGtQX05baYdi2xqG380uLpcmn9WWo4NeGZ+vrXEnXZw@mail.gmail.com>
Date: Sat, 24 Mar 2012 14:17:57 -0700
Message-ID: <CAFOYbcm8_UHTcw_7ZC2fg3Z-AMVcqNK+WEA1HNsjhnmiS5yNMQ@mail.gmail.com>
From: Jack Vogel <jfvogel@gmail.com>
To: Juli Mallett <jmallett@freebsd.org>
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: quoted-printable
X-Content-Filtered-By: Mailman/MimeDel 2.1.5
Cc: freebsd-net@freebsd.org, Ivan Voras <ivoras@freebsd.org>
Subject: Re: nmbclusters: how do we want to fix this for 8.3 ?
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 24 Mar 2012 21:18:04 -0000

This whole issue only came up on a system with 10G devices, and only igb
does anything like you're talking about, not a device/driver on most low en=
d
systems. So, we are trading red herrings it would seem.

I'm not opposed to economizing things in a sensible way, it was I that
brought
the issue up after all :)

Jack


On Sat, Mar 24, 2012 at 2:02 PM, Juli Mallett <jmallett@freebsd.org> wrote:

> On Sat, Mar 24, 2012 at 13:33, Jack Vogel <jfvogel@gmail.com> wrote:
> > On Sat, Mar 24, 2012 at 1:08 PM, John-Mark Gurney <jmg@funkthat.com>
> wrote:
> >> If we had some sort of tuning algorithm that would keep track of the
> >> current receive queue usage depth, and always keep enough mbufs on the
> >> queue to handle the largest expected burst of packets (either
> historical,
> >> or by looking at largest tcp window size, etc), this would both improv=
e
> >> memory usage, and in general reduce the number of require mbufs on the
> >> system...  If you have fast processors, you might be able to get away
> with
> >> less mbufs since you can drain the receive queue faster, but on slower
> >> systems, you would use more mbufs.
> >
> > These are the days when machines might have 64 GIGABYTES of main storag=
e,
> > so having sufficient memory to run high performance networking seems
> little
> > to
> > ask.
>
> I think the suggestion is that this should be configurable.  FreeBSD
> is also being used on systems, in production, doing networking-related
> tasks, with <128MB of RAM.  And it works fine, more or less.
>
> >> This tuning would also fix the problem of interfaces not coming up sin=
ce
> >> at boot, each interface might only allocate 128 or so mbufs, and then
> >> dynamicly grow as necessary...
> >
> > You want modern fast networked servers but only giving them 128 mbufs,
> > ya right , allocating memory takes time, so when you do this people wil=
l
> > whine about latency :)
>
> Allocating memory doesn't have to take much time.  A multi-queue
> driver could steal mbufs from an underutilized queue.  It could grow
> the number of descriptors based on load.  Some of those things are
> hard to implement in the first place and harder to cover the corner
> cases of, but not all.
>
> > When you start pumping 10G...40G...100G ...the scale of the system
> > is different, thinking in terms of the old 10Mb or 100Mb days just
> doesn't
> > work.
>
> This is a red herring.  Yes, some systems need to do 40/100G.  They
> require special tuning.  The default shouldn't assume that everyone's
> getting maximum pps.  This seems an especially-silly argument when
> much of the silicon available can't even keep up with maximum packet
> rates with minimally-sized frames, at 10G or even at 1G.
>
> But again, 1G NICs are the default now.  Does every FreeBSD system
> with a 1G NIC have loads of memory?  No.  I have an Atheros system
> with 2 1G NICs and 256MB of RAM.  It can't do anything at 1gbps.  Not
> even drop packets.  Why should its memory usage model be tuned for
> something it can't do?
>
> I'm not saying it should be impossible to allocate a bajillion
> gigaquads of memory to receive rings, I certainly do it myself all the
> time.  But choosing defaults is a tricky thing, and systems that are
> "pumping 10G" need other tweaks anyway, whether that's enabling
> forwarding or something else.  Because they have to be configured for
> the task that they are to do.  If part of that is increasing the
> number of receive descriptors (as the Intel drivers already allow us
> to do =97 thanks, Jack) and the number of queues, is that such a bad
> thing?  I really don't think it makes sense for my 8-core system or my
> 16-core system to come up with 8- or 16-queues *per interface*.  That
> just doesn't make sense.  8/N or 16/N queues where N is the number of
> interfaces makes more sense under heavy load.  1 queue per port is
> *ideal* if a single core can handle the load of that interface.
>
> > Sorry but the direction is to scale everything, not pare back on the
> network
> > IMHO.
>
> There is not just one direction.  There is not just one point of
> scaling.  Relatively-new defaults do not necessarily have to be
> increased in the future.  I mean, should a 1G NIC use 64 queues on a
> 64-core system that can do 100gbps @ 64 bytes on one core?  It's
> actively-harmful to performance.  The answer to "what's the most
> sensible default?" is not "what does a system that just forwards
> packets need?"  A system that just forwards packets already needs IPs
> configured and a sysctl set.  If we make it easier to change the
> tuning of the system for that scenario, then nobody's going to care
> what our defaults are, or think us "slow" for them.
>