From owner-svn-src-all@FreeBSD.ORG  Sun Nov 11 16:53:54 2012
Return-Path: <owner-svn-src-all@FreeBSD.ORG>
Delivered-To: svn-src-all@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
 by hub.freebsd.org (Postfix) with ESMTP id 666861869;
 Sun, 11 Nov 2012 16:53:54 +0000 (UTC) (envelope-from bright@mu.org)
Received: from elvis.mu.org (elvis.mu.org [192.203.228.196])
 by mx1.freebsd.org (Postfix) with ESMTP id F35DD8FC9A;
 Sun, 11 Nov 2012 16:47:29 +0000 (UTC)
Received: from [10.0.1.17] (c-67-180-208-218.hsd1.ca.comcast.net
 [67.180.208.218])
 by elvis.mu.org (Postfix) with ESMTPSA id 8C68D1A3C1E;
 Sun, 11 Nov 2012 08:47:27 -0800 (PST)
References: <CAF6rxg=HPmQS1T-LFsZ=DuKEqH30iJFpkz+JGhLr4OBL8nohjg@mail.gmail.com>
 <509DC25E.5030306@mu.org> <509E3162.5020702@FreeBSD.org>
 <509E7E7C.9000104@mu.org>
 <CAF6rxgmV8dx-gsQceQKuMQEsJ+GkExcKYxEvQ3kY+5_nSjvA3w@mail.gmail.com>
 <509E830D.5080006@mu.org> <1352568275.17290.85.camel@revolution.hippie.lan>
 <CAGE5yCp4N7fML05-Tomm0TM-ROBSka5+b9EKJTFR+yUpFuGj5Q@mail.gmail.com>
 <20121111061517.H1208@besplex.bde.org>
 <CAGE5yCpExfeJHeUuO0FEEFMgeNzftaFSWT=D-yKGdP+1xnjZ4A@mail.gmail.com>
 <20121111073352.GA96046@FreeBSD.org> <509F72B0.90201@mu.org>
 <CAGE5yCpn7znceWZsDqdw2tfCusSLroRjg6+=QJnonxoTQ8RjaA@mail.gmail.com>
In-Reply-To: <CAGE5yCpn7znceWZsDqdw2tfCusSLroRjg6+=QJnonxoTQ8RjaA@mail.gmail.com>
Mime-Version: 1.0 (1.0)
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
	charset=us-ascii
Message-Id: <15512D0F-D403-4341-92D7-DFA03FDC2D88@mu.org>
X-Mailer: iPhone Mail (9B206)
From: Alfred Perlstein <bright@mu.org>
Subject: Re: svn commit: r242847 - in head/sys: i386/include kern
Date: Sun, 11 Nov 2012 08:47:24 -0800
To: Peter Wemm <peter@wemm.org>
Cc: "svn-src-head@freebsd.org" <svn-src-head@freebsd.org>,
 Alexey Dokuchaev <danfe@freebsd.org>,
 "src-committers@freebsd.org" <src-committers@freebsd.org>,
 "svn-src-all@freebsd.org" <svn-src-all@freebsd.org>
X-BeenThere: svn-src-all@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: "SVN commit messages for the entire src tree \(except for &quot;
 user&quot; and &quot; projects&quot; \)" <svn-src-all.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/svn-src-all>,
 <mailto:svn-src-all-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/svn-src-all>
List-Post: <mailto:svn-src-all@freebsd.org>
List-Help: <mailto:svn-src-all-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/svn-src-all>,
 <mailto:svn-src-all-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 11 Nov 2012 16:53:54 -0000

I think there are two issue here.=20

One: you have much better idea of how to tune nmbclusters than I do. Cool! P=
lease put that into the code. I really think that's great and the time you'v=
e pit into giving it serious thought is helpful to all.=20

Two: you want to divorce nmbclusters (and therefor maxsockets and some other=
 tunables) from maxusers even though that has been the way to flip a big swi=
tch for ages now. This is think is very wrong.=20

"oh you only have to change 1 thing!"

Wait... What was that sound?  Oh it was the flushing of a toilet that was fl=
ushing down 15 years of mailing list information, FAQs and user knowledge do=
wn the toilet because the word "maxusers" is no longer hip to the community.=
 That is bad. Please don't do that.=20


On Nov 11, 2012, at 2:53 AM, Peter Wemm <peter@wemm.org> wrote:

> On Sun, Nov 11, 2012 at 1:41 AM, Albert Perlstein <bright@mu.org> wrote:
>> The real conversation goes like this:
>>=20
>> user: "Why is my box seeing terrible network performance?"
>> bsdguy: "Increase nmbclusters."
>> user: "what is that?"
>> bsdguy: "Oh those are the mbufs, just tell me your current value."
>> user: "oh it's like 128000"
>> bsdguy: "hmm try doubling that, go sysctl kern.ipc.nmbclusters=3D512000 o=
n the
>> command line."
>> user: "ok"
>> .... an hour passes ...
>> user: "hmm now I can't fork any more copies of apache.."
>> bsdguy: "oh, ok, you need to increase maxproc for that."
>> user: "so sysctl kern.ipc.maxproc=3D10000?"
>> bsdguy: "no... one second..."
>> ....
>> bsdguy: "ok, so that's sysctl kern.maxproc=3D10000"
>> user: "ok... bbiaf"
>> ....
>> user: "so now i'm getting log messages about can't open sockets..."
>> bsdguy: "oh you need to increase sockets bro... one second..."
>> user: "sysctl kern.maxsockets?"
>> bsdguy: "oh no.. it's actually back to kern.ipc.maxsockets"
>> user: "alrighty then.."
>> ....
>> ....
>> bsdguy: "so how is freebsd since I helped you tune it?"
>> user: "well i kept hitting other resource limits, boss made me switch to
>> Linux, it works out of the box and doesn't require an expert tuner to run=
 a
>> large scale server.  Y'know as a last ditch effort I looked around for th=
is
>> 'maxusers' thing but it seems like some eggheads retired it and instead o=
f
>> putting my job at risk, I just went with Linux, no one gets fired for usi=
ng
>> Linux."
>> bsdguy: "managers are lame!"
>> user: "yeah!  managers..."
>>=20
>> -Alfred
>=20
> Now Albert.. I know that deliberately playing dumb is fun, but there
> is no network difference between doubling "kern.maxusers" in
> loader.conf (the only place it can be set, it isn't runtime tuneable)
> and doubling "kern.ipc.nmbclusters" in the same place.  We've always
> allowed people to fine-tune derived settings at runtime where it is
> possible.
>=20
> My position still is that instead of trying to dick around with
> maxusers curve slopes to try and somehow get the scaling right, we
> should instead be setting sensibly right from the start, by default.
>=20
> The current scaling was written when we had severe kva constraints,
> did reservations, etc.  Now they're a cap on dynamic allocators on
> most platforms.
>=20
> "Sensible" defaults would be *way* higher than the current maxusers
> derived scaling curves.
>=20
> My quick survey:
> 8G ram -> 65088 clusters -> clusters capped at 6.2% of physical ram
> (running head)
> 3.5G ram -> 25600 clusters -> clusters capped at 5.0% of physical ram
> (running an old head)
> 32G ram -> 25600 clusters -> clusters capped at 1.5% of physical ram
> (running 9.1-stable)
> 72G ram -> 25600 clusters -> clusters capped at 0.06% of physical ram
> (9.1-stable again)
>=20
> As I've been saying from the beginning..  As these are limits on
> dynamic allocators, not reservations, they should be as high as we can
> comfortably set them without risking running out of other resources.
>=20
> As the code stands now..  the derived limits for 4k, 9k and 16k jumbo
> clusters is approximately the same space as 2K clusters.  (ie: 1 x 4k
> cluster per 2 x 2k clusters, 1 x 16k cluster per 8 2k clusters, and so
> on).  If we set a constant 6% for nmbclusters (since that's roughly
> where we're at now for smaller machines after albert's changes), then
> the worse case scenarios for 4k, 9k and 16k clusters are 6% each.  ie:
> 24% of wired, physical ram.
>=20
> Plus all the other values derived from the nmbclusters tunable at boot.
>=20
> I started writing this with the intention of suggesting 10% but that
> might be a bit high given that:
> kern_mbuf.c:        nmbjumbop =3D nmbclusters / 2;
> kern_mbuf.c:        nmbjumbo9 =3D nmbclusters / 4;
> kern_mbuf.c:        nmbjumbo16 =3D nmbclusters / 8;
> .. basically quadruples the worst case limits.
>=20
> Out of the box, 6% is infinitely better than we 0.06% we currently get
> on a 9-stable machine with 72G ram.
>=20
> But I object to dicking around with "maxusers" to derive network
> buffer space default limits.  If we settle on something like 6%, then
> it should be 6%.  That's easy to document and explain the meaning of
> the tunable.
> --=20
> Peter Wemm - peter@wemm.org; peter@FreeBSD.org; peter@yahoo-inc.com; KI6FJ=
V
> "All of this is for nothing if we don't go to the stars" - JMS/B5
> "If Java had true garbage collection, most programs would delete
> themselves upon execution." -- Robert Sewell