From owner-svn-src-all@FreeBSD.ORG  Thu Nov  8 09:13:37 2012
Return-Path: <owner-svn-src-all@FreeBSD.ORG>
Delivered-To: svn-src-all@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
 by hub.freebsd.org (Postfix) with ESMTP id C5E21172;
 Thu,  8 Nov 2012 09:13:37 +0000 (UTC) (envelope-from bright@mu.org)
Received: from elvis.mu.org (elvis.mu.org [192.203.228.196])
 by mx1.freebsd.org (Postfix) with ESMTP id 9026C8FC0A;
 Thu,  8 Nov 2012 09:13:37 +0000 (UTC)
Received: from Alfreds-MacBook-Pro-5.local
 (c-67-180-208-218.hsd1.ca.comcast.net [67.180.208.218])
 by elvis.mu.org (Postfix) with ESMTPSA id 487681A3CC3;
 Thu,  8 Nov 2012 01:13:37 -0800 (PST)
Message-ID: <509B77C0.9060202@mu.org>
Date: Thu, 08 Nov 2012 01:13:36 -0800
From: Alfred Perlstein <bright@mu.org>
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7;
 rv:16.0) Gecko/20121026 Thunderbird/16.0.2
MIME-Version: 1.0
To: Peter Wemm <peter@wemm.org>
Subject: Re: svn commit: r242029 - head/sys/kern
References: <201210250146.q9P1kLi8043704@svn.freebsd.org>
 <20121025080551.GG35915@deviant.kiev.zoral.com.ua>
 <201210250950.57161.jhb@freebsd.org> <509B501F.5050109@mu.org>
 <CAGE5yCobr4ZU0DEWZSez1kp4jo_V4gSby0kGqFF06Dev_Cc_jA@mail.gmail.com>
In-Reply-To: <CAGE5yCobr4ZU0DEWZSez1kp4jo_V4gSby0kGqFF06Dev_Cc_jA@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: src-committers@freebsd.org, John Baldwin <jhb@freebsd.org>,
 svn-src-all@freebsd.org, Alfred Perlstein <alfred@freebsd.org>,
 svn-src-head@freebsd.org, Konstantin Belousov <kostikbel@gmail.com>
X-BeenThere: svn-src-all@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: "SVN commit messages for the entire src tree \(except for &quot;
 user&quot; and &quot; projects&quot; \)" <svn-src-all.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/svn-src-all>,
 <mailto:svn-src-all-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/svn-src-all>
List-Post: <mailto:svn-src-all@freebsd.org>
List-Help: <mailto:svn-src-all-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/svn-src-all>,
 <mailto:svn-src-all-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 08 Nov 2012 09:13:38 -0000

Peter,

I agree.

It's certainly not perfect, however it's not nearly as bogus as what was 
there previously.

I know "maxusers" is "wrong", however what it really means, if you think 
about it, is "give me a scaling factor that is relative to physical ram, 
BUT capped at some value so as to not exhaust KVA." Yes, I grok that on 
certain architectures that mbufs clusters aren't pulled from KVA, but, 
that seems much less important than how broken it is currently....

This fix is "good enough" for the general case, and a far greater 
improvement than what was there previously which would make FreeBSD blow 
chunks on any sort of 10gigE load.

I think what needs to happen here, is that the people requiring 
perfection think about what mess it was prior and if they themselves do 
not have time to make it 100% perfect, allowing someone to step in and 
move something
a step in the right direction without overly complicating it.

What is there is crap, it's old, crufty and broken, it really is. It 
needs to be fixed, it needs to be given a nice fat band-aid now, and 
when someone interested in perfection comes along, then they can make it 
even more awesome.

I am not saying that my fix is PERFECT or the be all and end all, 
however it serves as a good step in the right direction on our tier 1 
platforms and is easily modifiable (just replace "VM_MAX_KERNEL_ADDRESS 
- VM_MIN_KERNEL_ADDRESS" with some form of MD magic sauce.)  Would you 
like me to do that?  Replace the hardline calculation with some constant 
that each platform can configure?

I'm thinking this might suffice to make purists a bit more happy:

#if defined(i386) || defined(amd64)
#define MAX_KERNEL_ADDRESS_SPACE (VM_MAX_KERNEL_ADDRESS - 
VM_MIN_KERNEL_ADDRESS)
#else
#define MAX_KERNEL_ADDRESS_SPACE  (1024*1024*1024)
#endif

Given my algorithm this should result in pretty much the same for other 
platforms than amd64 which will then be able to grow maxusers some.

I'm basically running out of time on this and I'm worried that I'll have 
to back it out indefinitely so that FreeBSD can't do 10gigE out of the box.

-Alfred

On 11/7/12 11:46 PM, Peter Wemm wrote:
> On Wed, Nov 7, 2012 at 10:24 PM, Alfred Perlstein <bright@mu.org> wrote:
>> [[ + peter ]]
>>
>> Folks, I spent quite a bit of time trying to figure out how to resolve
>> maxusers scaling in a happy way for all.
>>
>> I think I came up with a solution.
>>
>> This solution should work for i386, and other 32 bit platforms, as well as
>> scaling well for 64 bit (or higher) platforms that have virtually unlimited
>> AND 64bit with limited kernel address space.
>>
>> Here is how it works:
>>
>> We calculate the maxusers value based on physical memory, and then clamp it
>> down if physical memory exceeds kernel addressable memory.
>>
>> The algorithm actually remains the same for all architectures, with the
>> exception that machines with large kernel address space it is no longer
>> clamped at 384.
>>
>> I've attached a test program that lets you play with various values for
>> VM_MIN_KERNEL_ADDRESS, VM_MAX_KERNEL_ADDRESS and physpages.  (argv[1, 2, 3]
>> respectively.)
>>
>> Please give me your feedback.
> This is still bogus.  VM_MIN_KERNEL_ADDRESS and VM_MAX_KERNEL_ADDRESS
> have no bearing on how much space should be allocated for mbuf
> clusters on amd64.  If anything, you want dmapbase / dmapend if you
> want a practical cap for amd64.  Even then, jumbo clusters are >4K so
> they come out of kva rather than direct map.
>
> maxusers is the wrong thing for this.  maxusers should, if anything,
> be used to set things like kern.maxproc.  Preferably it should be
> deleted entirely and sysctl.conf should be used to change
> kern.maxproc.
>
> Setting limits for the mbuf / cluster pool should be a MD parameter.
>
> Trying to scale maxusers based on physical ram in order to get mbuf
> cluster limits set as a side effect is just plain wrong.
>
> It makes no more sense than trying to set nmbclusters based on
> PRINTF_BUFR_SIZE, and then trying to scale PRINTF_BUFR_SIZE in order
> to get desirable second and third order side effects.
>
> Scale nmbclusters based on physical ram, with a MD method for capping
> it for when there are MD limits (eg: disproportionately small kva on
> an i386 PAE machine).  Don't use maxusers.