From owner-svn-src-user@FreeBSD.ORG  Mon Nov 12 10:49:08 2012
Return-Path: <owner-svn-src-user@FreeBSD.ORG>
Delivered-To: svn-src-user@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
 by hub.freebsd.org (Postfix) with ESMTP id D56EF48F
 for <svn-src-user@freebsd.org>; Mon, 12 Nov 2012 10:49:08 +0000 (UTC)
 (envelope-from andre@freebsd.org)
Received: from c00l3r.networx.ch (c00l3r.networx.ch [62.48.2.2])
 by mx1.freebsd.org (Postfix) with ESMTP id 2B69B8FC15
 for <svn-src-user@freebsd.org>; Mon, 12 Nov 2012 10:49:07 +0000 (UTC)
Received: (qmail 95102 invoked from network); 12 Nov 2012 12:23:30 -0000
Received: from c00l3r.networx.ch (HELO [127.0.0.1]) ([62.48.2.2])
 (envelope-sender <andre@freebsd.org>)
 by c00l3r.networx.ch (qmail-ldap-1.03) with SMTP
 for <andre@FreeBSD.org>; 12 Nov 2012 12:23:30 -0000
Message-ID: <50A0D420.4030106@freebsd.org>
Date: Mon, 12 Nov 2012 11:49:04 +0100
From: Andre Oppermann <andre@freebsd.org>
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64;
 rv:16.0) Gecko/20121010 Thunderbird/16.0.1
MIME-Version: 1.0
To: Andre Oppermann <andre@FreeBSD.org>
Subject: Re: svn commit: r242910 - in user/andre/tcp_workqueue/sys: kern sys
References: <201211120847.qAC8lEAM086331@svn.freebsd.org>
In-Reply-To: <201211120847.qAC8lEAM086331@svn.freebsd.org>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Cc: src-committers@freebsd.org, svn-src-user@freebsd.org
X-BeenThere: svn-src-user@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: "SVN commit messages for the experimental &quot; user&quot;
 src tree" <svn-src-user.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/svn-src-user>,
 <mailto:svn-src-user-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/svn-src-user>
List-Post: <mailto:svn-src-user@freebsd.org>
List-Help: <mailto:svn-src-user-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/svn-src-user>,
 <mailto:svn-src-user-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 12 Nov 2012 10:49:09 -0000

On 12.11.2012 09:47, Andre Oppermann wrote:
> Author: andre
> Date: Mon Nov 12 08:47:13 2012
> New Revision: 242910
> URL: http://svnweb.freebsd.org/changeset/base/242910
>
> Log:
>    Base the mbuf related limits on the available physical memory or
>    kernel memory, whichever is lower.

The commit message is a bit terse so I'm going to explain in more
detail:

The overall mbuf related memory limit must be set so that mbufs
(and clusters of various sizes) can't exhaust physical RAM or KVM.

I've chosen a limit of half the physical RAM or KVM (whichever is
lower) as the baseline.  In any normal scenario we want to leave
at least half of the physmem/kvm for other kernel functions and
userspace to prevent it from swapping like hell.  Via a tunable
it can be upped to at most 3/4 of physmem/kvm.

Out of the overall mbuf memory limit I've chosen 2K clusters, 4K
(page size) clusters to get 1/4 each because these are the most
heavily used mbuf sizes.  2K clusters are used for MTU 1500 ethernet
inbound packets.  4K clusters are used whenever possible for sends
on sockets and thus outbound packets.

The larger cluster sizes of 9K and 16K are limited to 1/6 of the
overall mbuf memory limit.  Again, when jumbo MTU's are used these
large clusters will end up only on the inbound path.  They are not
used on outbound, there it's still 4K.  Yes, that will stay that
way because otherwise we run into lots of complications in the
stack.  And it really isn't a problem, so don't make a scene.

Previously the normal mbufs (256B) weren't limited at all.  This
is wrong as there are certain places in the kernel that on allocation
failure of clusters try to piece together their packet from smaller
mbufs.  The mbuf limit is the number of all other mbuf sizes together
plus some more to allow for standalone mbufs (ACK for example) and
to send off a copy of a cluster.  FYI: Every cluster eventually also
has an mbuf associated with it.

Unfortunately there isn't a way to set an overall limit for all
mbuf memory together as UMA doesn't support such a limiting.

Lets work out a few examples on sizing:

1GB KVM:
  512MB limit for mbufs
  419,430 mbufs
   65,536 2K mbuf clusters
   32,768 4K mbuf clusters
    9,709 9K mbuf clusters
    5,461 16K mbuf clusters

16GB RAM:
  8GB limit for mbufs
  33,554,432 mbufs
   1,048,576 2K mbuf clusters
     524,288 4K mbuf clusters
     155,344 9K mbuf clusters
      87,381 16K mbuf clusters

These defaults should be sufficient for event the most demanding
network loads.  If you do run into these limits you probably know
exactly what you are doing and you are expected to tune those
values for your particular purpose.

There is a side-issue with maxfiles as it relates to the maximum
number of sockets that can be opened at the same time.  With web
servers and proxy caches of these days there may be some 100K or
more sockets open.  Hence I've divorced maxfiles from maxusers as
well.  There is a relationship of maxfiles with the callout callwheel
though which has to be investigated some more to prevent ridiculous
values from being chosen.

-- 
Andre