From owner-freebsd-current@FreeBSD.ORG  Fri Jan 18 15:29:47 2013
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
Delivered-To: freebsd-current@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id 769ACD05
 for <freebsd-current@freebsd.org>; Fri, 18 Jan 2013 15:29:47 +0000 (UTC)
 (envelope-from andre@freebsd.org)
Received: from c00l3r.networx.ch (c00l3r.networx.ch [62.48.2.2])
 by mx1.freebsd.org (Postfix) with ESMTP id CB90CFF9
 for <freebsd-current@freebsd.org>; Fri, 18 Jan 2013 15:29:46 +0000 (UTC)
Received: (qmail 54728 invoked from network); 18 Jan 2013 16:51:56 -0000
Received: from unknown (HELO [62.48.0.94]) ([62.48.0.94])
 (envelope-sender <andre@freebsd.org>)
 by c00l3r.networx.ch (qmail-ldap-1.03) with SMTP
 for <freebsd-current@freebsd.org>; 18 Jan 2013 16:51:56 -0000
Message-ID: <50F96A67.9080203@freebsd.org>
Date: Fri, 18 Jan 2013 16:29:43 +0100
From: Andre Oppermann <andre@freebsd.org>
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64;
 rv:17.0) Gecko/20130107 Thunderbird/17.0.2
MIME-Version: 1.0
To: freebsd-current@freebsd.org, freebsd-hackers <freebsd-hackers@freebsd.org>
Subject: kmem_map auto-sizing and size dependencies
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
 <freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-current>, 
 <mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
 <mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 18 Jan 2013 15:29:47 -0000

The autotuning work is reaching into many places of the kernel and
while trying to tie up all lose ends I've got stuck in the kmem_map
and how it works or what its limitations are.

During startup the VM is initialized and an initial kernel virtual
memory map is setup in kmem_init() covering the entire KVM address
range.  Only the kernel itself is actually allocated within that
map.  A bit later on a number of other submaps are allocated (clean_map,
buffer_map, pager_map, exec_map).  Also in kmeminit() (in kern_malloc.c,
different from kmem_init) the kmem_map is allocated.

The (inital?) size of the kmem_map is determined by some voodoo magic,
a sprinkle of nmbclusters * PAGE_SIZE incrementor and lots of tunables.
However it seems to work out to an effective kmem_map_size of about 58MB
on my 16GB AMD64 dev machine:

vm.kvm_size: 549755809792
vm.kvm_free: 530233421824
vm.kmem_size: 16,594,300,928
vm.kmem_size_min: 0
vm.kmem_size_max: 329,853,485,875
vm.kmem_size_scale: 1
vm.kmem_map_size: 59,518,976
vm.kmem_map_free: 16,534,777,856

The kmem_map serves kernel malloc (via UMA), contigmalloc and everthing
else that uses UMA for memory allocation.

Mbuf memory too is managed by UMA which obtains the backing kernel memory
from the kmem_map.  The limits of the various mbuf memory types have
been considerably raised recently and may make use of 50-75% of all physically
present memory, or available KVM space, whichever is smaller.

Now my questions/comments are:

  Does the kmem_map automatically extend itself if more memory is requested?

  Should it be set to a larger initial value based on min(physical,KVM) space
  available?

  The use of nmbclusters for the initial kmem_map size calculation isn't
  appropriate anymore due to it being set up later and nmbclusters isn't the
  only mbuf relevant mbuf type.  We make significant use of page sized mbuf
  clusters too.

  The naming and output of the various vm.kmem_* and vm.kvm_* sysctls is
  confusing and not easy to reconcile.  Either we need some more detailing
  more aspects or less.  Plus perhaps sysctl subtrees to better describe the
  hierarchy of the maps.

  Why are separate kmem submaps being used?  Is it to limit memory usage of
  certain subsystems?  Are those limits actually enforced?

-- 
Andre