From owner-freebsd-current@FreeBSD.ORG  Fri Jun 13 10:08:51 2003
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
Delivered-To: freebsd-current@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 17AC637B401
	for <current@freebsd.org>; Fri, 13 Jun 2003 10:08:51 -0700 (PDT)
Received: from bluejay.mail.pas.earthlink.net (bluejay.mail.pas.earthlink.net
	[207.217.120.218])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 788B843F3F
	for <current@freebsd.org>; Fri, 13 Jun 2003 10:08:50 -0700 (PDT)
	(envelope-from tlambert2@mindspring.com)
Received: from user-38lc0hv.dialup.mindspring.com ([209.86.2.63]
	helo=mindspring.com)
	by bluejay.mail.pas.earthlink.net with asmtp (SSLv3:RC4-MD5:128)
	(Exim 3.33 #1)	id 19Qs2J-0003Ai-00; Fri, 13 Jun 2003 10:08:24 -0700
Message-ID: <3EEA04BD.9E06ED0@mindspring.com>
Date: Fri, 13 Jun 2003 10:07:09 -0700
From: Terry Lambert <tlambert2@mindspring.com>
X-Mailer: Mozilla 4.79 [en] (Win98; U)
X-Accept-Language: en
MIME-Version: 1.0
To: John Hay <jhay@icomtek.csir.co.za>
References: <20030613125156.GA8733@zibbi.icomtek.csir.co.za>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-ELNK-Trace: b1a02af9316fbb217a47c185c03b154d40683398e744b8a451f9ec7dd25958e0ff765b749fa0587ca8438e0f32a48e08350badd9bab72f9c350badd9bab72f9c
cc: current@FreeBSD.org
Subject: panic: kmem_map too small: the downside of FreeBSD 5
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
	<freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 13 Jun 2003 17:08:51 -0000

John Hay wrote:
> On a 5.1-RELEASE machine I have been able to cause a panic like this:
> panic: kmem_malloc(4096): kmem_map too small: 28610560 total allocated

Manually tune your system.  This panic results from the fact
that zone allocations with fixed limits don't really do the
right thing any more, now that it's possible to implement the
map entry allocations at interrupt.

Before the new memory allocator, there was an allocator entry
point called zalloci() that differed from zalloc() in that it
pre-allocated its map entries at declaration time, so that
there was always backing in KVA for yet-to-be-allocated pages.

With the new memory allocator, this is no longer the case.

Because of this, the kmem_map must be extended when the amount
of memory request would require a KVA mapping that does not yet
exist.

Normally this is only a problem if you have a huge amount of
memory, and there's not enough KVA to create the page mappings.

This can happen on auto-tuned systems using PAE, PSE36, or a
similar method, with more than 3-4G of physical RAM, since the
KVA is still limited to "4G - UVA size" on these systems.

It can also happen on any system that runs out of physical
memory before filling in all the page mappings that would have
been statically mapped with zalloci(), but aren't with plain
zalloc() because of the new memory allocator.

Really, as part of the switch to the new memory allocator, and
the deprecation of the zalloci() interface that accompanied it,
an audit should have been done of the system to go through all
previous places zalloci() was used, and make them robust in case
of a NULL return value (allocation failure), since those places
were effectively promised by zalloci() that allocations would
never fail for this reason.

The "panic" call in the attempt to grow the kmem_map should
probably be eliminated, to expose the places which are in
error.  The real problem here is that when you take a trap
fault on a page-not-present, you can't return to the program
that initiated the fault and cause it to block waiting for
memory (for interrupts, this is just not possible).

About the only code that used to allocate at interrupt that's
robust in the face of the new memory allocator and kmem_map
pressure is the mbuf code, since it has historically been
prepared for a NULL return on an allocation request, and the
intervening trap fault on the reserved KVA page for the page
mapping doesn't bother it.

IMO, the new memory allocator code needs to be refactored, in
addition to an audit, as does the auto-tuning.  Specifically,
kernel memory is, with rare exceptions like the uarea, which
people who don't understand are trying to kill off, non-pageable.
A strategy that suggests itself is to provide page mappings for
all of physical memory, before doing anything else.  What memory
remains is then available for use by the kernel.  This would
need a free-pool on top of everything else, since having a KVA
mapping and owning a corresponding physical mapping would be two
different things.

Right now, your only option is to disable auto-tuning (set the
value of "maxusers" to something other than 0), and manually
tune the system, such that the total amount of prereserved and
not-prereserved-but-allocable kernel memory can't exceed the
physical memory size.  You will have to take kernel size and
kernel modules into account, if you want to get close to full
utiilization of physical memory, if you do this.

-- Terry