From owner-freebsd-current@FreeBSD.ORG Fri Jun 13 10:08:51 2003 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 17AC637B401 for ; Fri, 13 Jun 2003 10:08:51 -0700 (PDT) Received: from bluejay.mail.pas.earthlink.net (bluejay.mail.pas.earthlink.net [207.217.120.218]) by mx1.FreeBSD.org (Postfix) with ESMTP id 788B843F3F for ; Fri, 13 Jun 2003 10:08:50 -0700 (PDT) (envelope-from tlambert2@mindspring.com) Received: from user-38lc0hv.dialup.mindspring.com ([209.86.2.63] helo=mindspring.com) by bluejay.mail.pas.earthlink.net with asmtp (SSLv3:RC4-MD5:128) (Exim 3.33 #1) id 19Qs2J-0003Ai-00; Fri, 13 Jun 2003 10:08:24 -0700 Message-ID: <3EEA04BD.9E06ED0@mindspring.com> Date: Fri, 13 Jun 2003 10:07:09 -0700 From: Terry Lambert X-Mailer: Mozilla 4.79 [en] (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: John Hay References: <20030613125156.GA8733@zibbi.icomtek.csir.co.za> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-ELNK-Trace: b1a02af9316fbb217a47c185c03b154d40683398e744b8a451f9ec7dd25958e0ff765b749fa0587ca8438e0f32a48e08350badd9bab72f9c350badd9bab72f9c cc: current@FreeBSD.org Subject: panic: kmem_map too small: the downside of FreeBSD 5 X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 13 Jun 2003 17:08:51 -0000 John Hay wrote: > On a 5.1-RELEASE machine I have been able to cause a panic like this: > panic: kmem_malloc(4096): kmem_map too small: 28610560 total allocated Manually tune your system. This panic results from the fact that zone allocations with fixed limits don't really do the right thing any more, now that it's possible to implement the map entry allocations at interrupt. Before the new memory allocator, there was an allocator entry point called zalloci() that differed from zalloc() in that it pre-allocated its map entries at declaration time, so that there was always backing in KVA for yet-to-be-allocated pages. With the new memory allocator, this is no longer the case. Because of this, the kmem_map must be extended when the amount of memory request would require a KVA mapping that does not yet exist. Normally this is only a problem if you have a huge amount of memory, and there's not enough KVA to create the page mappings. This can happen on auto-tuned systems using PAE, PSE36, or a similar method, with more than 3-4G of physical RAM, since the KVA is still limited to "4G - UVA size" on these systems. It can also happen on any system that runs out of physical memory before filling in all the page mappings that would have been statically mapped with zalloci(), but aren't with plain zalloc() because of the new memory allocator. Really, as part of the switch to the new memory allocator, and the deprecation of the zalloci() interface that accompanied it, an audit should have been done of the system to go through all previous places zalloci() was used, and make them robust in case of a NULL return value (allocation failure), since those places were effectively promised by zalloci() that allocations would never fail for this reason. The "panic" call in the attempt to grow the kmem_map should probably be eliminated, to expose the places which are in error. The real problem here is that when you take a trap fault on a page-not-present, you can't return to the program that initiated the fault and cause it to block waiting for memory (for interrupts, this is just not possible). About the only code that used to allocate at interrupt that's robust in the face of the new memory allocator and kmem_map pressure is the mbuf code, since it has historically been prepared for a NULL return on an allocation request, and the intervening trap fault on the reserved KVA page for the page mapping doesn't bother it. IMO, the new memory allocator code needs to be refactored, in addition to an audit, as does the auto-tuning. Specifically, kernel memory is, with rare exceptions like the uarea, which people who don't understand are trying to kill off, non-pageable. A strategy that suggests itself is to provide page mappings for all of physical memory, before doing anything else. What memory remains is then available for use by the kernel. This would need a free-pool on top of everything else, since having a KVA mapping and owning a corresponding physical mapping would be two different things. Right now, your only option is to disable auto-tuning (set the value of "maxusers" to something other than 0), and manually tune the system, such that the total amount of prereserved and not-prereserved-but-allocable kernel memory can't exceed the physical memory size. You will have to take kernel size and kernel modules into account, if you want to get close to full utiilization of physical memory, if you do this. -- Terry