From owner-freebsd-current@FreeBSD.ORG Wed May 2 14:53:51 2007 Return-Path: X-Original-To: freebsd-current@freebsd.org Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 4825716A401; Wed, 2 May 2007 14:53:51 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [209.31.154.42]) by mx1.freebsd.org (Postfix) with ESMTP id C7B0C13C45D; Wed, 2 May 2007 14:53:50 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [209.31.154.41]) by cyrus.watson.org (Postfix) with ESMTP id 1A9F546F96; Wed, 2 May 2007 10:53:50 -0400 (EDT) Date: Wed, 2 May 2007 15:53:50 +0100 (BST) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: Rick Macklem In-Reply-To: Message-ID: <20070502154934.E30345@fledge.watson.org> References: <20070407165759.GG8831@cicely12.cicely.de> <20070407180319.GH8831@cicely12.cicely.de> <20070407191517.GN63916@garage.freebsd.pl> <20070407212413.GK8831@cicely12.cicely.de> <20070410003505.GA8189@nowhere> <46365F76.7090708@infidyne.com> <20070430213043.GF67738@garage.freebsd.pl> <463665F2.8090605@infidyne.com> <46373CAD.6000502@infidyne.com> <20070501160213.GA496@xor.obsecurity.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: Craig Boston , Pawel Jakub Dawidek , freebsd-fs@freebsd.org, freebsd-current@freebsd.org, Kris Kennaway , Peter Schuller Subject: Re: ZFS committed to the FreeBSD base. X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 02 May 2007 14:53:51 -0000 On Tue, 1 May 2007, Rick Macklem wrote: > On Tue, 1 May 2007, Kris Kennaway wrote: > >>> I don't know if it relevent, but I've seen "kmem_map: too small" panics >>> when testing my NFSv4 server, ever since about FreeBSD5.4. There is no >>> problem running the same server code on FreeBSD4 (which is what I still >>> run in production mode) or OpenBSD3 or 4. If I increase the size of the >>> map, I can delay the panic for up to about two weeks of hard testing, but >>> it never goes away. I don't see any evidence of a memory leak during the >>> several days of testing leading up to the panic. (NFSv4 uses MALLOC/FREE >>> extensively for state related structures.) >> >> Sounds exactly like a memory leak to me. How did you rule it out? > Well, I had a little program running on the server that grabbed the > mti_stats[] out of the kernel and logged them. I had one client mounted > running thousands of passes of the Connectathon basic tests (one client, > same activity over and over and over again). For a week, the stats don't > show any increase in allocation for any type (alloc - free doesn't get > unreasonably big), then..."panic: kmem_map too small". How many days it took > to happen would vary with the size of the kernel map, but no evidence of a > leak prior to the crash. It seemed to be based on the number of times MALLOC > and FREE were called. > > Also, the same server code (except for the port changes, which have nothing > to do with the state handling where MALLOC/FREE get called a lot), works > fine for months on FreeBSD4 and OpenBSD3.9. > > So, I won't say a "memory leak is ruled out", but if there was a leak why > wouldn't it bite FreeBSD4 or show up in mti_stats[]? > > I first saw it on FreeBSD6.0, but went back to FreeBSD5.4 and tried the same > test and got the same result. Historically, such panics have been a result of one of two things: (1) An immediate resource leak in UMA(9) or malloc(9) allocated memory. (2) Mis-tuning of a resource limit, perhaps due to sizing the limit based on solely physical memory size, not taking available kernel address space into account. mti_stats reports only on malloc(9), you need to also look at uma(9), since many frequently allocated types are allocated directly with the slab allocator, and not from kernel malloc. Take a look at the output of "show uma" or "show malloc" in DDB, or respectively "vmstat -z" and "vmstat -m" on a core or on a live system. malloc(9) is actually implemented using two different back-ends: UMA-managed fixed size memory buckets for small allocations, and direct page allocation for large allocations. The most frequent example of (2) is mis-tuning in the maximum vnode limit of the system, resulting in the vnode cache exceeding available address space. Try tuning down that limit. Notice that vnodes, inodes, and most frequently used file system allocation data types are allocated using uma(9) and not malloc(9). Robert N M Watson Computer Laboratory University of Cambridge