From owner-freebsd-current@FreeBSD.ORG Thu Nov 6 15:56:44 2003 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 866FD16A4CE; Thu, 6 Nov 2003 15:56:44 -0800 (PST) Received: from postal2.es.net (proxy.es.net [198.128.3.206]) by mx1.FreeBSD.org (Postfix) with ESMTP id 7032F43F3F; Thu, 6 Nov 2003 15:56:43 -0800 (PST) (envelope-from oberman@es.net) Received: from ptavv.es.net ([198.128.4.29]) by postal2.es.net (Postal Node 2) with ESMTP (SSL) id MUA74016; Thu, 06 Nov 2003 15:56:42 -0800 Received: from ptavv (localhost [127.0.0.1]) by ptavv.es.net (Tachyon Server) with ESMTP id 7F5555D07; Thu, 6 Nov 2003 15:56:41 -0800 (PST) To: Robert Watson In-Reply-To: Message from Robert Watson Date: Thu, 06 Nov 2003 15:56:41 -0800 From: "Kevin Oberman" Message-Id: <20031106235641.7F5555D07@ptavv.es.net> cc: current@freebsd.org Subject: Re: Kernel memory leak in ATAPI/CAM or ATAng? X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 06 Nov 2003 23:56:44 -0000 > Date: Thu, 6 Nov 2003 11:23:30 -0500 (EST) > From: Robert Watson > > > On Thu, 6 Nov 2003, Kevin Oberman wrote: > > > I have learned a bit more about the problems I have been having with > > the DVD drive on my T30 laptop. When I have run the drive for an > > extended time (like 2 or 3 hours), I invariably have my system lock up > > because it can't malloc kernel memory for the ATAPI/CAM or ATA > > device. (Usually it's both.) > > > > The only recovery seems to be to reboot the system. > > Is it possible to drop to DDB and generate a coredump at that point? If > so, you can run vmstat on the core to look at memory use statistics in a > post-mortem way. As to what to look for: "big numbers" is about the limit > of what I can suggest, I'm afraid :-). Usually the activity of choice is > to compare vmstat statistics (with -m and -z) during normal operation and > when the leak has occurred, and look for any marked differences. It's > worth observing that there are two failure modes here that appear almost > identical: (1) a memory leak resulting in address space exhaustion for the > kernel, and (2) a tunable maximum allocation being too high for the > available address space. Note that (2) isn't a leak, simply a poorly > tuned value. We've noticed a number of tuned memory limits were set when > memory sizes on systems were much lower, and so we've had to readjust the > tuning parameters for large memory systems. Likewise, a number of > problems were observed when PAE was introduced, as some of the tuning > parameters scaled with the amount of physical memory, not with the > addressable space for the kernel. So we probably want to be on the look > out for both of these possibilities. Well, I have no details to this point, but 'vmstat -m' makes the problem obvious. The amount of kernel memory allocated to ATA request climbs forever and after enough data is transferred, it runs out of KVM. This is a continual leak, and monitoring it on the running system makes it pretty clear that something is leaking. I don't think (2) is the issue. Because the field allocated in vmstat are not large enough, this is a bit hard to read. The field all merge into some REALLY large numbers. After reboot, it is <5K. When running mencode I see this increasing at a rate of a bit under 1.9 MB per minute. It does not look like a tuning issue. No matter how big KVM is allowed to grow, it's only a matter of time until it is gone. I am going to do some testing to see what operations seem to causse this. I assume it does not happen all of the time or everyone would have seen it. I suspect it only happens with ATAPI/CAM activity, possibly only with simultaneous ATA and ATAPI/COM activity. -- R. Kevin Oberman, Network Engineer Energy Sciences Network (ESnet) Ernest O. Lawrence Berkeley National Laboratory (Berkeley Lab) E-mail: oberman@es.net Phone: +1 510 486-8634