From owner-freebsd-current@FreeBSD.ORG Sun Nov 9 13:09:50 2003 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 2689A16A4CE for ; Sun, 9 Nov 2003 13:09:50 -0800 (PST) Received: from smtp.mho.com (smtp.mho.net [64.58.4.6]) by mx1.FreeBSD.org (Postfix) with SMTP id B526B43FA3 for ; Sun, 9 Nov 2003 13:09:45 -0800 (PST) (envelope-from scottl@freebsd.org) Received: (qmail 43269 invoked by uid 1002); 9 Nov 2003 21:09:45 -0000 Received: from unknown (HELO freebsd.org) (64.58.1.252) by smtp.mho.net with SMTP; 9 Nov 2003 21:09:45 -0000 Message-ID: <3FAEACED.9040804@freebsd.org> Date: Sun, 09 Nov 2003 14:09:01 -0700 From: Scott Long User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.5) Gecko/20031103 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Kevin Oberman References: <20031106235641.7F5555D07@ptavv.es.net> <3FAB4DAB.5060901@freebsd.org> In-Reply-To: <3FAB4DAB.5060901@freebsd.org> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit cc: Robert Watson cc: current@freebsd.org Subject: Re: Kernel memory leak in ATAPI/CAM or ATAng? X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 09 Nov 2003 21:09:50 -0000 Fixed. Please retest. Scott Long wrote: > Kevin Oberman wrote: > >>> Date: Thu, 6 Nov 2003 11:23:30 -0500 (EST) >>> From: Robert Watson >>> >>> >>> On Thu, 6 Nov 2003, Kevin Oberman wrote: >>> >>> >>>> I have learned a bit more about the problems I have been having with >>>> the DVD drive on my T30 laptop. When I have run the drive for an >>>> extended time (like 2 or 3 hours), I invariably have my system lock up >>>> because it can't malloc kernel memory for the ATAPI/CAM or ATA >>>> device. (Usually it's both.) >>>> >>>> The only recovery seems to be to reboot the system. >>> >>> >>> Is it possible to drop to DDB and generate a coredump at that point? If >>> so, you can run vmstat on the core to look at memory use statistics in a >>> post-mortem way. As to what to look for: "big numbers" is about the >>> limit >>> of what I can suggest, I'm afraid :-). Usually the activity of >>> choice is >>> to compare vmstat statistics (with -m and -z) during normal operation >>> and >>> when the leak has occurred, and look for any marked differences. It's >>> worth observing that there are two failure modes here that appear almost >>> identical: (1) a memory leak resulting in address space exhaustion >>> for the >>> kernel, and (2) a tunable maximum allocation being too high for the >>> available address space. Note that (2) isn't a leak, simply a poorly >>> tuned value. We've noticed a number of tuned memory limits were set >>> when >>> memory sizes on systems were much lower, and so we've had to readjust >>> the >>> tuning parameters for large memory systems. Likewise, a number of >>> problems were observed when PAE was introduced, as some of the tuning >>> parameters scaled with the amount of physical memory, not with the >>> addressable space for the kernel. So we probably want to be on the look >>> out for both of these possibilities. >> >> >> >> Well, I have no details to this point, but 'vmstat -m' makes the >> problem obvious. The amount of kernel memory allocated to ATA request >> climbs forever and after enough data is transferred, it runs out of >> KVM. This is a continual leak, and monitoring it on the running system >> makes it pretty clear that something is leaking. I don't think (2) is >> the issue. Because the field allocated in vmstat are not large enough, >> this is a bit hard to read. The field all merge into some REALLY large >> numbers. After reboot, it is <5K. When running mencode I see this >> increasing at a rate of a bit under 1.9 MB per minute. >> >> It does not look like a tuning issue. No matter how big KVM is allowed >> to grow, it's only a matter of time until it is gone. >> >> I am going to do some testing to see what operations seem to causse >> this. I assume it does not happen all of the time or everyone would >> have seen it. I suspect it only happens with ATAPI/CAM activity, >> possibly only with simultaneous ATA and ATAPI/COM activity. > > > Does vmstat -m show which malloc type is growing? Knowing this will > greatly speed up the debugging process. > > Thanks! > > Scott >