From owner-freebsd-current@FreeBSD.ORG  Sun Nov  9 13:09:50 2003
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
Delivered-To: freebsd-current@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 2689A16A4CE
	for <current@freebsd.org>; Sun,  9 Nov 2003 13:09:50 -0800 (PST)
Received: from smtp.mho.com (smtp.mho.net [64.58.4.6])
	by mx1.FreeBSD.org (Postfix) with SMTP id B526B43FA3
	for <current@freebsd.org>; Sun,  9 Nov 2003 13:09:45 -0800 (PST)
	(envelope-from scottl@freebsd.org)
Received: (qmail 43269 invoked by uid 1002); 9 Nov 2003 21:09:45 -0000
Received: from unknown (HELO freebsd.org) (64.58.1.252)
  by smtp.mho.net with SMTP; 9 Nov 2003 21:09:45 -0000
Message-ID: <3FAEACED.9040804@freebsd.org>
Date: Sun, 09 Nov 2003 14:09:01 -0700
From: Scott Long <scottl@freebsd.org>
User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.5) Gecko/20031103
X-Accept-Language: en-us, en
MIME-Version: 1.0
To: Kevin Oberman <oberman@es.net>
References: <20031106235641.7F5555D07@ptavv.es.net>
	<3FAB4DAB.5060901@freebsd.org>
In-Reply-To: <3FAB4DAB.5060901@freebsd.org>
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit
cc: Robert Watson <rwatson@freebsd.org>
cc: current@freebsd.org
Subject: Re: Kernel memory leak in ATAPI/CAM or ATAng?
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
	<freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 09 Nov 2003 21:09:50 -0000

Fixed.  Please retest.

Scott Long wrote:
> Kevin Oberman wrote:
> 
>>> Date: Thu, 6 Nov 2003 11:23:30 -0500 (EST)
>>> From: Robert Watson <rwatson@freebsd.org>
>>>
>>>
>>> On Thu, 6 Nov 2003, Kevin Oberman wrote:
>>>
>>>
>>>> I have learned a bit more about the problems I have been having with
>>>> the DVD drive on my T30 laptop. When I have run the drive for an
>>>> extended time (like 2 or 3 hours), I invariably have my system lock up
>>>> because it can't malloc kernel memory for the ATAPI/CAM or ATA
>>>> device. (Usually it's both.)
>>>>
>>>> The only recovery seems to be to reboot the system.
>>>
>>>
>>> Is it possible to drop to DDB and generate a coredump at that point?  If
>>> so, you can run vmstat on the core to look at memory use statistics in a
>>> post-mortem way.  As to what to look for: "big numbers" is about the 
>>> limit
>>> of what I can suggest, I'm afraid :-).  Usually the activity of 
>>> choice is
>>> to compare vmstat statistics (with -m and -z) during normal operation 
>>> and
>>> when the leak has occurred, and look for any marked differences.  It's
>>> worth observing that there are two failure modes here that appear almost
>>> identical: (1) a memory leak resulting in address space exhaustion 
>>> for the
>>> kernel, and (2) a tunable maximum allocation being too high for the
>>> available address space.  Note that (2) isn't a leak, simply a poorly
>>> tuned value.  We've noticed a number of tuned memory limits were set 
>>> when
>>> memory sizes on systems were much lower, and so we've had to readjust 
>>> the
>>> tuning parameters for large memory systems.  Likewise, a number of
>>> problems were observed when PAE was introduced, as some of the tuning
>>> parameters scaled with the amount of physical memory, not with the
>>> addressable space for the kernel.  So we probably want to be on the look
>>> out for both of these possibilities.
>>
>>
>>
>> Well, I have no details to this point, but 'vmstat -m' makes the
>> problem obvious. The amount of kernel memory allocated to ATA request
>> climbs forever and after enough data is transferred, it runs out of
>> KVM. This is a continual leak, and monitoring it on the running system
>> makes it pretty clear that something is leaking. I don't think (2) is
>> the issue. Because the field allocated in vmstat are not large enough,
>> this is a bit hard to read. The field all merge into some REALLY large
>> numbers. After reboot, it is <5K. When running mencode I see this
>> increasing at a rate of a bit under 1.9 MB per minute.
>>
>> It does not look like a tuning issue. No matter how big KVM is allowed
>> to grow, it's only a matter of time until it is gone.
>>
>> I am going to do some testing to see what operations seem to causse
>> this. I assume it does not happen all of the time or everyone would
>> have seen it. I suspect it only happens with ATAPI/CAM activity,
>> possibly only with simultaneous ATA and ATAPI/COM activity.
> 
> 
> Does vmstat -m show which malloc type is growing?  Knowing this will
> greatly speed up the debugging process.
> 
> Thanks!
> 
> Scott
>