Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 06 Nov 2003 15:56:41 -0800
From:      "Kevin Oberman" <oberman@es.net>
To:        Robert Watson <rwatson@freebsd.org>
Cc:        current@freebsd.org
Subject:   Re: Kernel memory leak in ATAPI/CAM or ATAng? 
Message-ID:  <20031106235641.7F5555D07@ptavv.es.net>
In-Reply-To: Message from Robert Watson <rwatson@freebsd.org>  <Pine.NEB.3.96L.1031106111952.3700A-100000@fledge.watson.org> 

next in thread | previous in thread | raw e-mail | index | archive | help
> Date: Thu, 6 Nov 2003 11:23:30 -0500 (EST)
> From: Robert Watson <rwatson@freebsd.org>
> 
> 
> On Thu, 6 Nov 2003, Kevin Oberman wrote:
> 
> > I have learned a bit more about the problems I have been having with
> > the DVD drive on my T30 laptop. When I have run the drive for an
> > extended time (like 2 or 3 hours), I invariably have my system lock up
> > because it can't malloc kernel memory for the ATAPI/CAM or ATA
> > device. (Usually it's both.)
> > 
> > The only recovery seems to be to reboot the system.
> 
> Is it possible to drop to DDB and generate a coredump at that point?  If
> so, you can run vmstat on the core to look at memory use statistics in a
> post-mortem way.  As to what to look for: "big numbers" is about the limit
> of what I can suggest, I'm afraid :-).  Usually the activity of choice is
> to compare vmstat statistics (with -m and -z) during normal operation and
> when the leak has occurred, and look for any marked differences.  It's
> worth observing that there are two failure modes here that appear almost
> identical: (1) a memory leak resulting in address space exhaustion for the
> kernel, and (2) a tunable maximum allocation being too high for the
> available address space.  Note that (2) isn't a leak, simply a poorly
> tuned value.  We've noticed a number of tuned memory limits were set when
> memory sizes on systems were much lower, and so we've had to readjust the
> tuning parameters for large memory systems.  Likewise, a number of
> problems were observed when PAE was introduced, as some of the tuning
> parameters scaled with the amount of physical memory, not with the
> addressable space for the kernel.  So we probably want to be on the look
> out for both of these possibilities.

Well, I have no details to this point, but 'vmstat -m' makes the
problem obvious. The amount of kernel memory allocated to ATA request
climbs forever and after enough data is transferred, it runs out of
KVM. This is a continual leak, and monitoring it on the running system
makes it pretty clear that something is leaking. I don't think (2) is
the issue. Because the field allocated in vmstat are not large enough,
this is a bit hard to read. The field all merge into some REALLY large
numbers. After reboot, it is <5K. When running mencode I see this
increasing at a rate of a bit under 1.9 MB per minute.

It does not look like a tuning issue. No matter how big KVM is allowed
to grow, it's only a matter of time until it is gone.

I am going to do some testing to see what operations seem to causse
this. I assume it does not happen all of the time or everyone would
have seen it. I suspect it only happens with ATAPI/CAM activity,
possibly only with simultaneous ATA and ATAPI/COM activity.
-- 
R. Kevin Oberman, Network Engineer
Energy Sciences Network (ESnet)
Ernest O. Lawrence Berkeley National Laboratory (Berkeley Lab)
E-mail: oberman@es.net			Phone: +1 510 486-8634



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20031106235641.7F5555D07>