Date: Thu, 06 Nov 2003 15:56:41 -0800 From: "Kevin Oberman" <oberman@es.net> To: Robert Watson <rwatson@freebsd.org> Cc: current@freebsd.org Subject: Re: Kernel memory leak in ATAPI/CAM or ATAng? Message-ID: <20031106235641.7F5555D07@ptavv.es.net> In-Reply-To: Message from Robert Watson <rwatson@freebsd.org> <Pine.NEB.3.96L.1031106111952.3700A-100000@fledge.watson.org>
next in thread | previous in thread | raw e-mail | index | archive | help
> Date: Thu, 6 Nov 2003 11:23:30 -0500 (EST) > From: Robert Watson <rwatson@freebsd.org> > > > On Thu, 6 Nov 2003, Kevin Oberman wrote: > > > I have learned a bit more about the problems I have been having with > > the DVD drive on my T30 laptop. When I have run the drive for an > > extended time (like 2 or 3 hours), I invariably have my system lock up > > because it can't malloc kernel memory for the ATAPI/CAM or ATA > > device. (Usually it's both.) > > > > The only recovery seems to be to reboot the system. > > Is it possible to drop to DDB and generate a coredump at that point? If > so, you can run vmstat on the core to look at memory use statistics in a > post-mortem way. As to what to look for: "big numbers" is about the limit > of what I can suggest, I'm afraid :-). Usually the activity of choice is > to compare vmstat statistics (with -m and -z) during normal operation and > when the leak has occurred, and look for any marked differences. It's > worth observing that there are two failure modes here that appear almost > identical: (1) a memory leak resulting in address space exhaustion for the > kernel, and (2) a tunable maximum allocation being too high for the > available address space. Note that (2) isn't a leak, simply a poorly > tuned value. We've noticed a number of tuned memory limits were set when > memory sizes on systems were much lower, and so we've had to readjust the > tuning parameters for large memory systems. Likewise, a number of > problems were observed when PAE was introduced, as some of the tuning > parameters scaled with the amount of physical memory, not with the > addressable space for the kernel. So we probably want to be on the look > out for both of these possibilities. Well, I have no details to this point, but 'vmstat -m' makes the problem obvious. The amount of kernel memory allocated to ATA request climbs forever and after enough data is transferred, it runs out of KVM. This is a continual leak, and monitoring it on the running system makes it pretty clear that something is leaking. I don't think (2) is the issue. Because the field allocated in vmstat are not large enough, this is a bit hard to read. The field all merge into some REALLY large numbers. After reboot, it is <5K. When running mencode I see this increasing at a rate of a bit under 1.9 MB per minute. It does not look like a tuning issue. No matter how big KVM is allowed to grow, it's only a matter of time until it is gone. I am going to do some testing to see what operations seem to causse this. I assume it does not happen all of the time or everyone would have seen it. I suspect it only happens with ATAPI/CAM activity, possibly only with simultaneous ATA and ATAPI/COM activity. -- R. Kevin Oberman, Network Engineer Energy Sciences Network (ESnet) Ernest O. Lawrence Berkeley National Laboratory (Berkeley Lab) E-mail: oberman@es.net Phone: +1 510 486-8634
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20031106235641.7F5555D07>