From owner-freebsd-stable@FreeBSD.ORG  Mon Nov 29 12:55:28 2010
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id D12F0106566C
	for <freebsd-stable@freebsd.org>; Mon, 29 Nov 2010 12:55:28 +0000 (UTC)
	(envelope-from smithi@nimnet.asn.au)
Received: from sola.nimnet.asn.au (paqi.nimnet.asn.au [115.70.110.159])
	by mx1.freebsd.org (Postfix) with ESMTP id 17C3F8FC15
	for <freebsd-stable@freebsd.org>; Mon, 29 Nov 2010 12:55:27 +0000 (UTC)
Received: from localhost (localhost [127.0.0.1])
	by sola.nimnet.asn.au (8.14.2/8.14.2) with ESMTP id oATCtJl6007688;
	Mon, 29 Nov 2010 23:55:19 +1100 (EST)
	(envelope-from smithi@nimnet.asn.au)
Date: Mon, 29 Nov 2010 23:55:18 +1100 (EST)
From: Ian Smith <smithi@nimnet.asn.au>
To: Kevin Oberman <oberman@es.net>
In-Reply-To: <20101127171554.B40951CC0C@ptavv.es.net>
Message-ID: <20101128224112.I47536@sola.nimnet.asn.au>
References: <20101127171554.B40951CC0C@ptavv.es.net>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Cc: freebsd-stable@freebsd.org, Jeremy Chadwick <freebsd@jdc.parodius.com>
Subject: Re: memory leak and swapfile 
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 29 Nov 2010 12:55:28 -0000

On Sat, 27 Nov 2010, Kevin Oberman wrote:
 > > Date: Sat, 27 Nov 2010 04:17:43 -0800
 > > From: Jeremy Chadwick <freebsd@jdc.parodius.com>
 > > 
 > > On Fri, Nov 26, 2010 at 07:12:59PM -0800, Kevin Oberman wrote:
 > > > > From: "Jack Raats" <jack@jarasoft.net>
 > > > > Date: Fri, 26 Nov 2010 19:17:05 +0100
 > > > > Sender: owner-freebsd-stable@freebsd.org
 > > > > 
 > > > > It looks like that there may be a memory leak of my swap space with one of 
 > > > > the processes that is running.
 > > > > Big question: How can I determine which process is responsible.
 > > > > 
 > > > > Any suggestions?
 > > > 
 > > > ps -aux Look for processes with large values in the VSZ column. 
 > > > 
 > > > I'm sure that there are other ways to see this, but that's an easy
 > > > one. You can, of course, pipe the output to sort and use the -k 5 -n
 > > > options. 
 > > 
 > > I believe he should be looking for a process that has a large value in
 > > RSS ("RES" in top), not VSZ ("SIZE" in top).
 > 
 > I believe it's not that simple, but I think my answer is more likely to
 > pint to the culprit than yours.

I think so too, given Jack suggested growing swap use was the issue.

 > I am not terribly familiar with the details of the FreeBSD virtual
 > memory system, but I assume that it is similar to other I have known very
 > well over the years, primarily VMS and OSF/1 where I did a lot of kernel
 > programming.
 > 
 > FreeBSD does not do "greedy" swap allocation. Some systems will
 > "reserve" space in swap for all active memory. FreeBSD only uses swap
 > space when it is needed. RES shows the amount of physical memory a
 > process is using while VSZ. VSZ is in KB while RES is in pages (I
 > think), so the numbers wind up looking "odd".

top's SIZE and RES are given in KB (suffixed 'K', or 'M' over 100M) and 
ps' VSZ and RSS are both given in unadorned KB.  I expect Jeremy's right 
about procstat RES being in pages, and that might be a useful view once 
processes are under pressure to swap unused (or leaked!) pages out.

 > If VSZ is bigger than (RES * page-size in KB), then the entire process
 > memory space is not in physical memory. It is in one of three other
 > places: 
 > 1. Imaginary memory (demand-zero pages)
 > 2. Unmapped space (any pages of the image that have not been loaded into
 > physical memory)
 > 3. Swap
 > 
 > It's very hard to determine how much is where, though unread image pages
 > are not likely to be significant. Some applications set up huge buffers
 > of demand-zero memory which may never be used. This is the virtual memory
 > equivalent of a sparse file. Until a demand zero page is written to, it
 > takes a page table slot, but does not use either physical memory nor
 > swap space.
 > 
 > That all said, memory leakage is memory that has been used, but not
 > freed. It is never accessed, so drops into swap space when memory
 > pressure triggers the system to look for pages not recently
 > accessed. It goes to swap and stays thee until the process exists and
 > VSZ just keeps growing.

Yep, think I demonstrated just that with my lil'-iron example?

 > If you monitor VSZ and it just keeps growing when the process is not
 > doing anything that should require ever increasing memory, it's probably
 > a memory leak.
 > 
 > While RES alone tells you only what is in memory and nothing about swap
 > use,  leaky process will start by growing RES eventually start having
 > old pages swapped out so RES stops growing and VSZ keeps growing. If
 > some process grows with pages that are being actively accessed (not a
 > leak), RES may get large, but unless memory pressure is great enough,
 > will use little swap.
 >
 > Bottom line is that, if the system works the way I believe it does, VSZ
 > is the best, if not ideal check for memory leaks that fill swap.

Agreed, in my experience anyway.  All this prompted me to write a little 
script, below.  Tested on 8.1-S and (cough) 5.5-S.  Not really sure when 
top's default display changed.  Things vary over the time it's running 
of course, but it confirms that top and ps are seeing the same virtual 
and resident sizes, of which I'd never been sure.

% grep memory /var/run/dmesg.boot
real memory  = 167772160 (160 MB)
avail memory = 154341376 (147 MB)
% swapinfo
Device          1K-blocks     Used    Avail Capacity
/dev/ad0s2b        393216   173128   220088    44%
% procsize
>From ps -aux: 157 procs,  virtual 1022784K (998M), resident 95932K (93M)
>From top -S : 157 procs,  virtual 1022820K (998M), resident 95976K (93M)

cheers, Ian

=======
#!/bin/sh
# procsize 0.5 smithi 27,29/11/10 .. compare ps -aux VSZ,RSS vs top SIZE,RES
tempfile=/tmp/`basename $0`.$$
vszall=0; rssall=0; procnt=0

ps -aux >$tempfile			# can't update parent vars in pipe
while read user pid cpu mem vsz rss tt stat started time command; do
	[ $user = USER ] && continue
	[ "$command" = "ps -aux" ] && continue	# vs top -t
	vszall=$((vszall + $vsz))
	rssall=$((rssall + $rss))		# both KB
	procnt=$((procnt + 1))
done <$tempfile
echo "From ps -aux: $procnt procs, " \
	"virtual ${vszall}K ($((vszall / 1024))M)," \
	"resident ${rssall}K ($((rssall / 1024))M)"

oldtop='pid user pri nice size res state time wcpu cpu command' # <= 6.x?
newtop='pid user thr pri nice size res state time wcpu command' # w/out -H
ver=`uname -r`; [ "${ver%.*}" -lt 7 ] && varlist=$oldtop || varlist=$newtop
vszall=0; rssall=0; procnt=0; start=''

top -tS all >$tempfile
while eval read $varlist; do
	[ "$pid" ] || continue 			# skip blanks
	[ $pid = PID ] && start=y && continue
	[ "$start" ] || continue		# and headers
	# assume top uses G also (not testable here, let alone T :)
	[ ${size%G} != $size ] && size=$((${size%G} * 1024))M
	[ ${size%M} != $size ] && size=$((${size%M} * 1024))K
	[ ${res%G} != $res ] && res=$((${res%G} * 1024))M
	[ ${res%M} != $res ] && res=$((${res%M} * 1024))K
	# temp: check eval results
	[ ${size%K} = $size ] && echo "error: SIZE=$size" && break
	[ ${res%K} = $res ] && echo "error: RES=$res" && break
	vszall=$((vszall + ${size%K}))
	rssall=$((rssall + ${res%K}))		# both now KB
	procnt=$((procnt + 1))
done <$tempfile
rm $tempfile
echo "From top -S : $procnt procs, " \
        "virtual ${vszall}K ($((vszall / 1024))M)," \
        "resident ${rssall}K ($((rssall / 1024))M)"
=======