Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 2 Apr 2011 08:31:03 -0700
From:      Jeremy Chadwick <freebsd@jdc.parodius.com>
To:        Boris Kochergin <spawk@acm.poly.edu>
Cc:        FreeBSD-STABLE Mailing List <freebsd-stable@freebsd.org>
Subject:   Re: Kernel memory leak in 8.2-PRERELEASE?
Message-ID:  <20110402153103.GA10283@icarus.home.lan>
In-Reply-To: <4D972FF7.6010901@acm.poly.edu>
References:  <4D972FF7.6010901@acm.poly.edu>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, Apr 02, 2011 at 10:17:27AM -0400, Boris Kochergin wrote:
> Ahoy. This morning, I awoke to the following on one of my servers:
> 
> pid 59630 (httpd), uid 80, was killed: out of swap space
> pid 59341 (find), uid 0, was killed: out of swap space
> pid 23134 (irssi), uid 1001, was killed: out of swap space
> pid 49332 (sshd), uid 1001, was killed: out of swap space
> pid 69074 (httpd), uid 0, was killed: out of swap space
> pid 11879 (eggdrop-1.6.19), uid 1001, was killed: out of swap space
> ...
> 
> And so on.
>
> The machine is:
> 
> FreeBSD exodus.poly.edu 8.2-PRERELEASE FreeBSD 8.2-PRERELEASE #2:
> Thu Dec  2 11:39:21 EST 2010
> spawk@exodus.poly.edu:/usr/obj/usr/src/sys/EXODUS  amd64
> 
> 10:13AM  up 120 days, 20:06, 2 users, load averages: 0.00, 0.01, 0.00
> 
> The memory line from top intrigued me:
> 
> Mem: 16M Active, 48M Inact, 6996M Wired, 229M Cache, 828M Buf, 605M Free
> 
> The machine has 8 gigs of memory, and I don't know what all that
> wired memory is being used for. There is a large-ish (6 x 1.5-TB)
> ZFS RAID-Z2 on it which has had a disk in the UNAVAIL state for a
> few months:

The ZFS ARC is what's responsible for your large wired count.

How much swap space do you have?  You excluded that line from top.
"swapinfo" would also be helpful, but would indicate the same thing.

If you lack swap (which is a bad idea for a lot of reasons), then the
machine running out of available memory for userspace (a process which
grew too large, thus impacting others which were trying to malloc() at
the time) would make sense.

Can you please provide /boot/loader.conf and /etc/sysctl.conf ?

> # zpool status
>   pool: home
>  state: DEGRADED
> status: One or more devices could not be used because the label is
> missing or
>         invalid.  Sufficient replicas exist for the pool to continue
>         functioning in a degraded state.
> action: Replace the device using 'zpool replace'.
>    see: http://www.sun.com/msg/ZFS-8000-4J
>  scrub: none requested
> config:
> 
>         NAME        STATE     READ WRITE CKSUM
>         home        DEGRADED     0     0     0
>           raidz2    DEGRADED     0     0     0
>             ada0    ONLINE       0     0     0
>             ada1    ONLINE       0     0     0
>             ada2    ONLINE       0     0     0
>             ada3    ONLINE       0     0     0
>             ada4    ONLINE       0     0     0
>             ada5    UNAVAIL      0    85    11  experienced I/O failures
> 
> errors: No known data errors

I would also recommend fixing ada5; I'm not sure why any SA would let a
bad disk sit in a machine for "a few months".  Though, hopefully, this
doesn't cause extra memory usage or something odd behind the scenes (in
the kernel).  I'm going to assume the two things are completely
unrelated.

> "vmstat -m" and "vmstat -z" output:
> 
> http://acm.poly.edu/~spawk/vmstat-m.txt
> http://acm.poly.edu/~spawk/vmstat-z.txt
> 
> Anyone have a clue? I know it's just going to happen again if I
> reboot the machine. It is still up in case there are diagnostics for
> me to run.

The above vmstat data won't be too helpful since you need to see what's
going on "over time" and not what the values are right now.  There may
be one of them that indicates available userspace vs. available kmem.

Basically what you need is the equivalent of Solaris sar(1), so that you
can see memory usage of processes/etc. over time and find out if
something went crazy and started going malloc-crazy.

If the kernel itself ran out, you'd be seeing a panic.

Sorry if these ideas/comments seem like a ramble, I've been up all night
trying to decode a circa-1992 font routine in 65816 assembly, heh.  :-)

-- 
| Jeremy Chadwick                                   jdc@parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                  Mountain View, CA, USA |
| Making life hard for others since 1977.               PGP 4BD6C0CB |




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20110402153103.GA10283>