Date: Wed, 1 Oct 2008 05:49:55 -0700 From: Jeremy Chadwick <koitsu@FreeBSD.org> To: Stephen Clark <sclark46@earthlink.net> Cc: FreeBSD Stable <freebsd-stable@freebsd.org> Subject: Re: resource leak Message-ID: <20081001124955.GA21577@icarus.home.lan> In-Reply-To: <48E36D62.6090001@earthlink.net> References: <48E36204.5090108@earthlink.net> <20081001115046.GA20384@icarus.home.lan> <48E36D62.6090001@earthlink.net>
next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, Oct 01, 2008 at 08:30:26AM -0400, Stephen Clark wrote: > Jeremy Chadwick wrote: >> On Wed, Oct 01, 2008 at 07:41:56AM -0400, Stephen Clark wrote: >>> Hello List, >>> >>> I am running into a strange problem that points to a resource leak. >>> The problem manifests itself after one of our remote systems has been >>> up around 100 days. >>> The symptom is that it appears no new processes can be spawned. If I try to >>> ssh to the unit, I can see the 3-way tcp handshake and then no more traffic. >>> Examining log files, like cron, etc show that when this happens no more entries >>> are written into the cron log. The unit is acting as a firewall, >>> router and vpn appliance these functions continue to work. We have a >>> C application that is periodically started out of a shell script that >>> reports various information about the system, it stops reporting, >>> while vpns, ospf routing, and ipfilter firewalling continue to work >>> and write into their logfiles. >>> >>> My question is how do I monitor the various resources in the system that could >>> prevent the spawning of a new process? >> >> Periodically logging "ps -auxw" output to a file would be useful, as >> ideally you'd gradually see the list get longer and longer over time; >> it's possible you have many zombie processes as a result of a parent >> which is not reaping its children (calling waitpid(2) or its friends). >> >> Other things that might come in useful are "fstat" and "vmstat -s". >> >> It sounds like your C program relies heavily on system() or execl() and >> fork(), which is why it's affected -- while the other programs are >> likely kernel-level. >> > Thanks Jeremy, > > I have added those commands to a periodic daily script. > > Another thing I have noticed is that quite often the problem seems to > start at 2am in the morning, right when the periodic daily script runs. > > But I think it is coincidence and that we have reached the edge of the > resource limit and all the jobs that get spawned by the periodic daily > scripts pushes us over the limit. > > The other thing is that having logged into some of the systems that have > been up in the 80 day range, I don't see a lot/any zombies. I just wonder > if it is and fd leak, the fstat should point that out. You might find the below thread beneficial -- an individual came to the lists stating that they were running out of fds as a result of some Java software running amok on their systems. http://lists.freebsd.org/pipermail/freebsd-stable/2008-September/thread.html#45383 http://lists.freebsd.org/pipermail/freebsd-stable/2008-September/045383.html -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB |
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20081001124955.GA21577>