Date: Wed, 01 Oct 2008 09:35:04 -0400 From: Stephen Clark <sclark46@earthlink.net> To: Jeremy Chadwick <koitsu@freebsd.org> Cc: FreeBSD Stable <freebsd-stable@freebsd.org> Subject: Re: resource leak Message-ID: <48E37C88.50805@earthlink.net> In-Reply-To: <20081001124955.GA21577@icarus.home.lan> References: <48E36204.5090108@earthlink.net> <20081001115046.GA20384@icarus.home.lan> <48E36D62.6090001@earthlink.net> <20081001124955.GA21577@icarus.home.lan>
next in thread | previous in thread | raw e-mail | index | archive | help
Jeremy Chadwick wrote: > On Wed, Oct 01, 2008 at 08:30:26AM -0400, Stephen Clark wrote: >> Jeremy Chadwick wrote: >>> On Wed, Oct 01, 2008 at 07:41:56AM -0400, Stephen Clark wrote: >>>> Hello List, >>>> >>>> I am running into a strange problem that points to a resource leak. >>>> The problem manifests itself after one of our remote systems has been >>>> up around 100 days. >>>> The symptom is that it appears no new processes can be spawned. If I try to >>>> ssh to the unit, I can see the 3-way tcp handshake and then no more traffic. >>>> Examining log files, like cron, etc show that when this happens no more entries >>>> are written into the cron log. The unit is acting as a firewall, >>>> router and vpn appliance these functions continue to work. We have a >>>> C application that is periodically started out of a shell script that >>>> reports various information about the system, it stops reporting, >>>> while vpns, ospf routing, and ipfilter firewalling continue to work >>>> and write into their logfiles. >>>> >>>> My question is how do I monitor the various resources in the system that could >>>> prevent the spawning of a new process? >>> Periodically logging "ps -auxw" output to a file would be useful, as >>> ideally you'd gradually see the list get longer and longer over time; >>> it's possible you have many zombie processes as a result of a parent >>> which is not reaping its children (calling waitpid(2) or its friends). >>> >>> Other things that might come in useful are "fstat" and "vmstat -s". >>> >>> It sounds like your C program relies heavily on system() or execl() and >>> fork(), which is why it's affected -- while the other programs are >>> likely kernel-level. >>> >> Thanks Jeremy, >> >> I have added those commands to a periodic daily script. >> >> Another thing I have noticed is that quite often the problem seems to >> start at 2am in the morning, right when the periodic daily script runs. >> >> But I think it is coincidence and that we have reached the edge of the >> resource limit and all the jobs that get spawned by the periodic daily >> scripts pushes us over the limit. >> >> The other thing is that having logged into some of the systems that have >> been up in the 80 day range, I don't see a lot/any zombies. I just wonder >> if it is and fd leak, the fstat should point that out. > > You might find the below thread beneficial -- an individual came to the > lists stating that they were running out of fds as a result of some > Java software running amok on their systems. > > http://lists.freebsd.org/pipermail/freebsd-stable/2008-September/thread.html#45383 > http://lists.freebsd.org/pipermail/freebsd-stable/2008-September/045383.html > Thanks, but after reading the thread is there a single place in the kernel that reports the how many fds are currently in use? Does the "no more fds" message get logged in /var/log/messages or only in the kernel log buffer, since I haven't seen that message in the messages file, and since we force to have a remote user reboot the box the kernel buffer is gone. Steve -- "They that give up essential liberty to obtain temporary safety, deserve neither liberty nor safety." (Ben Franklin) "The course of history shows that as a government grows, liberty decreases." (Thomas Jefferson)
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?48E37C88.50805>