From owner-freebsd-stable@FreeBSD.ORG Wed Oct 1 13:35:06 2008 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D83921065687; Wed, 1 Oct 2008 13:35:06 +0000 (UTC) (envelope-from sclark46@earthlink.net) Received: from elasmtp-mealy.atl.sa.earthlink.net (elasmtp-mealy.atl.sa.earthlink.net [209.86.89.69]) by mx1.freebsd.org (Postfix) with ESMTP id 8DBCA8FC1C; Wed, 1 Oct 2008 13:35:06 +0000 (UTC) (envelope-from sclark46@earthlink.net) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=dk20050327; d=earthlink.net; b=KfRmdDQ0nE6b48+wytMxtYYLQ1RdICIyg+jNw2DHx4DRq9I4b8K5OEt9junXjcNQ; h=Received:Message-ID:Date:From:Reply-To:User-Agent:MIME-Version:To:CC:Subject:References:In-Reply-To:Content-Type:Content-Transfer-Encoding:X-ELNK-Trace:X-Originating-IP; Received: from [208.118.36.229] (helo=joker.seclark.com) by elasmtp-mealy.atl.sa.earthlink.net with esmtpsa (TLSv1:AES256-SHA:256) (Exim 4.67) (envelope-from ) id 1Kl1r7-0005CC-Q2; Wed, 01 Oct 2008 09:35:05 -0400 Message-ID: <48E37C88.50805@earthlink.net> Date: Wed, 01 Oct 2008 09:35:04 -0400 From: Stephen Clark User-Agent: Thunderbird 2.0.0.16 (X11/20080723) MIME-Version: 1.0 To: Jeremy Chadwick References: <48E36204.5090108@earthlink.net> <20081001115046.GA20384@icarus.home.lan> <48E36D62.6090001@earthlink.net> <20081001124955.GA21577@icarus.home.lan> In-Reply-To: <20081001124955.GA21577@icarus.home.lan> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-ELNK-Trace: a437fbc6971e80f61aa676d7e74259b7b3291a7d08dfec79ef51a053f47e564b46bae13ca01a3de3350badd9bab72f9c350badd9bab72f9c350badd9bab72f9c X-Originating-IP: 208.118.36.229 Cc: FreeBSD Stable Subject: Re: resource leak X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: sclark46@earthlink.net List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 01 Oct 2008 13:35:06 -0000 Jeremy Chadwick wrote: > On Wed, Oct 01, 2008 at 08:30:26AM -0400, Stephen Clark wrote: >> Jeremy Chadwick wrote: >>> On Wed, Oct 01, 2008 at 07:41:56AM -0400, Stephen Clark wrote: >>>> Hello List, >>>> >>>> I am running into a strange problem that points to a resource leak. >>>> The problem manifests itself after one of our remote systems has been >>>> up around 100 days. >>>> The symptom is that it appears no new processes can be spawned. If I try to >>>> ssh to the unit, I can see the 3-way tcp handshake and then no more traffic. >>>> Examining log files, like cron, etc show that when this happens no more entries >>>> are written into the cron log. The unit is acting as a firewall, >>>> router and vpn appliance these functions continue to work. We have a >>>> C application that is periodically started out of a shell script that >>>> reports various information about the system, it stops reporting, >>>> while vpns, ospf routing, and ipfilter firewalling continue to work >>>> and write into their logfiles. >>>> >>>> My question is how do I monitor the various resources in the system that could >>>> prevent the spawning of a new process? >>> Periodically logging "ps -auxw" output to a file would be useful, as >>> ideally you'd gradually see the list get longer and longer over time; >>> it's possible you have many zombie processes as a result of a parent >>> which is not reaping its children (calling waitpid(2) or its friends). >>> >>> Other things that might come in useful are "fstat" and "vmstat -s". >>> >>> It sounds like your C program relies heavily on system() or execl() and >>> fork(), which is why it's affected -- while the other programs are >>> likely kernel-level. >>> >> Thanks Jeremy, >> >> I have added those commands to a periodic daily script. >> >> Another thing I have noticed is that quite often the problem seems to >> start at 2am in the morning, right when the periodic daily script runs. >> >> But I think it is coincidence and that we have reached the edge of the >> resource limit and all the jobs that get spawned by the periodic daily >> scripts pushes us over the limit. >> >> The other thing is that having logged into some of the systems that have >> been up in the 80 day range, I don't see a lot/any zombies. I just wonder >> if it is and fd leak, the fstat should point that out. > > You might find the below thread beneficial -- an individual came to the > lists stating that they were running out of fds as a result of some > Java software running amok on their systems. > > http://lists.freebsd.org/pipermail/freebsd-stable/2008-September/thread.html#45383 > http://lists.freebsd.org/pipermail/freebsd-stable/2008-September/045383.html > Thanks, but after reading the thread is there a single place in the kernel that reports the how many fds are currently in use? Does the "no more fds" message get logged in /var/log/messages or only in the kernel log buffer, since I haven't seen that message in the messages file, and since we force to have a remote user reboot the box the kernel buffer is gone. Steve -- "They that give up essential liberty to obtain temporary safety, deserve neither liberty nor safety." (Ben Franklin) "The course of history shows that as a government grows, liberty decreases." (Thomas Jefferson)