From owner-freebsd-stable@FreeBSD.ORG Wed Oct 1 12:49:57 2008 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 364EF1065688 for ; Wed, 1 Oct 2008 12:49:57 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from QMTA02.westchester.pa.mail.comcast.net (qmta02.westchester.pa.mail.comcast.net [76.96.62.24]) by mx1.freebsd.org (Postfix) with ESMTP id D46478FC1C for ; Wed, 1 Oct 2008 12:49:56 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from OMTA05.westchester.pa.mail.comcast.net ([76.96.62.43]) by QMTA02.westchester.pa.mail.comcast.net with comcast id MN0j1a0010vyq2s52QpZRK; Wed, 01 Oct 2008 12:49:33 +0000 Received: from koitsu.dyndns.org ([67.180.253.227]) by OMTA05.westchester.pa.mail.comcast.net with comcast id MQpv1a00L4v8bD73RQpvSU; Wed, 01 Oct 2008 12:49:56 +0000 X-Authority-Analysis: v=1.0 c=1 a=6I5d2MoRAAAA:8 a=QycZ5dHgAAAA:8 a=H2G1wkdtjRh1aceY5sEA:9 a=Qu1rpiLgyi4Y0pKoIcEA:7 a=r8Jiv72FVFJo-K63GSJBjWvzivAA:4 a=EoioJ0NPDVgA:10 a=LY0hPdMaydYA:10 Received: by icarus.home.lan (Postfix, from userid 1000) id 37E2AC9432; Wed, 1 Oct 2008 05:49:55 -0700 (PDT) Date: Wed, 1 Oct 2008 05:49:55 -0700 From: Jeremy Chadwick To: Stephen Clark Message-ID: <20081001124955.GA21577@icarus.home.lan> References: <48E36204.5090108@earthlink.net> <20081001115046.GA20384@icarus.home.lan> <48E36D62.6090001@earthlink.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <48E36D62.6090001@earthlink.net> User-Agent: Mutt/1.5.18 (2008-05-17) Cc: FreeBSD Stable Subject: Re: resource leak X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 01 Oct 2008 12:49:57 -0000 On Wed, Oct 01, 2008 at 08:30:26AM -0400, Stephen Clark wrote: > Jeremy Chadwick wrote: >> On Wed, Oct 01, 2008 at 07:41:56AM -0400, Stephen Clark wrote: >>> Hello List, >>> >>> I am running into a strange problem that points to a resource leak. >>> The problem manifests itself after one of our remote systems has been >>> up around 100 days. >>> The symptom is that it appears no new processes can be spawned. If I try to >>> ssh to the unit, I can see the 3-way tcp handshake and then no more traffic. >>> Examining log files, like cron, etc show that when this happens no more entries >>> are written into the cron log. The unit is acting as a firewall, >>> router and vpn appliance these functions continue to work. We have a >>> C application that is periodically started out of a shell script that >>> reports various information about the system, it stops reporting, >>> while vpns, ospf routing, and ipfilter firewalling continue to work >>> and write into their logfiles. >>> >>> My question is how do I monitor the various resources in the system that could >>> prevent the spawning of a new process? >> >> Periodically logging "ps -auxw" output to a file would be useful, as >> ideally you'd gradually see the list get longer and longer over time; >> it's possible you have many zombie processes as a result of a parent >> which is not reaping its children (calling waitpid(2) or its friends). >> >> Other things that might come in useful are "fstat" and "vmstat -s". >> >> It sounds like your C program relies heavily on system() or execl() and >> fork(), which is why it's affected -- while the other programs are >> likely kernel-level. >> > Thanks Jeremy, > > I have added those commands to a periodic daily script. > > Another thing I have noticed is that quite often the problem seems to > start at 2am in the morning, right when the periodic daily script runs. > > But I think it is coincidence and that we have reached the edge of the > resource limit and all the jobs that get spawned by the periodic daily > scripts pushes us over the limit. > > The other thing is that having logged into some of the systems that have > been up in the 80 day range, I don't see a lot/any zombies. I just wonder > if it is and fd leak, the fstat should point that out. You might find the below thread beneficial -- an individual came to the lists stating that they were running out of fds as a result of some Java software running amok on their systems. http://lists.freebsd.org/pipermail/freebsd-stable/2008-September/thread.html#45383 http://lists.freebsd.org/pipermail/freebsd-stable/2008-September/045383.html -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB |