From owner-freebsd-stable@FreeBSD.ORG Wed Oct 1 16:48:57 2008 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 6DCCF1065699; Wed, 1 Oct 2008 16:48:57 +0000 (UTC) (envelope-from gpalmer@freebsd.org) Received: from noop.in-addr.com (mail.in-addr.com [IPv6:2001:470:8:162::1]) by mx1.freebsd.org (Postfix) with ESMTP id 4494B8FC1B; Wed, 1 Oct 2008 16:48:57 +0000 (UTC) (envelope-from gpalmer@freebsd.org) Received: from gjp by noop.in-addr.com with local (Exim 4.54 (FreeBSD)) id 1Kl4si-0001qH-Ev; Wed, 01 Oct 2008 12:48:56 -0400 Date: Wed, 1 Oct 2008 12:48:56 -0400 From: Gary Palmer To: Jeremy Chadwick Message-ID: <20081001164856.GA6478@in-addr.com> References: <48E36204.5090108@earthlink.net> <20081001115046.GA20384@icarus.home.lan> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20081001115046.GA20384@icarus.home.lan> Cc: Stephen Clark , FreeBSD Stable Subject: Re: resource leak X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 01 Oct 2008 16:48:57 -0000 On Wed, Oct 01, 2008 at 04:50:46AM -0700, Jeremy Chadwick wrote: > On Wed, Oct 01, 2008 at 07:41:56AM -0400, Stephen Clark wrote: > > Hello List, > > > > I am running into a strange problem that points to a resource leak. The > > problem manifests itself after one of our remote systems has been up > > around 100 days. > > The symptom is that it appears no new processes can be spawned. If I try to > > ssh to the unit, I can see the 3-way tcp handshake and then no more traffic. > > Examining log files, like cron, etc show that when this happens no more entries > > are written into the cron log. The unit is acting as a firewall, router > > and vpn appliance these functions continue to work. We have a C > > application that is periodically started out of a shell script that > > reports various information about the system, it stops reporting, while > > vpns, ospf routing, and ipfilter firewalling continue to work and write > > into their logfiles. > > > > My question is how do I monitor the various resources in the system that could > > prevent the spawning of a new process? > > Periodically logging "ps -auxw" output to a file would be useful, as > ideally you'd gradually see the list get longer and longer over time; > it's possible you have many zombie processes as a result of a parent > which is not reaping its children (calling waitpid(2) or its friends). "ps alxw" may be of interest in addition to "ps auxw" as it displays what the processes are waiting on. It could conceivably be a problem of some kind at the filesystem level. I've seen situations before where a problem escalates to the point where "ls /" hangs, and at that point you're stuck with an unresponsive box. Regards, Gary