From owner-freebsd-stable@FreeBSD.ORG Wed Oct 1 17:56:30 2008 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 20E3F1065693; Wed, 1 Oct 2008 17:56:30 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id ED2B28FC1A; Wed, 1 Oct 2008 17:56:29 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [65.122.17.41]) by cyrus.watson.org (Postfix) with ESMTP id 851A746B09; Wed, 1 Oct 2008 13:56:29 -0400 (EDT) Date: Wed, 1 Oct 2008 18:56:29 +0100 (BST) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: Gary Palmer In-Reply-To: <20081001164856.GA6478@in-addr.com> Message-ID: References: <48E36204.5090108@earthlink.net> <20081001115046.GA20384@icarus.home.lan> <20081001164856.GA6478@in-addr.com> User-Agent: Alpine 1.10 (BSF 962 2008-03-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: Stephen Clark , Jeremy Chadwick , FreeBSD Stable Subject: Re: resource leak X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 01 Oct 2008 17:56:30 -0000 On Wed, 1 Oct 2008, Gary Palmer wrote: >> Periodically logging "ps -auxw" output to a file would be useful, as >> ideally you'd gradually see the list get longer and longer over time; it's >> possible you have many zombie processes as a result of a parent which is >> not reaping its children (calling waitpid(2) or its friends). > > "ps alxw" may be of interest in addition to "ps auxw" as it displays what > the processes are waiting on. It could conceivably be a problem of some > kind at the filesystem level. I've seen situations before where a problem > escalates to the point where "ls /" hangs, and at that point you're stuck > with an unresponsive box. If you want an even greater level of detail than ps -l, you can use procstat -k to generate kernel stack traces for all user/kernel threads. Wait channels are very useful, but they only tell you what the code that invoked the wait thinks it is for, not how that code was reached. A classic example is waiting on an exhausted UMA zone -- you get a uma wait channel, but no indication of what subsystem performed the memory allocation... This required FreeBSD 7.1 and higher, however. (Obviously, the same can be done easily using DDB, but that's hard on a box without a serial console, and requires interrupting the flow of the operating system, compiling with DDB, etc). Robert N M Watson Computer Laboratory University of Cambridge