From owner-freebsd-stable  Tue Feb 25 22: 8:57 2003
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 3D05337B401
	for <freebsd-stable@FreeBSD.ORG>; Tue, 25 Feb 2003 22:08:56 -0800 (PST)
Received: from HAL9000.homeunix.com (12-233-57-224.client.attbi.com [12.233.57.224])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 90F8C43F93
	for <freebsd-stable@FreeBSD.ORG>; Tue, 25 Feb 2003 22:08:55 -0800 (PST)
	(envelope-from das@FreeBSD.ORG)
Received: from HAL9000.homeunix.com (localhost [127.0.0.1])
	by HAL9000.homeunix.com (8.12.6/8.12.5) with ESMTP id h1Q68s8Y006740;
	Tue, 25 Feb 2003 22:08:54 -0800 (PST)
	(envelope-from das@FreeBSD.ORG)
Received: (from das@localhost)
	by HAL9000.homeunix.com (8.12.6/8.12.5/Submit) id h1Q68sGf006739;
	Tue, 25 Feb 2003 22:08:54 -0800 (PST)
	(envelope-from das@FreeBSD.ORG)
Date: Tue, 25 Feb 2003 22:08:54 -0800
From: David Schultz <das@FreeBSD.ORG>
To: "Marc G. Fournier" <scrappy@hub.org>
Cc: freebsd-stable@FreeBSD.ORG
Subject: Re: 4.8-PRERELEASE 'hangs' nightly like clockwork ...
Message-ID: <20030226060854.GA6637@HAL9000.homeunix.com>
Mail-Followup-To: "Marc G. Fournier" <scrappy@hub.org>,
	freebsd-stable@FreeBSD.ORG
References: <20030225125414.P90059@hub.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20030225125414.P90059@hub.org>
Sender: owner-freebsd-stable@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-stable.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-stable>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-stable>
X-Loop: FreeBSD.ORG

Thus spake Marc G. Fournier <scrappy@hub.org>:
> For the past few nights, since I "fixed" the KVA_PAGES issue, the server
> seems to be hanging almost like clockwork ... plus or minus a bit, but is
> around 23hrs or so since the last hang (or, around 9pm CST, not sure which
> one is the 'trigger') ...
> 
> top, from last nights, shows:
> 
> last pid: 44187;  load averages:  0.29, 11.36, 19.195   up 1+00:11:55  22:04:00
> 3173 processes:1 running, 3150 sleeping, 22 zombie
> CPU states:  0.0% user,  0.0% nice,  8.6% system,  0.6% interrupt, 90.8% idle
> Mem: 2335M Active, 426M Inact, 595M Wired, 205M Cache, 199M Buf, 5860K Free
> Swap: 2048M Total, 495M Used, 1553M Free, 24% Inuse
> 
> now, I got the folks down at Rackspace to do a ctl-alt-esc and 'panic',
> and it dumps core, if that helps any ... a gdb on the core file just tells
> me that a panic was issued from the key board ... the top session above
> continued to run up until they issued the ctl-alt-sec, as does a ping to
> the server, so it looks like those processes resident in memory do continu
> to run ...

It sounds like processes are blocking forever on I/O.  Once you
have a crash dump, you can run ps(1) on the image to see what
state processes were in when the dump was taken.  I think you want
something like
	ps -alxww -M/path/to/core -N/path/to/kernel
If you notice a bunch of them stuck in a suspicious state, load
the dump into kgdb and type
	proc N
where N is the number of one of the stuck processes.  Then type bt
as usual and you'll get a backtrace of that process's stack.  If
any vnodes are involved, it might be useful to display those.  My
fu is probably too weak to debug your problem, but I've had two
experiences trying to debug other problems.  Where the filesystem
has been concerned, Kirk has been VERY adept at finding and fixing
the problem right away.  Matt has also been extremely helpful.

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-stable" in the body of the message