From owner-freebsd-stable Tue Feb 25 22: 8:57 2003 Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 3D05337B401 for ; Tue, 25 Feb 2003 22:08:56 -0800 (PST) Received: from HAL9000.homeunix.com (12-233-57-224.client.attbi.com [12.233.57.224]) by mx1.FreeBSD.org (Postfix) with ESMTP id 90F8C43F93 for ; Tue, 25 Feb 2003 22:08:55 -0800 (PST) (envelope-from das@FreeBSD.ORG) Received: from HAL9000.homeunix.com (localhost [127.0.0.1]) by HAL9000.homeunix.com (8.12.6/8.12.5) with ESMTP id h1Q68s8Y006740; Tue, 25 Feb 2003 22:08:54 -0800 (PST) (envelope-from das@FreeBSD.ORG) Received: (from das@localhost) by HAL9000.homeunix.com (8.12.6/8.12.5/Submit) id h1Q68sGf006739; Tue, 25 Feb 2003 22:08:54 -0800 (PST) (envelope-from das@FreeBSD.ORG) Date: Tue, 25 Feb 2003 22:08:54 -0800 From: David Schultz To: "Marc G. Fournier" Cc: freebsd-stable@FreeBSD.ORG Subject: Re: 4.8-PRERELEASE 'hangs' nightly like clockwork ... Message-ID: <20030226060854.GA6637@HAL9000.homeunix.com> Mail-Followup-To: "Marc G. Fournier" , freebsd-stable@FreeBSD.ORG References: <20030225125414.P90059@hub.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20030225125414.P90059@hub.org> Sender: owner-freebsd-stable@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Thus spake Marc G. Fournier : > For the past few nights, since I "fixed" the KVA_PAGES issue, the server > seems to be hanging almost like clockwork ... plus or minus a bit, but is > around 23hrs or so since the last hang (or, around 9pm CST, not sure which > one is the 'trigger') ... > > top, from last nights, shows: > > last pid: 44187; load averages: 0.29, 11.36, 19.195 up 1+00:11:55 22:04:00 > 3173 processes:1 running, 3150 sleeping, 22 zombie > CPU states: 0.0% user, 0.0% nice, 8.6% system, 0.6% interrupt, 90.8% idle > Mem: 2335M Active, 426M Inact, 595M Wired, 205M Cache, 199M Buf, 5860K Free > Swap: 2048M Total, 495M Used, 1553M Free, 24% Inuse > > now, I got the folks down at Rackspace to do a ctl-alt-esc and 'panic', > and it dumps core, if that helps any ... a gdb on the core file just tells > me that a panic was issued from the key board ... the top session above > continued to run up until they issued the ctl-alt-sec, as does a ping to > the server, so it looks like those processes resident in memory do continu > to run ... It sounds like processes are blocking forever on I/O. Once you have a crash dump, you can run ps(1) on the image to see what state processes were in when the dump was taken. I think you want something like ps -alxww -M/path/to/core -N/path/to/kernel If you notice a bunch of them stuck in a suspicious state, load the dump into kgdb and type proc N where N is the number of one of the stuck processes. Then type bt as usual and you'll get a backtrace of that process's stack. If any vnodes are involved, it might be useful to display those. My fu is probably too weak to debug your problem, but I've had two experiences trying to debug other problems. Where the filesystem has been concerned, Kirk has been VERY adept at finding and fixing the problem right away. Matt has also been extremely helpful. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-stable" in the body of the message