From owner-freebsd-stable Thu Feb 27 7: 9: 1 2003 Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 4FB8237B401; Thu, 27 Feb 2003 07:08:59 -0800 (PST) Received: from hub.org (hub.org [64.49.215.141]) by mx1.FreeBSD.org (Postfix) with ESMTP id BD2E343FB1; Thu, 27 Feb 2003 07:08:58 -0800 (PST) (envelope-from scrappy@hub.org) Received: from hub.org (hub.org [64.49.215.141]) by hub.org (Postfix) with ESMTP id 2A095950324; Thu, 27 Feb 2003 11:08:52 -0400 (AST) Date: Thu, 27 Feb 2003 11:08:52 -0400 (AST) From: "Marc G. Fournier" To: David Schultz Cc: freebsd-stable@FreeBSD.ORG Subject: Re: 4.8-PRERELEASE 'hangs' nightly like clockwork ... In-Reply-To: <20030226060854.GA6637@HAL9000.homeunix.com> Message-ID: <20030227110726.J17399@hub.org> References: <20030225125414.P90059@hub.org> <20030226060854.GA6637@HAL9000.homeunix.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-stable@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Tue, 25 Feb 2003, David Schultz wrote: > Thus spake Marc G. Fournier : > > For the past few nights, since I "fixed" the KVA_PAGES issue, the server > > seems to be hanging almost like clockwork ... plus or minus a bit, but is > > around 23hrs or so since the last hang (or, around 9pm CST, not sure which > > one is the 'trigger') ... > > > > top, from last nights, shows: > > > > last pid: 44187; load averages: 0.29, 11.36, 19.195 up 1+00:11:55 22:04:00 > > 3173 processes:1 running, 3150 sleeping, 22 zombie > > CPU states: 0.0% user, 0.0% nice, 8.6% system, 0.6% interrupt, 90.8% idle > > Mem: 2335M Active, 426M Inact, 595M Wired, 205M Cache, 199M Buf, 5860K Free > > Swap: 2048M Total, 495M Used, 1553M Free, 24% Inuse > > > > now, I got the folks down at Rackspace to do a ctl-alt-esc and 'panic', > > and it dumps core, if that helps any ... a gdb on the core file just tells > > me that a panic was issued from the key board ... the top session above > > continued to run up until they issued the ctl-alt-sec, as does a ping to > > the server, so it looks like those processes resident in memory do continu > > to run ... > > It sounds like processes are blocking forever on I/O. Once you > have a crash dump, you can run ps(1) on the image to see what > state processes were in when the dump was taken. I think you want > something like > ps -alxww -M/path/to/core -N/path/to/kernel > If you notice a bunch of them stuck in a suspicious state, load > the dump into kgdb and type 'K, first question is ... what would I consider a "suspicous state": jupiter# awk '{print $9}' ps.1 | sort | uniq -c 978 - 1 FFS 1 WCHAN 239 accept 324 ffsvgt 382 inode 558 lockf 4 nfsd 26 pause 236 piperd 1 pipewr 3 poll 1 ppwait 1 psleep 97 sbwait 32 select 14 ttyin 283 wait jupiter# wc -l ps.1 3181 ps.1 To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-stable" in the body of the message