From owner-freebsd-stable@FreeBSD.ORG Tue Jan 2 18:04:40 2007 Return-Path: X-Original-To: freebsd-stable@freebsd.org Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 47C9916A412 for ; Tue, 2 Jan 2007 18:04:40 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from sccrmhc11.comcast.net (sccrmhc11.comcast.net [204.127.200.81]) by mx1.freebsd.org (Postfix) with ESMTP id 11D2B13C458 for ; Tue, 2 Jan 2007 18:04:39 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from icarus.home.lan (c-67-174-220-97.hsd1.ca.comcast.net[67.174.220.97]) by comcast.net (sccrmhc11) with ESMTP id <2007010218043901100pmln8e>; Tue, 2 Jan 2007 18:04:39 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id BD8C61FA038; Tue, 2 Jan 2007 10:04:38 -0800 (PST) Date: Tue, 2 Jan 2007 10:04:38 -0800 From: Jeremy Chadwick To: Gavin Atkinson Message-ID: <20070102180438.GA81454@icarus.home.lan> Mail-Followup-To: Gavin Atkinson , freebsd-stable@freebsd.org References: <20070102153608.GA78405@icarus.home.lan> <1167755991.84652.6.camel@buffy.york.ac.uk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1167755991.84652.6.camel@buffy.york.ac.uk> X-PGP-Key: http://jdc.parodius.com/pubkey.asc User-Agent: Mutt/1.5.13 (2006-08-11) Cc: freebsd-stable@freebsd.org Subject: Re: Interrupt (SCSI?) hang on 4.x X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 02 Jan 2007 18:04:40 -0000 On Tue, Jan 02, 2007 at 04:39:51PM +0000, Gavin Atkinson wrote: > On Tue, 2007-01-02 at 07:36 -0800, Jeremy Chadwick wrote: > > # vmstat -i > > ata0 irq14 6 0 > > fxp0 irq10 14874 28 > > mux irq11 65028 125 > > fdc0 irq6 1 0 > > sio0 irq4 948 1 > > clk irq0 516187 998 > > rtc irq8 66071 127 > > Total 663115 1282 > > Do any of these numbers continue to increase after the hang? You may > find that if you are already logged in over the serial port before the > hang and have run vmstat recently, it'll still be runnable due to it > being cached. When this problem is happening, at the login: prompt (via serial console) once one types "root" and hits enter, one never gets a Password: prompt. This is likely because getpwent(3) and friends attempt to read passwd/master.passwd from the disk, which obviously hung due to the SCSI controller. Therefore, one cannot log in and run any commands. > If the serial port is dead, you will probably still find you can get > output from the serial port, so start "date; vmstat -i" in a loop over > the serial port before it hangs, and watch the output once it wedges. Once the machine is hung like described, since running shell commands (date/vmstat/even spawning sh itself) involves disk I/O, this won't work. If date and vmstat could be cached in memory somewhere, this might work, but I don't know how one would do that. (A memory filesystem could work, but pretty much all of / would have to be there for this to work...) The best I could do would be to have a cronjob or a process running in a screen session which does date && vmstat -i over and over to a log file, and examine that log once the machine hung like described. This wouldn't tell us if the numbers were increasing/fluxuating *after* the hang, though. :-( -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB |