From owner-freebsd-hackers Sat Feb 17 00:22:57 1996 Return-Path: owner-hackers Received: (from root@localhost) by freefall.freebsd.org (8.7.3/8.7.3) id AAA00482 for hackers-outgoing; Sat, 17 Feb 1996 00:22:57 -0800 (PST) Received: from irz301.inf.tu-dresden.de (irz301.inf.tu-dresden.de [141.76.1.11]) by freefall.freebsd.org (8.7.3/8.7.3) with SMTP id AAA00476 for ; Sat, 17 Feb 1996 00:22:54 -0800 (PST) Received: from sax.sax.de by irz301.inf.tu-dresden.de (8.6.12/8.6.12-s1) with ESMTP id JAA20040; Sat, 17 Feb 1996 09:21:22 +0100 Received: by sax.sax.de (8.6.11/8.6.12-s1) with UUCP id JAA22464; Sat, 17 Feb 1996 09:21:21 +0100 Received: (from j@localhost) by uriah.heep.sax.de (8.7.3/8.6.9) id JAA08102; Sat, 17 Feb 1996 09:04:13 +0100 (MET) From: J Wunsch Message-Id: <199602170804.JAA08102@uriah.heep.sax.de> Subject: Re: Web server locks up... but not quite. (?) To: jgreco@brasil.moneng.mei.com (Joe Greco) Date: Sat, 17 Feb 1996 09:04:13 +0100 (MET) Cc: taob@io.org, freebsd-hackers@FreeBSD.ORG Reply-To: joerg_wunsch@uriah.heep.sax.de (Joerg Wunsch) In-Reply-To: <199602170638.AAA06538@brasil.moneng.mei.com> from "Joe Greco" at Feb 17, 96 00:38:01 am X-Phone: +49-351-2012 669 X-Mailer: ELM [version 2.4 PL24 ME8a] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-hackers@FreeBSD.ORG Precedence: bulk As Joe Greco wrote: > > Since there is no indication as to the source of the hang, is > > there anything I can run periodically from cron to help track down the > > problem? > I've seen similar hangs occasionally under both 2.0.5R and 2.1.0R and one > additional "thing" I've noticed is that processes that are completely > in-core appear to keep running (i.e. I had a "vmstat 1" running for a few > weeks and when the box I am thinking of locked up, the vmstat 1 was still > scrolling output, the box was ping-able, but any services that were not > entirely in-core or required other disk accesses were not available). Hmm, we've also experienced these symptoms at sax.sax.de (small local non-commercial ISP), and i admit that i've basically been suspecting hardware in the first place. Your reports make me nervous however that it might be software. The system is plain 2.0.5R. Brian, if you got physical access to the box, try placing a simple card into the PC that hooks ISA pins A1/B1 to a pushbutton. Pushing it will cause an NMI (``IO channel check condition''), hopefully leaving you a coredump. Our machine is located in an mostly operator-less machine room at the University, i've already been playing with the idea to build a watchdog card that lowers the IOCHCK signal (and finally gives up 5 minutes later and issues a RESET). -- cheers, J"org joerg_wunsch@uriah.heep.sax.de -- http://www.sax.de/~joerg/ -- NIC: JW11-RIPE Never trust an operating system you don't have sources for. ;-)