From owner-freebsd-hackers Sun Feb 18 15:21:58 1996 Return-Path: owner-hackers Received: (from root@localhost) by freefall.freebsd.org (8.7.3/8.7.3) id PAA11381 for hackers-outgoing; Sun, 18 Feb 1996 15:21:58 -0800 (PST) Received: from irz301.inf.tu-dresden.de (irz301.inf.tu-dresden.de [141.76.1.11]) by freefall.freebsd.org (8.7.3/8.7.3) with SMTP id PAA11369 for ; Sun, 18 Feb 1996 15:21:49 -0800 (PST) Received: from sax.sax.de by irz301.inf.tu-dresden.de (8.6.12/8.6.12-s1) with ESMTP id AAA29226 for ; Mon, 19 Feb 1996 00:21:45 +0100 Received: by sax.sax.de (8.6.11/8.6.12-s1) with UUCP id AAA09985 for freebsd-hackers@freebsd.org; Mon, 19 Feb 1996 00:21:45 +0100 Received: (from j@localhost) by uriah.heep.sax.de (8.7.3/8.6.9) id AAA09217 for freebsd-hackers@freebsd.org; Mon, 19 Feb 1996 00:17:07 +0100 (MET) From: J Wunsch Message-Id: <199602182317.AAA09217@uriah.heep.sax.de> Subject: Re: Web server locks up... but not quite. (?) To: freebsd-hackers@freebsd.org (FreeBSD hackers) Date: Mon, 19 Feb 1996 00:17:07 +0100 (MET) Reply-To: joerg_wunsch@uriah.heep.sax.de (Joerg Wunsch) In-Reply-To: <199602182041.WAA01110@newzetor.clinet.fi> from "Heikki Suonsivu" at Feb 18, 96 10:41:17 pm X-Phone: +49-351-2012 669 X-Mailer: ELM [version 2.4 PL24 ME8a] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-hackers@freebsd.org Precedence: bulk As Heikki Suonsivu wrote: > Hmm, we've also experienced these symptoms at sax.sax.de (small local > non-commercial ISP), and i admit that i've basically been suspecting > hardware in the first place. Your reports make me nervous however > that it might be software. The system is plain 2.0.5R. > > If it were software, it would happen with specific set of hardware. We see > it on all machines here. We don't have a stack of hardware there. We have to buy our hardware ourselves. :) This is one of the infamous 40-MHz-VLB boards (though the VLB is not actually used), and the logs show sig 10's and sig 11's all over the place, about twice a weak. We didn't even notice this for the first time however, since nobody complained. I've stumbled across it while looking up other things in the kernel message logfile. (Well, i love FreeBSD's logging of abnormal signals!) So this explains why i've been thinking of hardware first... (Watchdog issuing NMI) > This might not help, we often find things being locked up when they try to > write the coredump. Well, if it's only a memory resource allocation problem, coredumping should still work. > Our machine is located in an mostly operator-less machine room at the > University, i've already been playing with the idea to build a > watchdog card that lowers the IOCHCK signal (and finally gives up 5 > minutes later and issues a RESET). > > Someone proposed an interrupt-level software watchdog. I proposed a crash > button on keyboard (almost always the keyboard driver is still there). The machine doesn't have a keyboard at all. -- cheers, J"org joerg_wunsch@uriah.heep.sax.de -- http://www.sax.de/~joerg/ -- NIC: JW11-RIPE Never trust an operating system you don't have sources for. ;-)