From owner-freebsd-questions Sat Nov 17 19:29:56 2001 Delivered-To: freebsd-questions@freebsd.org Received: from lists.blarg.net (lists.blarg.net [206.124.128.17]) by hub.freebsd.org (Postfix) with ESMTP id D4EEC37B405 for ; Sat, 17 Nov 2001 19:29:51 -0800 (PST) Received: from thig.blarg.net (thig.blarg.net [206.124.128.18]) by lists.blarg.net (Postfix) with ESMTP id 7693ABD9F; Sat, 17 Nov 2001 19:29:51 -0800 (PST) Received: from localhost.localdomain ([206.124.139.115]) by thig.blarg.net (8.9.3/8.9.3) with ESMTP id TAA17968; Sat, 17 Nov 2001 19:29:51 -0800 Received: (from jojo@localhost) by localhost.localdomain (8.11.6/8.11.3) id fAI3SFl72001; Sat, 17 Nov 2001 19:28:15 -0800 (PST) (envelope-from swear@blarg.net) To: Anthony Atkielski , FreeBSD Questions Subject: Re: Mysterious boot during the night References: <020e01c16f42$14885c10$0a00000a@atkielski.com> <20011117015632.B87944@xor.obsecurity.org> <02a001c16f53$215323b0$0a00000a@atkielski.com> <20011117133336.B88359@xor.obsecurity.org> From: swear@blarg.net (Gary W. Swearingen) Date: 17 Nov 2001 19:28:14 -0800 In-Reply-To: <20011117133336.B88359@xor.obsecurity.org> Message-ID: Lines: 36 User-Agent: Gnus/5.0808 (Gnus v5.8.8) XEmacs/21.1 (Cuyahoga Valley) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Sender: owner-freebsd-questions@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG > In general there's no reliable way for failing hardware to report its > failure mode correctly. e.g. run one of the memory testers in the > ports collection to check for failing RAM, but remember that if the > tester doesn't find a memory problem it doesn't mean you don't have > one. IIRC, I found the "memtest" port to be undireable and wound up going to the "memtest" web site (via freshmeat.net) and getting the standalone, all-on-one-floppy, version which, if you read the documentation, gives you a real warm feeling that it is testing your memory well. Some searching the web for "ECC" a year or two back lead me to believe that someone with lots of memory (1/4 GB?) could expect a bit error to happen once a year or so (?) from Cosmic Rays. I understand that most recent MBs support ECC; I plan to get it next time, even if it is a wee bit slower. Also, stability is a random thing. Bell- (and other-) shaped curves and that sort of thing. Margins are important. Lower-quality parts and higher temperatures give you smaller margins and higher probabilites of random error. As for software errors, keep track of how long your system has been running and when it crashes, etc. and look for trends if you get multiple crashes. Unfortunately for you, most of the software that has the ability to cause a crash doesn't depend on many external factors like other software or how long it's been running or how many times it has done something. Of course, you might have seen the exception. If you want to try for more info at the next crash, you'll need to do some reading of the dumpon(8) man page, the "Kernel Debugging" section of the Handbook, and maybe some groups.google.com searching and learning of the kernel debugger. But, depending on the hardware error, it may do you no good, as the man said. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-questions" in the body of the message