Skip site navigation (1)Skip section navigation (2)
Date:      17 Nov 2001 19:28:14 -0800
From:      swear@blarg.net (Gary W. Swearingen)
To:        Anthony Atkielski <anthony@atkielski.com>, FreeBSD Questions <freebsd-questions@FreeBSD.ORG>
Subject:   Re: Mysterious boot during the night
Message-ID:  <ns4rns69m9.rns@localhost.localdomain>
In-Reply-To: <20011117133336.B88359@xor.obsecurity.org>
References:  <020e01c16f42$14885c10$0a00000a@atkielski.com> <20011117015632.B87944@xor.obsecurity.org> <02a001c16f53$215323b0$0a00000a@atkielski.com> <20011117133336.B88359@xor.obsecurity.org>

next in thread | previous in thread | raw e-mail | index | archive | help

> In general there's no reliable way for failing hardware to report its
> failure mode correctly.  e.g. run one of the memory testers in the
> ports collection to check for failing RAM, but remember that if the
> tester doesn't find a memory problem it doesn't mean you don't have
> one.

IIRC, I found the "memtest" port to be undireable and wound up going
to the "memtest" web site (via freshmeat.net) and getting the
standalone, all-on-one-floppy, version which, if you read the
documentation, gives you a real warm feeling that it is testing your
memory well.

Some searching the web for "ECC" a year or two back lead me to believe
that someone with lots of memory (1/4 GB?) could expect a bit error to
happen once a year or so (?) from Cosmic Rays.  I understand that most
recent MBs support ECC; I plan to get it next time, even if it is a
wee bit slower.

Also, stability is a random thing.  Bell- (and other-) shaped curves
and that sort of thing.  Margins are important.  Lower-quality parts
and higher temperatures give you smaller margins and higher probabilites
of random error.

As for software errors, keep track of how long your system has been
running and when it crashes, etc. and look for trends if you get
multiple crashes.  Unfortunately for you, most of the software that has
the ability to cause a crash doesn't depend on many external factors
like other software or how long it's been running or how many times
it has done something.  Of course, you might have seen the exception.

If you want to try for more info at the next crash, you'll need to do
some reading of the dumpon(8) man page, the "Kernel Debugging" section
of the Handbook, and maybe some groups.google.com searching and learning
of the kernel debugger.   But, depending on the hardware error, it may
do you no good, as the man said.

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-questions" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?ns4rns69m9.rns>