Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 1 Mar 2006 13:18:31 -0600
From:      "Nikolas Britton" <nikolas.britton@gmail.com>
To:        "Don O'Neil" <don@lizardhill.com>
Cc:        freebsd-questions@freebsd.org
Subject:   Re: System Burn In
Message-ID:  <ef10de9a0603011118h71c0b487m6a90f8c3ecc584b4@mail.gmail.com>
In-Reply-To: <ef10de9a0602281609x9d3632fj6841d3869fd5a47d@mail.gmail.com>
References:  <058101c63ca5$585d0780$0300020a@mickey> <ef10de9a0602281609x9d3632fj6841d3869fd5a47d@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On 2/28/06, Nikolas Britton <nikolas.britton@gmail.com> wrote:
> On 2/28/06, Don O'Neil <don@lizardhill.com> wrote:
> > What is the best way to 'burn in' or 'stress test' a new system w/ Free=
BSD?
> > I'd like to stress test the CPU, Memory, Disk, etc.. To make sure the
> > hardware is 100% good before putting it in production.
> >
>
> Maybe try http://www.holm.cc/stress/ but this would be like
> forkbombing.  If the system locks it may be a kernel problem, not be a
> hardware problem.
>
> Check out UBCD, just type it into google.
>

I did not have alot of time on hand the first time I posted this so
let me go over it again.

The main problem that Kris and I pointed out with using kernel
regression testing tools is that by design the software is trying to
crash the kernel. This is not so great when your testing for hardware
problems.

Solid state electronics will fail predictively and it will often be a
complete failure in such a way that the system won't boot, you have
POST errors, or a card or peripheral device just won't function
correctly. This type of problem is test for before it leaves the
factory. If you had such a problem it would become apparent in
preliminary testing. If you can install FreeBSD and successfully
complete a make buildworld session I'll put money down that you don't
have this type of problem.

This leaves four things to test for: memory, PSU, overheating, and
mechanical failures.

Overheating (and PSU): In a controlled environment overheating
problems are predictable and can be directly measured. The primary
goal is to get all heat producing componets at maximum output all at
the same time and then use a DMM with a thermocouple to proble all the
hot spots. a "make -j4 buildworld" session will stress the CPU,
Chipset, and RAM but it won't stress the disk drives or the video card
GPUs, I don't know how to stress the GPUs (If your you using them for
vector computation). If you have a RAID array you will also need to
find a method[1] to stress this because an 8 disk array can easily
produce just as much heat as the CPU, if not more.

RAM failures: problems with memory are random events. The only sure
fire way to test ram modules without doubt is a hardware based ram
testing device but if you get random buildword failures (use Kris's
-j(N) to maxout main memory) it's a good bet that you have a bad ram
module.

PSU Problems: Like ram failures PSU failures are (semi-)random events
and it's often difficult to tell which problem you have because they
can both produce the same symptoms. <rant> Never skimp on a PSU!... I
just can't understand why people would skimp on the most important
part of a computer, without this device you have no computer and with
a crappy one you have a crappy computer. Always buy the best and
overspec it! </rant> Doing the thermal testing (above) will also
stress the PSU. If you have random reboots when thermal testing and
your system is not overheating you may have a PSU problem.

Mechanical failures (disk problems): Disk problems often have a
bathtub curve[2] failure rate. If a disk has not failed in the first
month of use and you protect it from  overheating and G-force damage
their is a high probability that the drive will hit and surpass it's
MTBF estimate. I stress test all my drive using MHDD32[3]. It has a
simple feature that will perform random seek access tests on the disk
till hell freezes over, or the drive fails. I often just connect the
drive, run the test, and let it do it's thing for a few days. After
this is done I run a complete functional diagnostic  test on the drive
and pop it into the computer if it passes. It's also useful for
producing random wear patterns on disks earmarked for RAID arrays.

[1] Use something like iozone, in ports: benchmarks/iozone,
http://www.iozone.org/
[2] http://en.wikipedia.org/wiki/Bathtub_curve#Bathtub_curve
[3] You can find the MHDD32 utility on the Ultimate Boot CD (UBCD).



--
BSD Podcasts @ http://bsdtalk.blogspot.com/



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?ef10de9a0603011118h71c0b487m6a90f8c3ecc584b4>