Date: Wed, 1 Mar 2006 13:18:31 -0600 From: "Nikolas Britton" <nikolas.britton@gmail.com> To: "Don O'Neil" <don@lizardhill.com> Cc: freebsd-questions@freebsd.org Subject: Re: System Burn In Message-ID: <ef10de9a0603011118h71c0b487m6a90f8c3ecc584b4@mail.gmail.com> In-Reply-To: <ef10de9a0602281609x9d3632fj6841d3869fd5a47d@mail.gmail.com> References: <058101c63ca5$585d0780$0300020a@mickey> <ef10de9a0602281609x9d3632fj6841d3869fd5a47d@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On 2/28/06, Nikolas Britton <nikolas.britton@gmail.com> wrote: > On 2/28/06, Don O'Neil <don@lizardhill.com> wrote: > > What is the best way to 'burn in' or 'stress test' a new system w/ Free= BSD? > > I'd like to stress test the CPU, Memory, Disk, etc.. To make sure the > > hardware is 100% good before putting it in production. > > > > Maybe try http://www.holm.cc/stress/ but this would be like > forkbombing. If the system locks it may be a kernel problem, not be a > hardware problem. > > Check out UBCD, just type it into google. > I did not have alot of time on hand the first time I posted this so let me go over it again. The main problem that Kris and I pointed out with using kernel regression testing tools is that by design the software is trying to crash the kernel. This is not so great when your testing for hardware problems. Solid state electronics will fail predictively and it will often be a complete failure in such a way that the system won't boot, you have POST errors, or a card or peripheral device just won't function correctly. This type of problem is test for before it leaves the factory. If you had such a problem it would become apparent in preliminary testing. If you can install FreeBSD and successfully complete a make buildworld session I'll put money down that you don't have this type of problem. This leaves four things to test for: memory, PSU, overheating, and mechanical failures. Overheating (and PSU): In a controlled environment overheating problems are predictable and can be directly measured. The primary goal is to get all heat producing componets at maximum output all at the same time and then use a DMM with a thermocouple to proble all the hot spots. a "make -j4 buildworld" session will stress the CPU, Chipset, and RAM but it won't stress the disk drives or the video card GPUs, I don't know how to stress the GPUs (If your you using them for vector computation). If you have a RAID array you will also need to find a method[1] to stress this because an 8 disk array can easily produce just as much heat as the CPU, if not more. RAM failures: problems with memory are random events. The only sure fire way to test ram modules without doubt is a hardware based ram testing device but if you get random buildword failures (use Kris's -j(N) to maxout main memory) it's a good bet that you have a bad ram module. PSU Problems: Like ram failures PSU failures are (semi-)random events and it's often difficult to tell which problem you have because they can both produce the same symptoms. <rant> Never skimp on a PSU!... I just can't understand why people would skimp on the most important part of a computer, without this device you have no computer and with a crappy one you have a crappy computer. Always buy the best and overspec it! </rant> Doing the thermal testing (above) will also stress the PSU. If you have random reboots when thermal testing and your system is not overheating you may have a PSU problem. Mechanical failures (disk problems): Disk problems often have a bathtub curve[2] failure rate. If a disk has not failed in the first month of use and you protect it from overheating and G-force damage their is a high probability that the drive will hit and surpass it's MTBF estimate. I stress test all my drive using MHDD32[3]. It has a simple feature that will perform random seek access tests on the disk till hell freezes over, or the drive fails. I often just connect the drive, run the test, and let it do it's thing for a few days. After this is done I run a complete functional diagnostic test on the drive and pop it into the computer if it passes. It's also useful for producing random wear patterns on disks earmarked for RAID arrays. [1] Use something like iozone, in ports: benchmarks/iozone, http://www.iozone.org/ [2] http://en.wikipedia.org/wiki/Bathtub_curve#Bathtub_curve [3] You can find the MHDD32 utility on the Ultimate Boot CD (UBCD). -- BSD Podcasts @ http://bsdtalk.blogspot.com/
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?ef10de9a0603011118h71c0b487m6a90f8c3ecc584b4>