Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 09 Mar 2004 12:59:26 -0800
From:      Todd Meister <todd@lmi.net>
To:        "FreeBSD Stable" <freebsd-stable@freebsd.org>
Subject:   Re: Unexplained reboots with 4.9 
Message-ID:  <200403092059.i29KxQPa001706@drtboi.rdsl.lmi.net>
In-Reply-To: Your message of "Mon, 01 Mar 2004 08:59:16 PST." <200403010859.16585.dsilver@urchin.com> 

next in thread | previous in thread | raw e-mail | index | archive | help
Doug Silver writes:
>I recently brought up an old 700MHz P3 4.9 machine and added a 3ware IDE raid 
>card 7506-8 with 4 120Gb drives in a raid 5 array.  It will randomly reboot, 
>about once-per-day and even though I'm running a debug kernel, it does not 
>leave any crash information (which I assume just means that the kernel did 
>not panic and dump core).  After the first few times, I upgraded to a new 
>400W power supply.  The machine is not heavily loaded and its primary 
>function is for NFS/samba sharing.

This is an old thread (I'm playing catch-up with this list), but I just had 
a similar problem.  We had a new, 2U, P4 2.66GHz machine, all-SCSI, with an 
Adaptec 2100s RAID device (using the asr driver) doing RAID 5 with four 
drives, plus one spare, and a gigabyte of RAM.  This was our new mail server, 
and got about 150k to 200k connections a day.  We run MIME Defang with 
Clamav, and a lot of our users use Spamassassin.  So it was used pretty 
thoroughly, though it rarely hit a load greater than .6, and swap was nearly 
un-utilized.

The first four days it ran, it was fine.  Then it spontaneously rebooted 
one morning.  Two days later, it did the same thing.  Within a week, it 
would barely stay up a full 24 hours (we were going through a lot of 
troubleshooting during this time, BTW, not just standing around, picking 
our noses).  We ended up taking the whole thing down and reinstating our 
old, barely-sufficient system, while we tested the box.

I could go through a list of everything we tested, but won't bother racking 
my memory, unless someone really wants to hear it.  We ended up replacing 
nearly every piece of hardware but the case - NIC, M/B, RAID card, RAM - 
but nothing worked.  I was always pretty sure it was hardware-related, as 
we could never capture a panic, and by the time it got really bad (the 
day we replaced the M/B), I could watch it reboot almost as soon as it 
finished booting.

In the end, the culprit was exactly what I suspected from the beginning, but 
was assured it couldn't be - the riser card in the 2U case.  We don't have 
anything but circumstantial evidence pointing to that, but it's pretty sure.  
If we took the riser card out of the case, and plugged everything directly 
into the M/B (which required leaving the top of the case, of course), we could 
bludgeon the system with SMTP connections while running a disk I/O benchmarker 
and FTPing large amounts of data in variously-sized files back and forth.  If 
we put the card back in, it'd reboot in about three hours.

We switched to a 4U case, upgraded the system the Friday before last, and 
haven't had a problem, yet (fingers crossed, knocking on wood, etc.).

So I guess all this was just to say "beware the riser card."

-Todd



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200403092059.i29KxQPa001706>