Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 9 Jan 2012 10:45:21 -0800
From:      Jeremy Chadwick <freebsd@jdc.parodius.com>
To:        Freddie Cash <fjwcash@gmail.com>
Cc:        FreeBSD Stable <freebsd-stable@freebsd.org>
Subject:   Re: Upgrade from 8.2-STABLE to 9.0-RELEASE wedges on SuperMicro H8DGiF-based system
Message-ID:  <20120109184521.GA95985@icarus.home.lan>
In-Reply-To: <CAOjFWZ4PYci9QrABntFf33-ZPdO%2BuR%2Bz8j0bnkJdCqEV_VHHig@mail.gmail.com>
References:  <CAOjFWZ6PbXCBoOinZRvXKmHDM8xWsYU657yPh5-i9TsmnFpdVg@mail.gmail.com> <F9A87D68-27E4-4872-A2F2-CD3F0F4D1BE4@jnielsen.net> <CAOjFWZ4PYci9QrABntFf33-ZPdO%2BuR%2Bz8j0bnkJdCqEV_VHHig@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, Jan 09, 2012 at 09:55:58AM -0800, Freddie Cash wrote:
> On Mon, Jan 9, 2012 at 9:50 AM, John Nielsen <lists@jnielsen.net> wrote:
> > From what you've said I strongly suspect that you have some kind of hardware issue. Dodgy RAM is my first guess, something cooling-related is my 2nd, and PSU is my 3rd. It is a little suspicious that you only started having problems after your upgrade but it could be coincidence or it could be something about the new software tickling the hardware differently than the old.
> 
> That's what we're leaning toward as well.  We're planning on doing a
> BIOS upgrade (betadrive is running v2.00 and alphadrive is v1.00),
> then a memtest86+ run, then check firmware on the SATA controllers.

For hardware/system troubleshooting advice:

1) BIOS upgrade -- since this is also what's responsible for ACPI bits
   and other "configuration model" pieces of a system,
2) BIOS settings -- make sure they're all 100% identical between both
   systems,
3) Controller firmware -- please make sure these are the same (your
   controllers between boxes appear to be the same model),
4) Flaky PSU -- possibly voltages drop or raise below/above levels which
   the mainboard can handle.  As someone who buys Supermicro exclusively
   for their systems, I can tell you that their PSUs ("Ablecom") are
   quite cheap/horrible.  It's worth purchasing a replacement -- if it
   doesn't turn out to be the problem, you now have a spare PSU (which
   is good to have -- our last systems failure was due to a blown PSU).
5) Flaky RAM -- memtest86+ can help here, mostly but not entirely.
6) Flaky mainboard -- it happens.  Really.  :-)

For OS advice:

Compare rc.conf, loader.conf, and so on.  For example, is one system
using powerd(8) while the other isn't?

> If none of the above helps, we're thinking of swapping the CPUs
> between the two systems to see if the problems stay with the box or
> follow the CPU.

I was helping out someone on a public forum earlier this week who
purchased a Dell desktop system that started behaving oddly.  memtest86+
claimed all his DIMMs were bad (regardless of slot), and replacement
DIMMs claimed the same thing.  Dell kept insisting he reload the OS,
else they can try a motherboard swap, blah blah blah.  What amused me
was that nobody looked at the CPU: Intel Core i3-550, which contains an
on-die MCH.  Chances are the MCH is going bad, which means time to
replace the CPU.

CPUs rarely go bad, but now with on-die MCHs, on-die VGA, etc. it's
becoming much more plausible that the physical CPU needs to be replaced.
They've become practically computers inside of a computer.  :-)

-- 
| Jeremy Chadwick                                jdc at parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                   Mountain View, CA, US |
| Making life hard for others since 1977.               PGP 4BD6C0CB |




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20120109184521.GA95985>