Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 14 Nov 2007 13:12:39 -0500
From:      "Tamouh H." <hakmi@rogers.com>
To:        "'Barnaby Scott'" <bds@waywood.co.uk>, "'Derek Ragona'" <derek@computinginnovations.com>
Cc:        freebsd-questions@freebsd.org
Subject:   RE: Dell PE4600 RAID5 server failing
Message-ID:  <039801c826e9$f1804550$6700a8c0@tamouh>
In-Reply-To: <473B34C5.4030300@waywood.co.uk>
References:  <473B0D70.7020307@waywood.co.uk><6.0.0.22.2.20071114094712.024bfe38@mail.computinginnovations.com> <473B34C5.4030300@waywood.co.uk>

next in thread | previous in thread | raw e-mail | index | archive | help
>=20
> Derek Ragona wrote:
> > At 09:00 AM 11/14/2007, Barnaby Scott wrote:
> >> I suspect I already know the answer to this, which is that the=20
> >> trouble I am having is nothing to do with the OS at all,=20
> but I have=20
> >> to ask, because I am otherwise up against a total brick wall!
> >>
> >> I bought a second-hand Dell Poweredge 4600 and installed=20
> FreeBSD 6.2=20
> >> earlier this year. I had it set up with RAID5 using its PERC3/DC=20
> >> controller, with 7 x 73GB disks (+ 1 hot spare). So far so=20
> good, and=20
> >> it worked faultlessly as a Samba server for several months.
> >>
> >> At the beginning of October, it went down, reporting a mismatch=20
> >> between the configuration on the NVRAM and the disks. With=20
> help from=20
> >> Dell support, I managed to recreate the RAID array and it worked=20
> >> again for a month.
> >>
> >> In early November it happened again, and has kept=20
> happening since. At=20
> >> one point it appeared that the backplane was faulty, so I replaced=20
> >> that, but I cannot keep the server up for more than a day or so=20
> >> without this 'mismatch' poblem.
> >>
> >> What about diagnostics on the hardware you may ask? I have run all=20
> >> the diagnostic tools that Dell can supply - several times=20
> - and the=20
> >> server declares itself to be totally fault-free.
> >>
> >> My specific questions therefore:
> >>
> >> Is there any way at all that FreeBSD could be invloved with this=20
> >> problem? (I did notice for example that the Dell PERC3/DC=20
> controller=20
> >> was not in the list of supported hardware - but then=20
> again, why did=20
> >> it work for several months?)
> >>
> >> Can I use FreeBSD to tell me anything about the fault that Dell's=20
> >> diagnostic tools haven't found?
> >>
> >> (I do hope someone might be able to help - Dell are trying=20
> to get me=20
> >> to switch to a 'supported' OS!)
> >>
> >>
> >> Thanks
> >>
> >> Barnaby Scott
> >=20
> > It doesn't sound like any OS issue as you set up the RAID=20
> outside the=20
> > OS.  It may be a bad drive or drive(s).  Most RAID drives have RAID=20
> > information written to the drives, and if this becomes=20
> unreadable you=20
> > will have RAID faults.
> >=20
> > Another likely culprit is heat.  Overheating drives often=20
> fail.  Are=20
> > you sure the temperatures in the drive enclosure is OK?
> >=20
> > If you can, run diagnostics on the drives, this usually requires=20
> > running these with the drives taken out of the RAID array though.
> >=20
> >         -Derek
> >=20
>=20
> Thanks for replying - as I said, this is a long shot trying=20
> to see if there is any OS involvement.
>=20
> The drives are fine - I have used two different tools to=20
> analyse them while the computer is booted from a live CD and=20
> the RAID configuration cleared on the controller. Besides,=20
> you would expect one drive to fail at a time, and if this=20
> happened, the hot spare would surely be pressed into service.=20
> Nothing like this has happened though - the controller is=20
> reporting several drives (not always the same ones) failed=20
> simultaneously, but when the array is re-created from the=20
> disks, everything works fine. Problem is, it goes down again=20
> a day or so later.
>=20
> As for heat, there is nothing being reported there and the=20
> fans that cool that area are working.
>=20
> Any other ideas gratefully received!
>=20
> Barnaby Scott

This is very unlikely to be OS related. But here are few pointers:

1) Check the make/model of the drives. Certain types of make/model SCSI =
drives had a glitch in them a while ago with a certain firmware that =
they'd disconnect from a RAID. I had a personal experience with these =
ones (Seagate U320).

2) What did happen in October? Anything hardware, software, power wise =
has occurred ?

3) NVRAM and Disk mismatch, I'd say check the controller, backup battery =
present but weak ?

4) Unlikely to be the source, but run a test on your physical RAM using =
MEMTEST86+ and check the power supply is sufficient and working =
properly.

=20





Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?039801c826e9$f1804550$6700a8c0>