From owner-freebsd-questions@FreeBSD.ORG Wed Nov 14 17:48:12 2007 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 6B32316A420 for ; Wed, 14 Nov 2007 17:48:12 +0000 (UTC) (envelope-from bds@waywood.co.uk) Received: from lon-mail-1.gradwell.net (lon-mail-1.gradwell.net [193.111.201.125]) by mx1.freebsd.org (Postfix) with ESMTP id CEF8E13C46A for ; Wed, 14 Nov 2007 17:48:11 +0000 (UTC) (envelope-from bds@waywood.co.uk) Received: from 81-6-241-84.dyn.gotadsl.co.uk ([81.6.241.84] helo=[192.168.1.6] country=GB ident=bds#pop3^waywood$co^uk) by lon-mail-1.gradwell.net with esmtpa (Gradwell gwh-smtpd 1.262) id 473b34d8.b815.45c; Wed, 14 Nov 2007 17:48:08 +0000 (envelope-sender ) Message-ID: <473B34C5.4030300@waywood.co.uk> Date: Wed, 14 Nov 2007 17:47:49 +0000 From: Barnaby Scott User-Agent: Thunderbird 2.0.0.6 (Windows/20070728) MIME-Version: 1.0 To: Derek Ragona References: <473B0D70.7020307@waywood.co.uk> <6.0.0.22.2.20071114094712.024bfe38@mail.computinginnovations.com> In-Reply-To: <6.0.0.22.2.20071114094712.024bfe38@mail.computinginnovations.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-questions@freebsd.org Subject: Re: Dell PE4600 RAID5 server failing X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 14 Nov 2007 17:48:12 -0000 Derek Ragona wrote: > At 09:00 AM 11/14/2007, Barnaby Scott wrote: >> I suspect I already know the answer to this, which is that the trouble >> I am having is nothing to do with the OS at all, but I have to ask, >> because I am otherwise up against a total brick wall! >> >> I bought a second-hand Dell Poweredge 4600 and installed FreeBSD 6.2 >> earlier this year. I had it set up with RAID5 using its PERC3/DC >> controller, with 7 x 73GB disks (+ 1 hot spare). So far so good, and >> it worked faultlessly as a Samba server for several months. >> >> At the beginning of October, it went down, reporting a mismatch >> between the configuration on the NVRAM and the disks. With help from >> Dell support, I managed to recreate the RAID array and it worked again >> for a month. >> >> In early November it happened again, and has kept happening since. At >> one point it appeared that the backplane was faulty, so I replaced >> that, but I cannot keep the server up for more than a day or so >> without this 'mismatch' poblem. >> >> What about diagnostics on the hardware you may ask? I have run all the >> diagnostic tools that Dell can supply - several times - and the server >> declares itself to be totally fault-free. >> >> My specific questions therefore: >> >> Is there any way at all that FreeBSD could be invloved with this >> problem? (I did notice for example that the Dell PERC3/DC controller >> was not in the list of supported hardware - but then again, why did it >> work for several months?) >> >> Can I use FreeBSD to tell me anything about the fault that Dell's >> diagnostic tools haven't found? >> >> (I do hope someone might be able to help - Dell are trying to get me >> to switch to a 'supported' OS!) >> >> >> Thanks >> >> Barnaby Scott > > It doesn't sound like any OS issue as you set up the RAID outside the > OS. It may be a bad drive or drive(s). Most RAID drives have RAID > information written to the drives, and if this becomes unreadable you > will have RAID faults. > > Another likely culprit is heat. Overheating drives often fail. Are you > sure the temperatures in the drive enclosure is OK? > > If you can, run diagnostics on the drives, this usually requires running > these with the drives taken out of the RAID array though. > > -Derek > Thanks for replying - as I said, this is a long shot trying to see if there is any OS involvement. The drives are fine - I have used two different tools to analyse them while the computer is booted from a live CD and the RAID configuration cleared on the controller. Besides, you would expect one drive to fail at a time, and if this happened, the hot spare would surely be pressed into service. Nothing like this has happened though - the controller is reporting several drives (not always the same ones) failed simultaneously, but when the array is re-created from the disks, everything works fine. Problem is, it goes down again a day or so later. As for heat, there is nothing being reported there and the fans that cool that area are working. Any other ideas gratefully received! Barnaby Scott