From owner-freebsd-questions@FreeBSD.ORG Wed Nov 14 18:37:39 2007 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 43B8A16A417 for ; Wed, 14 Nov 2007 18:37:39 +0000 (UTC) (envelope-from hakmi@rogers.com) Received: from smtp108.rog.mail.re2.yahoo.com (smtp108.rog.mail.re2.yahoo.com [68.142.225.206]) by mx1.freebsd.org (Postfix) with SMTP id E410913C459 for ; Wed, 14 Nov 2007 18:37:38 +0000 (UTC) (envelope-from hakmi@rogers.com) Received: (qmail 34425 invoked from network); 14 Nov 2007 18:10:58 -0000 DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=rogers.com; h=Received:X-YMail-OSG:From:To:Cc:References:Subject:Date:Message-ID:MIME-Version:Content-Type:Content-Transfer-Encoding:X-Mailer:Thread-Index:X-MimeOLE:In-Reply-To; b=KwnuwuR8+yICvk5yNp5RGkHHND3iHeKIQU1mEayEqOc1wThBxBhZeNAoWIJFW3THbn+J7v1dOIM97L7qWeHCgSu14NzrC43dcyNML3xLWTICbGxe8hw/JAv76Bjo60aDra7Jgh4C0eEibYiXe8RVyYd4yrrMjlQeUDCqskf29O0= ; Received: from unknown (HELO tamouh) (hakmi@rogers.com@99.224.65.182 with login) by smtp108.rog.mail.re2.yahoo.com with SMTP; 14 Nov 2007 18:10:58 -0000 X-YMail-OSG: OTJwohMVM1lfYTxd7MOF7jFiauaPaEQSlRI5r_D0.eZsxvFAa3CE9RN4s5nm4HYtQg-- From: "Tamouh H." To: "'Barnaby Scott'" , "'Derek Ragona'" References: <473B0D70.7020307@waywood.co.uk><6.0.0.22.2.20071114094712.024bfe38@mail.computinginnovations.com> <473B34C5.4030300@waywood.co.uk> Date: Wed, 14 Nov 2007 13:12:39 -0500 Message-ID: <039801c826e9$f1804550$6700a8c0@tamouh> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable X-Mailer: Microsoft Office Outlook 11 Thread-Index: Acgm5pe6EK8MDFldQrmS+SGSPuJ9egAAq/MQ X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.3198 In-Reply-To: <473B34C5.4030300@waywood.co.uk> Cc: freebsd-questions@freebsd.org Subject: RE: Dell PE4600 RAID5 server failing X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 14 Nov 2007 18:37:39 -0000 >=20 > Derek Ragona wrote: > > At 09:00 AM 11/14/2007, Barnaby Scott wrote: > >> I suspect I already know the answer to this, which is that the=20 > >> trouble I am having is nothing to do with the OS at all,=20 > but I have=20 > >> to ask, because I am otherwise up against a total brick wall! > >> > >> I bought a second-hand Dell Poweredge 4600 and installed=20 > FreeBSD 6.2=20 > >> earlier this year. I had it set up with RAID5 using its PERC3/DC=20 > >> controller, with 7 x 73GB disks (+ 1 hot spare). So far so=20 > good, and=20 > >> it worked faultlessly as a Samba server for several months. > >> > >> At the beginning of October, it went down, reporting a mismatch=20 > >> between the configuration on the NVRAM and the disks. With=20 > help from=20 > >> Dell support, I managed to recreate the RAID array and it worked=20 > >> again for a month. > >> > >> In early November it happened again, and has kept=20 > happening since. At=20 > >> one point it appeared that the backplane was faulty, so I replaced=20 > >> that, but I cannot keep the server up for more than a day or so=20 > >> without this 'mismatch' poblem. > >> > >> What about diagnostics on the hardware you may ask? I have run all=20 > >> the diagnostic tools that Dell can supply - several times=20 > - and the=20 > >> server declares itself to be totally fault-free. > >> > >> My specific questions therefore: > >> > >> Is there any way at all that FreeBSD could be invloved with this=20 > >> problem? (I did notice for example that the Dell PERC3/DC=20 > controller=20 > >> was not in the list of supported hardware - but then=20 > again, why did=20 > >> it work for several months?) > >> > >> Can I use FreeBSD to tell me anything about the fault that Dell's=20 > >> diagnostic tools haven't found? > >> > >> (I do hope someone might be able to help - Dell are trying=20 > to get me=20 > >> to switch to a 'supported' OS!) > >> > >> > >> Thanks > >> > >> Barnaby Scott > >=20 > > It doesn't sound like any OS issue as you set up the RAID=20 > outside the=20 > > OS. It may be a bad drive or drive(s). Most RAID drives have RAID=20 > > information written to the drives, and if this becomes=20 > unreadable you=20 > > will have RAID faults. > >=20 > > Another likely culprit is heat. Overheating drives often=20 > fail. Are=20 > > you sure the temperatures in the drive enclosure is OK? > >=20 > > If you can, run diagnostics on the drives, this usually requires=20 > > running these with the drives taken out of the RAID array though. > >=20 > > -Derek > >=20 >=20 > Thanks for replying - as I said, this is a long shot trying=20 > to see if there is any OS involvement. >=20 > The drives are fine - I have used two different tools to=20 > analyse them while the computer is booted from a live CD and=20 > the RAID configuration cleared on the controller. Besides,=20 > you would expect one drive to fail at a time, and if this=20 > happened, the hot spare would surely be pressed into service.=20 > Nothing like this has happened though - the controller is=20 > reporting several drives (not always the same ones) failed=20 > simultaneously, but when the array is re-created from the=20 > disks, everything works fine. Problem is, it goes down again=20 > a day or so later. >=20 > As for heat, there is nothing being reported there and the=20 > fans that cool that area are working. >=20 > Any other ideas gratefully received! >=20 > Barnaby Scott This is very unlikely to be OS related. But here are few pointers: 1) Check the make/model of the drives. Certain types of make/model SCSI = drives had a glitch in them a while ago with a certain firmware that = they'd disconnect from a RAID. I had a personal experience with these = ones (Seagate U320). 2) What did happen in October? Anything hardware, software, power wise = has occurred ? 3) NVRAM and Disk mismatch, I'd say check the controller, backup battery = present but weak ? 4) Unlikely to be the source, but run a test on your physical RAM using = MEMTEST86+ and check the power supply is sufficient and working = properly. =20