From owner-freebsd-questions@FreeBSD.ORG Thu Jun 19 10:11:59 2008 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 81CF3106564A for ; Thu, 19 Jun 2008 10:11:59 +0000 (UTC) (envelope-from daniel_k_eriksson@telia.com) Received: from pne-smtpout1-sn2.hy.skanova.net (pne-smtpout1-sn2.hy.skanova.net [81.228.8.83]) by mx1.freebsd.org (Postfix) with ESMTP id 3AA338FC0A for ; Thu, 19 Jun 2008 10:11:59 +0000 (UTC) (envelope-from daniel_k_eriksson@telia.com) Received: from royal64.emp.zapto.org (195.198.193.168) by pne-smtpout1-sn2.hy.skanova.net (7.3.129) id 483EBD68004422F9; Thu, 19 Jun 2008 11:02:16 +0200 MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: quoted-printable Content-class: urn:content-classes:message X-MimeOLE: Produced By Microsoft Exchange V6.5.7235.2 Date: Thu, 19 Jun 2008 11:02:14 +0200 Message-ID: <4F9C9299A10AE74E89EA580D14AA10A61A1947@royal64.emp.zapto.org> In-Reply-To: <2812.71.63.150.244.1213842028.squirrel@www.pictureprints.net> X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: "Fixing" a RAID thread-index: AcjRs0VCi6C4VfTGRH+R2uLsWymNOQAL7kSA References: <2812.71.63.150.244.1213842028.squirrel@www.pictureprints.net> From: "Daniel Eriksson" To: Cc: ryan.coleman@cwis.biz Subject: RE: "Fixing" a RAID X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 19 Jun 2008 10:11:59 -0000 Ryan Coleman wrote: > Jun 4 23:02:28 testserver kernel: ar0: 715425MB RAID5 (stripe 64 KB)> status: READY > Jun 4 23:02:28 testserver kernel: ar0: disk0 READY using ad13 at ata6-slave > Jun 4 23:02:28 testserver kernel: ar0: disk1 READY using ad16 at ata8-master > Jun 4 23:02:28 testserver kernel: ar0: disk2 READY using ad15 at ata7-slave > Jun 4 23:02:28 testserver kernel: ar0: disk3 READY using ad17 at ata8-slave > Jun 4 23:05:35 testserver kernel: g_vfs_done():ar0s1c[READ(offset=3D501963358208, length=3D16384)]error = =3D 5 > ... My guess is that the rebuild failure is due to unreadable sectors on one (or more) of the original three drives. I recently had this happen to me on an 8 x 1 TB RAID-5 array on a Highpoint RocketRAID 2340 controller. For some unknown reason two drives developed unreadable sectors within hours of each other. To make a long story short, the way I "fixed" this was to: 1. Used a tool I got from Highpoint tech-support to re-init the array information (so the array was no longer marked as broken). 2. Unplugged both drives and hooked them up to another computer using a regular SATA controller. 3. One of the drives was put through a complete "recondition" cycle(a). 4. The other drive was put through a partial "recondition" cycle(b). 5. I hooked up both drives to the 2340 controller again. The BIOS immediately marked the array as degraded (because it didn't recognize the wiped drive as part of the array), and I could re-add the wiped drive so a rebuild of the array could start. 6. I finally ran a "zpool scrub" on the tank, and restored the few files that had checksum errors. (a) I tried to run a SMART long selftest, but it failed. I then completely wiped the drive by writing zeroes to the entire surface, allowing the firmware to remap the bad sectors. After this procedure the long selftest succeeded. I finally used a diagnostic program from the drive vendor (Western Digital) to again verify that the drive was working properly. (b) The SMART long selftest failed the first time, but after running a surface scan using the diagnostic program from Western Digital the selftest passed. I'm pretty sure the diagnostic program remapped the bad sector, replacing it with a blank one. At least the program warned me to back up all data before starting the surface scan. Alternatively I could have used dd (with offset) to write to just the failed sector (available in the SMART selftest log). If I were you I would run all three drives through a SMART long selftest. I'm sure you'll find that at least one of them will fail the selftest. Use something like SpinRite 6 to recover the drive, or use dd / dd_rescue to copy the data to a fresh drive. Once all three of the original drives pass a long selftest the array should be able to finish a rebuild using a fourth (blank) drive. By the way, don't try to use SpinRite 6 on 1 TB drives, it will fail halfway through with a division-by-zero error. I haven't tried it on any 500 GB drives yet. /Daniel Eriksson