From owner-freebsd-stable@FreeBSD.ORG  Fri Jan 13 18:26:24 2006
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
X-Original-To: freebsd-stable@freebsd.org
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 4AB5F16A41F
	for <freebsd-stable@freebsd.org>; Fri, 13 Jan 2006 18:26:24 +0000 (GMT)
	(envelope-from ambrisko@ambrisko.com)
Received: from mail.ambrisko.com (mail.ambrisko.com [64.174.51.43])
	by mx1.FreeBSD.org (Postfix) with ESMTP id C585F43D48
	for <freebsd-stable@freebsd.org>; Fri, 13 Jan 2006 18:26:23 +0000 (GMT)
	(envelope-from ambrisko@ambrisko.com)
Received: from server2.ambrisko.com (HELO www.ambrisko.com) ([192.168.1.2])
	by mail.ambrisko.com with ESMTP; 13 Jan 2006 10:26:23 -0800
Received: from ambrisko.com (localhost [127.0.0.1])
	by www.ambrisko.com (8.12.11/8.12.9) with ESMTP id k0DIQNcr088342;
	Fri, 13 Jan 2006 10:26:23 -0800 (PST)
	(envelope-from ambrisko@ambrisko.com)
Received: (from ambrisko@localhost)
	by ambrisko.com (8.12.11/8.12.11/Submit) id k0DIQNeY088341;
	Fri, 13 Jan 2006 10:26:23 -0800 (PST) (envelope-from ambrisko)
From: Doug Ambrisko <ambrisko@ambrisko.com>
Message-Id: <200601131826.k0DIQNeY088341@ambrisko.com>
In-Reply-To: <6.2.3.4.0.20060113125258.045378d8@64.7.153.2>
To: Mike Tancsa <mike@sentex.net>
Date: Fri, 13 Jan 2006 10:26:23 -0800 (PST)
X-Mailer: ELM [version 2.4ME+ PL94b (25)]
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Type: text/plain; charset=US-ASCII
Cc: freebsd-stable@freebsd.org
Subject: Re: 6.0 on Dell 1850 with PERC4e/DC RAID?
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 13 Jan 2006 18:26:24 -0000

Mike Tancsa writes:
| At 11:59 AM 13/01/2006, Doug Ambrisko wrote:
| >|
| >| That's lame.  Under what condition does it happen, do you know?
| >
| >Running RAID 10, a drive was swapped and the rebuild started on the
| >replacement drive.  The rebuild complained about the source drive
| >for the mirror rebuild having read errors that couldn't be recovered.
| >It continued on and finished re-creating the mirror.  Then the RAID
| >proceeeded onto a background init which they normal did and started
| >failing that and re-starting the background init over and over again.
| >The box changed the RAID from degraded to optimal when the rebuild
| >completed (with errors).  Do a dd of the entire RAID logical device
| >returned an error at the bad sector since it couldn't recover that.
| >The RAID controller reported an I/O error and still left the RAID as
| >optimal.
| >
| >We reported this and where told that's the way it is designed :-(
| 
| Interesting timing as I ran into this sort of situation on the 
| weekend on a 3ware drive in RAID1. The card had complained for a week 
| about read errors on drive 1. We thought we would wait until the 
| weekend maintenance window to swap it out.  Sadly, before that 
| window, drive zero totally died a horrible death.  We popped in a new 
| drive on port zero, started the rebuild, and it crapped out saying 
| there was a read error on drive 1.  However, there is a check box 
| that says continue the build, even with errors on the source drive.

With Adaptec we used to do a verify of each disk before a swap
to increase our chances of a successful disk swap.  Adaptec was
a little heavy handed in if you are running on the last disk of the
mirror and it has a read-error it will fail the drive.  If you have
a RAID 10 then you lose 1/2 the file system :-(  I'd rather just
get the read error back to the OS then loose the entire drive.
 
| This setup seems to give you the best of both worlds.  We did a quick 
| check of the resultant files compared to backups and only a couple 
| were toasted. (The box is going to be retired in a month, so if there 
| is other hidden fs corruption if it holds out for another 3 weeks we 
| dont care too much). The correct approach would be to do a total 
| restore of course, but this was good enough for us in this 
| situation.  I guess the question is, is this RAID1 in a proper mirror 
| given that there are hard errors on the drive on port 1 ?

That sounds like a good controller assuming it says the RAID is still
degraded and it's not optimal.  I assume "optimal" means everything
is fine and safe to read the entire volume.

Doug A.