From owner-freebsd-stable@FreeBSD.ORG  Mon Dec 13 19:21:53 2004
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 42E5F16A4CE
	for <freebsd-stable@freebsd.org>;
	Mon, 13 Dec 2004 19:21:53 +0000 (GMT)
Received: from outbound0.sv.meer.net (outbound0.sv.meer.net [205.217.152.13])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 2408543D60
	for <freebsd-stable@freebsd.org>;
	Mon, 13 Dec 2004 19:21:53 +0000 (GMT)
	(envelope-from jrhett@mail.meer.net)
Received: from mail.meer.net (mail.meer.net [209.157.152.14])
	iBDJLbwR024172;	Mon, 13 Dec 2004 11:21:44 -0800 (PST)
	(envelope-from jrhett@mail.meer.net)
Received: from mail.meer.net (localhost [127.0.0.1])
	by mail.meer.net (8.12.10/8.12.10/meer) with ESMTP id iBDJLMFL013738;
	Mon, 13 Dec 2004 11:21:22 -0800 (PST)
	(envelope-from jrhett@mail.meer.net)
Received: (from jrhett@localhost)
	by mail.meer.net (8.12.1/8.12.10) id iBDJLK1J013730;
	Mon, 13 Dec 2004 11:21:20 -0800 (PST)
	(envelope-from jrhett)
Date: Mon, 13 Dec 2004 11:21:20 -0800
From: Joe Rhett <jrhett@meer.net>
To: Doug White <dwhite@gumbysoft.com>
Message-ID: <20041213192119.GB4781@meer.net>
Mail-Followup-To: Doug White <dwhite@gumbysoft.com>,
	freebsd-stable@FreeBSD.org,
	=?iso-8859-1?Q?S=F8ren?= Schmidt <sos@DeepCore.dk>
References: <20041213052628.GB78120@meer.net>
	<20041213054159.GC78120@meer.net> <20041212215841.X83257@carver.gumbysoft.com>
	<20041213060549.GE78120@meer.net> <20041213102333.V92964@carver.gumbysoft.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20041213102333.V92964@carver.gumbysoft.com>
User-Agent: Mutt/1.4i
Organization: Meer.net LLC
cc: freebsd-stable@freebsd.org
cc: =?iso-8859-1?Q?S=F8ren?= Schmidt <sos@DeepCore.dk>
Subject: Re: drive failure during rebuild causes page fault
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Production branch of FreeBSD source code
	<freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 13 Dec 2004 19:21:53 -0000

> > On Sun, Dec 12, 2004 at 09:59:16PM -0800, Doug White wrote:
> > > Thats a nice shotgun you have there.

> On Sun, 12 Dec 2004, Joe Rhett wrote:
> > Yessir.  And that's what testing is designed to uncover.  The question is
> > why this works, and how do we prevent it?
 
On Mon, Dec 13, 2004 at 10:28:53AM -0800, Doug White wrote:
> I'm sure Soren appreciates you donating your feet to the cause :)
 
That's what sandbox feet are for ;-)

> Why it works: the system assumes the administrator is competent enough to
> not yank a disk that is being rebuilt to.
 
Yes, I and most others are.  But that's a bad assumption. The issue is
fairly simple --  what occurs if the disk goes offline for a hardware 
failure?  For example, that SATA interface starts having problems.  We 
replace the drive, assuming it is the drive.  The rebuild starts, and the 
interface dies again.  Bam! There goes the system.  Not good.

Or, perhaps it's a DOA drive and it fails during the rebuild?

> > Is there a proper way to handle these sort of events?  If so, where is it
> > documented?
> >
> > And fyi just pulling the drives causes the same failure so that means that
> > RAID1 buys you nothing because your system will also crash.
> 
> This is why I don't trust ATA RAID for fault tolerance -- it'll save your
> data, but the system will tank.  Since the disk state is maintained by
> the OS and not abstracted by a separate processor, if a disk dies in a
> particularly bad way the system may not be able to cope.
 
Yes, but SATA isn't limited by this problem.  It does have a processor per
disk. (this is all SATA, if I didn't make that clear)

-- 
Joe Rhett
Senior Geek
Meer.net