Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 13 Dec 2004 16:03:06 -0500
From:      Paul Mather <paul@gromit.dlib.vt.edu>
To:        Doug White <dwhite@gumbysoft.com>
Cc:        =?ISO-8859-1?Q?S=F8ren?= Schmidt <sos@DeepCore.dk>
Subject:   Re: drive failure during rebuild causes page fault
Message-ID:  <1102971786.7399.24.camel@zappa.Chelsea-Ct.Org>
In-Reply-To: <20041213102333.V92964@carver.gumbysoft.com>
References:  <20041213052628.GB78120@meer.net> <20041213054159.GC78120@meer.net><20041213060549.GE78120@meer.net> <20041213102333.V92964@carver.gumbysoft.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, 2004-12-13 at 10:28 -0800, Doug White wrote:
> On Sun, 12 Dec 2004, Joe Rhett wrote:
> 
> > On Sun, Dec 12, 2004 at 09:59:16PM -0800, Doug White wrote:
> > > Thats a nice shotgun you have there.
> >
> > Yessir.  And that's what testing is designed to uncover.  The question is
> > why this works, and how do we prevent it?
> 
> I'm sure Soren appreciates you donating your feet to the cause :)
> 
> Why it works: the system assumes the administrator is competent enough to
> not yank a disk that is being rebuilt to.

That's not quite fair.  He was obviously testing to see how resilient
ATA RAID is to drive failures during rebuilding, as part of a series of
tests.  (Obviously, it is not.)  If you look at his original message, he
did not even "yank" the disk.  He detached it in a somewhat orderly
fashion using "atacontrol detach."  (One can argue that physically
yanking it might have been a more accurate, if more severe failure
test.)  This makes the ensuing panic even more sad.  (Would the same
panic result if the disk being rebuilt fell victim to one of those
"TIMEOUT - WRITE_DMA" errors that are in vogue nowadays and was detached
by the system?  I get those errors occasionally [never used to under 5.1
on the exact same hardware] but my geom_mirror has coped with it so far,
thankfully.)

It's reasonable to conduct simulated failure testing of ATA RAID (or
others such as geom_mirror and geom_vinum) prior to adopting it on your
system.  I know I did in the case of ATA RAID and abandoned it precisely
because it turned out for me to be too flaky when it came to error
recovery.

Cheers,

Paul.
-- 
e-mail: paul@gromit.dlib.vt.edu

"Without music to decorate it, time is just a bunch of boring production
 deadlines or dates by which bills must be paid."
        --- Frank Vincent Zappa



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1102971786.7399.24.camel>