Date: Mon, 13 Dec 2004 16:03:06 -0500 From: Paul Mather <paul@gromit.dlib.vt.edu> To: Doug White <dwhite@gumbysoft.com> Cc: =?ISO-8859-1?Q?S=F8ren?= Schmidt <sos@DeepCore.dk> Subject: Re: drive failure during rebuild causes page fault Message-ID: <1102971786.7399.24.camel@zappa.Chelsea-Ct.Org> In-Reply-To: <20041213102333.V92964@carver.gumbysoft.com> References: <20041213052628.GB78120@meer.net> <20041213054159.GC78120@meer.net><20041213060549.GE78120@meer.net> <20041213102333.V92964@carver.gumbysoft.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, 2004-12-13 at 10:28 -0800, Doug White wrote: > On Sun, 12 Dec 2004, Joe Rhett wrote: > > > On Sun, Dec 12, 2004 at 09:59:16PM -0800, Doug White wrote: > > > Thats a nice shotgun you have there. > > > > Yessir. And that's what testing is designed to uncover. The question is > > why this works, and how do we prevent it? > > I'm sure Soren appreciates you donating your feet to the cause :) > > Why it works: the system assumes the administrator is competent enough to > not yank a disk that is being rebuilt to. That's not quite fair. He was obviously testing to see how resilient ATA RAID is to drive failures during rebuilding, as part of a series of tests. (Obviously, it is not.) If you look at his original message, he did not even "yank" the disk. He detached it in a somewhat orderly fashion using "atacontrol detach." (One can argue that physically yanking it might have been a more accurate, if more severe failure test.) This makes the ensuing panic even more sad. (Would the same panic result if the disk being rebuilt fell victim to one of those "TIMEOUT - WRITE_DMA" errors that are in vogue nowadays and was detached by the system? I get those errors occasionally [never used to under 5.1 on the exact same hardware] but my geom_mirror has coped with it so far, thankfully.) It's reasonable to conduct simulated failure testing of ATA RAID (or others such as geom_mirror and geom_vinum) prior to adopting it on your system. I know I did in the case of ATA RAID and abandoned it precisely because it turned out for me to be too flaky when it came to error recovery. Cheers, Paul. -- e-mail: paul@gromit.dlib.vt.edu "Without music to decorate it, time is just a bunch of boring production deadlines or dates by which bills must be paid." --- Frank Vincent Zappa
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1102971786.7399.24.camel>