From owner-freebsd-stable@FreeBSD.ORG Mon Dec 13 06:06:20 2004 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id D114516A4CE for ; Mon, 13 Dec 2004 06:06:20 +0000 (GMT) Received: from outbound0.sv.meer.net (outbound0.sv.meer.net [205.217.152.13]) by mx1.FreeBSD.org (Postfix) with ESMTP id B49B243D2F for ; Mon, 13 Dec 2004 06:06:20 +0000 (GMT) (envelope-from jrhett@mail.meer.net) Received: from mail.meer.net (mail.meer.net [209.157.152.14]) iBD66FwN004794; Sun, 12 Dec 2004 22:06:15 -0800 (PST) (envelope-from jrhett@mail.meer.net) Received: from mail.meer.net (localhost [127.0.0.1]) by mail.meer.net (8.12.10/8.12.10/meer) with ESMTP id iBD65qCR090395; Sun, 12 Dec 2004 22:05:53 -0800 (PST) (envelope-from jrhett@mail.meer.net) Received: (from jrhett@localhost) by mail.meer.net (8.12.1/8.12.10) id iBD65ovB090387; Sun, 12 Dec 2004 22:05:50 -0800 (PST) (envelope-from jrhett) Date: Sun, 12 Dec 2004 22:05:50 -0800 From: Joe Rhett To: Doug White Message-ID: <20041213060549.GE78120@meer.net> Mail-Followup-To: Doug White , freebsd-stable@freebsd.org, =?iso-8859-1?Q?S=F8ren?= Schmidt References: <20041213052628.GB78120@meer.net> <20041213054159.GC78120@meer.net> <20041212215841.X83257@carver.gumbysoft.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20041212215841.X83257@carver.gumbysoft.com> User-Agent: Mutt/1.4i Organization: Meer.net LLC cc: freebsd-stable@freebsd.org cc: =?iso-8859-1?Q?S=F8ren?= Schmidt Subject: Re: drive failure during rebuild causes page fault X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 13 Dec 2004 06:06:20 -0000 > On Sun, 12 Dec 2004, Joe Rhett wrote: > > And another, I can now confirm that it is fairly easy to kill 5.3-release > > during the rebuilding process. The following steps will cause a kernel > > page fault consistently: > > > > atacontrol create RAID0 ad6 ad10 > > atacontrol detach 5 > > log: ad10 deleted from ar0 disk1 > > log: ad10 WARNING - removed from configuration > > atacontrol addspare 0 ad8 > > log: ad8 inserted into ar0 disk1 as spare > > atacontrol rebuild 0 > > atacontrol detach 4 > > log: ad8 deleted from ar0 disk1 > > log: ad8 WARNING - removed from configuration > > > > Fatal trap 12: page fault while in kernel mode > > fault virtual address = 0x10 On Sun, Dec 12, 2004 at 09:59:16PM -0800, Doug White wrote: > Thats a nice shotgun you have there. Yessir. And that's what testing is designed to uncover. The question is why this works, and how do we prevent it? Is there a proper way to handle these sort of events? If so, where is it documented? And fyi just pulling the drives causes the same failure so that means that RAID1 buys you nothing because your system will also crash. -- Joe Rhett Senior Geek Meer.net