Date: Wed, 18 May 2005 17:20:16 -0700 From: Joe Rhett <jrhett@meer.net> To: =?iso-8859-1?Q?S=F8ren?= Schmidt <sos@DeepCore.dk>, freebsd-stable@freebsd.org Subject: Re: drive failure during rebuild causes page fault Message-ID: <20050519002015.GA25329@meer.net> In-Reply-To: <20041215005359.GK27283@meer.net> References: <20041213052628.GB78120@meer.net> <20041213054159.GC78120@meer.net> <20041212215841.X83257@carver.gumbysoft.com> <20041213060549.GE78120@meer.net> <20041213102333.V92964@carver.gumbysoft.com> <20041213192119.GB4781@meer.net> <20041213183336.T97507@carver.gumbysoft.com> <41BE8F2D.8000407@DeepCore.dk> <20041215005359.GK27283@meer.net>
next in thread | previous in thread | raw e-mail | index | archive | help
Soren, I've just retested all of this with 5.4-REL and most of the problems listed here are solved. The only problems appear to be related to these ghost arrays that appear when it finds a drive that was taken offline earlier. For example, pull a drive and then reboot the system. 1. If you reboot the system you can delete the array cleanly, but it returns next time. I can't figure out how to make this information go away, and I've tried low-level formatting the disks :-( 2. Removing the array using "atacontrol delete" after an "atacontrol reinit channel" will always produce a page fault. For example, if you have only a single array in a system and you lose a drive, and then it returns later.. # atacontrol status 1 atacontrol: ioctl(ATARAIDSTATUS): Device not configured # atacontrol reinit 5 ...finds disk # atacontrol status 1 ar1: ATA RAID1 subdisks: DOWN DOWN status: DEGRADED # atacontrol delete 1 *Page Fault* We can't run -current, so I'm hoping to find options to work with this as is. If you know for a fact that this has changed in the mkIII patches then I'd be willing to investigate, but I will need to be certain. I know that you have no desire to work on this older code, but could you at least clue me in on how to get atacontrol to drop these ghost arrays? On Tue, Dec 14, 2004 at 04:53:59PM -0800, Joe Rhett wrote: > Soren, do you have any thoughts on what I could do to alleviate or better > debug this page fault? I've found three ways to cause this: > in all cases "pull" is either physical pull or "atacontrol detach <channel>" > > 1. Pull a drive and rebuild onto hot spare. Pull hot spare *boom* > > 2. Pull a drive and rebuild onto hot spare. Pull good disk *boom* > ...should cause filesystem failure, but not page fault when it's not / > > 3. Pull a drive and then put it back. The system suddenly has a new array > with just that drive in it. "atacontrol delete <new-array>" *boom* > > In particular, what's the story with the new array appearing when you > insert a drive with array meta-data on it? That array appears to be > half-there (no devices, etc) which is probably what causes #2... > > On Tue, Dec 14, 2004 at 07:58:53AM +0100, Søren Schmidt wrote: > > Actually I'm in the process of rewriting the ATA RAID code, so things > > are rolling, albeit slowly, time is a precious resource. I belive that > > it can be made pretty robust, but the rest of the kernel still have > > issues with disappearing devices etc thats out of ATA's realm. > > > > Anyhow. I can only test with the HW I have here in the lab, which by far > > covers all possible permutations, so testing etc by the community is > > very much needed here to get things sorted out... > > -- > Joe Rhett > Senior Geek > Meer.net > _______________________________________________ > freebsd-stable@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org" -- Joe Rhett senior geek meer.net
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20050519002015.GA25329>