From owner-freebsd-stable@FreeBSD.ORG Wed Aug 6 15:28:15 2008 Return-Path: Delivered-To: freebsd-stable@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 2E047106566B; Wed, 6 Aug 2008 15:28:15 +0000 (UTC) (envelope-from jdc@parodius.com) Received: from mx01.sc1.parodius.com (mx01.sc1.parodius.com [72.20.106.3]) by mx1.freebsd.org (Postfix) with ESMTP id 1C0968FC16; Wed, 6 Aug 2008 15:28:15 +0000 (UTC) (envelope-from jdc@parodius.com) Received: by mx01.sc1.parodius.com (Postfix, from userid 1000) id 00D6D1CC0B3; Wed, 6 Aug 2008 08:28:15 -0700 (PDT) Date: Wed, 6 Aug 2008 08:28:14 -0700 From: Jeremy Chadwick To: "Sean C. Farley" Message-ID: <20080806152814.GA65023@eos.sc1.parodius.com> References: <20080806033016.GA35921@eos.sc1.parodius.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.18 (2008-05-17) Cc: freebsd-stable@FreeBSD.org Subject: Re: Stuck in geli X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 06 Aug 2008 15:28:15 -0000 On Wed, Aug 06, 2008 at 09:24:20AM -0500, Sean C. Farley wrote: > On Tue, 5 Aug 2008, Jeremy Chadwick wrote: >> After reading my above Wiki page, I hope you consider disabling >> MatrixRAID and avoiding it entirely on FreeBSD. There are patches to >> address major issues which have been sitting untouched, despite >> patches included, for 2+ years. Draw your own conclusions. > > Yuck. > > I used the on-board "RAID", so the system could dual-boot Windows XP and > FreeBSD. > > Is there any way to use gmirror to mirror the entire disk with XP on one > slice and FreeBSD on another? ;) OK. I think I know that answer. > Does XP have software RAID1? I can setup XP on one slice and gmirror on > another. It seems unlikely that XP would natively have a form of software RAID-1 since it's deemed a desktop system, while the Server products (e.g. 2003 and 2008) probably do. A multi-OS-friendly solution might be to pick up a cheap Promise RAID controller and use that, since drivers are available for Windows XP and the card works without hitches on FreeBSD (just install FreeBSD on the ar0 device, voila). Keep in mind that I haven't tested a failure scenario on those Promise cards (e.g. 2 disks in a RAID1 config, pull the disk FreeBSD booted off of while the OS is up and see what happens), but I can if someone is curious. I'd really like to see those Intel MatrixRAID bugs addressed. It's available on many server-class boxes (hello Supermicro), and it's *incredibly* useful for admins with 2-disk servers that want a form of failover in the case one of the disks dies -- and want that form of failover nearly transparent to the underlying OS (most of us want this because we don't want to deal with the "oh look, FreeBSD doesn't know how to boot off this" situation). Some Supermicro boxes even let you pick between using Intel MatrixRAID or Adaptec HostRAID, via a BIOS option (yes really!). FreeBSD has bugs with MatrixRAID, and doesn't appear to support HostRAID at all. > Is mirroring a slice any easier today? I followed information from the > following links to do this before on my server: > http://lists.freebsd.org/pipermail/freebsd-stable/2005-February/011699.html Not sure, I've not any experience with gmirror. > I forget. MatrixRAID does not destroy any data if RAID1 is disabled. > Correct? I remember playing with MatrixRAID on one of our PDSMi+ boxes. After I encountered aforementioned FreeBSD issues with MatrixRAID, I went into the MatrixRAID BIOS and chose to delete the array. If I remember right, I was asked if while deleting the array I wanted to "delete the data". The question was phrased in such a way which made me wonder, "are they talking about the MatrixRAID metadata, or do they mean wiping the first 512 bytes of each disk?" The ambiguity of the question and the way the on-screen details were written made me unsure how to answer it. I believe I chose "No" regardless. When FreeBSD booted (off of one of the two disks), the bootloader worked but claimed it couldn't find any filesystems. Now, I'm not sure if that's expected (maybe the C/H/S data looks different under MatrixRAID than without? I don't know). The best answer to your question would be: "back everything you have up before doing it". Not the answer you want to hear, I'm sure, but it's the safe one. >> Also, you won't be able to kill -9 a process in that state. The >> kernel (or some piece of it) is hung, not the process. The fact that >> a reboot is required also does not surprise me. >> >> You *might* have been able to detach the ATA/SATA channel using >> atacontrol to get access to the system, but then again it might result >> in a system panic (see Wiki). > > I did not feel safe even without a possible panic to detach the channels > and attach them again. Would I not suffer data loss with everything > mounted? With MatrixRAID? Oh yes, definitely, and probably worse than if you weren't using it. Without MatrixRAID? Also definitely, but hopefully fsck would fix the problems. Take a look at PR 108924, specifically my step-by-step comments. I detach a channel (done automatically by yanking the disk; FreeBSD at least knows when to detach the channel on its own) with filesystems mounted via ar0. The outcome is not pretty. Ideally, if you did "atacontrol detach ata0" OR if the disk fell off the bus, the array should go into degraded mode. There should be no data loss, because even though you just lost ata0 (ad0), you still have ata1 (ad1). Ideally, you would address the problem, then do "atacontrol attach ata0". The array would start rebuilding, then eventually be fine. But consider what happens when the kernel panics upon reattach -- you've just guaranteed data loss on the other disk in the mirror, and you've probably just horked whatever data was possibly written to the disk you just reattached (looking at it from a MatrixRAID BIOS perspective, since I have a feeling it does some stuff on its own). And then, making matters even worse, consider PR 102210 -- since a kernel panic induces a reboot....... Now I'm sure you see how severe this situation is. This is the exact sort of situation people try to avoid by using RAID-1, yet by using it, they're taking the exact risks they're trying to avoid. Quite ironic, isn't it. -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB |