From owner-freebsd-stable@FreeBSD.ORG Fri Jan 23 18:26:21 2009 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 32B3B106566C for ; Fri, 23 Jan 2009 18:26:21 +0000 (UTC) (envelope-from bms@incunabulum.net) Received: from out3.smtp.messagingengine.com (out3.smtp.messagingengine.com [66.111.4.27]) by mx1.freebsd.org (Postfix) with ESMTP id 0896A8FC08 for ; Fri, 23 Jan 2009 18:26:21 +0000 (UTC) (envelope-from bms@incunabulum.net) Received: from compute1.internal (compute1.internal [10.202.2.41]) by out1.fastmail.fm (Postfix) with ESMTP id 61AB325602D for ; Fri, 23 Jan 2009 13:26:20 -0500 (EST) Received: from heartbeat1.messagingengine.com ([10.202.2.160]) by compute1.internal (MEProxy); Fri, 23 Jan 2009 13:26:20 -0500 X-Sasl-enc: VciPJduHHLPXRsYxzBVkAVthBf4KQt7cdoiKzEadxuka 1232735179 Received: from empiric.lon.incunabulum.net (82-35-112-254.cable.ubr07.dals.blueyonder.co.uk [82.35.112.254]) by mail.messagingengine.com (Postfix) with ESMTPSA id C69A7B371 for ; Fri, 23 Jan 2009 13:26:19 -0500 (EST) Message-ID: <497A0BCA.5070904@incunabulum.net> Date: Fri, 23 Jan 2009 18:26:18 +0000 From: Bruce M Simpson User-Agent: Thunderbird 2.0.0.19 (X11/20090116) MIME-Version: 1.0 To: FreeBSD STABLE X-Enigmail-Version: 0.95.6 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: A nasty ataraid experience. X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 23 Jan 2009 18:26:21 -0000 Hi there, I had a bit of a nasty experience this week with ataraid(4). I thought I would summarize the issues I ran into so hopefully others can benefit from my nasty experience, should they experience catastophic failure of a Pseudo-RAID. I was surprised that I was unable to find much in the way of similar experience in forums. Luckily, I didn't lose any data, thanks to mr ports/sysutils/scan_ffs. Bog simple setup. Gigabyte VM900M motherboard, Intel Core Duo T4300. JMicron JMB363 controller, two SATA ports, RAID mode. Two Seagate 160GB drives. I'll skip the stuff about the strange loop I got into with having to recreate the MBR and disklabels after they got trashed -- suffice to say, BACK THEM UP... BACK THEM UP... --- 1. atacontrol rebuild. There are a few issues here. I'm partly to blame here for not reading the documentation in the handbook about the rebuild process -- however -- hilarity ensued. Following the rebuild procedure in the Handbook, if you try to run "atacontrol rebuild" from the FreeBSD 7.1 LiveFS, it'll break. I ran it thinking that it had some kind of magic in it which I couldn't achieve using dd alone, which is partly true, but also partly not true. It has a hardcoded path to /usr/bin/nice, which it runs using the system() libc call, and unfortunately, the LiveFS is rooted at /mnt2. It does this after it issues an ioctl() to tell the ATA driver to copy and rewrite the meta-data to the new "spare" drive. Ooops. At this point the state of the array is inconsistent. "atacontrol status" will report the array as REBUILDING, despite the fact that the dd job never kicked off. Because the metadata has now been rewritten, and the ataraid driver has entered a REBUILDING state, you can't stop it, and it won't rebuild. I also found that the default dd block size it uses, 1m, didn't work with my drives -- I had to dd manually with a 64KB block size to get things to work, otherwise I got lots and lots of ATA read/write errors related to trying to write beyond the last part of the disk. The drives themselves are fine, though. HOMEWORK: "atacontrol rebuild" needs to be taught to play nice when run in a catastrophic recovery situation; the path stuff needs to be fixed, and perhaps some magic should be added to allow the metadata to be zapped when Really Bad Stuff happens. --- 2. raid metadata, and drive sizes. OK, the tradeoff with ataraid is that it is pseudo-raid. That's understood, however, it's easy for the metadata to be downright out of sync with After my bad experience with "atacontrol rebuild" from the LiveFS, to trick FreeBSD back into understanding that the array was in fact degraded, I had to read the ataraid driver code to figure out which LBA it was expecting to see the metadata at, and then wipe that out with dd. It doesn't help that the drives themselves are of different sizes. So. Imagine the hilarity when I just swap the drives and try to rebuild the array. Ooops. HOMEWORK: Is there a way to use the system partition stuff e.g. ATA SET MAX ADDRESS to get around this? Obviously it would mean losing a few sectors at the end of the drive, but it's a small price to pay for sanity with Pseudo-RAID. --- 2. RAID BIOS. I have been using a JMicron 36x PCI-e controller. Unfortunately, when stuff like the MBR is broken, it says nothing informative -- it will just skip to the next INT 19h handler. This is more something which should be thrown at the BIOS vendors -- I don't believe there isn't enough space in there to print a message which says "The drive geometry is invalid". HOMEWORK: Someone needs to throw a wobbly at the vendors. --- 3. fdisk and drive geometry. The root cause of my boot failure, it turned out, was that the drive geometry was "wrong" for the JMicron SATA RAID BIOS. It turns out sysinstall's guess at the drive geometry is "right" for LBA mode (C/H/S n/255/63), and thus "right" for my SATA controller BIOS. fdisk, however, wants to use a C/H/S of n/16/63 by default. Profanity ensues. HOMEWORK: Why does fdisk still assume 16 heads... ? Perhaps we should have a switch to tell it to use the LBA-style C/H/S converted geometry? --- Redux I understand why organisations pay good money for hardware RAID controllers, but given that time has moved on, this is kinda effed up. I shouldn't, in an ideal world, have to bust hours of ninja admin moves just to get a single RAID-ed home server back up and running. I also now understand that I can't rely on RAID alone to keep the integrity of my own data -- there is no substitute for backups, I just wish there were realistic backup solutions for individuals trying to do things with technology right now, without paying over the odds, or being ripped off. A "good enough" cheap solution is what individuals often end up using, to get things going in life with minimal resources or wastage. I hope others benefit from my experience. cheers BMS P.S. Big thanks to Doug White for helping me to get /etc recreated after bits of it got trashed.