Date: Sun, 01 Mar 2009 18:00:06 -0500 From: Alex Kirk <alex@schnarff.com> To: questions@freebsd.org Subject: RAID Gone Wild - One Array Split Into Two Message-ID: <20090301180006.19402mvtopuv9go4@mail.schnarff.com>
next in thread | raw e-mail | index | archive | help
First off, I realize that this may be more of a lower-level hardware =20 question than is appropriate to ask here, but I'm at a real loss, and =20 have no idea who else to ask...so I apologize in advance if I'm being =20 a pest. That said: I've got a FreeBSD 7.0/stable box that is used as the =20 development server for a live system I administer. It recently crapped =20 out on me (the dev box), and I realized that its power supply had =20 kicked the bucket. After going out and replacing the power supply, it =20 booted right back up, I ssh'd in, and when I ran my first userland =20 command - "w", FWIW - it froze up solid. I got one more SSH session in =20 attempting to figure out WTF was going on before it wouldn't even log =20 me in any more. After a couple of hard reboots, I decided to attach a monitor to it to =20 see what was going on. It turns out that the RAID5 array on the system =20 had really lost its mind - all four devices that were part of the =20 array were listed as being offline, which of course meant that the =20 system could no longer boot (as it was booting off of the RAID). The =20 controller is an integrated Intel Matrix DHC7R, built onto the =20 motherboard. I looked around the web a bit to try to figure out how to fix this, =20 and ran across a couple of forum posts (which I can unfortunately no =20 longer seem to find) suggesting that this particular controller was =20 prone to an issue where hard power-downs would sometimes make the =20 drives go offline, and that I needed to boot from CD to re-initialize =20 them into their previous state. I tried first with an Ubuntu Linux CD =20 I had handy - which promptly freaked out and dropped me into an =20 emergency shell - and then the FreeBSD 7.0 boot-only disc. The latter =20 was a bit more helpful, because I got this diagnostic: ar0: WARNING - parity protection lost, RAID5 array in DEGRADED mode ar0: 715418MB <Intel MatrixRAID RAID5 (stripe 64KB)> status: DEGRADED ar0: disk0 READY using ad4 at ata2-master ar0: disk1 READY using ad8 at ata4-master ar0: disk2 READY using ad6 at ata3-master ar0: disk3 DOWN no device found for this subdisk ar1: 715418MB <Intel MatrixRAID RAID5 (stripe 64KB)> status: BROKEN ar1: disk0 DOWN no device found for this subdisk ar1: disk1 DOWN no device found for this subdisk ar1: disk2 DOWN no device found for this subdisk ar1: disk3 READY using ad10 at ata5-master Now I can see that my problem is that I've somehow got *two* RAID =20 devices, both improperly configured, whereas I'd only had one before. Does anyone have a clue how I can fix this, preferably while retaining =20 my data? I could wipe the box if necessary, but I'd really prefer not =20 to, as that would be a huge pain in the butt. Thanks, Alex Kirk ---------------------------------------------------------------- This message was sent using IMP, the Internet Messaging Program.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20090301180006.19402mvtopuv9go4>