Date: Fri, 12 Sep 2008 09:32:07 -0700 From: Jeremy Chadwick <koitsu@FreeBSD.org> To: Zaphod Beeblebrox <zbeeble@gmail.com> Cc: freebsd-hackers@freebsd.org, kpielorz_lst@tdx.co.uk Subject: Re: ZFS w/failing drives - any equivalent of Solaris FMA? Message-ID: <20080912163207.GE60094@icarus.home.lan> In-Reply-To: <5f67a8c40809120904o49b6e410l5b65a20f5216202@mail.gmail.com> References: <C984A6E7B1C6657CD8C4F79E@Slim64.dmpriest.net.uk> <200809121544.m8CFiRHQ099725@lurza.secnetix.de> <5f67a8c40809120904o49b6e410l5b65a20f5216202@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, Sep 12, 2008 at 12:04:27PM -0400, Zaphod Beeblebrox wrote: > On Fri, Sep 12, 2008 at 11:44 AM, Oliver Fromme <olli@lurza.secnetix.de>wrote: > > Did you try "atacontrol detach" to remove the disk from > > the bus? I haven't tried that with ZFS, but gmirror > > automatically detects when a disk has gone away, and > > doesn't try to do anything with it anymore. It certainly > > should not hang the machine. After all, what's the > > purpose of a RAID when you have to reboot upon drive > > failure. ;-) > > To be fair, many "home" users run RAID without the expectation of being able > to hot swap the drives. While RAID can provide high availability, but it > can also provide simple data security. RAID only ensures a very, very tiny part of "data security", and it depends greatly on what RAID implementation you use. No RAID implementation I know of provides against transparent data corruption ("bit-rot"), and many RAID controllers and RAID drivers have bugs that induce corruption (to date, that's (very old ATA) Highpoint chips, nVidia/nForce chips, JMicron or Silicon Image chips -- all of these are used on consumer boards). A big problem is also that end-users *still* think RAID is a replacement for doing backups. :-( > To your point... I suppose you have to reboot at some point after the drive > failure, but my experience has been that the reboot has been under my > control some time after the failure (usually when I have the replacement > drive). For home use, sure. Since most home/consumer systems do not include hot-swappable drive bays, rebooting is required. Although more and more consumer motherboards are offering AHCI -- which is the only reliable way you'll get that capability with SATA. In my case with servers in a co-lo, it's not acceptable. Our systems contain SATA backplanes that support hot-swapping, and it works how it should (yank the disk, replace with a new one) on Linux -- there is no need to do a bunch of hoopla like on FreeBSD. On FreeBSD, with that hoopla, also take the risk of inducing a kernel panic. That risk does not sit well with me, but thankfully I've only been in that situation (replacing a bad disk + using hot-swapping) once -- and it did work. At my home, I have a pseudo-NAS system running FreeBSD. The case is from Supermicro, a mid-tower, and has a SATA backplane that supports hot-swapping. I use ZFS on this system, sporting 3 disks and one (non-ZFS) for boot/OS. But because I'm using ata(4) -- see above. Individuals on -stable and other lists using ZFS have posted their experiences with disk failures. I believe to date I've seen one which worked flawlessly, and the others reporting strange issues with resilvering, or in a couple cases, lost all their zpools permanently. Of course, it's very rare in this day and age for people to mail a mailing list reporting *successes* with something -- people usually only mail if something *fails*. :-) That said, pjd@'s dedication to getting ZFS working reliably on FreeBSD is outstanding. It's a great filesystem replacement, and even the Linux folks are a bit jealous over how simple and painless it is. I can share their jealousy -- I've looked at the LVM docs... never again. > About the only real improvement I'd like to see in this setup is the ability > to spin down idle drives. That would be an ideal setup for the home RAID > array. There is a FreeBSD port which handles this, although such a feature should ideally be part of the ata(4) system (as should TCQ/NCQ and a slew of other things -- some of those are being worked on). -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB |
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20080912163207.GE60094>