From owner-freebsd-hackers@FreeBSD.ORG Fri Sep 12 16:32:10 2008 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 92E0E1065676 for ; Fri, 12 Sep 2008 16:32:10 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from QMTA02.emeryville.ca.mail.comcast.net (qmta02.emeryville.ca.mail.comcast.net [76.96.30.24]) by mx1.freebsd.org (Postfix) with ESMTP id 7520C8FC13 for ; Fri, 12 Sep 2008 16:32:09 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from OMTA05.emeryville.ca.mail.comcast.net ([76.96.30.43]) by QMTA02.emeryville.ca.mail.comcast.net with comcast id DlgF1a0060vp7WLA2sY8xH; Fri, 12 Sep 2008 16:32:08 +0000 Received: from koitsu.dyndns.org ([67.180.253.227]) by OMTA05.emeryville.ca.mail.comcast.net with comcast id DsY71a00b4v8bD78RsY8SG; Fri, 12 Sep 2008 16:32:08 +0000 X-Authority-Analysis: v=1.0 c=1 a=FEcCtSrf6_wA:10 a=SSZ9KyxJ8eYA:10 a=QycZ5dHgAAAA:8 a=A83LeEqSZlcB8MI0kgwA:9 a=VikVsTUK0H0in_DGzq8A:7 a=lDfpnWc1vJKTIVBhB1BSMSoH6J0A:4 a=EoioJ0NPDVgA:10 a=LY0hPdMaydYA:10 Received: by icarus.home.lan (Postfix, from userid 1000) id 9EC2917B81A; Fri, 12 Sep 2008 09:32:07 -0700 (PDT) Date: Fri, 12 Sep 2008 09:32:07 -0700 From: Jeremy Chadwick To: Zaphod Beeblebrox Message-ID: <20080912163207.GE60094@icarus.home.lan> References: <200809121544.m8CFiRHQ099725@lurza.secnetix.de> <5f67a8c40809120904o49b6e410l5b65a20f5216202@mail.gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <5f67a8c40809120904o49b6e410l5b65a20f5216202@mail.gmail.com> User-Agent: Mutt/1.5.18 (2008-05-17) Cc: freebsd-hackers@freebsd.org, kpielorz_lst@tdx.co.uk Subject: Re: ZFS w/failing drives - any equivalent of Solaris FMA? X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 12 Sep 2008 16:32:10 -0000 On Fri, Sep 12, 2008 at 12:04:27PM -0400, Zaphod Beeblebrox wrote: > On Fri, Sep 12, 2008 at 11:44 AM, Oliver Fromme wrote: > > Did you try "atacontrol detach" to remove the disk from > > the bus? I haven't tried that with ZFS, but gmirror > > automatically detects when a disk has gone away, and > > doesn't try to do anything with it anymore. It certainly > > should not hang the machine. After all, what's the > > purpose of a RAID when you have to reboot upon drive > > failure. ;-) > > To be fair, many "home" users run RAID without the expectation of being able > to hot swap the drives. While RAID can provide high availability, but it > can also provide simple data security. RAID only ensures a very, very tiny part of "data security", and it depends greatly on what RAID implementation you use. No RAID implementation I know of provides against transparent data corruption ("bit-rot"), and many RAID controllers and RAID drivers have bugs that induce corruption (to date, that's (very old ATA) Highpoint chips, nVidia/nForce chips, JMicron or Silicon Image chips -- all of these are used on consumer boards). A big problem is also that end-users *still* think RAID is a replacement for doing backups. :-( > To your point... I suppose you have to reboot at some point after the drive > failure, but my experience has been that the reboot has been under my > control some time after the failure (usually when I have the replacement > drive). For home use, sure. Since most home/consumer systems do not include hot-swappable drive bays, rebooting is required. Although more and more consumer motherboards are offering AHCI -- which is the only reliable way you'll get that capability with SATA. In my case with servers in a co-lo, it's not acceptable. Our systems contain SATA backplanes that support hot-swapping, and it works how it should (yank the disk, replace with a new one) on Linux -- there is no need to do a bunch of hoopla like on FreeBSD. On FreeBSD, with that hoopla, also take the risk of inducing a kernel panic. That risk does not sit well with me, but thankfully I've only been in that situation (replacing a bad disk + using hot-swapping) once -- and it did work. At my home, I have a pseudo-NAS system running FreeBSD. The case is from Supermicro, a mid-tower, and has a SATA backplane that supports hot-swapping. I use ZFS on this system, sporting 3 disks and one (non-ZFS) for boot/OS. But because I'm using ata(4) -- see above. Individuals on -stable and other lists using ZFS have posted their experiences with disk failures. I believe to date I've seen one which worked flawlessly, and the others reporting strange issues with resilvering, or in a couple cases, lost all their zpools permanently. Of course, it's very rare in this day and age for people to mail a mailing list reporting *successes* with something -- people usually only mail if something *fails*. :-) That said, pjd@'s dedication to getting ZFS working reliably on FreeBSD is outstanding. It's a great filesystem replacement, and even the Linux folks are a bit jealous over how simple and painless it is. I can share their jealousy -- I've looked at the LVM docs... never again. > About the only real improvement I'd like to see in this setup is the ability > to spin down idle drives. That would be an ideal setup for the home RAID > array. There is a FreeBSD port which handles this, although such a feature should ideally be part of the ata(4) system (as should TCQ/NCQ and a slew of other things -- some of those are being worked on). -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB |