Date: Wed, 11 Jul 2012 11:16:22 -0500 From: Chris BeHanna <chris@behanna.org> To: freebsd-fs@freebsd.org Subject: Re: vdev/pool math with combined raidzX vdevs... Message-ID: <1120F2CC-BFB2-401F-8114-58F3408DF1EF@behanna.org> In-Reply-To: <1342020754.79202.YahooMailClassic@web122502.mail.ne1.yahoo.com> References: <1342020754.79202.YahooMailClassic@web122502.mail.ne1.yahoo.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Jul 11, 2012, at 10:32 , Jason Usher wrote: > Since (I think) a lot of raidz3 adoption is due to folks desiring = "some overkill" as they attempt to overcome the "disks got really big = but didn't get any faster (for rebuilds)"[1] ... but they are losing = some of that by combining vdevs in a single pool. >=20 > Not losing so much that they're back down to the failure rate of a = single raidz*2* vdev, but they're not at the overkill level they thought = they were at either. >=20 > I think that's important, or at least worth noting... >=20 >=20 > [1] http://storagegaga.com/4tb-disks-the-end-of-raid/ That, and unrecoverable read errors (UREs) during = reconstruction, are indeed the problem. Gibson, et al, have gone on to = object storage to get around this--RAID is done over the individual = stored objects, rather than over the volume itself. If you need to = reconstruct, you can reconstruct both on-demand and lazily in the = background (i.e., you start reconstructing the objects in a volume, and = if a user attempts to access an as-yet-unreconstructed object, that = object gets inserted at the head of the queue). There aren't, however, to my knowledge, any = good-enough-to-use-at-work-without-hiring-a-pet-kernel-hacker = object-based file systems available for free[1]. CMU PDL did raidframe, = but that was a proof-of-concept and had not been bulletproofed and = optimized (though many of the concepts there found their way into = Panasas's PanFS). In the absence of a ready-to-go (or at least ready-to-assemble) = object-based solution, ZFS is the next best thing. You at least can get = some warning from the parity scrub that objects are corrupted, and can = have some duplicates lying around to recover. That said, you're going = to want to keep your failure domains fairly small, if you can, owing to = the time-to-reconstruct and the inevitability of UREs[2] when volumes = get large enough. --=20 Chris BeHanna chris@behanna.org [1] Because it's very, very hard. Panasas has been at it, full time, = for more than ten years. Spinnaker was at it for a long time, too, = prior to the NetApp acquisition. There's also Storage Tank and GFS, and = there was Zambeel, and a few others. [2] Garth Gibson talks about UREs on page 2: = http://gcn.com/articles/2008/07/25/garth-gibson--faster-storage-systems-th= rough-parallelism.aspx=
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1120F2CC-BFB2-401F-8114-58F3408DF1EF>