Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 11 Jul 2012 11:16:22 -0500
From:      Chris BeHanna <chris@behanna.org>
To:        freebsd-fs@freebsd.org
Subject:   Re: vdev/pool math with combined raidzX vdevs...
Message-ID:  <1120F2CC-BFB2-401F-8114-58F3408DF1EF@behanna.org>
In-Reply-To: <1342020754.79202.YahooMailClassic@web122502.mail.ne1.yahoo.com>
References:  <1342020754.79202.YahooMailClassic@web122502.mail.ne1.yahoo.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Jul 11, 2012, at 10:32 , Jason Usher wrote:

> Since (I think) a lot of raidz3 adoption is due to folks desiring =
"some overkill" as they attempt to overcome the "disks got really big =
but didn't get any faster (for rebuilds)"[1] ... but they are losing =
some of that by combining vdevs in a single pool.
>=20
> Not losing so much that they're back down to the failure rate of a =
single raidz*2* vdev, but they're not at the overkill level they thought =
they were at either.
>=20
> I think that's important, or at least worth noting...
>=20
>=20
> [1] http://storagegaga.com/4tb-disks-the-end-of-raid/

	That, and unrecoverable read errors (UREs) during =
reconstruction, are indeed the problem.  Gibson, et al, have gone on to =
object storage to get around this--RAID is done over the individual =
stored objects, rather than over the volume itself.  If you need to =
reconstruct, you can reconstruct both on-demand and lazily in the =
background (i.e., you start reconstructing the objects in a volume, and =
if a user attempts to access an as-yet-unreconstructed object, that =
object gets inserted at the head of the queue).

	There aren't, however, to my knowledge, any =
good-enough-to-use-at-work-without-hiring-a-pet-kernel-hacker =
object-based file systems available for free[1].  CMU PDL did raidframe, =
but that was a proof-of-concept and had not been bulletproofed and =
optimized (though many of the concepts there found their way into =
Panasas's PanFS).

	In the absence of a ready-to-go (or at least ready-to-assemble) =
object-based solution, ZFS is the next best thing.  You at least can get =
some warning from the parity scrub that objects are corrupted, and can =
have some duplicates lying around to recover.  That said, you're going =
to want to keep your failure domains fairly small, if you can, owing to =
the time-to-reconstruct and the inevitability of UREs[2] when volumes =
get large enough.

--=20
Chris BeHanna
chris@behanna.org

[1] Because it's very, very hard.  Panasas has been at it, full time, =
for more than ten years.  Spinnaker was at it for a long time, too, =
prior to the NetApp acquisition.  There's also Storage Tank and GFS, and =
there was Zambeel, and a few others.

[2] Garth Gibson talks about UREs on page 2:  =
http://gcn.com/articles/2008/07/25/garth-gibson--faster-storage-systems-th=
rough-parallelism.aspx=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1120F2CC-BFB2-401F-8114-58F3408DF1EF>