From owner-freebsd-fs@FreeBSD.ORG  Wed Jul 11 16:25:43 2012
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 33A5D1065670
	for <freebsd-fs@freebsd.org>; Wed, 11 Jul 2012 16:25:43 +0000 (UTC)
	(envelope-from chris@behanna.org)
Received: from alayta.pair.com (alayta.pair.com [209.68.4.24])
	by mx1.freebsd.org (Postfix) with ESMTP id 0F6EA8FC19
	for <freebsd-fs@freebsd.org>; Wed, 11 Jul 2012 16:25:43 +0000 (UTC)
Received: from tourmalet.ticom-geo.com (unknown [64.132.190.26])
	by alayta.pair.com (Postfix) with ESMTPSA id A4324D9837
	for <freebsd-fs@freebsd.org>; Wed, 11 Jul 2012 12:16:23 -0400 (EDT)
Content-Type: text/plain; charset=iso-8859-1
Mime-Version: 1.0 (Apple Message framework v1278)
From: Chris BeHanna <chris@behanna.org>
In-Reply-To: <1342020754.79202.YahooMailClassic@web122502.mail.ne1.yahoo.com>
Date: Wed, 11 Jul 2012 11:16:22 -0500
Content-Transfer-Encoding: quoted-printable
Message-Id: <1120F2CC-BFB2-401F-8114-58F3408DF1EF@behanna.org>
References: <1342020754.79202.YahooMailClassic@web122502.mail.ne1.yahoo.com>
To: freebsd-fs@freebsd.org
X-Mailer: Apple Mail (2.1278)
Subject: Re: vdev/pool math with combined raidzX vdevs...
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 11 Jul 2012 16:25:43 -0000

On Jul 11, 2012, at 10:32 , Jason Usher wrote:

> Since (I think) a lot of raidz3 adoption is due to folks desiring =
"some overkill" as they attempt to overcome the "disks got really big =
but didn't get any faster (for rebuilds)"[1] ... but they are losing =
some of that by combining vdevs in a single pool.
>=20
> Not losing so much that they're back down to the failure rate of a =
single raidz*2* vdev, but they're not at the overkill level they thought =
they were at either.
>=20
> I think that's important, or at least worth noting...
>=20
>=20
> [1] http://storagegaga.com/4tb-disks-the-end-of-raid/

	That, and unrecoverable read errors (UREs) during =
reconstruction, are indeed the problem.  Gibson, et al, have gone on to =
object storage to get around this--RAID is done over the individual =
stored objects, rather than over the volume itself.  If you need to =
reconstruct, you can reconstruct both on-demand and lazily in the =
background (i.e., you start reconstructing the objects in a volume, and =
if a user attempts to access an as-yet-unreconstructed object, that =
object gets inserted at the head of the queue).

	There aren't, however, to my knowledge, any =
good-enough-to-use-at-work-without-hiring-a-pet-kernel-hacker =
object-based file systems available for free[1].  CMU PDL did raidframe, =
but that was a proof-of-concept and had not been bulletproofed and =
optimized (though many of the concepts there found their way into =
Panasas's PanFS).

	In the absence of a ready-to-go (or at least ready-to-assemble) =
object-based solution, ZFS is the next best thing.  You at least can get =
some warning from the parity scrub that objects are corrupted, and can =
have some duplicates lying around to recover.  That said, you're going =
to want to keep your failure domains fairly small, if you can, owing to =
the time-to-reconstruct and the inevitability of UREs[2] when volumes =
get large enough.

--=20
Chris BeHanna
chris@behanna.org

[1] Because it's very, very hard.  Panasas has been at it, full time, =
for more than ten years.  Spinnaker was at it for a long time, too, =
prior to the NetApp acquisition.  There's also Storage Tank and GFS, and =
there was Zambeel, and a few others.

[2] Garth Gibson talks about UREs on page 2:  =
http://gcn.com/articles/2008/07/25/garth-gibson--faster-storage-systems-th=
rough-parallelism.aspx=