From owner-freebsd-fs@FreeBSD.ORG  Thu Jul 26 07:29:43 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 2068F16A419;
	Thu, 26 Jul 2007 07:29:43 +0000 (UTC) (envelope-from dfr@rabson.org)
Received: from itchy.rabson.org (unknown [IPv6:2001:618:400::50b1:e8f2])
	by mx1.freebsd.org (Postfix) with ESMTP id 8111913C45B;
	Thu, 26 Jul 2007 07:29:42 +0000 (UTC) (envelope-from dfr@rabson.org)
Received: from [80.177.232.250] (herring.rabson.org [80.177.232.250])
	by itchy.rabson.org (8.13.3/8.13.3) with ESMTP id l6Q7TXR4016034;
	Thu, 26 Jul 2007 08:29:33 +0100 (BST) (envelope-from dfr@rabson.org)
From: Doug Rabson <dfr@rabson.org>
To: Mark Powell <M.S.Powell@salford.ac.uk>
In-Reply-To: <20070726075607.W68220@rust.salford.ac.uk>
References: <20070725174715.9F47E5B3B@mail.bitblocks.com>
	<1185389856.3698.11.camel@herring.rabson.org>
	<20070726075607.W68220@rust.salford.ac.uk>
Content-Type: text/plain
Date: Thu, 26 Jul 2007 08:29:33 +0100
Message-Id: <1185434973.3698.18.camel@herring.rabson.org>
Mime-Version: 1.0
X-Mailer: Evolution 2.10.2 FreeBSD GNOME Team Port 
Content-Transfer-Encoding: 7bit
X-Spam-Status: No, score=-1.4 required=5.0 tests=ALL_TRUSTED autolearn=failed 
	version=3.1.0
X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on itchy.rabson.org
X-Virus-Scanned: ClamAV 0.87.1/3775/Thu Jul 26 06:56:02 2007 on
	itchy.rabson.org
X-Virus-Status: Clean
Cc: freebsd-fs@freebsd.org, Pawel Jakub Dawidek <pjd@freebsd.org>
Subject: Re: ZfS & GEOM with many odd drive sizes
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 26 Jul 2007 07:29:43 -0000

On Thu, 2007-07-26 at 07:59 +0100, Mark Powell wrote:
> On Wed, 25 Jul 2007, Doug Rabson wrote:
> > On Wed, 2007-07-25 at 10:47 -0700, Bakul Shah wrote:
> >> Does it really do this?  As I understood it, only one of the
> >> disks in a mirror will be read for a given block.  If the
> >> checksum fails, the same block from the other disk is read
> >> and checksummed.  If all the disks in a mirror are read for
> >> every block, ZFS read performance would get somewhat worse
> >> instead of linear scaling up with more disks in a mirror.  In
> >> order to monitor data on both disks one would need to
> >> periodically run "zpool scrub", no?  But that is not
> >> *continuous* monitoring of the two sides.
> >
> > This is of course correct. I should have said "continuously checks the
> > data which you are actually looking at on a regular basis". The
> > consistency check is via the block checksum (not comparing the date from
> > the two sides of the mirror).
> 
> ACcording to this:
> 
> http://www.opensolaris.org/jive/thread.jspa?threadID=23093&tstart=0
> 
> RAID-Z has to read every drive to be able to checksum a block.
>    Isn't this the reason why RAID-Z random reads are so slow and also the 
> reason the pre-fetcher exists to speed up sequential reads?
>    Cheers.

When its reading, RAID-Z only has to read the blocks which contain data
- the parity block is only read if either the vdev is in degraded mode
after a drive failure or one (two for RAID-Z2) of the data block reads
fails.

For pools which contain a single RAID-Z or RAID-Z2 group, this is
probably a performance issue. Larger pools containing multiple RAID-Z
groups can spread the load to improve this.