From owner-freebsd-fs@FreeBSD.ORG Thu Jul 26 07:29:43 2007 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 2068F16A419; Thu, 26 Jul 2007 07:29:43 +0000 (UTC) (envelope-from dfr@rabson.org) Received: from itchy.rabson.org (unknown [IPv6:2001:618:400::50b1:e8f2]) by mx1.freebsd.org (Postfix) with ESMTP id 8111913C45B; Thu, 26 Jul 2007 07:29:42 +0000 (UTC) (envelope-from dfr@rabson.org) Received: from [80.177.232.250] (herring.rabson.org [80.177.232.250]) by itchy.rabson.org (8.13.3/8.13.3) with ESMTP id l6Q7TXR4016034; Thu, 26 Jul 2007 08:29:33 +0100 (BST) (envelope-from dfr@rabson.org) From: Doug Rabson To: Mark Powell In-Reply-To: <20070726075607.W68220@rust.salford.ac.uk> References: <20070725174715.9F47E5B3B@mail.bitblocks.com> <1185389856.3698.11.camel@herring.rabson.org> <20070726075607.W68220@rust.salford.ac.uk> Content-Type: text/plain Date: Thu, 26 Jul 2007 08:29:33 +0100 Message-Id: <1185434973.3698.18.camel@herring.rabson.org> Mime-Version: 1.0 X-Mailer: Evolution 2.10.2 FreeBSD GNOME Team Port Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-1.4 required=5.0 tests=ALL_TRUSTED autolearn=failed version=3.1.0 X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on itchy.rabson.org X-Virus-Scanned: ClamAV 0.87.1/3775/Thu Jul 26 06:56:02 2007 on itchy.rabson.org X-Virus-Status: Clean Cc: freebsd-fs@freebsd.org, Pawel Jakub Dawidek Subject: Re: ZfS & GEOM with many odd drive sizes X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 26 Jul 2007 07:29:43 -0000 On Thu, 2007-07-26 at 07:59 +0100, Mark Powell wrote: > On Wed, 25 Jul 2007, Doug Rabson wrote: > > On Wed, 2007-07-25 at 10:47 -0700, Bakul Shah wrote: > >> Does it really do this? As I understood it, only one of the > >> disks in a mirror will be read for a given block. If the > >> checksum fails, the same block from the other disk is read > >> and checksummed. If all the disks in a mirror are read for > >> every block, ZFS read performance would get somewhat worse > >> instead of linear scaling up with more disks in a mirror. In > >> order to monitor data on both disks one would need to > >> periodically run "zpool scrub", no? But that is not > >> *continuous* monitoring of the two sides. > > > > This is of course correct. I should have said "continuously checks the > > data which you are actually looking at on a regular basis". The > > consistency check is via the block checksum (not comparing the date from > > the two sides of the mirror). > > ACcording to this: > > http://www.opensolaris.org/jive/thread.jspa?threadID=23093&tstart=0 > > RAID-Z has to read every drive to be able to checksum a block. > Isn't this the reason why RAID-Z random reads are so slow and also the > reason the pre-fetcher exists to speed up sequential reads? > Cheers. When its reading, RAID-Z only has to read the blocks which contain data - the parity block is only read if either the vdev is in degraded mode after a drive failure or one (two for RAID-Z2) of the data block reads fails. For pools which contain a single RAID-Z or RAID-Z2 group, this is probably a performance issue. Larger pools containing multiple RAID-Z groups can spread the load to improve this.