From owner-freebsd-fs@FreeBSD.ORG  Wed Jul 25 11:23:15 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id D518916A418
	for <freebsd-fs@freebsd.org>; Wed, 25 Jul 2007 11:23:15 +0000 (UTC)
	(envelope-from dfr@rabson.org)
Received: from mail.qubesoft.com (gate.qubesoft.com [217.169.36.34])
	by mx1.freebsd.org (Postfix) with ESMTP id 646C913C459
	for <freebsd-fs@freebsd.org>; Wed, 25 Jul 2007 11:23:15 +0000 (UTC)
	(envelope-from dfr@rabson.org)
Received: from [10.201.19.245] (doug02.dyn.qubesoft.com [10.201.19.245])
	by mail.qubesoft.com (8.13.3/8.13.3) with ESMTP id l6PB1u5e002918;
	Wed, 25 Jul 2007 12:02:00 +0100 (BST) (envelope-from dfr@rabson.org)
In-Reply-To: <20070725103746.N57231@rust.salford.ac.uk>
References: <20070719102302.R1534@rust.salford.ac.uk>
	<20070719135510.GE1194@garage.freebsd.pl>
	<20070719181313.G4923@rust.salford.ac.uk>
	<20070721065204.GA2044@garage.freebsd.pl>
	<20070725095723.T57231@rust.salford.ac.uk>
	<1185355848.3698.7.camel@herring.rabson.org>
	<20070725103746.N57231@rust.salford.ac.uk>
Mime-Version: 1.0 (Apple Message framework v752.2)
Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed
Message-Id: <3A5D89E1-A7B1-4B10-ADB8-F58332306691@rabson.org>
Content-Transfer-Encoding: 7bit
From: Doug Rabson <dfr@rabson.org>
Date: Wed, 25 Jul 2007 12:01:53 +0100
To: Mark Powell <M.S.Powell@salford.ac.uk>
X-Mailer: Apple Mail (2.752.2)
X-Spam-Status: No, score=-2.8 required=5.0 tests=ALL_TRUSTED autolearn=failed 
	version=3.0.4
X-Spam-Checker-Version: SpamAssassin 3.0.4 (2005-06-05) on mail.qubesoft.com
X-Virus-Scanned: ClamAV 0.86.2/3762/Wed Jul 25 06:17:29 2007 on
	mail.qubesoft.com
X-Virus-Status: Clean
Cc: freebsd-fs@freebsd.org
Subject: Re: ZfS & GEOM with many odd drive sizes
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 25 Jul 2007 11:23:15 -0000


On 25 Jul 2007, at 11:13, Mark Powell wrote:

> On Wed, 25 Jul 2007, Doug Rabson wrote:
>
>> I'm not really sure why you are using gmirror, gconcat or gstripe at
>> all. Surely it would be easier to let ZFS manage the mirroring and
>> concatentation. If you do that, ZFS can use its checksums to  
>> continually
>> monitor the two sides of your mirrors for consistency and will be  
>> able
>> to notice as early as possible when one of the drives goes flakey.  
>> For
>> concats, ZFS will also spread redundant copies of metadata (and  
>> regular
>> data if you use 'zfs set copies=<N>') across the disks in the  
>> compat. If
>> you have to replace one half of a mirror, ZFS has enough  
>> information to
>> know exactly which blocks needs to be copied to the new drive  
>> which can
>> make recovery much quicker.
>
> gmirror is only going to used for the ufs /boot parition and block  
> device swap. (I'll ignore the smallish space used by that below.)

Just to muddy the waters a little - I'm working on ZFS native boot  
code at the moment. It probably won't ship with 7.0 but should be  
available shortly after.

>   I thought gstripe was a solution cos I mentioned in the original  
> post that I have the following drives to play with; 1x400GB,  
> 3x250GB, 3x200GB.
>   If I make a straight zpool with all those drives I get a total  
> usable 7x200GB raidz with only an effective 6x200GB=1200GB of  
> usable storage. Also a 7 device raidz cries out for being a raidz2?  
> That's a further 200GB of storage lost.
>   My original plan was (because of the largest drive being a single  
> 400GB) was to gconcat (now to gstripe) the smaller drives into 3  
> pairs of 250GB+200GB, making three new 450GB devices. This would  
> make a zpool of 4 devices i.e. 1x400GB+3x450GB giving effective  
> storage of 1200GB. Yes, it's the same as above (as long as raidz2  
> is not used there), but I was thinking about future expansion...
>   The advantge this approach seems to give is that when drives fail  
> each device (which is either a single drive or a gstripe pair) can  
> be replaced with a modern larger drive (500GB or 750GB depending on  
> what's economical at the time).
>   Once that replacement has been performed only 4 times, the zpool  
> will increase in size (actually it will increase straight away by  
> 4x50GB total if the 400GB drive fails 1st).
>   In addition, once a couple of drives in a pair have failed and  
> are replaced by a single large drive, there will also be smaller  
> 250GB or 200GB drives spare which can be further added to the zpool  
> as a zfs mirror.
>   The alternative of using a zpool of 7 individual drives means  
> that I need to replace many more drives to actually see an increase  
> in zpool size.
>   Yes, there a large number of combinations here, but it seems that  
> the zpool will increase in size sooner this way?
>   I believe my reasoning is correct here? Let me know if your  
> experience would suggest otherwise.
>   Many thanks.
>

Your reasoning sounds fine now that I have the bigger picture in my  
head. I don't have a lot of experience here - for my ZFS testing, I  
just bought a couple of cheap 300GB drives which I'm using as a  
simple mirror. From what I have read, mirrors and raidz2 are roughly  
equivalent in 'mean time to data loss' terms with raidz1 quite a bit  
less safe due to the extra vulnerability window between a drive  
failure and replacement.