Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 25 Jul 2007 12:01:53 +0100
From:      Doug Rabson <dfr@rabson.org>
To:        Mark Powell <M.S.Powell@salford.ac.uk>
Cc:        freebsd-fs@freebsd.org
Subject:   Re: ZfS & GEOM with many odd drive sizes
Message-ID:  <3A5D89E1-A7B1-4B10-ADB8-F58332306691@rabson.org>
In-Reply-To: <20070725103746.N57231@rust.salford.ac.uk>
References:  <20070719102302.R1534@rust.salford.ac.uk> <20070719135510.GE1194@garage.freebsd.pl> <20070719181313.G4923@rust.salford.ac.uk> <20070721065204.GA2044@garage.freebsd.pl> <20070725095723.T57231@rust.salford.ac.uk> <1185355848.3698.7.camel@herring.rabson.org> <20070725103746.N57231@rust.salford.ac.uk>

next in thread | previous in thread | raw e-mail | index | archive | help

On 25 Jul 2007, at 11:13, Mark Powell wrote:

> On Wed, 25 Jul 2007, Doug Rabson wrote:
>
>> I'm not really sure why you are using gmirror, gconcat or gstripe at
>> all. Surely it would be easier to let ZFS manage the mirroring and
>> concatentation. If you do that, ZFS can use its checksums to  
>> continually
>> monitor the two sides of your mirrors for consistency and will be  
>> able
>> to notice as early as possible when one of the drives goes flakey.  
>> For
>> concats, ZFS will also spread redundant copies of metadata (and  
>> regular
>> data if you use 'zfs set copies=<N>') across the disks in the  
>> compat. If
>> you have to replace one half of a mirror, ZFS has enough  
>> information to
>> know exactly which blocks needs to be copied to the new drive  
>> which can
>> make recovery much quicker.
>
> gmirror is only going to used for the ufs /boot parition and block  
> device swap. (I'll ignore the smallish space used by that below.)

Just to muddy the waters a little - I'm working on ZFS native boot  
code at the moment. It probably won't ship with 7.0 but should be  
available shortly after.

>   I thought gstripe was a solution cos I mentioned in the original  
> post that I have the following drives to play with; 1x400GB,  
> 3x250GB, 3x200GB.
>   If I make a straight zpool with all those drives I get a total  
> usable 7x200GB raidz with only an effective 6x200GB=1200GB of  
> usable storage. Also a 7 device raidz cries out for being a raidz2?  
> That's a further 200GB of storage lost.
>   My original plan was (because of the largest drive being a single  
> 400GB) was to gconcat (now to gstripe) the smaller drives into 3  
> pairs of 250GB+200GB, making three new 450GB devices. This would  
> make a zpool of 4 devices i.e. 1x400GB+3x450GB giving effective  
> storage of 1200GB. Yes, it's the same as above (as long as raidz2  
> is not used there), but I was thinking about future expansion...
>   The advantge this approach seems to give is that when drives fail  
> each device (which is either a single drive or a gstripe pair) can  
> be replaced with a modern larger drive (500GB or 750GB depending on  
> what's economical at the time).
>   Once that replacement has been performed only 4 times, the zpool  
> will increase in size (actually it will increase straight away by  
> 4x50GB total if the 400GB drive fails 1st).
>   In addition, once a couple of drives in a pair have failed and  
> are replaced by a single large drive, there will also be smaller  
> 250GB or 200GB drives spare which can be further added to the zpool  
> as a zfs mirror.
>   The alternative of using a zpool of 7 individual drives means  
> that I need to replace many more drives to actually see an increase  
> in zpool size.
>   Yes, there a large number of combinations here, but it seems that  
> the zpool will increase in size sooner this way?
>   I believe my reasoning is correct here? Let me know if your  
> experience would suggest otherwise.
>   Many thanks.
>

Your reasoning sounds fine now that I have the bigger picture in my  
head. I don't have a lot of experience here - for my ZFS testing, I  
just bought a couple of cheap 300GB drives which I'm using as a  
simple mirror. From what I have read, mirrors and raidz2 are roughly  
equivalent in 'mean time to data loss' terms with raidz1 quite a bit  
less safe due to the extra vulnerability window between a drive  
failure and replacement.







Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3A5D89E1-A7B1-4B10-ADB8-F58332306691>