From owner-freebsd-fs@FreeBSD.ORG Wed Jul 25 11:23:15 2007 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D518916A418 for ; Wed, 25 Jul 2007 11:23:15 +0000 (UTC) (envelope-from dfr@rabson.org) Received: from mail.qubesoft.com (gate.qubesoft.com [217.169.36.34]) by mx1.freebsd.org (Postfix) with ESMTP id 646C913C459 for ; Wed, 25 Jul 2007 11:23:15 +0000 (UTC) (envelope-from dfr@rabson.org) Received: from [10.201.19.245] (doug02.dyn.qubesoft.com [10.201.19.245]) by mail.qubesoft.com (8.13.3/8.13.3) with ESMTP id l6PB1u5e002918; Wed, 25 Jul 2007 12:02:00 +0100 (BST) (envelope-from dfr@rabson.org) In-Reply-To: <20070725103746.N57231@rust.salford.ac.uk> References: <20070719102302.R1534@rust.salford.ac.uk> <20070719135510.GE1194@garage.freebsd.pl> <20070719181313.G4923@rust.salford.ac.uk> <20070721065204.GA2044@garage.freebsd.pl> <20070725095723.T57231@rust.salford.ac.uk> <1185355848.3698.7.camel@herring.rabson.org> <20070725103746.N57231@rust.salford.ac.uk> Mime-Version: 1.0 (Apple Message framework v752.2) Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed Message-Id: <3A5D89E1-A7B1-4B10-ADB8-F58332306691@rabson.org> Content-Transfer-Encoding: 7bit From: Doug Rabson Date: Wed, 25 Jul 2007 12:01:53 +0100 To: Mark Powell X-Mailer: Apple Mail (2.752.2) X-Spam-Status: No, score=-2.8 required=5.0 tests=ALL_TRUSTED autolearn=failed version=3.0.4 X-Spam-Checker-Version: SpamAssassin 3.0.4 (2005-06-05) on mail.qubesoft.com X-Virus-Scanned: ClamAV 0.86.2/3762/Wed Jul 25 06:17:29 2007 on mail.qubesoft.com X-Virus-Status: Clean Cc: freebsd-fs@freebsd.org Subject: Re: ZfS & GEOM with many odd drive sizes X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 25 Jul 2007 11:23:15 -0000 On 25 Jul 2007, at 11:13, Mark Powell wrote: > On Wed, 25 Jul 2007, Doug Rabson wrote: > >> I'm not really sure why you are using gmirror, gconcat or gstripe at >> all. Surely it would be easier to let ZFS manage the mirroring and >> concatentation. If you do that, ZFS can use its checksums to >> continually >> monitor the two sides of your mirrors for consistency and will be >> able >> to notice as early as possible when one of the drives goes flakey. >> For >> concats, ZFS will also spread redundant copies of metadata (and >> regular >> data if you use 'zfs set copies=') across the disks in the >> compat. If >> you have to replace one half of a mirror, ZFS has enough >> information to >> know exactly which blocks needs to be copied to the new drive >> which can >> make recovery much quicker. > > gmirror is only going to used for the ufs /boot parition and block > device swap. (I'll ignore the smallish space used by that below.) Just to muddy the waters a little - I'm working on ZFS native boot code at the moment. It probably won't ship with 7.0 but should be available shortly after. > I thought gstripe was a solution cos I mentioned in the original > post that I have the following drives to play with; 1x400GB, > 3x250GB, 3x200GB. > If I make a straight zpool with all those drives I get a total > usable 7x200GB raidz with only an effective 6x200GB=1200GB of > usable storage. Also a 7 device raidz cries out for being a raidz2? > That's a further 200GB of storage lost. > My original plan was (because of the largest drive being a single > 400GB) was to gconcat (now to gstripe) the smaller drives into 3 > pairs of 250GB+200GB, making three new 450GB devices. This would > make a zpool of 4 devices i.e. 1x400GB+3x450GB giving effective > storage of 1200GB. Yes, it's the same as above (as long as raidz2 > is not used there), but I was thinking about future expansion... > The advantge this approach seems to give is that when drives fail > each device (which is either a single drive or a gstripe pair) can > be replaced with a modern larger drive (500GB or 750GB depending on > what's economical at the time). > Once that replacement has been performed only 4 times, the zpool > will increase in size (actually it will increase straight away by > 4x50GB total if the 400GB drive fails 1st). > In addition, once a couple of drives in a pair have failed and > are replaced by a single large drive, there will also be smaller > 250GB or 200GB drives spare which can be further added to the zpool > as a zfs mirror. > The alternative of using a zpool of 7 individual drives means > that I need to replace many more drives to actually see an increase > in zpool size. > Yes, there a large number of combinations here, but it seems that > the zpool will increase in size sooner this way? > I believe my reasoning is correct here? Let me know if your > experience would suggest otherwise. > Many thanks. > Your reasoning sounds fine now that I have the bigger picture in my head. I don't have a lot of experience here - for my ZFS testing, I just bought a couple of cheap 300GB drives which I'm using as a simple mirror. From what I have read, mirrors and raidz2 are roughly equivalent in 'mean time to data loss' terms with raidz1 quite a bit less safe due to the extra vulnerability window between a drive failure and replacement.