From owner-freebsd-fs Fri Feb 13 23:19:00 1998 Return-Path: Received: (from majordom@localhost) by hub.freebsd.org (8.8.8/8.8.8) id XAA11224 for freebsd-fs-outgoing; Fri, 13 Feb 1998 23:19:00 -0800 (PST) (envelope-from owner-freebsd-fs@FreeBSD.ORG) Received: from allegro.lemis.com (allegro.lemis.com [192.109.197.134]) by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id XAA11219 for ; Fri, 13 Feb 1998 23:18:53 -0800 (PST) (envelope-from grog@lemis.com) Received: from freebie.lemis.com (freebie.lemis.com [192.109.197.137]) by allegro.lemis.com (8.8.7/8.8.5) with ESMTP id RAA23416; Sat, 14 Feb 1998 17:48:50 +1030 (CST) Received: (from grog@localhost) by freebie.lemis.com (8.8.8/8.8.7) id RAA03053; Sat, 14 Feb 1998 17:48:45 +1030 (CST) (envelope-from grog) Message-ID: <19980214174844.41352@freebie.lemis.com> Date: Sat, 14 Feb 1998 17:48:44 +1030 From: Greg Lehey To: mvanloon@mindbender.serv.net, michaelh@cet.co.jp Cc: freebsd-fs@FreeBSD.ORG Subject: Re: ccd and ftd References: <226EB39D3C91D111B2F900A0C9054383294C@Lobotomy.HeadCandy.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 0.89i In-Reply-To: <226EB39D3C91D111B2F900A0C9054383294C@Lobotomy.HeadCandy.com>; from mvanloon@mindbender.serv.net on Fri, Feb 13, 1998 at 09:40:19PM -0800 WWW-Home-Page: http://www.lemis.com/~grog Organization: LEMIS, PO Box 460, Echunga SA 5153, Australia Phone: +61-8-8388-8286 Fax: +61-8-8388-8725 Mobile: +61-41-739-7062 Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org On Fri, 13 February 1998 at 21:40:19 -0800, mvanloon@mindbender.serv.net wrote: >> -----Original Message----- >> From: Greg Lehey [SMTP:grog@lemis.com] >> Sent: Friday, February 13, 1998 8:58 PM >> To: Michael Hancock >> Cc: freebsd-fs@FreeBSD.ORG >> Subject: Re: ccd and ftd >> >> On Sat, 14 February 1998 at 13:38:47 +0900, Michael Hancock wrote: >>> On Sat, 14 Feb 1998, Greg Lehey wrote: >>> >>>> On Sat, 14 February 1998 at 10:19:49 +0900, Michael Hancock wrote: >>>>> I'm looking for a fault tolerant solution. I know Greg is working >> on a >>>>> ftd driver. >>>>> >>>>> In the meantime I noticed that ccd supports mirroring. I'd like >> to mirror >>>>> two arrays of 3 to 4 disk each. How well does this work? >>>> >>>> Do you mean more than one mirror? I don't believe that works. >>>> Certainly I can't see how it can from reading the code. >>> >>> The man pages says you can mirror any even number of disks. What >> I'd like >>> to do is this: >>> >>> 2940 - disk1 - disk2 - disk3 - disk4 >>> >>> and mirror/duplex the whole array with another identical array. >>> >>> 2940 - disk5 - disk6 - disk7 - disk8 >>> >>> Interestingly, man ccdconfig talks about CCDF_MIRROR and an >> unimplemented >>> flag CCDF_PARITY. >> >> Maybe we're talking at cross purposes here. The current version of >> ccd supports two mappings: >> >> 1. Concatenation (in other words, making one large space out of >> several smaller ones), which gives it the name. To concatenate, >> you don't use any flags. >> >> 2. Mirroring (redundantly storing the same data on two different >> spaces). To mirror, you specify the flag CCDF_MIRROR. >> >> I haven't looked at ccdconfig(8), but I suspect that if you specify >> more than one pair of disks on a CCDF_MIRROR line, it will concatenate >> pairs of mirrors. >> >> In addition, you can have striping, where the mapping is non-linear. >> Every certain number of bytes (stripe size), the mapping proceeds to >> the next component of the array, thus spreading the load more evenly. >> I have some PostScript documentation of the comparisons under >> development if anybody's interested. >> >> The CCDF_PARITY flag was for Raid 5, but as you say, it hasn't been >> implemented. >> >>> Although, it looks like they have hooks for more fault-tolerant >>> features in ccd I agree with your effort to make a separate >>> fault-tolerant driver and keeping the ccd a light-weight, primarily >>> for striping-only driver. Are you going to be able to share some >>> code with it? >> >> I'm a bit disappointed with CCD. I had originally planned just to >> extend it, but I find it too primitive. It's not just "light >> weight"--it's deficient. It handles faults very badly, and it offers >> multiple possibilities for shooting yourself in the foot. In >> addition, I don't think its performance is better than vinum (mine) is >> going to be, though that remains to be seen. About the only advantage >> is that it is significantly smaller, but I don't expect the typical >> user of either driver to be concerned about another 20 kB of memory. >> >> As a result of these problems, vinum is effectively a rewrite, though >> I've been looking very carefully at CCD for implementation ideas. > > You're writing a better "software-raid" driver? Something like that. > Here's what I want to do -- let me know if you have this in mind. > > I like fault-tolerance, but I don't like to take performance hits for > it. I'd like to take a set of drives and mirror pairs of them. Then, > I'd like to stripe on top of those mirror sets, without parity. > > This would give you the ultimate in performance, especially for reads, > and redundant fault-tolerance, at the same time. Yes, it will be able to do that. Specifically (those of you who know Veritas will recognize this): 1. Each partitions is split up into an arbitary number of subdisks, contiguous sections of disk space. For best performance, you would take one subdisk per volume. 2. The subdisks are combined into plexes. Within a plex, each subdisk represents a unique part of the address space (i.e. there is no redundancy). There is no requirement for the address space of a plex to be contiguous. Plexes can be organized in three ways: sequentially (i.e. each subdisk represents a contiguous part of the plex address space), striped (in other words, the address space is patched together out of fixed-length pieces sequentially allocated from each subdisk), or RAID 5 (like striping, except that in each group of stripes one subdisk is reserved for a parity stripe). I realise that this description isn't the utmost in clarity. If somebody can help me phrase it better, I'd be grateful. 3. Up to 8 plexes can be combined to form a volume. The volume corresponds to a normal disk device. Each plex replicates the same address space (i.e. they provide mirroring). There is a requirement that the logical sum of the plexes provide storage for the complete address space of the volume. Mapping this to your requirements, you'd get the striping by taking striped plexes, and the redundancy from multiple plexes per volume. There are a couple of other advantages to this method: 1. You can move data from one disk to another on line without loss of fault tolerance by dynamically adding a new plex and then removing the old one. 2. You can make consistent backups of a volume by adding a plex, and then detaching it at a specific time. Greg To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message