From owner-freebsd-fs  Fri Feb 13 23:19:00 1998
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Received: (from majordom@localhost)
          by hub.freebsd.org (8.8.8/8.8.8) id XAA11224
          for freebsd-fs-outgoing; Fri, 13 Feb 1998 23:19:00 -0800 (PST)
          (envelope-from owner-freebsd-fs@FreeBSD.ORG)
Received: from allegro.lemis.com (allegro.lemis.com [192.109.197.134])
          by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id XAA11219
          for <freebsd-fs@FreeBSD.ORG>; Fri, 13 Feb 1998 23:18:53 -0800 (PST)
          (envelope-from grog@lemis.com)
Received: from freebie.lemis.com (freebie.lemis.com [192.109.197.137])
	by allegro.lemis.com (8.8.7/8.8.5) with ESMTP id RAA23416;
	Sat, 14 Feb 1998 17:48:50 +1030 (CST)
Received: (from grog@localhost)
          by freebie.lemis.com (8.8.8/8.8.7) id RAA03053;
          Sat, 14 Feb 1998 17:48:45 +1030 (CST)
          (envelope-from grog)
Message-ID: <19980214174844.41352@freebie.lemis.com>
Date: Sat, 14 Feb 1998 17:48:44 +1030
From: Greg Lehey <grog@lemis.com>
To: mvanloon@mindbender.serv.net, michaelh@cet.co.jp
Cc: freebsd-fs@FreeBSD.ORG
Subject: Re: ccd and ftd
References: <226EB39D3C91D111B2F900A0C9054383294C@Lobotomy.HeadCandy.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Mailer: Mutt 0.89i
In-Reply-To: <226EB39D3C91D111B2F900A0C9054383294C@Lobotomy.HeadCandy.com>; from mvanloon@mindbender.serv.net on Fri, Feb 13, 1998 at 09:40:19PM -0800
WWW-Home-Page: http://www.lemis.com/~grog
Organization: LEMIS, PO Box 460, Echunga SA 5153, Australia
Phone: +61-8-8388-8286
Fax: +61-8-8388-8725
Mobile: +61-41-739-7062
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

On Fri, 13 February 1998 at 21:40:19 -0800, mvanloon@mindbender.serv.net wrote:
>> -----Original Message-----
>> From:	Greg Lehey [SMTP:grog@lemis.com]
>> Sent:	Friday, February 13, 1998 8:58 PM
>> To:	Michael Hancock
>> Cc:	freebsd-fs@FreeBSD.ORG
>> Subject:	Re: ccd and ftd
>>
>> On Sat, 14 February 1998 at 13:38:47 +0900, Michael Hancock wrote:
>>> On Sat, 14 Feb 1998, Greg Lehey wrote:
>>>
>>>> On Sat, 14 February 1998 at 10:19:49 +0900, Michael Hancock wrote:
>>>>> I'm looking for a fault tolerant solution.  I know Greg is working
>> on a
>>>>> ftd driver.
>>>>>
>>>>> In the meantime I noticed that ccd supports mirroring.  I'd like
>> to mirror
>>>>> two arrays of 3 to 4 disk each.  How well does this work?
>>>>
>>>> Do you mean more than one mirror?  I don't believe that works.
>>>> Certainly I can't see how it can from reading the code.
>>>
>>> The man pages says you can mirror any even number of disks.  What
>> I'd like
>>> to do is this:
>>>
>>> 2940 - disk1 - disk2 - disk3 - disk4
>>>
>>> 	and mirror/duplex the whole array with another identical array.
>>>
>>> 2940 - disk5 - disk6 - disk7 - disk8
>>>
>>> Interestingly, man ccdconfig talks about CCDF_MIRROR and an
>> unimplemented
>>> flag CCDF_PARITY.
>>
>> Maybe we're talking at cross purposes here.  The current version of
>> ccd supports two mappings:
>>
>> 1.  Concatenation (in other words, making one large space out of
>>     several smaller ones), which gives it the name.  To concatenate,
>>     you don't use any flags.
>>
>> 2.  Mirroring (redundantly storing the same data on two different
>>     spaces).  To mirror, you specify the flag CCDF_MIRROR.
>>
>> I haven't looked at ccdconfig(8), but I suspect that if you specify
>> more than one pair of disks on a CCDF_MIRROR line, it will concatenate
>> pairs of mirrors.
>>
>> In addition, you can have striping, where the mapping is non-linear.
>> Every certain number of bytes (stripe size), the mapping proceeds to
>> the next component of the array, thus spreading the load more evenly.
>> I have some PostScript documentation of the comparisons under
>> development if anybody's interested.
>>
>> The CCDF_PARITY flag was for Raid 5, but as you say, it hasn't been
>> implemented.
>>
>>> Although, it looks like they have hooks for more fault-tolerant
>>> features in ccd I agree with your effort to make a separate
>>> fault-tolerant driver and keeping the ccd a light-weight, primarily
>>> for striping-only driver.  Are you going to be able to share some
>>> code with it?
>>
>> I'm a bit disappointed with CCD.  I had originally planned just to
>> extend it, but I find it too primitive.  It's not just "light
>> weight"--it's deficient.  It handles faults very badly, and it offers
>> multiple possibilities for shooting yourself in the foot.  In
>> addition, I don't think its performance is better than vinum (mine) is
>> going to be, though that remains to be seen.  About the only advantage
>> is that it is significantly smaller, but I don't expect the typical
>> user of either driver to be concerned about another 20 kB of memory.
>>
>> As a result of these problems, vinum is effectively a rewrite, though
>> I've been looking very carefully at CCD for implementation ideas.
>
> You're writing a better "software-raid" driver?

Something like that.

> Here's what I want to do -- let me know if you have this in mind.
>
> I like fault-tolerance, but I don't like to take performance hits for
> it.  I'd like to take a set of drives and mirror pairs of them.  Then,
> I'd like to stripe on top of those mirror sets, without parity.
>
> This would give you the ultimate in performance, especially for reads,
> and redundant fault-tolerance, at the same time.

Yes, it will be able to do that.  Specifically (those of you who know
Veritas will recognize this):

1.  Each partitions is split up into an arbitary number of subdisks,
    contiguous sections of disk space.  For best performance, you
    would take one subdisk per volume.

2.  The subdisks are combined into plexes.  Within a plex, each
    subdisk represents a unique part of the address space (i.e. there
    is no redundancy).  There is no requirement for the address space
    of a plex to be contiguous.

    Plexes can be organized in three ways: sequentially (i.e. each
    subdisk represents a contiguous part of the plex address space),
    striped (in other words, the address space is patched together out
    of fixed-length pieces sequentially allocated from each subdisk),
    or RAID 5 (like striping, except that in each group of stripes one
    subdisk is reserved for a parity stripe).

    I realise that this description isn't the utmost in clarity.  If
    somebody can help me phrase it better, I'd be grateful.

3.  Up to 8 plexes can be combined to form a volume.  The volume
    corresponds to a normal disk device.  Each plex replicates the
    same address space (i.e. they provide mirroring).  There is a
    requirement that the logical sum of the plexes provide storage for
    the complete address space of the volume.

Mapping this to your requirements, you'd get the striping by taking 
striped plexes, and the redundancy from multiple plexes per volume. 

There are a couple of other advantages to this method:

1.  You can move data from one disk to another on line without loss of
    fault tolerance by dynamically adding a new plex and then removing
    the old one.

2.  You can make consistent backups of a volume by adding a plex, and
    then detaching it at a specific time.

Greg


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message