Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 3 Mar 1998 19:17:55 +1030
From:      Greg Lehey <grog@lemis.com>
To:        Wilko Bulte <wilko@yedi.iaf.nl>
Cc:        sbabkin@dcn.att.com, tlambert@primenet.com, shimon@simon-shapiro.org, jdn@acp.qiv.com, blkirk@float.eli.net, hackers@FreeBSD.ORG
Subject:   Re: SCSI Bus redundancy...
Message-ID:  <19980303191755.14264@freebie.lemis.com>
In-Reply-To: <199803022257.XAA06604@yedi.iaf.nl>; from Wilko Bulte on Mon, Mar 02, 1998 at 11:57:44PM %2B0100
References:  <19980303084608.56831@freebie.lemis.com> <199803022257.XAA06604@yedi.iaf.nl>

next in thread | previous in thread | raw e-mail | index | archive | help
On Mon,  2 March 1998 at 23:57:44 +0100, Wilko Bulte wrote:
> As Greg Lehey wrote...
>> On Mon,  2 March 1998 at 14:23:50 -0500, sbabkin@dcn.att.com wrote:
>>>> ----------
>>>> From: 	Terry Lambert[SMTP:tlambert@primenet.com]
>>>>
>>>>>>> I think Julian's SLICE code has something in that direction.
>>>> DPT
>>>>>>> supports INCREASING the size of a RAID-5 array by adding drives.
>>>>>>
>>>>>> How can that work?
>>>>>
>>>>> Something like
>>>>> 	- read N RAID blocks from K disks
>>>>> 	- compute new checksum for K+1 disks and write as less number
>>>>>         of RAID blocks but each one of bigger size (K+1/K times)
>>>>>       - add empty blocks at the end of RAID in the added space
>>>>
>>>> You would have to remember to grab the blocks to be relocated with
>>>> the same O(n) randomness as their allocation.  8-).
>>>>
>>> Huh ? Probably I've missed something about RAIDs. I've thought
>>> that, for example, RAID block 0 consists of blocks 0 of all
>>> the physical disks. And so on. And I've thought that RAID itself
>>> does not allocate any blocks, the upper level like filesystem or
>>> volume manager does it, RAID just makes chechsuming. Am I wrong again ?
>>
>> That's not the point.  OK, we were talking about RAID 5 here, which
>> also has parity blocks, but the point is that if you add another disk,
>> you're effectively adding another block every n blocks in the file
>> system address space.  It requires some non-trivial data movement to
>> rearrange all the data (more specifically, except for the first n (n =
>> old number of drives) blocks, you must move *everything*, and you must
>> recalculate parity for every stripe.
>>
>> My question ("How can that work?") was based on the misassumption that
>> this would be too much work to be justifiable.
>
> And apart from the work involved to get it implemented: how long would it
> take a RAIDset to get re-organised/enlarged. Reason #1 for doing things like
> this is because you don't want downtime. And I don't want to think about
> some hardware failure (say a disk) halfway during this process. That would
> really result in a dis[k]array ;-)

Obviously there are a number of problems.  But in fact it's not as
difficult as it sounds.  There's a problem with RAID 5 anyway if
there's, say, a power failure during a write.  After bringing it back
up again, you can recognize that there's a parity error, but where?

The question of reorganizing isn't as critical: run an asynchronous
process which updates the array a stripe at a time.  In addition to
the data, let it write a magic number in the entire first sector
following the updated slice.  If the array does go down during the
update, a recovery run can can find this magic number and know where
to restart the reorganization.  Not ideal, but better than nothing.

Vinum offers another alternative: attach a second plex with the same
data, maybe only a few megabytes at a time.  During the time this area
of the volume is being updated, the plex supplies a backup in case of
failure.  When the region is left, the plex is detached and reattached
at the next point in the array.  If anything goes down, the correct
data will be in the auxiliary plex.

Does that make sense?  I'll try to formulate it more clearly if
anybody has difficulty with the concepts.

Greg


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?19980303191755.14264>