From owner-freebsd-current  Wed Nov 29 17:11:05 1995
Return-Path: owner-current
Received: (from root@localhost)
          by freefall.freebsd.org (8.6.12/8.6.6) id RAA09668
          for current-outgoing; Wed, 29 Nov 1995 17:11:05 -0800
Received: from phaeton.artisoft.com (phaeton.Artisoft.COM [198.17.250.211])
          by freefall.freebsd.org (8.6.12/8.6.6) with ESMTP id RAA09607
          ; Wed, 29 Nov 1995 17:10:47 -0800
Received: (from terry@localhost) by phaeton.artisoft.com (8.6.11/8.6.9) id SAA29251; Wed, 29 Nov 1995 18:04:53 -0700
From: Terry Lambert <terry@lambert.org>
Message-Id: <199511300104.SAA29251@phaeton.artisoft.com>
Subject: Re: Concatenated Drives ...
To: scrappy@hub.org (Marc G. Fournier)
Date: Wed, 29 Nov 1995 18:04:52 -0700 (MST)
Cc: current@FreeBSD.org, hackers@FreeBSD.org
In-Reply-To: <Pine.BSF.3.91.951129180209.175F-100000@hub.org> from "Marc G. Fournier" at Nov 29, 95 06:04:44 pm
X-Mailer: ELM [version 2.4 PL24]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Content-Length: 4300      
Sender: owner-current@FreeBSD.org
Precedence: bulk

> 	I've seen some discussions go by dealing with the
> ability to make multiple drives look like one big drive, and I'm
> curious as to what is involved in doing so?
> 
> 	Mainly, isn't that what the swap devices are doing already?
> Or is there something extra that needs to be added when combining 
> several drives into one virtual drive?


Ideally (not-so-ideally follows), using devfs:

1)	Allow a block device driver to specify offset + size relative
	to another driver.

2)	Allow drivers to be stacked so that one logical drive can be
	a fragment of another.

3)	Support creation of logical devices exported to the user via
	the devfs /dev name space.

4)	Create a driver that exports a device given a tag mechanism
	on a drive for recognition.


So you have:

	PHY:	wdc0 target 0		start	= 0
	|				size	= <size of drive>
	|				pdev	= /dev/dsk/wdc0
	|				name	= /dev/dsk/wdc0/d0
	v
	LOG:	OnTrack 6.x		start	=64
	|				size	= <size of drive> - 64
	|				pdev	= /dev/dsk/wdc0/d0
	|				name	= /dev/dsk/wdc0/d0 <replaces>
	v
	LOG:	DOS partitioning	start	= 64 + <Partiton 1 start>
	|				size	= <Partition 1 size>
	|				pdev	= /dev/dsk/wdc0/d0
	|				name	= /dev/dsk/wdc0/d0/p1
	|				start	= 64 + <Partiton 2 start>
	|				size	= <Partition 2 size>
	|				pdev	= /dev/dsk/wdc0/d0
	|				name	= /dev/dsk/wdc0/d0/p2
	|				start	= 64 + <Partiton 3 start>
	|				size	= <Partition 3 size>
	|				pdev	= /dev/dsk/wdc0/d0
	|				name	= /dev/dsk/wdc0/d0/p3
	|				start	= 64 + <Partiton 4 start>
	|				size	= <Partition 4 size>
	|				pdev	= /dev/dsk/wdc0/d0
	|				name	= /dev/dsk/wdc0/d0/p4
	V
	LOG:	Extended partitoning, if any
	|
	V
	LOG:	Concatentation driver	start	= 0
					size	= <Adjusted size>
					pdev	= /dev/dsk/ccd
					name	= /dev/dsk/ccd/d0


Concatenation driver, operation:

	<Adjusted partition 1 size> = <Partition 1 size> - <CCD header size>
	<Adjusted partition 3 size> = <Partition 3 size> - <CCD header size>

	<Adjusted size> = <Adjusted partition 1 size> +
			  <Adjusted partition 3 size>

	if <sector> > <Adjusted size>
		[fail operation]

	if <sector> < <Adjusted partition 1 size>
		<real sector> = <sector> + 64 + <Partition 1 start>
	else
		<real sector> = <sector> + 64 + <Partition 3 start>


	<CD header> :==
		<Which CCD drive number of which this device is a member>
		<Member lexical order>
		<Total number of members in volume span set>


Obviously, you can stick in media perfection at any layer, just like
you put in OnTrack management.

Also Obviously, DOS partitioning, OpenFirmware partitioning, DOS extended
partitioning, BSD disk slices, OSF disk slices, and Sun disk slices, etc.,
are all memebrs of the same class of driver and can be layered freely.

A device that overlays a device that does not change sector ordering (ie:
does not do bad sector replacement, etc.) can be collapsed to a single
start/offset reference, making the code no more overhead than the current
partition handling, and less overhead than the disk-slice-on-DOS-partition
handling.  Graphs containing removable media are not collapsed down, only
up.


The full theory of operation is:

You register all physical devices.

For each unclaimed logical or physical device in the system, you ask
each logical driver to claim the device.  A CCD (really, a "volume
spanning driver") will only claim logical devices when the full set is
present.  You repeat this until no more logical devices are added, and
no more are claimed (this last is because you might have a volume
spanning driver layered on another, such as two or more stripe sets,
which would delay the claiming of the final device by N+1 for each
logical device that consisted of multiple logical or phsical devices).

For each file system is asked to claim each logical device.  Claimed
devices are considered "mounted".



Not-so-ideally:

For the current code set, which can't support this, you can write a CCD
driver (or port NetBSD's) with little effort.  It operates by making a
block major/minor dev that refers to multiple physical volume vnodes,
and referencing the device vnodes (which it vn_open's internally).  It
does not protect against access to the underlying file systems.


					Regards,
					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.