Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 7 Jan 2004 10:47:48 -0500 (EST)
From:      Robert Watson <rwatson@FreeBSD.org>
To:        "Greg 'groggy' Lehey" <grog@FreeBSD.org>
Cc:        FreeBSD Architecture Mailing List <arch@FreeBSD.org>
Subject:   Re: Vinum and GEOM: the future
Message-ID:  <Pine.NEB.3.96L.1040107103242.6394C-100000@fledge.watson.org>
In-Reply-To: <20040107062252.GQ7617@wantadilla.lemis.com>

next in thread | previous in thread | raw e-mail | index | archive | help

On Wed, 7 Jan 2004, Greg 'groggy' Lehey wrote:

> 1.  Ditch it.  It's served its purpose, and there are better
>     alternatives.

Right now, Vinum remains the most productionable implementation of RAID5
on FreeBSD.  Not only that, but even if we do eventually decide to kick
out Vinum, we need to provide a sensible migration path to whatever
replaces it.

> 2.  Keep it alongside GEOM, and maintain code such as the swapon()
>     code to handle both.

One of the nice things about the move to GEOM is that we now have a
consistent and reliable abstraction for storage devices, with well-defined
APIs for querying storage properties, rather than attempting to futz
around with "Is it a character device that kind of implements the things
we're kind of looking for".  The disk(9) API provides a pretty reasonable
API for non-GEOM devices to export "I am a storage thing", and experience
seems to demonstrate that writing GEOM transforms and services is far
easier than digging around to do it from scratch. 

> 3.  Modify it to understand GEOM.

Vinum seems to consist of two components: things to make up for a lack of
GEOM/devfs, and things that implement volume/RAID services.  Gradually
trimming the overlap will allow the body of Vinum to implement that which
it actually exists to do:  volume and RAID services, and seems like a
natural direction. 

> - Online configuration via the vinum utility program.
> - Automatic error detection and recovery where possible.
> - State information for each object.  This enables Vinum to function
>   correctly even if some objects are not accessible.
> - Persistent configuration.  Each Vinum drive stores two copies of the
>   configuration, so the system can start up automatically.  The
>   configuration includes state information, so any degraded objects
>   will remain so over a reboot, or even when moved to a new system.
> - Support for Vinum root file systems.
> - Online rebuild of objects.
> 
> Interestingly, none of these touch GEOM as far as I can see.  Am I
> missing something? 

An important goal of GEOM is to allow storage transform authors to have to
deal with less paperwork by providing reasonable abstractions.  If half of
the paperwork evaporates from Vinum, it will be a lot easier to do these
things -- for example, you get decent notification of disk arrival/removal
so that you can automatically configure, it provides a framework to allow
interlocking pieces to cooperate, and a more well defined mechanism to
pass requests up and down the stack.  Another benefit is that you get
Vinum's hands out of the internals of device management, which should
improve maintainability and reduce complexity.

> Based on this understanding, my intentions for Vinum currently don't go
> beyond replacing the following: 
> 
> - Replace the objects volume, plex and subdisk with a corresponding
>   geom.  I expect this to enable a more arbitrary means of joining
>   together the objects, but that's about all.
> - Replace the ioctls with gctl_s.  This seems to be more cosmetic than
>   functional, though also a good idea.
> 
> This will certainly be worthwhile, but somehow I was expecting more. 
> Can anybody suggest other things that could be changed with benefit? 

I think there's a spectrum of possibilities you can explore, and that it
offers a lot of choices.

The most obvious first step is to have Vinum export its storage units
using the disk(9) API, which will permit GEOM consumers to attach to those
devices as "disks".  This will get swap up and running again with what I
hope will be little difficulty, and basically put Vinum in the same
situation disk devices currently sit.  disk(9) allows you to say "Hi, I'm
a disk, and I implement the following methods, and have the following
properties".  The one caution is to be careful about generating cycles: 
i.e., only export volumes, not the bits that make up volumes.  You would
continue to use character devices for things like the Vinum control node.

A second phase involves an actual "GEOMification", in which modify Vinum
to consume and produce GEOM instances.  I.e., you turn Vinum into one
large GEOM class, using GEOM to discover and access storage objects, and
using GEOM to expose new storage objects, and use GEOM's stage engine and
bio management.  As I mentioned, this will strip a lot of the "paperwork" 
from Vinum, and result in Vinum no longer directly producing or consuming
character devices for storage I/O.  Note that in this stage, one of the
things you can do is move to using GEOM ctl operations to manage Vinum,
but that's not obligatory: you could still maintain the use of a character
device for control ioctls. 

A third, and optional stage, would be to then decompose Vinum into its
logical components, creating GEOM classes for each of those components.
This will be a lot more work, but I think would be well worth it. However,
it will take a fair amount of time, so I think that this makes sense only
after performing one of the above steps as an interim stage.

My recommendation would be to begin by simply attacking the disk(9) issue. 
Chances are, the changes will be small -- avoiding cycles might fall out
naturally, or it might require a little tweaking.  Once it exports
disk(9), you're at a point where you can pause for breath and take on the
larger tasks.

Robert N M Watson             FreeBSD Core Team, TrustedBSD Projects
robert@fledge.watson.org      Senior Research Scientist, McAfee Research



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.NEB.3.96L.1040107103242.6394C-100000>