Date: Wed, 7 Jan 2004 10:47:48 -0500 (EST) From: Robert Watson <rwatson@FreeBSD.org> To: "Greg 'groggy' Lehey" <grog@FreeBSD.org> Cc: FreeBSD Architecture Mailing List <arch@FreeBSD.org> Subject: Re: Vinum and GEOM: the future Message-ID: <Pine.NEB.3.96L.1040107103242.6394C-100000@fledge.watson.org> In-Reply-To: <20040107062252.GQ7617@wantadilla.lemis.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, 7 Jan 2004, Greg 'groggy' Lehey wrote: > 1. Ditch it. It's served its purpose, and there are better > alternatives. Right now, Vinum remains the most productionable implementation of RAID5 on FreeBSD. Not only that, but even if we do eventually decide to kick out Vinum, we need to provide a sensible migration path to whatever replaces it. > 2. Keep it alongside GEOM, and maintain code such as the swapon() > code to handle both. One of the nice things about the move to GEOM is that we now have a consistent and reliable abstraction for storage devices, with well-defined APIs for querying storage properties, rather than attempting to futz around with "Is it a character device that kind of implements the things we're kind of looking for". The disk(9) API provides a pretty reasonable API for non-GEOM devices to export "I am a storage thing", and experience seems to demonstrate that writing GEOM transforms and services is far easier than digging around to do it from scratch. > 3. Modify it to understand GEOM. Vinum seems to consist of two components: things to make up for a lack of GEOM/devfs, and things that implement volume/RAID services. Gradually trimming the overlap will allow the body of Vinum to implement that which it actually exists to do: volume and RAID services, and seems like a natural direction. > - Online configuration via the vinum utility program. > - Automatic error detection and recovery where possible. > - State information for each object. This enables Vinum to function > correctly even if some objects are not accessible. > - Persistent configuration. Each Vinum drive stores two copies of the > configuration, so the system can start up automatically. The > configuration includes state information, so any degraded objects > will remain so over a reboot, or even when moved to a new system. > - Support for Vinum root file systems. > - Online rebuild of objects. > > Interestingly, none of these touch GEOM as far as I can see. Am I > missing something? An important goal of GEOM is to allow storage transform authors to have to deal with less paperwork by providing reasonable abstractions. If half of the paperwork evaporates from Vinum, it will be a lot easier to do these things -- for example, you get decent notification of disk arrival/removal so that you can automatically configure, it provides a framework to allow interlocking pieces to cooperate, and a more well defined mechanism to pass requests up and down the stack. Another benefit is that you get Vinum's hands out of the internals of device management, which should improve maintainability and reduce complexity. > Based on this understanding, my intentions for Vinum currently don't go > beyond replacing the following: > > - Replace the objects volume, plex and subdisk with a corresponding > geom. I expect this to enable a more arbitrary means of joining > together the objects, but that's about all. > - Replace the ioctls with gctl_s. This seems to be more cosmetic than > functional, though also a good idea. > > This will certainly be worthwhile, but somehow I was expecting more. > Can anybody suggest other things that could be changed with benefit? I think there's a spectrum of possibilities you can explore, and that it offers a lot of choices. The most obvious first step is to have Vinum export its storage units using the disk(9) API, which will permit GEOM consumers to attach to those devices as "disks". This will get swap up and running again with what I hope will be little difficulty, and basically put Vinum in the same situation disk devices currently sit. disk(9) allows you to say "Hi, I'm a disk, and I implement the following methods, and have the following properties". The one caution is to be careful about generating cycles: i.e., only export volumes, not the bits that make up volumes. You would continue to use character devices for things like the Vinum control node. A second phase involves an actual "GEOMification", in which modify Vinum to consume and produce GEOM instances. I.e., you turn Vinum into one large GEOM class, using GEOM to discover and access storage objects, and using GEOM to expose new storage objects, and use GEOM's stage engine and bio management. As I mentioned, this will strip a lot of the "paperwork" from Vinum, and result in Vinum no longer directly producing or consuming character devices for storage I/O. Note that in this stage, one of the things you can do is move to using GEOM ctl operations to manage Vinum, but that's not obligatory: you could still maintain the use of a character device for control ioctls. A third, and optional stage, would be to then decompose Vinum into its logical components, creating GEOM classes for each of those components. This will be a lot more work, but I think would be well worth it. However, it will take a fair amount of time, so I think that this makes sense only after performing one of the above steps as an interim stage. My recommendation would be to begin by simply attacking the disk(9) issue. Chances are, the changes will be small -- avoiding cycles might fall out naturally, or it might require a little tweaking. Once it exports disk(9), you're at a point where you can pause for breath and take on the larger tasks. Robert N M Watson FreeBSD Core Team, TrustedBSD Projects robert@fledge.watson.org Senior Research Scientist, McAfee Research
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.NEB.3.96L.1040107103242.6394C-100000>