Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 07 Apr 2005 23:13:00 +0200
From:      Poul-Henning Kamp <phk@phk.freebsd.dk>
To:        current@freebsd.org
Subject:   GEOM architecture and the (lack of) need for foot-shooting
Message-ID:  <20331.1112908380@critter.freebsd.dk>

next in thread | raw e-mail | index | archive | help

I can see that almost everybody is short a crucial couple of pieces
of the puzzle so let me try to straighten out some of the many
points which have been fired across on the subject of GEOM and
footshooting.

First of all, there are tools which do not do the right thing.
Amongst these are bsdlabel, fdisk, boot0cfg and sysinstall.  Most
of them do something right but none of them gets everything right.

Yet.

Let me try to explain what should happen using the deletion of a
MBR partition as example:

If the disk has an MBR which defines three partitions and one of
these are open, then the MBR cannot be written to without informing
the GEOM_MBR instance which implements the contents of that MBR.

The correct way to do that is to use the g_ctl() api because what
is needed is an out-of-band mechanism to tell that we want to loose
one of the partitions.

g_ctl() has not been fully implemented in all classes yet, and
therefore what we currently do is open one of the partitions and
issue an ioctl which hits the GEOM_MBR instance.

This worked fine until recently where it was discovered that one
could issue ioctls which did "write like" stuff on a filedescriptor
open only for read.  This is pretty counter to what people exepect
and we fixed it.

The problem with that is that there may not be any partition we can
open for write, they may all be opened by something else (mounted)
and therefore our attempt to open will fail.

That is where things stand today.

(I'll speak at length about the subject of ioctl and in-band/out-of-band
communications at BSDcan.  Be there!)

Now, why havn't you finished GEOM ?  I hear.

Well, many reasons.

For one thing I wanted to see how it panned out in all sorts of
ways before I went any further, it is important to stop up every
so often and see if the direction is still sound.

Second, there was a lot of talk about sysinstallNG at the time and
I thought that would be a great time to revise the entire
userland-edit-disk-layout thing.

Third, I needed a break from it.

Finally, I would give others a chance to join in and help out.  In
a project like this, people tend not to disturb developers who seem
to be on a roll, no matter how much the developer begs for assistance.
The only effective way to get others to join in is to step away and
make space for them.


Now, here is a list of what needs to be done in this general area:


1. Find out which partition format we migrate to instead of BSDlabel
   which runs out of steam around 2TB.  GPT has been proposed but
   seems to be a rather dead end with Itanic sinking fast.

   MBR(ext) is not really a solution, they suffer from the same thing
   as far as I know.

   Somebody needs to make a decision and it is not really a technical
   thing.

2. Implement g_ctl in the various slicers and in their userland tools.
   (geom_bsd, geom_mbr, geom_mbrext, geom_apple, geom_..., bsdlabel,
   fdisk, boot0cfg, sysinstall etc).

   Somebody smart would implement a sensible generic library and a
   semi-standardize set of request to all slicers.  This of course
   would be hard work.

3. Implement orphan methods in our filesystems and teach them about
   media which disappear.

4. Stop thinking in quick hacks and start to think in long term
   architecture.

   Just because you could do it in SystemIII doesn't mean that it was
   the right way to do it, and just because you think a quick fix
   is all that is needed doesn't make it so either.

   GEOM is a bold step in disk-I/O architecture, but Rome wasn't built
   in one day and GEOM won't be built in one release-cycle either.

Anybody who expect me to do all of this singlehandedly can take a peek
here http://people.freebsd.org/%7Epeter/srcsys.window.txt and go stick
their head in a bucket of cold water before telling me I have to work
harder.

-- 
Poul-Henning Kamp       | UNIX since Zilog Zeus 3.20
phk@FreeBSD.ORG         | TCP/IP since RFC 956
FreeBSD committer       | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20331.1112908380>