Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 27 Sep 1998 20:59:02 +0000 (GMT)
From:      Terry Lambert <tlambert@primenet.com>
To:        phk@critter.freebsd.dk (Poul-Henning Kamp)
Cc:        gibbs@narnia.plutotech.com, current@FreeBSD.ORG
Subject:   Re: Current is Really Broken(tm)
Message-ID:  <199809272059.NAA29095@usr05.primenet.com>
In-Reply-To: <8655.906831168@critter.freebsd.dk> from "Poul-Henning Kamp" at Sep 26, 98 07:32:48 pm

next in thread | previous in thread | raw e-mail | index | archive | help
> There are two basic ways to skin that cat:
> 
> A) "The kernel knows"
> 
> This is what Julian sort of implemented in SLICE.  It is the "quick"
> way but not the easy way.  The trouble is that you need to read
> diskblocks from some kind of thead or event handler, examine their
> contents, configure the right drivers and so on.  That sounds easy,
> but is hairy.

Right.  The trivial way to fix this is to queue device arrivals by
descriptor for non-interrupt context processing by a kernel thread.
This introduces a descriptor allocation issue.  My personal take
would be for the driver to own the desriptors, either as a result
of a preallocation on the drivers behalf as part of startup before
the device probes, or, more ugly, as simple static declarations of
storage (e.g., you will never have more than two devices "arriving"
from an IDE controller driver, since there is a limit of two
devices total inherent in IDE).


> The major problems is with this approach is that you hardcode a
> lot of gunk into the kernel, how does a BSD disklabel look, how
> does a MBR disklabel look, and so on.

Which the kernel has to know anyway, in order to mount things.
The structures are there, and they're not going to go away, no
matter how ugly their being there is deemed.  Well, until we
support discardable ELF section tags for unused kernel components.


> Next you need to figure out how the kernel will discover that you
> muck about with a disklabel/MBR something else.  And things go
> rapidly down-hill from there.


Actually, I disagreed with the way Julian did this, but his
approach *did* work, using notifications.

The way I argued with him that it should be done is via ioctl()'s
that abstract the ideas of:

1)	A list of N extents of physically contiguous blocks

		DOS Partitioning
		DOS Extended Partitioning
		FreeBSD Disklabel
		NetBSD Extended Disklabel
		DEC UNIX Disklabel
		Solaris Disklabel
		SVR4 VTOC
		etc.

2)	A media perfection mechanism of M virtually contiguous
	blocks layered on N physically contiguous blocks, M < N.

		Bad144
		etc.

3)	A list of #1, to be agregated into a single virtually
	contiguous blocks

		CCD
		Vinum
		VXFS volume manager
		etc.

Since the kernel code must know about these structures for both
mounting and for device node creation, it may as well know how
to write a correct default instance of the structures.

This is a pretty trivial abstraction to make.  Julian's argument
against this is "what about future things that don't fit this
model?".  I can't think of any, but if he ever comes up with a
concrete example, well, the next integer after "3" is "4"...


The fdisk program moves to user space, and you can add new and
"improved" SLICE types to your heart's content, and the single
user space "fdisk" program will embrace them all, without needing
to be recompiled.

No more "run fdisk, now run fdisk, now run disklabel, now...".

One simple program that doesn't take a CS degree to run.



> B) "This is magic, we need a daemon"
> 
> If you do it from userland in a daemon, then the interface in the
> kernel becomes much cleaner, there are no "hidden users" which do
> odd things to you disk.
> 
> You can make one generic method that slices a device into several
> devices, and depending on what your daemon finds, it will be
> configured with the data from a disklabel, a MBR or a Mac VTOC for
> that matter.
> 
> On the other hand you get a bootstrap problem, to find / you need
> to run a program (unless you cheat of course, see: "Veritas")
> 
> The issue of changing a layout is now moved from the kernel to a
> daemon in user-space, which needs to sanity-check and implement
> the changes people want to do.  This is a lot less hairy than
> doing it in the kernel.

?

There's really no difference between a kernel thread and a process.

The bootstrap problem is the primary reason why you would put
this stuff in the kernel.

One of the most common Linux "gloats" is that they can be booted
off a DOS Extended Partition.


> The second most tricky problem is open/read/write locking: can I,
> considering what else is open, open this device for read/write ?

I think this problem boils down to "am I going to let user space
programs write things I recognize as cutting the disk up using
user space programs?".

I think the answer to this question which best simplifies the
issue is "No.".  The longer answer is "No.  Use an ioctl() instead.".

This lets people put experimental crap in user space (or even in
kernel space, using an LKM to load a SLICE type management module),
and keeps people from blowing away their DOS parition table and
thus losing the FreeBSD disklabel in partition 2, etc..

You could add an ioctl() that says to a physical device (where
no FS was mounted on an inferior SLICE) "Elvis has left the
building".  The top level device node would remain, but all
inferior device nodes would deregister.  I would suggest naming
the manifest constant to do this:

	SLICE_MY_NOSE_OFF_TO_SPITE_MY_FACE

With the inverse call (making the device "arrive" again) being:

	GLUE_MY_NOSE_BACK_ON_I_WAS_AIMING_AT_MY_FOOT

8-).


					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199809272059.NAA29095>