Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 27 Sep 1998 17:56:55 +0930
From:      Greg Lehey <grog@lemis.com>
To:        Poul-Henning Kamp <phk@critter.freebsd.dk>, "Justin T. Gibbs" <gibbs@narnia.plutotech.com>
Cc:        current@FreeBSD.ORG
Subject:   Slice implementation (was: Current is Really Broken(tm))
Message-ID:  <19980927175655.P20205@freebie.lemis.com>
In-Reply-To: <8655.906831168@critter.freebsd.dk>; from Poul-Henning Kamp on Sat, Sep 26, 1998 at 07:32:48PM %2B0200
References:  <199809261652.KAA05259@narnia.plutotech.com> <8655.906831168@critter.freebsd.dk>

next in thread | previous in thread | raw e-mail | index | archive | help
On Saturday, 26 September 1998 at 19:32:48 +0200, Poul-Henning Kamp wrote:
> In message <199809261652.KAA05259@narnia.plutotech.com>, "Justin T. Gibbs" writes:
>> In article <7171.906791511@critter.freebsd.dk> you wrote:
>>>>> This is just the idea that is realized by SLICE.
>>>>
>>>> This is exactly the opposite of what is realized by SLICE.  SLICE
>>>> does the "mount" at a deep level of the drivers (in an interrupt
>>>> handler).
>>>
>>> ...Which was one of the most wrong things about it.
>>
>> Whatever replaces it must be able to be notified of insertion and
>> removal events from an interrupt context.  It can simply queue the
>> notification up to be processed by a process or thread, but subsystems
>> like CAM which process command completions from an SWI need to be
>> able to perform notifications from low level contexts.
>
> <ARCHITECTURE>
>
> For any SLICE/GEOMETRY implementation, the discovery and instantiation
> of the network of handlers and devices is the most tricky part,
> no doubt about that.
>
> There are two basic ways to skin that cat:
>
> A) "The kernel knows"
>
> This is what Julian sort of implemented in SLICE.  It is the "quick"
> way but not the easy way.

Well, OK, but it's done now.  

> The major problems is with this approach is that you hardcode a
> lot of gunk into the kernel, how does a BSD disklabel look, how
> does a MBR disklabel look, and so on.

Well, yes.  We also hard code information on lots of other objects
(binary file formats, file system access methods, hardware
characteristics, etc.).  Sure, it would be nice not to have to depend
on them, but I don't see any basic flaw in having a disk driver which
understands the disk layout.

> B) "This is magic, we need a daemon"
>
> If you do it from userland in a daemon, then the interface in the
> kernel becomes much cleaner, there are no "hidden users" which do
> odd things to you disk.

As you mentioned in the snipped text, Veritas uses (too many) dæmons.
That's not necessarily a bad idea in moderation, but it is relatively
inefficient, since it requires additional context switches.  I'm
probably prejudiced because I've seen too many people kill Veritas
dæmons and hang their system, but I'd take the attitude that you
should only have (another) dæmon if you can't avoid it.  That means "I
must have process context".  I can't see anything in this discussion
which requires process context.

> The issue of changing a layout is now moved from the kernel to a
> daemon in user-space, which needs to sanity-check and implement
> the changes people want to do.  This is a lot less hairy than
> doing it in the kernel.

I considered these options when writing Vinum, which is hairier than I
want it to be.  I decided against both of these alternatives, and for
a third one: put it in an LKM.  It's not going to work completely,
because I need process context for I/O error recovery, but that's
about the only thing.  

> The second most tricky problem is open/read/write locking: can I,
> considering what else is open, open this device for read/write ?
>
> Clearly most current filesystems would kindly but firmly insist
> that nobody else writes to their partition while the have it
> mounted.  There are on the other hand filesystems which legitimately
> do allow this.
>
> Consequently opens can be made in one of several ways:
>
> 	"read only, don't care about other users"
> 	"read/write, don't care about other users"
> 	"read only, nobody else can write"
> 	"read/write, nobody else can write"
>
> This needs to be propageted all the way down (and possibly up) through
> the network of instances/devices, for approval before it can succeed.

Tandem solved this problem decades ago: when you open a file, you
specify an exclusion parameter, which can be "shared" (no exclusion),
"protected" (anybody can read, only I can write), or "exclusive"
(nobody else can access).  There's also a parameter which specifies to
wait until the file is available for access in the desired mode.  This
works very well.  Applying the solution to UNIX is less a question of
the implementation than of implementing it in a backwards-compatible
manner.

Greg
--
See complete headers for address, home page and phone numbers
finger grog@lemis.com for PGP public key

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?19980927175655.P20205>