Date: Sun, 27 Sep 1998 20:59:02 +0000 (GMT) From: Terry Lambert <tlambert@primenet.com> To: phk@critter.freebsd.dk (Poul-Henning Kamp) Cc: gibbs@narnia.plutotech.com, current@FreeBSD.ORG Subject: Re: Current is Really Broken(tm) Message-ID: <199809272059.NAA29095@usr05.primenet.com> In-Reply-To: <8655.906831168@critter.freebsd.dk> from "Poul-Henning Kamp" at Sep 26, 98 07:32:48 pm
next in thread | previous in thread | raw e-mail | index | archive | help
> There are two basic ways to skin that cat: > > A) "The kernel knows" > > This is what Julian sort of implemented in SLICE. It is the "quick" > way but not the easy way. The trouble is that you need to read > diskblocks from some kind of thead or event handler, examine their > contents, configure the right drivers and so on. That sounds easy, > but is hairy. Right. The trivial way to fix this is to queue device arrivals by descriptor for non-interrupt context processing by a kernel thread. This introduces a descriptor allocation issue. My personal take would be for the driver to own the desriptors, either as a result of a preallocation on the drivers behalf as part of startup before the device probes, or, more ugly, as simple static declarations of storage (e.g., you will never have more than two devices "arriving" from an IDE controller driver, since there is a limit of two devices total inherent in IDE). > The major problems is with this approach is that you hardcode a > lot of gunk into the kernel, how does a BSD disklabel look, how > does a MBR disklabel look, and so on. Which the kernel has to know anyway, in order to mount things. The structures are there, and they're not going to go away, no matter how ugly their being there is deemed. Well, until we support discardable ELF section tags for unused kernel components. > Next you need to figure out how the kernel will discover that you > muck about with a disklabel/MBR something else. And things go > rapidly down-hill from there. Actually, I disagreed with the way Julian did this, but his approach *did* work, using notifications. The way I argued with him that it should be done is via ioctl()'s that abstract the ideas of: 1) A list of N extents of physically contiguous blocks DOS Partitioning DOS Extended Partitioning FreeBSD Disklabel NetBSD Extended Disklabel DEC UNIX Disklabel Solaris Disklabel SVR4 VTOC etc. 2) A media perfection mechanism of M virtually contiguous blocks layered on N physically contiguous blocks, M < N. Bad144 etc. 3) A list of #1, to be agregated into a single virtually contiguous blocks CCD Vinum VXFS volume manager etc. Since the kernel code must know about these structures for both mounting and for device node creation, it may as well know how to write a correct default instance of the structures. This is a pretty trivial abstraction to make. Julian's argument against this is "what about future things that don't fit this model?". I can't think of any, but if he ever comes up with a concrete example, well, the next integer after "3" is "4"... The fdisk program moves to user space, and you can add new and "improved" SLICE types to your heart's content, and the single user space "fdisk" program will embrace them all, without needing to be recompiled. No more "run fdisk, now run fdisk, now run disklabel, now...". One simple program that doesn't take a CS degree to run. > B) "This is magic, we need a daemon" > > If you do it from userland in a daemon, then the interface in the > kernel becomes much cleaner, there are no "hidden users" which do > odd things to you disk. > > You can make one generic method that slices a device into several > devices, and depending on what your daemon finds, it will be > configured with the data from a disklabel, a MBR or a Mac VTOC for > that matter. > > On the other hand you get a bootstrap problem, to find / you need > to run a program (unless you cheat of course, see: "Veritas") > > The issue of changing a layout is now moved from the kernel to a > daemon in user-space, which needs to sanity-check and implement > the changes people want to do. This is a lot less hairy than > doing it in the kernel. ? There's really no difference between a kernel thread and a process. The bootstrap problem is the primary reason why you would put this stuff in the kernel. One of the most common Linux "gloats" is that they can be booted off a DOS Extended Partition. > The second most tricky problem is open/read/write locking: can I, > considering what else is open, open this device for read/write ? I think this problem boils down to "am I going to let user space programs write things I recognize as cutting the disk up using user space programs?". I think the answer to this question which best simplifies the issue is "No.". The longer answer is "No. Use an ioctl() instead.". This lets people put experimental crap in user space (or even in kernel space, using an LKM to load a SLICE type management module), and keeps people from blowing away their DOS parition table and thus losing the FreeBSD disklabel in partition 2, etc.. You could add an ioctl() that says to a physical device (where no FS was mounted on an inferior SLICE) "Elvis has left the building". The top level device node would remain, but all inferior device nodes would deregister. I would suggest naming the manifest constant to do this: SLICE_MY_NOSE_OFF_TO_SPITE_MY_FACE With the inverse call (making the device "arrive" again) being: GLUE_MY_NOSE_BACK_ON_I_WAS_AIMING_AT_MY_FOOT 8-). Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-current" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199809272059.NAA29095>