Date: Wed, 29 Mar 95 11:25:07 MST From: terry@cs.weber.edu (Terry Lambert) To: dufault@hda.com (Peter Dufault) Cc: darrenr@vitruvius.arbld.unimelb.edu.au, freebsd-hackers@freefall.cdrom.com Subject: Re: Configuring driver added via LKM Message-ID: <9503291825.AA20071@cs.weber.edu> In-Reply-To: <199503291017.FAA03749@hda.com> from "Peter Dufault" at Mar 29, 95 05:17:46 am
next in thread | previous in thread | raw e-mail | index | archive | help
> No, I disagree. That logic shouldn't be LKM code that isn't present > in a config'd driver, it should be a standard driver entry point > similar to probe and attach. In the case of Win95, this entry point is called by what they call a "volume tracking driver"; basically, something that knows about stuff "going away". The problem with LKM unload is non-trivial. There are several types of unload mechanisms it's desirable to implement. The unload for file systems either needs to imply a forcible unmount or deny the unload. I took Sun's approach when first implementing this, and the unload was allowed to return "EBUSY". This is grossly underpowered for the types of applications that are now cropping up. In reality, you want a rundown mechanism... basically a "schedule for unload" rather than an unload. This could allow a file system (for instance) to run down one of two ways: 1) commit buffers and forcibly unmount, invalidating any descriptors open on the volume in the same way NFS does -- "stale handle", OR 2) prevent further operations on the drive but do not interfere with those in progress; this could take a significant amount of time to complete. The forcible run down (approach #1 above) has a lot of merit for hardware that you can press a button on that means "request eject" instead of "eject by purely mechanical means" (a purely mechanical eject is doomed to fail for file systems anyway, since there is no mechanism to notify the file system that a commit should take place; the only real alternative for ensuring media integrity is to use only synchronus writes on the media). The forcible rundown also has a lot of merit for developers wanting to replace a module with an updated version NOW instead of some time in the future. The problem with this is that there *is* a requirement that the driver know that it is or can be loaded as a module. If we take the system call case, we have to ensure that the system call is not entered when the unload taked place. To do this, we have to track entrance and exit to the call, and only unload the call when the entrancy count is 0. For this to be effective, we have to ensure that the call will not continue to be entered, but since the entry is no isosynchronus, this means that short of rewriting trap.c to be aware of the ability to load calls and taking a hit on non-loaded calls, we have to have a shunt based on a call-global flag that allows the call itself to return ENOSYS to callers while it is still loaded. The run down entry sets this flag to prevent subsequent access. Similarly, since the unload is now based on an event internal to the system call, to wit, the decrementing of the entrancy count to 0, the decrement itself must make two additional compares; one for the 1->0 transition, and one for the rundown flag to allow it to cause the unload to be retriggered. This is safest if done by causing a wakeup of a different context so that the unload is handled not in the code being unloaded prior to the call return (we could only safely do that if the kernel was non-preemptible and non-reentrant, since it would mean running code in an area designated as reallocable for other uses -- at odds with our long term goals). The alternative to this approach is to enter in the system call table the address of a function in the LKM system call load component that calls the loaded system call itself. This includes not only the compares and entrancy tracking that would otherwise be exposed in the driver code, but also causes the addition of function call overhead to each call into the loaded call -- although if we could designate at load time that this module was load-only, the pointers could be fixed up so that the overhead could be avoided. Finally, there is a problem in the system call with signal handling; the current signal implementation in the trap code is to declare a global jump buffer in the process context to be used in case of a system call interrupted by signal. The problem with this is, of course, that an exit via this path violates the single entry/exit criteria that allow us to do the entrancy tracking in the first place. The function wrapper approach would solve this at some high cost, but so would making the jump buffer a stack variable and passing it into the system call itself as an argument; this is actually a superior approach on both counts, since it avoids the jump buffer diddling that would otherwise have to occur in the wrapper function to get the correct call back into the trap level code on signal interrupt. Of course, that's just for hiding the LKM internals from the system calls code itself, as has been proposed by the statement that the driver should not have to have special code for it to be an LKM. 8-). > I would probably (incorrectly but expeditiously in that it isn't > really an isa problem) implement this by changing the definition > of isa_driver to include "goaway(struct isa_device *isdp)", but I > think that the "goaway" entry point in "kern_devconf" is supposed > to do this. > > In isa.c I would add something like: > > isa_install_driver(struct isa_device *isdp, u_int *mp); > isa_remove_driver(struct isa_device *isdp, u_int *mp); > > "isa_install_driver" will pretty much just call config_isa_dev_c. > > "isa_remove_driver" will call the driver goaway entry point, and > if it returns 0, removes the isr if it was specified. The goaway > entry point will stop all activity if it can, deregister itself > from kern_devconf, and so on. At that point you can safely unload > the LKM. It's a little more complicated than that, unless you want to block the process requesting the unload until the unload can be safely completed (as has been shown for system calls). In the case of a device that hooks an interrupt, it is not safe to unhook the interrupt until such time as a detach procedure can be run to guarantee that an interrupt on the hooked interrupt will not fire on the device and have no driver to handle it. What this basically means is that you must have an "action that may result in an interrupt ending event" count -- similar to the entrancy count for system calls. That means that all outstanding requests to (say) a PCMCIA AIC7770 based SCSI controller must have completed. This is, in effect, precisely the same rundown conditions for the system calls entrancy. The added complication here is that the device may go away because it was taken away, or the device must act as if it had been taken away by not causing additional activity by having had pending operations full run down. Bottom line in this case is that additional entry points won't necessarily cut it; what is needed is a callback notification mechanism to indicate run down complete. Since we already know we are going to be dealing with things like PCMCIA and other "hot plug" devices (like SCSI tape drives that are externally attached and me be powered on and off seperately from the machine) this should be a more geberal mechanism than one that would apply only to LKMs. > If you had these facilities you could pretty quickly come up with > a utility that would install a driver's .o file directly without > any LKM glue. That would be nice for testing drivers. Again, I think this would be predicated on wrapping the function entry points with hidden LKM glue at a significant overhead otherwise, unless you make it clear that the driver was a load only driver and would not be unloaded. I think in general that the features that make a driver a good "plug-n-play" citizen will do the same for making it a good LKM citizen; the ability to do both types of rundown is a required feature in either case, and the question of whether you unload an inactive driver or leave it in the kernel is rather moot. The final missing piece is the ability to non-destructively attach the driver while it is loaded; I don't think the attach is clean enough for this, and there is no "resource manger" that will notice the new "plug-n-play" device and do the apropriate callbacks to "plug-n-play" aware drivers to try and get them to notice the new hardware. > What I haven't figured out is how this is supposed to play with > kern_devconf or with the reconfig code already in isa.c supporting > removable devices. After all, this isn't isa specific and I think > that kern_devconf is trying to address these issues. Unfortunately, a lot of this is legacy code; in reality, the whole device issue really wants to be glossed over entirely by allowing the drivers to dynamically create and delete device nodes. In other words, a devfs with an implied mount that you can't get rid of. This little promiscuous wiring into the file system still needs to be done; at the same time, you probably want to murder specfs, but you may want to keep it around anyway to allow static /dev's to be on / partitions for diskless clients running older OSs. The final hook is for volume management so that when a device wants to go away, the file system is forcibly shut down and buffers flushed to allow the device to go away without blowing up the stateful mount on the device. What is really needed is for a group to go through a real architectural overview and planning of what should be done and to get a clean API layering designed. The current approach of incrementally handling what is wanted at the moment, followed by the liberally pouring of glue to fill in the cracks, is probably a mistake. Terry Lambert terry@cs.weber.edu --- Any opinions in this posting are my own and not those of my present or previous employers.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?9503291825.AA20071>