Date: Tue, 26 Oct 1999 00:56:19 +0000 (GMT) From: Terry Lambert <tlambert@primenet.com> To: nate@mt.sri.com Cc: imp@village.org, arch@freebsd.org Subject: Re: Racing interrupts Message-ID: <199910260056.RAA20467@usr06.primenet.com> In-Reply-To: <199910251646.KAA13773@mt.sri.com> from "Nate Williams" at Oct 25, 99 10:46:53 am
next in thread | previous in thread | raw e-mail | index | archive | help
> > Consider the following situation. There is a driver talking to a > > device. An unmasked interrupt happens and the device is now gone. > > How does the driver know to stop what it is doing and respond to this > > disappearance in a sane way? > > You can do what all the other OS's attempt to do, and not worry about > it. On Win98/Linux, they allow the system to hang if the user ejects > the card w/out properly shutting it down. > > So, if you do a reject on the card and something bad happens, it hangs. > Yes, this isn't the best solution, but I don't believe there is a 'best' > solution, or a good solution. That it happens to work some of the is > good luck, not necessarily good design. A lot of these OSs, Windows especially, provide a utility for managing this. In Windows, there's a "PCCARD" control panel that you can use to "stop" the cards before removing them. In addition, most card vendors who supply drivers dock a little application in the task bar (e.g. the Linksys driver docks a tiny icon, whose only menu entry is "stop the linksys ethernet card". Both of these are kludges for bad up front design, and I think they should be tossed out without further consideration, even as a means of whining at the user to put the card back so it can be "shutdown properly". FreeBSD has no concept of a session manager, which could be used in order to get the whine from the kernel to user space (and perhaps get a response from the user). > In other words, I don't think a hardware solution exists. PHK and I > have talked a 'bit' about this problem, and he's convinced me (in time) > that it not a problem that can be solved completely. IMO, you can't > sufficiently abstract it from the device, and each and every device > driver requires alot of specific code that overly complicates/obfuscates > the drivers in the tree. I disagree. Windows will actually pop up a dialog to complain about a card being ejected while it is active ("started"). The system will not crash as a result of this. Effectively, the driver is "frozen", and the usual recommended pallative is to reinsert the card, and click "Shutdown" (or "cancel" to leave the card running, or "OK" to terminate the card unceremoniously). I think this is really a driver API issue, more than anything else. To quote one of Poul's favorite hobby horses, which I definitely agree with, the driver need notifications for each close, not just the final one, and all drivers need to be fixed to play by this rule. In addition, there is some need for interrupt masking, and a handler abort for an active interrupt currently in the driver. In effect, this means the ability to force the moral equivalent of a longjmp to the interrupt dispatch code. So long as resources are not allocated (a no-no in an interrupt handler), and are properly committed (this is a coding style issue, more than anything else), it should be workable. One could easily imagine that the "longjmp" would result in a special "device departure" entry point being entered. The only remaining issue would be decommiting partial resource commits. As a matter of coding style, this could be implemented either by preventing states other than a base state in the driver (this would be painful for driver writers, but only once), or by using a bitmap to track state over resource commits, so that the "device departure" handler can back them out (e.g. things like "this mbuf is in use", etc.). I think that, in the long run, our drivers could only benefit from this kind of attention, since the same work needs to be done at some point anyway to allow CPU reentrancy into the interrupt handling code, with the interrupt lock changing into an I/O address space lock, and a CPU-commit lock for committing an interrupt for handling by a given CPU in the SMP case (plus whatever masking is agreed upon for shared interrupts, etc.). > > This sort of situation comes up in the pccard code. When someone > > ejects the card, an interrupt fires. If I were to remove the device > > from the tree in the interrupt handler, I can invalidate the softc > > that the driver is still using by freeing it. > > Actually, not always. You'll not always know that the device is gone > when the driver is removed from the tree. It's possible (very probable) > that the device may be in the process of servicing the interrupt *just* > as the driver is removed, so what happens then? The code as written is > trying to read from non-existing ports/memory locations, and can hang > waiting for a result, or loop forever if it gets bad information. > > This is one of many 'races' that can not be solved w/out hardware > support, since there's no way of controlling the removal of the hardware > so that the software can 'clean up' safely. I think that you could handle this via the "longjmp" and resource commit state tracking, described above. It would be nive for PCI hot swap and other reasons to be able to deal with the departure and arrival of (replacement?) hardware automatically, so long as it can be done nicely, and you don't end up introducing warts in the process -- even if you do end up adding some restrictions on driver coding style, and the use of APIs. The overhead of the "longjmp" is practically that of a stack probe: that is, about one stack pointer store operation and a register checkpoint. Is FreeBSD going to have UDI support at some point? It has a "method" of dealing with this issue for PCI hot swap. > > It is my understanding that the pccard hardware still still exist for > > a period of time after the card is ejected (since it takes some period > > of time to move from the pins that caused the interrupt to the > > power/control/data pins being disabled). I don't know if this is > > true, or if true what the period of time is. > > Not true at all. Otherwise, the 'hardware' would have to emulate the > functionality of every card that was once in the slot, and respond in > the same fashion. :( Depending on being able to talk to an unplugged device is pretty much a bad idea, even if you have several milliseconds that you might be able to use to do the job. I don't think that allowing a driver servicing an interrupt at the time the card is unplugged to run to completion based on this theory is a good idea, in any case. You need to abort the interrupt in progress, if any, and back out any partially committed (and incomplete) state transitions on things like buffers owned by the kernel, not the device, etc.. Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199910260056.RAA20467>