Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 26 Oct 1999 00:56:19 +0000 (GMT)
From:      Terry Lambert <tlambert@primenet.com>
To:        nate@mt.sri.com
Cc:        imp@village.org, arch@freebsd.org
Subject:   Re: Racing interrupts
Message-ID:  <199910260056.RAA20467@usr06.primenet.com>
In-Reply-To: <199910251646.KAA13773@mt.sri.com> from "Nate Williams" at Oct 25, 99 10:46:53 am

next in thread | previous in thread | raw e-mail | index | archive | help
> > Consider the following situation.  There is a driver talking to a
> > device.  An unmasked interrupt happens and the device is now gone.
> > How does the driver know to stop what it is doing and respond to this
> > disappearance in a sane way?
> 
> You can do what all the other OS's attempt to do, and not worry about
> it.  On Win98/Linux, they allow the system to hang if the user ejects
> the card w/out properly shutting it down.
> 
> So, if you do a reject on the card and something bad happens, it hangs.
> Yes, this isn't the best solution, but I don't believe there is a 'best'
> solution, or a good solution.  That it happens to work some of the is
> good luck, not necessarily good design.

A lot of these OSs, Windows especially, provide a utility for
managing this.  In Windows, there's a "PCCARD" control panel
that you can use to "stop" the cards before removing them.

In addition, most card vendors who supply drivers dock a little
application in the task bar (e.g. the Linksys driver docks a tiny
icon, whose only menu entry is "stop the linksys ethernet card".

Both of these are kludges for bad up front design, and I think
they should be tossed out without further consideration, even as
a means of whining at the user to put the card back so it can be
"shutdown properly".  FreeBSD has no concept of a session manager,
which could be used in order to get the whine from the kernel to
user space (and perhaps get a response from the user).


> In other words, I don't think a hardware solution exists.  PHK and I
> have talked a 'bit' about this problem, and he's convinced me (in time)
> that it not a problem that can be solved completely.  IMO, you can't
> sufficiently abstract it from the device, and each and every device
> driver requires alot of specific code that overly complicates/obfuscates
> the drivers in the tree.

I disagree.  Windows will actually pop up a dialog to complain
about a card being ejected while it is active ("started").  The
system will not crash as a result of this.  Effectively, the driver
is "frozen", and the usual recommended pallative is to reinsert
the card, and click "Shutdown" (or "cancel" to leave the card
running, or "OK" to terminate the card unceremoniously).

I think this is really a driver API issue, more than anything
else.  To quote one of Poul's favorite hobby horses, which I
definitely agree with, the driver need notifications for each
close, not just the final one, and all drivers need to be fixed
to play by this rule.  In addition, there is some need for
interrupt masking, and a handler abort for an active interrupt
currently in the driver.  In effect, this means the ability to
force the moral equivalent of a longjmp to the interrupt
dispatch code.  So long as resources are not allocated (a no-no
in an interrupt handler), and are properly committed (this is a
coding style issue, more than anything else), it should be
workable.

One could easily imagine that the "longjmp" would result in a
special "device departure" entry point being entered.  The only
remaining issue would be decommiting partial resource commits.
As a matter of coding style, this could be implemented either
by preventing states other than a base state in the driver (this
would be painful for driver writers, but only once), or by using
a bitmap to track state over resource commits, so that the
"device departure" handler can back them out (e.g. things like
"this mbuf is in use", etc.).


I think that, in the long run, our drivers could only benefit
from this kind of attention, since the same work needs to be
done at some point anyway to allow CPU reentrancy into the
interrupt handling code, with the interrupt lock changing into
an I/O address space lock, and a CPU-commit lock for committing
an interrupt for handling by a given CPU in the SMP case (plus
whatever masking is agreed upon for shared interrupts, etc.).


> > This sort of situation comes up in the pccard code.  When someone
> > ejects the card, an interrupt fires.  If I were to remove the device
> > from the tree in the interrupt handler, I can invalidate the softc
> > that the driver is still using by freeing it.
> 
> Actually, not always.  You'll not always know that the device is gone
> when the driver is removed from the tree.  It's possible (very probable)
> that the device may be in the process of servicing the interrupt *just*
> as the driver is removed, so what happens then?  The code as written is
> trying to read from non-existing ports/memory locations, and can hang
> waiting for a result, or loop forever if it gets bad information.
> 
> This is one of many 'races' that can not be solved w/out hardware
> support, since there's no way of controlling the removal of the hardware
> so that the software can 'clean up' safely.

I think that you could handle this via the "longjmp" and resource
commit state tracking, described above.

It would be nive for PCI hot swap and other reasons to be able to
deal with the departure and arrival of (replacement?) hardware
automatically, so long as it can be done nicely, and you don't
end up introducing warts in the process -- even if you do end up
adding some restrictions on driver coding style, and the use of
APIs.  The overhead of the "longjmp" is practically that of a
stack probe: that is, about one stack pointer store operation and
a register checkpoint.

Is FreeBSD going to have UDI support at some point?  It has a
"method" of dealing with this issue for PCI hot swap.


> > It is my understanding that the pccard hardware still still exist for
> > a period of time after the card is ejected (since it takes some period
> > of time to move from the pins that caused the interrupt to the
> > power/control/data pins being disabled).  I don't know if this is
> > true, or if true what the period of time is.
> 
> Not true at all.  Otherwise, the 'hardware' would have to emulate the
> functionality of every card that was once in the slot, and respond in
> the same fashion. :(

Depending on being able to talk to an unplugged device is pretty
much a bad idea, even if you have several milliseconds that you
might be able to use to do the job.  I don't think that allowing
a driver servicing an interrupt at the time the card is unplugged
to run to completion based on this theory is a good idea, in any
case.  You need to abort the interrupt in progress, if any, and
back out any partially committed (and incomplete) state transitions
on things like buffers owned by the kernel, not the device, etc..


					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.




To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199910260056.RAA20467>