Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 25 Oct 1999 12:57:24 -0600
From:      Nate Williams <nate@mt.sri.com>
To:        Warner Losh <imp@village.org>
Cc:        nate@mt.sri.com (Nate Williams), arch@freebsd.org
Subject:   Re: Racing interrupts 
Message-ID:  <199910251857.MAA14453@mt.sri.com>
In-Reply-To: <199910251850.MAA42106@harmony.village.org>
References:  <199910251827.MAA14189@mt.sri.com> <199910251646.KAA13773@mt.sri.com> <199910240608.AAA34462@harmony.village.org> <199910251822.MAA41899@harmony.village.org> <199910251850.MAA42106@harmony.village.org>

next in thread | previous in thread | raw e-mail | index | archive | help

> : Possibly, but the races still exist, and you can still get in a position
> : where the hardware is gone.  (I've verified this, having done alot of
> : the work in the old pccard on suspend/resume.)
> 
> OK.  So there is a small window there, but nothing that can be counted
> upon.  One must therefore assume that the hardware is gone when the
> interrupt comes in...

Agreed.  Sean also brought up the fact that it was necessary for Stratus
to do this in the case of 'failed' hardware, which is a big deal not to
hang your kernel when you're advertising yourself as fault-tolerant. :)

> : It's certainly not impossible, but it does make the drivers that much
> : more complex.  And, (not to disagree with Sean), I don't see how you
> : fix all the problems, simply because at some point you must assume the
> : hardware exists, and if it disappears in the middle of an operation
> : without any way of knowing that it's gone, how can you recover from it?
> 
> Yes.  W/o explicit checks for 'am I gone' it is very hard, and where
> do you make them, and there is still a tiny race between the checking
> for am I gone and the touching of hardware.  These races can be made
> so small as to be hard to lose.

The 'am I gone' race is a big one (IMO), and instead of trying to
minimize that race (which I don't think we can minimize much at all), I
think the solution (which is much more complex) is to re-write the
device drivers to never do busy-wait loops, never-ending timeouts,
etc...

Unfortunately, this may require changes to some basic FreeBSD
assumptions (timeouts in particular).  Do tsleep/wakeup provide for a
'default' timeout and notification?

> That's one reason I think that having
> some way to terminate the current thread of execution at any
> instruction with a simple callback saying, "I killed your driver
> thread, cope with the loss of hardware" is about as good as we're
> going to get.

This requiers changes to all drivers to not expect that a piece of
hardware exists.  And, if the thread is never given the indication that
the hardware is gone (think fast interrupts), it still must deal with
the fact that the hardware *may* be gone.

It would also have a nice side-effect of making FreeBSD much more
tolerant of failing hardware, although I'm not sure we would need to go
the the lengths that a company like Stratus does.  They don't have to
support the wide variety of hardware that FreeBSD does.

> : When someone removes the bridge away from you while you're walking
> : across the chasm, how can you be expected to 'recover' from it? ;)
> 
> By hanging onto the bridge :-)  Or registering a SIGBRIDGE handler and
> hoping that the 'chute deploys in time :-).

That implies that you are informed that the bridge is gone, instead of
finding out about it from striking the ground w/out a net. :(




Nate




To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199910251857.MAA14453>