Date: Mon, 18 Oct 1999 07:26:35 -0700 (PDT) From: Matthew Jacob <mjacob@feral.com> To: Andrew Gallatin <gallatin@cs.duke.edu>, sos@freebsd.org Cc: alpha@freebsd.org Subject: Re: workaround for ata driver woes on alpha Message-ID: <Pine.BSF.4.05.9910180702290.14549-100000@semuta.feral.com> In-Reply-To: <14347.6330.820928.627692@grits.cs.duke.edu>
next in thread | previous in thread | raw e-mail | index | archive | help
(I have not yet looked at the actual source...) > I think there is a serious problem with the ad_timeout() function in > the case where the request has actually completed & the timeout was > too short. ad_timeout() has no way to know if the request it has been > passed is still valid, or has been deallocated. Wrapping the function > in splbio() will only narrow the race, not close it because we're > still going to be at splsoftclock when the function is called. I > think setting the timeout to a reasonable value is a good workaround, > but I'm still concerned about very slow hardware.. No, lengthening the timeout, while possibly correct for trying to achieve the same length of timeout on alpha as in i386, will *never* solve window problems- it just makes them more infrequent which is, in fact, far more dangerous to an OS than the outright panic (why? Think about it- if you make a problem just *rare* instead of really going away, you curse the platform it occurs on with an aura of unreliabilty so that people are just too uneasy to depend on it...)..... (goes off and looks at source....) Yep. This is broken. The timeout can still run when a request has been deallocated. This whole area of the code needs to be rewritten/rethought. I wouldn't run it, even with the timeout extended, without that. I would recommend hanging requests off the softc in a list (if it's more than one per ata instance) or just as a pointer *which gets nulled if untimeout is called* so that splbio protection can offer mutex exclusion on the callout vs. the IDE interrupt thread looking through the currently active list. If ad_interrupt runs and calls untimeout on an already active callout it will make the callout thread not find anything to whine about (but only if the callout thread knows where to looK). You should note that this would still be problematic if there ever were identical request block pointers. Also, IMO, using a timeout per I/O request is a heavy load for a system unless you need the precise accuracy. I prefer a general periodic timer per device instance and timeout counts for all active commands for that device. Soren- this is your stuff isn't it- what have we misunderstood? -matt To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-alpha" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.BSF.4.05.9910180702290.14549-100000>