Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 1 Feb 1996 10:54:42 -0600 (CST)
From:      Joe Greco <jgreco@brasil.moneng.mei.com>
To:        gibbs@freefall.freebsd.org (Justin T. Gibbs)
Cc:        hackers@FreeBSD.org
Subject:   Re: No SCSI recovery - yet another gripe
Message-ID:  <199602011654.KAA09004@brasil.moneng.mei.com>
In-Reply-To: <199602010525.VAA23989@freefall.freebsd.org> from "Justin T. Gibbs" at Jan 31, 96 09:25:17 pm

next in thread | previous in thread | raw e-mail | index | archive | help
> >This is the second time this week my news box has frozen with a SCSI error
> >of some sort on the screen.  This time:
> >
> >ahc1: target 3, lun0 (sd23) timed out
> >sd23(aha1:3:0): BUS DEVICE RESET message queued.
> >ahc1:A:3: no active SCB for reconnecting target - issuing ABORT
> >SAVED_TCL = 0x30
> >ahc1: target 3, lun0 (sd23) timed out
> 
> Yup.  The error recovery code in the aic7xxx driver is especially
> bad because it has not been updated to match the recent stability
> fixes in the driver.

Bluhhhck.  :-/  :-(

> >On the other hand, consider this a
> >plea for the SCSI gods to improve the error handling somehow!  I hear great
> >games talked on -hackers and all, layered device independent error handling,
> >etc...  a free beer to the person(s) who implement(s) it.  ;-)
> 
> This will happen before 2.2 ships.  PowerPoint is nearing code complete,
> so my time is limited for for another 7 days or so, but after that,
> my nights will be devoted to these problems.  The entire generic SCSI
> layer is in for a revamp with extra detail going toward error recovery
> and performance.

Cool!  :-)

> >For kicks, I have been known to take a SCSI disk and unplug it from a
> >Solaris based system while the system is running.  The grace with which it
> >attempts to deal with the crisis is admirable.  Sometimes the system even
> >continues to work if I plug the drive back in...  :-)  I don't expect that
> >anybody has the time or effort to spare to implement error recovery to this
> >sort of level,
> 
> We need this level of robustness in order to be taken seriously IMHO. 

Yes, I think so too, but then again, I realize this is a volunteer operation
and I would not be unhappy with a less than preferable action such as a
panic.  I do think that this is something that needs to be addressed before
FreeBSD is likely to over the world.  ;-)

> As they say, "shit happens" on SCSI busses as well as as in real life.
> Luckily we can anticipate what kinds of things will hit the fan with SCSI
> and hopefully do everything possible to recover.  My main concern is
> sufficient driver level documentation to make the error recovery reliable.
> I have all I need for the Adaptec aic7xxx cards since I control the
> firmware (Stephan I'm sure is in the same boat with the NCR), but for cards
> like the Buslogic and Ultrastore, I just don't know how well we can do.

Again, one can only ask so much out of volunteers.  We can try to support
LOTS of hardware and that is GOOD.  On the other hand, there is also nothing
wrong with certifying certain hardware as "FreeBSD Blessed" and therefore
saying it is preferable to use well documented hardware that is fully
supported over poorly documented hardware that got reverse-engineered.

Anyways, thanks and good luck.

... Joe

-------------------------------------------------------------------------------
Joe Greco - Systems Administrator			      jgreco@ns.sol.net
Solaria Public Access UNIX - Milwaukee, WI			   414/546-7968



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199602011654.KAA09004>