Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 31 Jan 1996 21:25:17 -0800
From:      "Justin T. Gibbs" <gibbs@freefall.freebsd.org>
To:        Joe Greco <jgreco@solaria.sol.net>
Cc:        hackers@freebsd.org
Subject:   Re: No SCSI recovery - yet another gripe 
Message-ID:  <199602010525.VAA23989@freefall.freebsd.org>
In-Reply-To: Your message of "Wed, 31 Jan 1996 23:04:08 CST." <199602010504.XAA28690@solaria.sol.net> 

next in thread | previous in thread | raw e-mail | index | archive | help
>This is the second time this week my news box has frozen with a SCSI error
>of some sort on the screen.  This time:
>
>ahc1: target 3, lun0 (sd23) timed out
>sd23(aha1:3:0): BUS DEVICE RESET message queued.
>ahc1:A:3: no active SCB for reconnecting target - issuing ABORT
>SAVED_TCL = 0x30
>ahc1: target 3, lun0 (sd23) timed out

Yup.  The error recovery code in the aic7xxx driver is especially
bad because it has not been updated to match the recent stability
fixes in the driver.

>The SCSI system works GREAT when all is fine and dandy.  However, this sort
>of error "recovery" sucks - a panic and reboot is preferable to a dead
>freeze.

Agreed.

>On the other hand, consider this a
>plea for the SCSI gods to improve the error handling somehow!  I hear great
>games talked on -hackers and all, layered device independent error handling,
>etc...  a free beer to the person(s) who implement(s) it.  ;-)

This will happen before 2.2 ships.  PowerPoint is nearing code complete,
so my time is limited for for another 7 days or so, but after that,
my nights will be devoted to these problems.  The entire generic SCSI
layer is in for a revamp with extra detail going toward error recovery
and performance.

>For kicks, I have been known to take a SCSI disk and unplug it from a
>Solaris based system while the system is running.  The grace with which it
>attempts to deal with the crisis is admirable.  Sometimes the system even
>continues to work if I plug the drive back in...  :-)  I don't expect that
>anybody has the time or effort to spare to implement error recovery to this
>sort of level,

We need this level of robustness in order to be taken seriously IMHO.  As
they say, "shit happens" on SCSI busses as well as as in real life.
Luckily we can anticipate what kinds of things will hit the fan with SCSI
and hopefully do everything possible to recover.  My main concern is
sufficient driver level documentation to make the error recovery reliable.
I have all I need for the Adaptec aic7xxx cards since I control the
firmware (Stephan I'm sure is in the same boat with the NCR), but for cards
like the Buslogic and Ultrastore, I just don't know how well we can do.

>but the current "lock'n'hang" is a little too far to the
>opposite extreme...

No disagreement here.

>
>Thanks and good evening,
>
>... Joe
>
>------------------------------------------------------------------------------
>-
>Joe Greco - Systems Administrator			      jgreco@ns.sol.net
>Solaria Public Access UNIX - Milwaukee, WI			   414/342-4847

--
Justin T. Gibbs
===========================================
  FreeBSD: Turning PCs into workstations
===========================================



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199602010525.VAA23989>