Date: Wed, 31 Jan 1996 21:25:17 -0800 From: "Justin T. Gibbs" <gibbs@freefall.freebsd.org> To: Joe Greco <jgreco@solaria.sol.net> Cc: hackers@freebsd.org Subject: Re: No SCSI recovery - yet another gripe Message-ID: <199602010525.VAA23989@freefall.freebsd.org> In-Reply-To: Your message of "Wed, 31 Jan 1996 23:04:08 CST." <199602010504.XAA28690@solaria.sol.net>
next in thread | previous in thread | raw e-mail | index | archive | help
>This is the second time this week my news box has frozen with a SCSI error >of some sort on the screen. This time: > >ahc1: target 3, lun0 (sd23) timed out >sd23(aha1:3:0): BUS DEVICE RESET message queued. >ahc1:A:3: no active SCB for reconnecting target - issuing ABORT >SAVED_TCL = 0x30 >ahc1: target 3, lun0 (sd23) timed out Yup. The error recovery code in the aic7xxx driver is especially bad because it has not been updated to match the recent stability fixes in the driver. >The SCSI system works GREAT when all is fine and dandy. However, this sort >of error "recovery" sucks - a panic and reboot is preferable to a dead >freeze. Agreed. >On the other hand, consider this a >plea for the SCSI gods to improve the error handling somehow! I hear great >games talked on -hackers and all, layered device independent error handling, >etc... a free beer to the person(s) who implement(s) it. ;-) This will happen before 2.2 ships. PowerPoint is nearing code complete, so my time is limited for for another 7 days or so, but after that, my nights will be devoted to these problems. The entire generic SCSI layer is in for a revamp with extra detail going toward error recovery and performance. >For kicks, I have been known to take a SCSI disk and unplug it from a >Solaris based system while the system is running. The grace with which it >attempts to deal with the crisis is admirable. Sometimes the system even >continues to work if I plug the drive back in... :-) I don't expect that >anybody has the time or effort to spare to implement error recovery to this >sort of level, We need this level of robustness in order to be taken seriously IMHO. As they say, "shit happens" on SCSI busses as well as as in real life. Luckily we can anticipate what kinds of things will hit the fan with SCSI and hopefully do everything possible to recover. My main concern is sufficient driver level documentation to make the error recovery reliable. I have all I need for the Adaptec aic7xxx cards since I control the firmware (Stephan I'm sure is in the same boat with the NCR), but for cards like the Buslogic and Ultrastore, I just don't know how well we can do. >but the current "lock'n'hang" is a little too far to the >opposite extreme... No disagreement here. > >Thanks and good evening, > >... Joe > >------------------------------------------------------------------------------ >- >Joe Greco - Systems Administrator jgreco@ns.sol.net >Solaria Public Access UNIX - Milwaukee, WI 414/342-4847 -- Justin T. Gibbs =========================================== FreeBSD: Turning PCs into workstations ===========================================
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199602010525.VAA23989>