From owner-freebsd-hackers Wed Jan 31 21:47:23 1996 Return-Path: owner-hackers Received: (from root@localhost) by freefall.freebsd.org (8.7.3/8.7.3) id VAA25091 for hackers-outgoing; Wed, 31 Jan 1996 21:47:23 -0800 (PST) Received: from schizo.cdsnet.net (schizo.cdsnet.net [204.118.244.32]) by freefall.freebsd.org (8.7.3/8.7.3) with SMTP id VAA25085 for ; Wed, 31 Jan 1996 21:47:17 -0800 (PST) Received: (from mrcpu@localhost) by schizo.cdsnet.net (8.6.12/8.6.12) id VAA19914; Wed, 31 Jan 1996 21:46:56 -0800 Date: Wed, 31 Jan 1996 21:46:55 -0800 (PST) From: Jaye Mathisen To: Joe Greco cc: hackers@freebsd.org Subject: Re: No SCSI recovery - yet another gripe In-Reply-To: <199602010504.XAA28690@solaria.sol.net> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-hackers@freebsd.org Precedence: bulk Justin mentioned one time that he was working on the recovery code. It's doing the same thing for me as well. On Wed, 31 Jan 1996, Joe Greco wrote: > This is the second time this week my news box has frozen with a SCSI error > of some sort on the screen. This time: > > ahc1: target 3, lun0 (sd23) timed out > sd23(aha1:3:0): BUS DEVICE RESET message queued. > ahc1:A:3: no active SCB for reconnecting target - issuing ABORT > SAVED_TCL = 0x30 > ahc1: target 3, lun0 (sd23) timed out > _ > > The SCSI system works GREAT when all is fine and dandy. However, this sort > of error "recovery" sucks - a panic and reboot is preferable to a dead > freeze. > > In all reality I believe it has something to do with the relative > reliability of drive power connectors and the likelihood that all 14 of them > that are on news.sol.net work perfectly is less than 100%... so I will > tackle the problem from a hardware standpoint, as I believe that the source > is a loose power connection somewhere. On the other hand, consider this a > plea for the SCSI gods to improve the error handling somehow! I hear great > games talked on -hackers and all, layered device independent error handling, > etc... a free beer to the person(s) who implement(s) it. ;-) > > For kicks, I have been known to take a SCSI disk and unplug it from a > Solaris based system while the system is running. The grace with which it > attempts to deal with the crisis is admirable. Sometimes the system even > continues to work if I plug the drive back in... :-) I don't expect that > anybody has the time or effort to spare to implement error recovery to this > sort of level, but the current "lock'n'hang" is a little too far to the > opposite extreme... > > Thanks and good evening, > > ... Joe > > ------------------------------------------------------------------------------- > Joe Greco - Systems Administrator jgreco@ns.sol.net > Solaria Public Access UNIX - Milwaukee, WI 414/342-4847 >