From owner-freebsd-hackers Wed Jan 31 21:03:34 1996 Return-Path: owner-hackers Received: (from root@localhost) by freefall.freebsd.org (8.7.3/8.7.3) id VAA22557 for hackers-outgoing; Wed, 31 Jan 1996 21:03:34 -0800 (PST) Received: from anacreon.sol.net (anacreon.sol.net [206.55.64.116]) by freefall.freebsd.org (8.7.3/8.7.3) with SMTP id VAA22552 for ; Wed, 31 Jan 1996 21:03:31 -0800 (PST) Received: from solaria.sol.net (solaria.sol.net [206.55.65.75]) by anacreon.sol.net (8.6.12/8.6.12) with ESMTP id XAA07493 for ; Wed, 31 Jan 1996 23:02:56 -0600 Received: from localhost by solaria.sol.net (8.5/8.5) id XAA28690; Wed, 31 Jan 1996 23:04:11 -0600 From: Joe Greco Message-Id: <199602010504.XAA28690@solaria.sol.net> Subject: No SCSI recovery - yet another gripe To: hackers@freebsd.org Date: Wed, 31 Jan 96 23:04:08 CST X-Mailer: ELM [version 2.4dev PL65] MIME-Version: 1.0 Content-Type: text Sender: owner-hackers@freebsd.org Precedence: bulk This is the second time this week my news box has frozen with a SCSI error of some sort on the screen. This time: ahc1: target 3, lun0 (sd23) timed out sd23(aha1:3:0): BUS DEVICE RESET message queued. ahc1:A:3: no active SCB for reconnecting target - issuing ABORT SAVED_TCL = 0x30 ahc1: target 3, lun0 (sd23) timed out _ The SCSI system works GREAT when all is fine and dandy. However, this sort of error "recovery" sucks - a panic and reboot is preferable to a dead freeze. In all reality I believe it has something to do with the relative reliability of drive power connectors and the likelihood that all 14 of them that are on news.sol.net work perfectly is less than 100%... so I will tackle the problem from a hardware standpoint, as I believe that the source is a loose power connection somewhere. On the other hand, consider this a plea for the SCSI gods to improve the error handling somehow! I hear great games talked on -hackers and all, layered device independent error handling, etc... a free beer to the person(s) who implement(s) it. ;-) For kicks, I have been known to take a SCSI disk and unplug it from a Solaris based system while the system is running. The grace with which it attempts to deal with the crisis is admirable. Sometimes the system even continues to work if I plug the drive back in... :-) I don't expect that anybody has the time or effort to spare to implement error recovery to this sort of level, but the current "lock'n'hang" is a little too far to the opposite extreme... Thanks and good evening, ... Joe ------------------------------------------------------------------------------- Joe Greco - Systems Administrator jgreco@ns.sol.net Solaria Public Access UNIX - Milwaukee, WI 414/342-4847