Date: Sun, 17 Aug 1997 10:59:43 +0930 (CST) From: Greg Lehey <grog@lemis.com> To: freebsd-scsi@freebsd.org (FreeBSD SCSI Mailing List) Subject: Bus resets. Grrrr. Message-ID: <199708170129.KAA03776@freebie.lemis.com>
next in thread | raw e-mail | index | archive | help
This is the third time in a row that I haven't been able to complete a backup because of "recoverable" SCSI errors. Here's a pretty typical scenario: Aug 17 10:27:19 freebie /kernel: sd0: SCB 0x4 - timed out while idle, LASTPHASE == 0x1, SCSISIGI == 0x0 What does this mean? What can time out when nothing's happening? Or is this a timeout accepting a new command when it shouldn't have to? Is this a device or a driver logic error? Aug 17 10:27:31 freebie /kernel: SEQADDR = 0x9 SCSISEQ = 0x12 SSTAT0 = 0x5 SSTAT1 = 0xa Aug 17 10:27:31 freebie /kernel: sd0: Queueing an Abort SCB Aug 17 10:27:31 freebie /kernel: sd0: Abort Message Sent Aug 17 10:27:31 freebie /kernel: sd0: SCB 0x4 - timed out in message out phase, SCSISIGI == 0xa4 Aug 17 10:27:31 freebie /kernel: SEQADDR = 0x9a SCSISEQ = 0x12 SSTAT0 = 0x5 SSTAT1 = 0x2 If I understand this correctly, this means that the abort SCB wasn't received either, so the driver does a bus reset: Aug 17 10:27:31 freebie /kernel: ahc0: Issued Channel A Bus Reset. 3 SCBs aborted Aug 17 10:27:32 freebie /kernel: Clearing bus reset Aug 17 10:27:32 freebie /kernel: Clearing 'in-reset' flag Aug 17 10:27:32 freebie /kernel: sd0: no longer in timeout ... which works. Aug 17 10:27:32 freebie /kernel: sd0: SCB 0x4 - timed out in command phase, SCSISIGI == 0x84 So why do we get another timeout? Or is this overlapping? Aug 17 10:27:32 freebie /kernel: SEQADDR = 0x42 SCSISEQ = 0x12 SSTAT0 = 0x7 SSTAT1 = 0x2 Aug 17 10:27:32 freebie /kernel: sd0: abort message in message buffer Aug 17 10:27:32 freebie /kernel: sd1: SCB 0x3 timedout while recovery in progress Aug 17 10:27:32 freebie /kernel: sd0: SCB 1 - Abort Completed. Aug 17 10:27:32 freebie /kernel: sd0: no longer in timeout Aug 17 10:27:32 freebie /kernel: sd1: UNIT ATTENTION asc:29,0 Aug 17 10:27:32 freebie /kernel: sd1: Power on, reset, or bus device reset occurred Aug 17 10:27:32 freebie /kernel: , retries:3 So sd3 complains, but carries on with no harm done, Aug 17 10:27:32 freebie /kernel: st0: UNIT ATTENTION asc:29,0 Aug 17 10:27:32 freebie /kernel: st0: Power on, reset, or bus device reset occurred Aug 17 10:27:32 freebie /kernel: st0: Target Busy but the tape dies. Is there a good reason for this? I would have thought that it would make sense for a power on or reset, but not for a bus reset. Does a tape unit lose its position or data when it receives a bus reset? Is anybody doing anything about this? Greg
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199708170129.KAA03776>