Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 17 Aug 1997 10:59:43 +0930 (CST)
From:      Greg Lehey <grog@lemis.com>
To:        freebsd-scsi@freebsd.org (FreeBSD SCSI Mailing List)
Subject:   Bus resets.  Grrrr.
Message-ID:  <199708170129.KAA03776@freebie.lemis.com>

next in thread | raw e-mail | index | archive | help
This is the third time in a row that I haven't been able to complete a backup
because of "recoverable" SCSI errors.  Here's a pretty typical scenario:

Aug 17 10:27:19 freebie /kernel: sd0: SCB 0x4 - timed out while idle, LASTPHASE == 0x1, SCSISIGI == 0x0

What does this mean?  What can time out when nothing's happening?  Or is this a
timeout accepting a new command when it shouldn't have to?  Is this a device or
a driver logic error?

Aug 17 10:27:31 freebie /kernel: SEQADDR = 0x9 SCSISEQ = 0x12 SSTAT0 = 0x5 SSTAT1 = 0xa
Aug 17 10:27:31 freebie /kernel: sd0: Queueing an Abort SCB
Aug 17 10:27:31 freebie /kernel: sd0: Abort Message Sent
Aug 17 10:27:31 freebie /kernel: sd0: SCB 0x4 - timed out in message out phase, SCSISIGI == 0xa4
Aug 17 10:27:31 freebie /kernel: SEQADDR = 0x9a SCSISEQ = 0x12 SSTAT0 = 0x5 SSTAT1 = 0x2

If I understand this correctly, this means that the abort SCB wasn't received
either, so the driver does a bus reset:

Aug 17 10:27:31 freebie /kernel: ahc0: Issued Channel A Bus Reset. 3 SCBs aborted
Aug 17 10:27:32 freebie /kernel: Clearing bus reset
Aug 17 10:27:32 freebie /kernel: Clearing 'in-reset' flag
Aug 17 10:27:32 freebie /kernel: sd0: no longer in timeout

... which works.

Aug 17 10:27:32 freebie /kernel: sd0: SCB 0x4 - timed out in command phase, SCSISIGI == 0x84

So why do we get another timeout?  Or is this overlapping?

Aug 17 10:27:32 freebie /kernel: SEQADDR = 0x42 SCSISEQ = 0x12 SSTAT0 = 0x7 SSTAT1 = 0x2
Aug 17 10:27:32 freebie /kernel: sd0: abort message in message buffer
Aug 17 10:27:32 freebie /kernel: sd1: SCB 0x3 timedout while recovery in progress
Aug 17 10:27:32 freebie /kernel: sd0: SCB 1 - Abort Completed.
Aug 17 10:27:32 freebie /kernel: sd0: no longer in timeout
Aug 17 10:27:32 freebie /kernel: sd1: UNIT ATTENTION asc:29,0
Aug 17 10:27:32 freebie /kernel: sd1:  Power on, reset, or bus device reset occurred
Aug 17 10:27:32 freebie /kernel: , retries:3

So sd3 complains, but carries on with no harm done,

Aug 17 10:27:32 freebie /kernel: st0: UNIT ATTENTION asc:29,0
Aug 17 10:27:32 freebie /kernel: st0:  Power on, reset, or bus device reset occurred
Aug 17 10:27:32 freebie /kernel: st0: Target Busy

but the tape dies.  Is there a good reason for this?  I would have thought that
it would make sense for a power on or reset, but not for a bus reset.  Does a
tape unit lose its position or data when it receives a bus reset?

Is anybody doing anything about this?

Greg



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199708170129.KAA03776>