From owner-freebsd-scsi  Sat Aug 16 23:01:55 1997
Return-Path: <owner-freebsd-scsi>
Received: (from root@localhost)
          by hub.freebsd.org (8.8.5/8.8.5) id XAA08979
          for freebsd-scsi-outgoing; Sat, 16 Aug 1997 23:01:55 -0700 (PDT)
Received: from nico.telstra.net (nico.telstra.net [139.130.204.16])
          by hub.freebsd.org (8.8.5/8.8.5) with SMTP id XAA08971
          for <freebsd-scsi@FreeBSD.ORG>; Sat, 16 Aug 1997 23:01:49 -0700 (PDT)
Received: from freebie.lemis.com (gregl1.lnk.telstra.net [139.130.136.133]) by nico.telstra.net (8.6.10/8.6.10) with ESMTP id QAA06501; Sun, 17 Aug 1997 16:01:16 +1000
Received: (grog@localhost) by freebie.lemis.com (8.8.7/8.6.12) 
       id PAA06503; Sun, 17 Aug 1997 15:31:15 +0930 (CST)
Message-ID: <19970817153114.20533@lemis.com>
Date: Sun, 17 Aug 1997 15:31:14 +0930
From: Greg Lehey <grog@lemis.com>
To: Joerg Wunsch <joerg_wunsch@uriah.heep.sax.de>
Cc: FreeBSD SCSI Mailing List <freebsd-scsi@FreeBSD.ORG>
Subject: Re: Bus resets.  Grrrr.
References: <199708170129.KAA03776@freebie.lemis.com> <19970817075001.XE28042@uriah.heep.sax.de>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Mailer: Mutt 0.81e
In-Reply-To: <19970817075001.XE28042@uriah.heep.sax.de>; from J Wunsch on Sun, Aug 17, 1997 at 07:50:01AM +0200
Organisation: LEMIS, PO Box 460, Echunga SA 5153, Australia
Phone: +61-8-8388-8250
Fax: +61-8-8388-8250
Mobile: +61-41-739-7062
WWW-Home-Page: http://www.lemis.com/~grog
Sender: owner-freebsd-scsi@FreeBSD.ORG
X-Loop: FreeBSD.org
Precedence: bulk

On Sun, Aug 17, 1997 at 07:50:01AM +0200, J Wunsch wrote:
> As Greg Lehey wrote:
>
>> This is the third time in a row that I haven't been able to complete
>> a backup because of "recoverable" SCSI errors.
>
> What makes you think these are `recoverable'?

The disks recover.

> Be reminded that this is the typical failure picture one can see from
> a bad SCSI chain.

Yup.  But that's not the only thing.

> I'm also seeing it occasionally on our new Seagate/Conner DAT drive at
> work, where even the older ahc driver used to work with the previous
> HP DAT (that is dead now).  I'm not fully sure yet, but i tend to
> blame the Conner drive there.

Interesting.  The tape in question is a
Conner^H^H^H^H^H^HArchive^H^H^H^H^H^H^HSeagate changer--see the dmesg
output below for more info.  But that doesn't seem to be the problem:
it's always the Micropolis disk which has the timeout.

>> If I understand this correctly, this means that the abort SCB wasn't
>> received either, so the driver does a bus reset:
>
> Which is typical for a SCSI chain where ``Nichts geht mehr''.

But which can happen as well at other times.

>> Aug 17 10:27:32 freebie /kernel: sd1: UNIT ATTENTION asc:29,0
>> Aug 17 10:27:32 freebie /kernel: sd1:  Power on, reset, or bus device reset occurred
>
> That's the consequence from the bus reset.  As you wrote, no harm done
> for the disks.  The unit attention is typically caught by the first
> (out of 4) retries.

My question (which you omitted): does this have to be fatal for the
tape?  Is there indeterminate data loss (i.e. can we not be sure
whether a block has been written or not?)

>> Is anybody doing anything about this?
>
> You, checking your termination and term power first?

No, been there, done that.  Do you think I'd ask a question like that
without doing my homework first?  Also, this config has been running
smoothly for weeks.

In that connection, however, I suspect problems with the IBM
DORS-32160 drives I have connected to that host adapter.  They just
plain Would Not Work on any host adapter together with my Conner
CFP4207S.  The BIOS wouldn't even get through the scan.  Here are some
relevant parts of the config:

ahc0: <Adaptec 2940 SCSI host adapter> rev 0x03 int a irq 12 on pci0.18.0
ahc0: aic7870 Single Channel, SCSI Id=7, 16 SCBs
ahc0: waiting for scsi devices to settle
scbus0 at ahc0 bus 0
scbus0 target 0 lun 0: <MICROP 2112-15MQ1094802 HQ48> type 0 fixed SCSI 2
sd0 at scbus0 target 0 lun 0
sd0: Direct-Access 1001MB (2051615 512 byte sectors)
sd0: with 1760 cyls, 15 heads, and an average 77 sectors/track
scbus0 target 3 lun 0: <IBM DORS-32160 WA0A> type 0 fixed SCSI 2
sd1 at scbus0 target 3 lun 0
sd1: Direct-Access 2063MB (4226725 512 byte sectors)
sd1: with 6703 cyls, 5 heads, and an average 126 sectors/track
scbus0 target 4 lun 0: <ARCHIVE Python 28849-XXX 4.CM> type 1 removable SCSI 2
st0 at scbus0 target 4 lun 0
st0: Sequential-Access density code 0x24, 512-byte blocks, write-enabled
scbus0 target 4 lun 1: <ARCHIVE Python 28849-XXX 4.CM> type 8 removable SCSI 2
uk0 at scbus0 target 4 lun 1
uk0: Unknown 
scbus0 target 5 lun 0: <TANDBERG  TDC 3800 -03:> type 1 removable SCSI 1
st1 at scbus0 target 5 lun 0
st1: Sequential-Access density code 0x0,  drive empty

scbus1 at aha0 bus 0
scbus1 target 2 lun 0: <CONNER CFP4207S  4.28GB 2847> type 0 fixed SCSI 2
sd2 at scbus1 target 2 lun 0

Any ideas?  I was thinking of moving the Micropolis drive to the aha,
but that suffers from other problems, over and above the performance
loss.

Greg