From owner-freebsd-scsi  Tue May 12 09:31:45 1998
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
Received: (from majordom@localhost)
          by hub.freebsd.org (8.8.8/8.8.8) id JAA01334
          for freebsd-scsi-outgoing; Tue, 12 May 1998 09:31:45 -0700 (PDT)
          (envelope-from owner-freebsd-scsi@FreeBSD.ORG)
Received: from interlock2.lexmark.com (interlock2.lexmark.com [192.146.101.10])
          by hub.freebsd.org (8.8.8/8.8.8) with SMTP id JAA01328
          for <freebsd-scsi@freebsd.org>; Tue, 12 May 1998 09:31:42 -0700 (PDT)
          (envelope-from fireston@lexmark.com)
Received: by interlock2.lexmark.com id AA24888
  (InterLock SMTP Gateway 3.0 for freebsd-scsi@freebsd.org);
  Tue, 12 May 1998 12:31:30 -0400
Message-Id: <199805121631.AA24888@interlock2.lexmark.com>
Received: by interlock2.lexmark.com (Protected-side Proxy Mail Agent-1);
  Tue, 12 May 1998 12:31:30 -0400
From: Mik Firestone <fireston@lexmark.com>
Subject: SCSI timeouts in dataout mode
To: freebsd-scsi@FreeBSD.ORG
Date: Tue, 12 May 1998 12:31:31 -0400 (EDT)
Reply-To: fireston@lexmark.com
X-Mailer: ELM [version 2.4ME+ PL38 (25)]
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-scsi@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

We are attempting to get an HP Autoraid device running on a FreeBSD 3.0-
current box, 31198 snapshot.

The HP unit is attached to the machine via an Adaptec 2944 ( that is no typo -
it is differential ).  There are currently 6 drives in the HP.  The HP is
further set up such that there are 8 LUNs defined ( the actual sizes do not
matter - we have tried many configurations ).  The HP unit has two
controllers, but the problem appears to be independant of that.

On boot, all the LUNs are probed and return the correct information - ie
size.  fsck works ( trust us, we have done that *many* times ).  We
can mount the drives, we can delete files, move small files around 
without problem.

As soon as we try writing large ( > 25 Mb ) sequential files ( via bonnie ), 
we get this:
May 12 10:01:28 rocky /kernel: (da3:ahc1:0:0:1): SCB 0x1a - timed out in dataout phase, SCSISIGI == 0xe6
May 12 10:01:28 rocky /kernel: SEQADDR == 0x115
May 12 10:01:28 rocky /kernel: SSTAT1 == 0x13
May 12 10:01:28 rocky /kernel: (da3:ahc1:0:0:1): BDR message in message buffer
May 12 10:01:30 rocky /kernel: (da3:ahc1:0:0:1): SCB 0x38 - timed out in dataout phase, SCSISIGI == 0xf6
May 12 10:01:30 rocky /kernel: SEQADDR == 0x115
May 12 10:01:30 rocky /kernel: SSTAT1 == 0x13
May 12 10:01:30 rocky /kernel: (da3:ahc1:0:0:1): no longer in timeout
May 12 10:01:30 rocky /kernel: ahc1: Issued Channel A Bus Reset. 67 SCBs aborted
May 12 10:01:44 rocky /kernel: (da2:ahc1:0:0:0): SCB 0x1a - timed out while idle, LASTPHASE == 0x1, SCSISIGI == 0x0
May 12 10:01:44 rocky /kernel: SEQADDR == 0x18b
May 12 10:01:44 rocky /kernel: SSTAT1 == 0x0
May 12 10:01:44 rocky /kernel: (da2:ahc1:0:0:0): Queuing a BDR SCB
May 12 10:01:46 rocky /kernel: (da2:ahc1:0:0:0): SCB 0x1a - timed out while idle, LASTPHASE == 0x1, SCSISIGI == 0x0
May 12 10:01:46 rocky /kernel: SEQADDR == 0x18b
May 12 10:01:46 rocky /kernel: SSTAT1 == 0x0
May 12 10:01:46 rocky /kernel: (da2:ahc1:0:0:0): no longer in timeout
May 12 10:01:46 rocky /kernel: ahc1: Issued Channel A Bus Reset. 65 SCBs aborted
May 12 10:01:50 rocky /kernel: (da2:ahc1:0:0:0): Sending SDTR!!
May 12 10:01:50 rocky /kernel: ahc1:A:0: no active SCB for reconnecting target - issuing BUS DEVICE RESET
May 12 10:01:50 rocky /kernel: SAVED_TCL == 0x1, ARG_1 == 0xff, SEQ_FLAGS == 0x20
May 12 10:01:50 rocky /kernel: Bus Device Reset Completed.
May 12 10:01:50 rocky /kernel: ahc1: Bus Device Reset delivered. 67 SCBs aborted
May 12 10:01:54 rocky /kernel: (da2:ahc1:0:0:0): Sending SDTR!!

and so on until either we reboot ( if using CAM code ) or bonnie dies ( using
the old SCSI ).  We have played with the file size and know that a 10Mb
file hasn't caused a problem while a 25Mb file always has.

While a valid answer would be to stop running bonnie, the main intended use of
this RAID widget will be to write/read large, sequential files.

Under the old code, we have not enabled any of the AHC_* options - no tag
queuing, no mem io, etc.

Under CAM, we have tried adding the device as a rogue with tag queuing
disabled and that didn't fix it.

The Adaptec card probes as rev 1 so we slowed the data transfer rate down
to 10 Mb/sec, but that did not help.


Does anybody have an idea as to what we missed?  We would really like to get
this functional before our boss tells us to use a Sun.  

Of course, the full dmesg output is available if anybody is that interested as
well as our config files.

-- 
Mik Firestone fireston@lexmark.com
If ever I become an Evil Overlord:
I will not have a daughter. She would be as beautiful as she was evil, but one
look at the hero's rugged countenance and she'd betray her own father.

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-scsi" in the body of the message