From owner-freebsd-scsi Tue May 12 09:31:45 1998 Return-Path: Received: (from majordom@localhost) by hub.freebsd.org (8.8.8/8.8.8) id JAA01334 for freebsd-scsi-outgoing; Tue, 12 May 1998 09:31:45 -0700 (PDT) (envelope-from owner-freebsd-scsi@FreeBSD.ORG) Received: from interlock2.lexmark.com (interlock2.lexmark.com [192.146.101.10]) by hub.freebsd.org (8.8.8/8.8.8) with SMTP id JAA01328 for ; Tue, 12 May 1998 09:31:42 -0700 (PDT) (envelope-from fireston@lexmark.com) Received: by interlock2.lexmark.com id AA24888 (InterLock SMTP Gateway 3.0 for freebsd-scsi@freebsd.org); Tue, 12 May 1998 12:31:30 -0400 Message-Id: <199805121631.AA24888@interlock2.lexmark.com> Received: by interlock2.lexmark.com (Protected-side Proxy Mail Agent-1); Tue, 12 May 1998 12:31:30 -0400 From: Mik Firestone Subject: SCSI timeouts in dataout mode To: freebsd-scsi@FreeBSD.ORG Date: Tue, 12 May 1998 12:31:31 -0400 (EDT) Reply-To: fireston@lexmark.com X-Mailer: ELM [version 2.4ME+ PL38 (25)] Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-freebsd-scsi@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org We are attempting to get an HP Autoraid device running on a FreeBSD 3.0- current box, 31198 snapshot. The HP unit is attached to the machine via an Adaptec 2944 ( that is no typo - it is differential ). There are currently 6 drives in the HP. The HP is further set up such that there are 8 LUNs defined ( the actual sizes do not matter - we have tried many configurations ). The HP unit has two controllers, but the problem appears to be independant of that. On boot, all the LUNs are probed and return the correct information - ie size. fsck works ( trust us, we have done that *many* times ). We can mount the drives, we can delete files, move small files around without problem. As soon as we try writing large ( > 25 Mb ) sequential files ( via bonnie ), we get this: May 12 10:01:28 rocky /kernel: (da3:ahc1:0:0:1): SCB 0x1a - timed out in dataout phase, SCSISIGI == 0xe6 May 12 10:01:28 rocky /kernel: SEQADDR == 0x115 May 12 10:01:28 rocky /kernel: SSTAT1 == 0x13 May 12 10:01:28 rocky /kernel: (da3:ahc1:0:0:1): BDR message in message buffer May 12 10:01:30 rocky /kernel: (da3:ahc1:0:0:1): SCB 0x38 - timed out in dataout phase, SCSISIGI == 0xf6 May 12 10:01:30 rocky /kernel: SEQADDR == 0x115 May 12 10:01:30 rocky /kernel: SSTAT1 == 0x13 May 12 10:01:30 rocky /kernel: (da3:ahc1:0:0:1): no longer in timeout May 12 10:01:30 rocky /kernel: ahc1: Issued Channel A Bus Reset. 67 SCBs aborted May 12 10:01:44 rocky /kernel: (da2:ahc1:0:0:0): SCB 0x1a - timed out while idle, LASTPHASE == 0x1, SCSISIGI == 0x0 May 12 10:01:44 rocky /kernel: SEQADDR == 0x18b May 12 10:01:44 rocky /kernel: SSTAT1 == 0x0 May 12 10:01:44 rocky /kernel: (da2:ahc1:0:0:0): Queuing a BDR SCB May 12 10:01:46 rocky /kernel: (da2:ahc1:0:0:0): SCB 0x1a - timed out while idle, LASTPHASE == 0x1, SCSISIGI == 0x0 May 12 10:01:46 rocky /kernel: SEQADDR == 0x18b May 12 10:01:46 rocky /kernel: SSTAT1 == 0x0 May 12 10:01:46 rocky /kernel: (da2:ahc1:0:0:0): no longer in timeout May 12 10:01:46 rocky /kernel: ahc1: Issued Channel A Bus Reset. 65 SCBs aborted May 12 10:01:50 rocky /kernel: (da2:ahc1:0:0:0): Sending SDTR!! May 12 10:01:50 rocky /kernel: ahc1:A:0: no active SCB for reconnecting target - issuing BUS DEVICE RESET May 12 10:01:50 rocky /kernel: SAVED_TCL == 0x1, ARG_1 == 0xff, SEQ_FLAGS == 0x20 May 12 10:01:50 rocky /kernel: Bus Device Reset Completed. May 12 10:01:50 rocky /kernel: ahc1: Bus Device Reset delivered. 67 SCBs aborted May 12 10:01:54 rocky /kernel: (da2:ahc1:0:0:0): Sending SDTR!! and so on until either we reboot ( if using CAM code ) or bonnie dies ( using the old SCSI ). We have played with the file size and know that a 10Mb file hasn't caused a problem while a 25Mb file always has. While a valid answer would be to stop running bonnie, the main intended use of this RAID widget will be to write/read large, sequential files. Under the old code, we have not enabled any of the AHC_* options - no tag queuing, no mem io, etc. Under CAM, we have tried adding the device as a rogue with tag queuing disabled and that didn't fix it. The Adaptec card probes as rev 1 so we slowed the data transfer rate down to 10 Mb/sec, but that did not help. Does anybody have an idea as to what we missed? We would really like to get this functional before our boss tells us to use a Sun. Of course, the full dmesg output is available if anybody is that interested as well as our config files. -- Mik Firestone fireston@lexmark.com If ever I become an Evil Overlord: I will not have a daughter. She would be as beautiful as she was evil, but one look at the hero's rugged countenance and she'd betray her own father. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-scsi" in the body of the message