From owner-freebsd-isp Fri Dec 5 17:03:51 1997 Return-Path: Received: (from root@localhost) by hub.freebsd.org (8.8.7/8.8.7) id RAA00549 for isp-outgoing; Fri, 5 Dec 1997 17:03:51 -0800 (PST) (envelope-from owner-freebsd-isp) Received: from obiwan.TerraNova.net (root@obiwan.TerraNova.net [209.4.59.2]) by hub.freebsd.org (8.8.7/8.8.7) with ESMTP id RAA00544 for ; Fri, 5 Dec 1997 17:03:48 -0800 (PST) (envelope-from bofh@terranova.net) Received: from P1mpBSD.TerraNova.net (tog@guenhwyvar.TerraNova.net [209.4.59.4]) by obiwan.TerraNova.net (8.8.8/TerraNovaNet) with SMTP id UAA08224; Fri, 5 Dec 1997 20:07:13 -0500 (EST) Message-ID: <3488A507.65CB@terranova.net> Date: Fri, 05 Dec 1997 20:06:15 -0500 From: Travis Mikalson Organization: TerraNovaNet X-Mailer: Mozilla 3.03 (Win95; I) MIME-Version: 1.0 To: "Bryn Wm. Moslow" CC: freebsd-isp@freebsd.org, jayk@nwlink.com Subject: Re: Adaptec 2940/Seagate Failures References: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-isp@freebsd.org X-Loop: FreeBSD.org Precedence: bulk Bryn Wm. Moslow wrote: > - I've used every combination of AHC_TAGENABLE, AHC_SCBPAGING_ENABLE, and > AHC_ALLOW_MEMIO in the kernel possible and each one in cooperation with > the others or on its own eventually brings down the system. Ok, for the record, I'm using all of these options together as well. AHC_SCBPAGING_ENABLE AHC_TAGENABLE AHC_ALLOW_MEMIO > sd2(ahc1:2:0): SCB 0x0 - timed out in message in phase, SCSISIGI == 0xe6 > SEQADDR = 0xd1 SCSISEQ = 0x12 SSTAT0 = 0x2 SSTAT1 = 0x3 > sd2(ahc1:2:0): abort message in message buffer > sd2(ahc1:2:0): SCB 0x0 - timed out in message in phase, SCSISIGI == 0xf6 > SEQADDR = 0xd1 SCSISEQ = 0x12 SSTAT0 = 0x2 SSTAT1 = 0x3 > sd2(ahc1:2:0): no longer in timeout > ahc1: Issued Channel A Bus Reset. 4 SCBs aborted Wow! I'm quite familiar with this unfortunately. I've been getting this occasionally on my lightly-loaded server with an AHA-2940UW and two Micropolis drives: ahc0 rev 0 int a irq 11 on pci0:14 ahc0: aic7880 Wide Channel, SCSI Id=7, 16/255 SCBs ahc0 waiting for scsi devices to settle ahc0: target 0 Tagged Queuing Device (ahc0:0:0): "MICROP 3243-19 1128RFAV RFAV" type 0 fixed SCSI 2 sd0(ahc0:0:0): Direct-Access 4095MB (8388315 512 byte sectors) ahc0: target 2 Tagged Queuing Device (ahc0:2:0): "MICROP 3243-19 1128RFAV RFAV" type 0 fixed SCSI 2 sd1(ahc0:2:0): Direct-Access 4095MB (8388315 512 byte sectors) Your hell looks just like mine. Mine has only been fatal once. I've always attributed it to the HD (and backup very frequently) since it's always the same one.. but then again that is the drive that does the most I/O. The last failing HD I dealt with always req'd a cold boot to get running again.. I just had it happen a couple days ago as a matter of fact.. it wasn't fatal at all, just a hiccup; I was working on the system at the time. sd1(ahc0:2:0): SCB 0x6 - timed out in command phase, SCSISIGI == 0x84 SEQADDR = 0x4e SCSISEQ = 0x12 SSTAT0 = 0x7 SSTAT1 = 0x2 Ordered Tag queued sd1(ahc0:2:0): SCB 0x7 timedout while recovery in progress sd0(ahc0:0:0): SCB 0x0 timedout while recovery in progress sd1(ahc0:2:0): SCB 0x2 timedout while recovery in progress Ordered Tag sent sd0(ahc0:0:0): SCB 0x3 timedout while recovery in progress sd0(ahc0:0:0): SCB 0x1 timedout while recovery in progress sd1(ahc0:2:0): SCB 0x6 - timed out in command phase, SCSISIGI == 0x84 SEQADDR = 0x4e SCSISEQ = 0x12 SSTAT0 = 0x7 SSTAT1 = 0x2 sd1(ahc0:2:0): abort message in message buffer sd1(ahc0:2:0): SCB 6 - Abort Tag Completed. sd1(ahc0:2:0): no longer in timeout sd1(ahc0:2:0): SCB 0x7 - timed out while idle, LASTPHASE == 0x1, SCSISIGI == 0x0 SEQADDR = 0x6 SCSISEQ = 0x12 SSTAT0 = 0x5 SSTAT1 = 0xa Ordered Tag queued sd1(ahc0:2:0): SCB 0x7 - timed out while idle, LASTPHASE == 0x1, SCSISIGI == 0x0 SEQADDR = 0x7 SCSISEQ = 0x12 SSTAT0 = 0x5 SSTAT1 = 0xa sd1(ahc0:2:0): Queueing an Abort SCB sd1(ahc0:2:0): Abort Message Sent sd1(ahc0:2:0): SCB 7 - Abort Tag Completed. sd1(ahc0:2:0): no longer in timeout Ordered Tag sent There did happen to be a higher than normal load on the system at the time. I've changed the cable thrice and the heat seems well within acceptable parameters. So anyway I guess this is a me too. I think I started having this problem around 2.2.1. I'm currently running: FreeBSD obiwan.TerraNova.net 2.2.5-STABLE FreeBSD 2.2.5-STABLE #0: Mon Nov 24 05:16:33 EST 1997 mikaltra@obiwan.TerraNova.net:/usr/src/sys/compile/obiwan i386 I've been taught to question the hardware first, the OS later so I've never mentioned it as I don't have replacement drives to try so there still remains a variable (oh I tried another 2940UW, too) Later, -T -- TerraNovaNet Internet Services - Key Largo, FL Voice: (305)453-4011 Fax: (305)451-5991 http://www.terranova.net/ ---------------------------------------------- "Acceptance without proof is the fundamental characteristic of Western religion, Rejection without proof is the fundamental characteristic of Western science." -- Gary Zukav, "The Dancing Wu Li Masters"