From owner-freebsd-questions Tue Nov 23 19:58:45 1999 Delivered-To: freebsd-questions@freebsd.org Received: from blockhead.mincom.com (blockhead1.mincom.com [203.55.175.241]) by hub.freebsd.org (Postfix) with ESMTP id 2273714C8A for ; Tue, 23 Nov 1999 19:58:32 -0800 (PST) (envelope-from philh@mincom.com) Received: (from uucp@localhost) by blockhead.mincom.com (8.9.3/8.9.3) id NAA95425 for ; Wed, 24 Nov 1999 13:57:39 +1000 (EST) (envelope-from philh@mincom.com) Received: from porthole.mincom.oz.au(172.17.100.2) via SMTP by blockhead.mincom.oz.au, id smtpdy95419; Wed Nov 24 13:57:33 1999 Received: (from philh@localhost) by porthole.mincom.oz.au (8.8.8/8.8.5) id NAA21842 for questions@freebsd.org; Wed, 24 Nov 1999 13:57:32 +1000 (EST) Date: Wed, 24 Nov 1999 13:57:32 +1000 From: Phil Homewood To: questions@freebsd.org Subject: AHC parity errors, timeouts - -STABLE Message-ID: <19991124135732.E23235@mincom.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 0.95.5i Sender: owner-freebsd-questions@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Anyone know of any changes to the SCSI/CAM/AHC code in -STABLE in the last two weeks that may cause devices or the bus to go out to lunch? I'm in the middle of deploying a swag of near-identical boxes, and the latest one died last night with console displaying ahc0: Data Parity Error Detected during address or write data phase (da0:ahc0:0:0:0): SCB 0x3 - timed out while idle, LASTPHASE == 0x1, SEQADDR == 0x8 (da0:ahc0:0:0:0): SCB 3: Immediate reset. Flags = 0x4040 (da0:ahc0:0:0:0): no longer in timeout, status = 34b ahc0: Issued Channel A Bus Reset. 64 SCBs aborted repeatable under heavy I/O (find / -print, make buildworld). After this error appears, the machine locks solid - needs a poke in the eye to recover. This machine was only installed (3.3-RELEASE from CD) 5 days ago, but has already been through the cvsup-buildworld-portsinstall phase without a glitch once; problems started the day after the -STABLE buildworld. I don't see any obvious changes (at least pci/ahc_pci.c hasn't been touched) but my understanding of what bits of code come into the picture is a little limited. My bet is on a flaky disk, but I'd like to hear if anyone else is suffering this or has any ideas before I send the disk back. :-) dmesg relevant bits: ahc0: rev 0x00 int a irq 15 on pci0.19.0 ahc0: aic7890/91 Wide Channel A, SCSI Id=7, 16/255 SCBs ... da0: Fixed Direct Access SCSI-3 device da0: 80.000MB/s transfers (40.000MHz, offset 31, 16bit), Tagged Queueing Enabled da0: 8761MB (17942584 512 byte sectors: 255H 63S/T 1116C) da1 at ahc0 bus 0 target 1 lun 0 da1: Fixed Direct Access SCSI-3 device da1: 80.000MB/s transfers (40.000MHz, offset 31, 16bit), Tagged Queueing Enabled da1: 8761MB (17942584 512 byte sectors: 255H 63S/T 1116C) "auto-terminate" is enabled on the card. Drives are connected to the U2/LVD/SE connector; The other connectors on the card are not connected. There is a terminator on the cable end. The cable is not too long (around 1200mm by my reckoning, proper twisted LVD cabling.) Ideas, anyone? -- Phil Homewood DNRC email: philh@mincom.com Postmaster and BOFH Mincom Ltd phone: +61-7-3303-3524 Brisbane, QLD Australia fax: +61-7-3303-3269 To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-questions" in the body of the message