From owner-freebsd-drivers@FreeBSD.ORG Sun Jun 26 04:16:17 2011 Return-Path: Delivered-To: freebsd-drivers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E8F11106566C for ; Sun, 26 Jun 2011 04:16:16 +0000 (UTC) (envelope-from perryh@pluto.rain.com) Received: from agora.rdrop.com (agora.rdrop.com [IPv6:2607:f678:1010::34]) by mx1.freebsd.org (Postfix) with ESMTP id CCE7C8FC0C for ; Sun, 26 Jun 2011 04:16:16 +0000 (UTC) Received: from agora.rdrop.com (66@localhost [127.0.0.1]) by agora.rdrop.com (8.13.1/8.12.7) with ESMTP id p5Q4GFVG001060 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT) for ; Sat, 25 Jun 2011 21:16:15 -0700 (PDT) (envelope-from perryh@pluto.rain.com) Received: (from uucp@localhost) by agora.rdrop.com (8.13.1/8.12.9/Submit) with UUCP id p5Q4GF4G001059 for freebsd-drivers@freebsd.org; Sat, 25 Jun 2011 21:16:15 -0700 (PDT) Received: from fbsd61 by pluto.rain.com (4.1/SMI-4.1-pluto-M2060407) id AA08723; Sat, 25 Jun 11 15:59:09 PDT Date: Sat, 25 Jun 2011 15:58:33 -0700 From: perryh@pluto.rain.com To: freebsd-drivers@freebsd.org Message-Id: <4e066819.DRprHaL0TvBGL6Jl%perryh@pluto.rain.com> User-Agent: nail 11.25 7/29/05 Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Subject: fatal ata WRITE_DMA48 UDMA ICRC errors X-BeenThere: freebsd-drivers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Writing device drivers for FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 26 Jun 2011 04:16:17 -0000 Once in a while, on a recently-installed 8.1-RELEASE, I get a sequence like this (reformatted): Jun 25 15:55:30 fbsd81 kernel: ad8: WARNING - WRITE_DMA48 UDMA ICRC error (retrying request) LBA=615769530 Jun 25 15:55:30 fbsd81 kernel: ad8: FAILURE - WRITE_DMA48 status=51 error=4 LBA=615769530 Jun 25 15:55:30 fbsd81 kernel: GEOM_MIRROR: Request failed (error=5). ad8s2a[WRITE(offset=315265765888, length=78336)] Jun 25 15:55:30 fbsd81 kernel: GEOM_MIRROR: Device gm0: provider ad8s2a disconnected. The sequence is consistent: a retried WRITE_DMA48 UDMA ICRC error on ad8, a WRITE_DMA48 "FAILURE" on the same LBA with status=51 and error=4, a gmirror "Request failed (error=5)", and a disconnect. The LBA, offset, and length vary from one instance to another. The retry seems to succeed most of the time -- the "WARNING - WRITE_DMA48 UDMA ICRC error" message most often is not closely followed by anything else -- but it is immediately followed by a failure with status=51 and error=4 frequently enough to be a significant problem (since it breaks the mirror). The cable between the controller and the drive has been a factor -- the errors became much more frequent the first time I replaced it -- but I'm still getting occasional errors even with a brand-new cable. I doubt there is anything wrong with the (nearly new) drive, because I am not having any trouble at all with an identical drive connected to the onboard ata controller as ad0, but I wonder if there may be known issues with the VIA-based PCI card that provides two SATA ports along with the ad8 ATA port. (Nothing is connected as ad9, and I haven't yet tried to use either of the SATA devices.) I've asked on geom@ about the possibility of making gmirror more robust to this sort of event, but the better solution would be to improve the handling at the hardware or ata driver level. What would cause the ad8 driver to sometimes return a FAILURE indication after a single retryable error? Would it make sense to treat this indication (with status=51 and error=4) as retryable? Relevant parts of dmesg: pcib0: port 0xcf8-0xcff on acpi0 pci0: on pcib0 pcib1: at device 1.0 on pci0 pci1: on pcib1 pcib2: at device 30.0 on pci0 pci2: on pcib2 atapci0: port 0xdc70-0xdc7f,0xdc50-0xdc5f,0xdc30-0xdc3f, 0xdc10-0xdc1f,0xd8e0-0xd8ff,0xd400-0xd4ff irq 19 at device 11.0 on pci2 atapci0: [ITHREAD] ata2: on atapci0 ata2: [ITHREAD] ata3: on atapci0 ata3: [ITHREAD] ata4: on atapci0 ata4: [ITHREAD] pcib3: at device 14.0 on pci2 pci3: on pcib3 atapci1: port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xffa0-0xffaf at device 31.1 on pci0 ata0: on atapci1 ata0: [ITHREAD] ata1: on atapci1 ata1: [ITHREAD] ad0: 305245MB at ata0-master UDMA66 ad1: 32253MB at ata0-slave UDMA66 acd0: CDROM drive at ata1 as slave ad4: 61136MB at ata2-master UDMA100 SATA 1.5Gb/s acd1: DVDR drive at ata3 as master ad8: 305245MB at ata4-master UDMA133