From owner-freebsd-amd64@FreeBSD.ORG Sat Jun 4 17:50:10 2011 Return-Path: Delivered-To: freebsd-amd64@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 146F21065675 for ; Sat, 4 Jun 2011 17:50:10 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id E1F5B8FC1A for ; Sat, 4 Jun 2011 17:50:09 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id p54Ho9d6094151 for ; Sat, 4 Jun 2011 17:50:09 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.4/8.14.4/Submit) id p54Ho9uq094150; Sat, 4 Jun 2011 17:50:09 GMT (envelope-from gnats) Resent-Date: Sat, 4 Jun 2011 17:50:09 GMT Resent-Message-Id: <201106041750.p54Ho9uq094150@freefall.freebsd.org> Resent-From: FreeBSD-gnats-submit@FreeBSD.org (GNATS Filer) Resent-To: freebsd-amd64@FreeBSD.org Resent-Reply-To: FreeBSD-gnats-submit@FreeBSD.org, Petteri Valkonen Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 850B51065675 for ; Sat, 4 Jun 2011 17:44:59 +0000 (UTC) (envelope-from nobody@FreeBSD.org) Received: from red.freebsd.org (red.freebsd.org [IPv6:2001:4f8:fff6::22]) by mx1.freebsd.org (Postfix) with ESMTP id 74A098FC0C for ; Sat, 4 Jun 2011 17:44:59 +0000 (UTC) Received: from red.freebsd.org (localhost [127.0.0.1]) by red.freebsd.org (8.14.4/8.14.4) with ESMTP id p54HixfM062182 for ; Sat, 4 Jun 2011 17:44:59 GMT (envelope-from nobody@red.freebsd.org) Received: (from nobody@localhost) by red.freebsd.org (8.14.4/8.14.4/Submit) id p54Hixjn062181; Sat, 4 Jun 2011 17:44:59 GMT (envelope-from nobody) Message-Id: <201106041744.p54Hixjn062181@red.freebsd.org> Date: Sat, 4 Jun 2011 17:44:59 GMT From: Petteri Valkonen To: freebsd-gnats-submit@FreeBSD.org X-Send-Pr-Version: www-3.1 X-Mailman-Approved-At: Sat, 04 Jun 2011 19:11:40 +0000 Cc: Subject: amd64/157615: AHCI device timeouts with ATI IXP700 SATA controller on high IO load X-BeenThere: freebsd-amd64@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Porting FreeBSD to the AMD64 platform List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 04 Jun 2011 17:50:10 -0000 >Number: 157615 >Category: amd64 >Synopsis: AHCI device timeouts with ATI IXP700 SATA controller on high IO load >Confidential: no >Severity: serious >Priority: low >Responsible: freebsd-amd64 >State: open >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Sat Jun 04 17:50:09 UTC 2011 >Closed-Date: >Last-Modified: >Originator: Petteri Valkonen >Release: 8.2-RELEASE >Organization: >Environment: FreeBSD microserver 8.2-RELEASE FreeBSD 8.2-RELEASE #0: Thu Feb 17 02:41:51 UTC 2011 root@mason.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC amd64 >Description: I'm running 8.2-RELEASE with the ahci(4) driver loaded at boot time on a HP ProLiant N36L with four Samsung HD204UI drives attached to a simple (striped) ZFS pool via an ATI IXP700 SATA controller: ahci0: port 0xd000-0xd007,0xc000-0xc003,0xb000-0xb007,0xa000-0xa003,0x9000-0x900f mem 0xfe6ffc00-0xfe6fffff irq 19 at device 17.0 on pci0 ahci0: [ITHREAD] ahci0: AHCI v1.20 with 4 3Gbps ports, Port Multiplier supported ahcich0: at channel 0 on ahci0 ahcich0: [ITHREAD] ahcich1: at channel 1 on ahci0 ahcich1: [ITHREAD] ahcich2: at channel 2 on ahci0 ahcich2: [ITHREAD] ahcich3: at channel 3 on ahci0 ahcich3: [ITHREAD] ada0 at ahcich0 bus 0 scbus0 target 0 lun 0 ada0: ATA-8 SATA 2.x device ada0: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) ada0: Command Queueing enabled ada0: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C) ada1 at ahcich1 bus 0 scbus1 target 0 lun 0 ada1: ATA-8 SATA 2.x device ada1: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) ada1: Command Queueing enabled ada1: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C) ada2 at ahcich2 bus 0 scbus2 target 0 lun 0 ada2: ATA-8 SATA 2.x device ada2: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) ada2: Command Queueing enabled ada2: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C) ada3 at ahcich3 bus 0 scbus3 target 0 lun 0 ada3: ATA-8 SATA 2.x device ada3: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) ada3: Command Queueing enabled ada3: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C) If I attempt to scrub the pool, at some point one of the disks will time out: Jun 1 22:48:59 microserver kernel: ahcich1: Timeout on slot 1 Jun 1 22:48:59 microserver kernel: ahcich1: is 00000000 cs 000007f8 ss 000007fe rs 000007fe tfd 40 serr 00000000 Jun 1 22:48:59 microserver kernel: ahcich1: device is not ready (timeout 15000ms) tfd = 00000080 Jun 1 22:49:45 microserver kernel: ahcich1: Timeout on slot 10 Jun 1 22:49:45 microserver kernel: ahcich1: is 00000000 cs 00000400 ss 00000000 rs 00000400 tfd 80 serr 00000000 Jun 1 22:49:45 microserver kernel: ahcich1: device is not ready (timeout 15000ms) tfd = 00000080 Jun 1 22:50:31 microserver kernel: ahcich1: Timeout on slot 10 Jun 1 22:50:31 microserver kernel: ahcich1: is 00000000 cs 00000400 ss 00000000 rs 00000400 tfd 80 serr 00000000 Jun 1 22:50:31 microserver kernel: ahcich1: device is not ready (timeout 15000ms) tfd = 00000080 Jun 1 22:50:31 microserver kernel: (ada1:ahcich1:0:0:0): lost device Jun 1 22:51:34 microserver kernel: ahcich1: Timeout on slot 10 Jun 1 22:51:34 microserver kernel: ahcich1: is 00000000 cs 000ffc00 ss 000ffc00 rs 000ffc00 tfd 80 serr 00000000 Jun 1 22:51:34 microserver kernel: ahcich1: device is not ready (timeout 15000ms) tfd = 00000080 Jun 1 22:51:34 microserver kernel: ahcich1: Poll timeout on slot 19 Jun 1 22:51:34 microserver kernel: ahcich1: is 00000000 cs 00080000 ss 00000000 rs 00080000 tfd 80 serr 00000000 Jun 1 22:52:36 microserver kernel: ahcich1: Timeout on slot 19 Jun 1 22:52:36 microserver kernel: ahcich1: is 00000000 cs 1ff80000 ss 1ff80000 rs 1ff80000 tfd 80 serr 00000000 Jun 1 22:52:36 microserver kernel: ahcich1: device is not ready (timeout 15000ms) tfd = 00000080 Jun 1 22:52:36 microserver kernel: ahcich1: Poll timeout on slot 28 Jun 1 22:52:36 microserver kernel: ahcich1: is 00000000 cs 10000000 ss 00000000 rs 10000000 tfd 80 serr 00000000 Jun 1 22:53:38 microserver kernel: ahcich1: Timeout on slot 28 Jun 1 22:53:38 microserver kernel: ahcich1: is 00000000 cs f000003f ss f000003f rs f000003f tfd 80 serr 00000000 Jun 1 22:53:38 microserver kernel: ahcich1: device is not ready (timeout 15000ms) tfd = 00000080 Jun 1 22:53:38 microserver kernel: ahcich1: Poll timeout on slot 5 Jun 1 22:53:38 microserver kernel: ahcich1: is 00000000 cs 00000020 ss 00000000 rs 00000020 tfd 80 serr 00000000 Jun 1 22:54:41 microserver kernel: ahcich1: Timeout on slot 5 Jun 1 22:54:41 microserver kernel: ahcich1: is 00000000 cs 00007fe0 ss 00007fe0 rs 00007fe0 tfd 80 serr 00000000 Jun 1 22:54:41 microserver kernel: ahcich1: device is not ready (timeout 15000ms) tfd = 00000080 Jun 1 22:54:41 microserver kernel: ahcich1: Poll timeout on slot 14 Jun 1 22:54:41 microserver kernel: ahcich1: is 00000000 cs 00004000 ss 00000000 rs 00004000 tfd 80 serr 00000000 Jun 1 22:54:41 microserver root: ZFS: vdev I/O failure, zpool=backup path=/dev/label/disk2 offset=270336 size=8192 error=6 Jun 1 22:54:41 microserver root: ZFS: vdev I/O failure, zpool=backup path=/dev/label/disk2 offset=2000398327808 size=8192 error=6 Jun 1 22:54:41 microserver root: ZFS: vdev I/O failure, zpool=backup path=/dev/label/disk2 offset=2000398589952 size=8192 error=6 The offending disk is then taken offline: # camcontrol devlist at scbus0 target 0 lun 0 (ada0,pass0) at scbus2 target 0 lun 0 (ada2,pass2) at scbus3 target 0 lun 0 (ada3,pass3) I have upgraded the server's BIOS to the latest available version (2011.04.02 (A)), but the problem still persists. Furthermore, extended offline SMART self-tests (smartctl -t long) performed on all the disks report no errors. If I switch to the old ata(4) driver, the scrub job completes without any errors. Others have also reported the same symptoms on similar hardware (a N36L with Samsung disks), and switching drivers has also remedied the problem for them: http://freebsd.1045724.n5.nabble.com/ahci-ko-and-IXP700-800-gt-no-disk-found-tt3948669.html#a3948673 >How-To-Repeat: Load the ahci(4) module and begin an disk IO intensive process (e.g. a ZFS scrub). >Fix: >Release-Note: >Audit-Trail: >Unformatted: