From owner-freebsd-stable@FreeBSD.ORG Wed May 19 03:18:59 2010 Return-Path: Delivered-To: stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8AC6D1065670 for ; Wed, 19 May 2010 03:18:59 +0000 (UTC) (envelope-from dgerow@afflictions.org) Received: from ironport2-out.pppoe.ca (ironport2-out.teksavvy.com [206.248.154.183]) by mx1.freebsd.org (Postfix) with ESMTP id 292238FC08 for ; Wed, 19 May 2010 03:18:58 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AvsEAMDu8kvO+L5V/2dsb2JhbACeB3K9NoJcgjQEjC8 X-IronPort-AV: E=Sophos;i="4.53,259,1272859200"; d="scan'208";a="64537117" Received: from 206-248-190-85.dsl.teksavvy.com (HELO shell.afflictions.org) ([206.248.190.85]) by ironport2-out.pppoe.ca with ESMTP/TLS/DHE-RSA-AES256-SHA; 18 May 2010 22:49:59 -0400 Received: from shell.afflictions.org (shell.afflictions.org [172.20.143.66]) by shell.afflictions.org (8.14.4/8.14.4) with ESMTP id o4J2E3Tu044939 for ; Tue, 18 May 2010 22:14:03 -0400 (EDT) (envelope-from dgerow@afflictions.org) Received: (from dwg@localhost) by shell.afflictions.org (8.14.4/8.14.4/Submit) id o4J2E3hL044880 for stable@freebsd.org; Tue, 18 May 2010 22:14:03 -0400 (EDT) (envelope-from dgerow@afflictions.org) X-Authentication-Warning: shell.afflictions.org: dwg set sender to dgerow@afflictions.org using -f Date: Tue, 18 May 2010 22:14:03 -0400 From: Damian Gerow To: stable@freebsd.org Message-ID: <20100519021402.GI92949@plebeian.afflictions.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.20 (2009-06-14) Cc: Subject: AHCI timeouts on S3 resume X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 19 May 2010 03:18:59 -0000 A few months back, I swapped out my dying hard drive for a WD Scorpio Blue. Cheap, seemed reliable, and it was the only drive the local shop had in stock. However, it seems that AHCI doesn't like this device, and is having troubles during an S3 resume. It appears as though I'm experiencing two types of timeouts when resuming: recoverable, and non-recoverable. My question is: do I have a bad HDD, or is AHCI just not playing nicely? First off, if I try to resume when in graphics mode, it pretty much always fails (screen backlight comes on, but nothing else, system is non-responsive to all keypresses save ctrl+alt+del and a hard power down). I'm currently convinced this is because it's doing something that's triggering a non-recoverable timeout. However, when doing a suspend/resume from a text console, I can usually hit a recoverable timeout. Which looks like this (full suspend-resume cycle): ----- drm0: on vgapci0 info: [drm] MSI enabled 1 message(s) vgapci0: child drm0 requested pci_enable_busmaster info: [drm] AGP at 0xd0000000 256MB info: [drm] Initialized i915 1.6.0 20080730 drm0: [ITHREAD] wlan0: link state changed to DOWN ugen1.2: at usbus1 (disconnected) ugen2.2: at usbus2 (disconnected) ugen1.2: at usbus1 ugen1.3: at usbus1 (disconnected) ugen2.2: at usbus2 ugen1.3: at usbus1 wlan0: link state changed to UP ahcich0: Timeout on slot 24 ahcich0: is 00400040 cs 0f000000 ss 0f000000 rs 0ff00000 tfd 80 serr 04050002 ----- Recovery from these timeouts seems to take 30s. However, periodically when resuming into text mode (and I assume what is almost always when resuming into graphics mode), I trigger a non-recoverable timeout: ----- ahcich0: Timeout on slot 13 ahcich0: is 00400000 cs 0000e000 ss 0000e000 rs 0000e00 tfd 80 serr 04050000 ahcich0: device is not ready (timeout 15000ms) tfd = 00000080 ahcich0: Timeout on slot 15 ahcich0: is 00400000 cs 0000e000 ss 0000e000 rs 0000800 tfd 00 serr 00000000 ahcich0: device is not ready (timeout 15000ms) tfd = 00000000 (ada0:ahcich0:0:0:0): lost device ahcich0: Poll timeout on slot 15 ahcich0: is 00000000 cs 00008000 ss 00000000 rs 0000800 tfd 00 serr 00000000 GEOM_ELI: g_eli_read_done() failed ada0p3.eli[READ(offset=13067157504, length=65536)] GEOM_ELI: g_eli_read_done() failed ada0p3.eli[READ(offset=262144, length=8192)] GEOM_ELI: g_eli_read_done() failed ada0p3.eli[READ(offset=482927181824, length=8192)] GEOM_ELI: g_eli_read_done() failed ada0p3.eli[READ(offset=482927443968, length=8192)] ahcich0: Timeout on slot 15 ahcich0: is 00000000 cs 00038000 ss 00038000 rs 0003800 tfd 00 serr 00000000 ahcich0: devfice is not ready (timeout 15000ms) tfd = 00000080 ahcich0: Poll timeout on slot 17 ahcich0: is 00000000 cs 00020000 ss 00000000 rs 0002000 tfd 00 serr 00000000 ahcich0: Timeout on slot 17 ahcich0: is 00000000 cs 000e0000 ss 000e0000 rs 000e000 tfd 00 serr 00000000 ahcich0: devfice is not ready (timeout 15000ms) tfd = 00000080 ----- This continues, with the slot increasing by two every time. Each 'Timeout' and 'Poll timeout' takes ~15s to trigger, and I'd let this run to slot 23 before I powered down. The HDD is a WD Scorpio blue, model WD5000BEVT-22A0RT0, and isn't exactly the fastest drive on the planet. SMART seems to be relatively clean, with some mild questions surrounding attributes 191, 9/193, and 194: ----- ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0 3 Spin_Up_Time 0x0027 186 185 021 Pre-fail Always - 1675 4 Start_Stop_Count 0x0032 055 055 000 Old_age Always - 45174 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0 9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 723 10 Spin_Retry_Count 0x0032 100 100 051 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 57 191 G-Sense_Error_Rate 0x0032 072 072 000 Old_age Always - 28 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 48 193 Load_Cycle_Count 0x0032 162 162 000 Old_age Always - 115712 194 Temperature_Celsius 0x0022 112 106 000 Old_age Always - 35 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0030 100 253 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0008 100 253 051 Old_age Offline - 0 ----- So, am I looking at a bum HDD, or is AHCI just not playing nicely? Hardware and traditional troubleshooting info: Laptop is a Lenovo X200. Disk controller: ----- ahci0@pci0:0:31:2: class=0x010601 card=0x20f817aa chip=0x29298086 rev=0x03 hdr=0x00 vendor = 'Intel Corporation' device = '82801IB/IR/IH (ICH9 Family) Mobile SATA AHCI Controller' class = mass storage subclass = SATA ----- loader.conf: ----- ahci_load="YES" geom_eli_load="YES" zfs_load="YES" vfs.root.mountfrom="zfs:zroot" vfs.zfs.txg.timeout="5" iwnfw_load="YES" if_iwn_load="YES" snd_hda_load="YES" sem_load="YES" smb_load="YES" coretemp_load="YES" usb_load="YES" uhid_load="YES" uhci_load="YES" ohci_load="YES" umass_load="YES" linux_enable="YES" kern.maxfiles="35000" cpufreq_load="YES" hint.p4tcc.0.disable="1" hint.acpi_throttle.0.disable="1" hw.pci.do_power_nodriver="2" ----- dmesg: ----- Copyright (c) 1992-2010 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 8.1-PRERELEASE #7 r208254: Tue May 18 08:44:31 EDT 2010 root@plebeian.afflictions.org:/usr/obj/usr/src/sys/PLEBEIAN amd64 module_register: module g_eli already exists! Module g_eli failed to register: 17 Timecounter "i8254" frequency 1193182 Hz quality 0 CPU: Intel(R) Core(TM)2 Duo CPU P8600 @ 2.40GHz (2394.02-MHz K8-class CPU) Origin = "GenuineIntel" Id = 0x10676 Family = 6 Model = 17 Stepping = 6 Features=0xbfebfbff Features2=0x8e3fd AMD Features=0x20100800 AMD Features2=0x1 TSC: P-state invariant real memory = 4294967296 (4096 MB) avail memory = 3991834624 (3806 MB) ACPI APIC Table: FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs FreeBSD/SMP: 1 package(s) x 2 core(s) cpu0 (BSP): APIC ID: 0 cpu1 (AP): APIC ID: 1 ioapic0: Changing APIC ID to 1 ioapic0 irqs 0-23 on motherboard kbd1 at kbdmux0 cryptosoft0: on motherboard acpi0: on motherboard CPU0: local APIC error 0x40 acpi0: [ITHREAD] acpi_ec0: port 0x62,0x66 on acpi0 acpi0: Power Button (fixed) acpi0: reservation of 0, a0000 (3) failed acpi0: reservation of 100000, bff00000 (3) failed Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000 acpi_timer0: <24-bit timer at 3.579545MHz> port 0x1008-0x100b on acpi0 cpu0: on acpi0 cpu1: on acpi0 acpi_hpet0: iomem 0xfed00000-0xfed003ff on acpi0 Timecounter "HPET" frequency 14318180 Hz quality 900 acpi_lid0: on acpi0 acpi_button0: on acpi0 pcib0: port 0xcf8-0xcff on acpi0 pci0: on pcib0 vgapci0: port 0x1800-0x1807 mem 0xf2000000-0xf23fffff,0xd0000000-0xdfffffff irq 16 at device 2.0 on pci0 agp0: on vgapci0 agp0: detected 32764k stolen memory agp0: aperture size is 256M vgapci1: mem 0xf2400000-0xf24fffff at device 2.1 on pci0 pci0: at device 3.0 (no driver attached) pci0: at device 25.0 (no driver attached) uhci0: port 0x1860-0x187f irq 20 at device 26.0 on pci0 uhci0: [ITHREAD] usbus0: on uhci0 uhci1: port 0x1880-0x189f irq 21 at device 26.1 on pci0 uhci1: [ITHREAD] usbus1: on uhci1 uhci2: port 0x18a0-0x18bf irq 22 at device 26.2 on pci0 uhci2: [ITHREAD] usbus2: on uhci2 pci0: at device 26.7 (no driver attached) hdac0: mem 0xf2620000-0xf2623fff irq 17 at device 27.0 on pci0 hdac0: HDA Driver Revision: 20100226_0142 hdac0: [ITHREAD] pcib1: irq 20 at device 28.0 on pci0 pci2: on pcib1 pcib2: irq 21 at device 28.1 on pci0 pci3: on pcib2 iwn0: mem 0xf2500000-0xf2501fff irq 17 at device 0.0 on pci3 iwn0: MIMO 2T3R, MoW1, address 00:21:5c:5f:7e:e3 iwn0: [ITHREAD] iwn0: 11a rates: 6Mbps 9Mbps 12Mbps 18Mbps 24Mbps 36Mbps 48Mbps 54Mbps iwn0: 11b rates: 1Mbps 2Mbps 5.5Mbps 11Mbps iwn0: 11g rates: 1Mbps 2Mbps 5.5Mbps 11Mbps 6Mbps 9Mbps 12Mbps 18Mbps 24Mbps 36Mbps 48Mbps 54Mbps pcib3: irq 23 at device 28.3 on pci0 pci5: on pcib3 uhci3: port 0x18c0-0x18df irq 16 at device 29.0 on pci0 uhci3: [ITHREAD] usbus3: on uhci3 uhci4: port 0x18e0-0x18ff irq 17 at device 29.1 on pci0 uhci4: [ITHREAD] usbus4: on uhci4 uhci5: port 0x1c00-0x1c1f irq 18 at device 29.2 on pci0 uhci5: [ITHREAD] usbus5: on uhci5 pci0: at device 29.7 (no driver attached) pcib4: at device 30.0 on pci0 pci13: on pcib4 isab0: at device 31.0 on pci0 isa0: on isab0 ahci0: port 0x1c48-0x1c4f,0x183c-0x183f,0x1c40-0x1c47,0x1838-0x183b,0x1c20-0x1c3f mem 0xf2826000-0xf28267ff irq 16 at device 31.2 on pci0 ahci0: [ITHREAD] ahci0: AHCI v1.20 with 4 3Gbps ports, Port Multiplier not supported ahcich0: at channel 0 on ahci0 ahcich0: [ITHREAD] ahcich1: at channel 1 on ahci0 ahcich1: [ITHREAD] pci0: at device 31.3 (no driver attached) acpi_tz0: on acpi0 acpi_tz1: on acpi0 atrtc0: port 0x70-0x71 irq 8 on acpi0 atkbdc0: port 0x60,0x64 irq 1 on acpi0 atkbd0: irq 1 on atkbdc0 kbd0 at atkbd0 atkbd0: [GIANT-LOCKED] atkbd0: [ITHREAD] psm0: irq 12 on atkbdc0 psm0: [GIANT-LOCKED] psm0: [ITHREAD] psm0: model Generic PS/2 mouse, device ID 0 battery0: on acpi0 acpi_acad0: on acpi0 orm0: at iomem 0xc0000-0xcffff,0xd0000-0xd0fff,0xd1000-0xd1fff,0xd2000-0xd2fff,0xde000-0xdf7ff,0xe0000-0xeffff on isa0 sc0: at flags 0x100 on isa0 sc0: VGA <16 virtual consoles, flags=0x300> vga0: at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 coretemp0: on cpu0 est0: on cpu0 p4tcc0: on cpu0 coretemp1: on cpu1 est1: on cpu1 p4tcc1: on cpu1 ZFS NOTICE: Prefetch is disabled by default if less than 4GB of RAM is present; to enable, add "vfs.zfs.prefetch_disable=0" to /boot/loader.conf. ZFS filesystem version 3 ZFS storage pool version 14 Timecounters tick every 1.000 msec hdac0: HDA Codec #0: Conexant CX20561 (Hermosa) pcm0: at cad 0 nid 1 on hdac0 pcm1: at cad 0 nid 1 on hdac0 The GEOM class ELI is already loaded. usbus0: 12Mbps Full Speed USB v1.0 usbus1: 12Mbps Full Speed USB v1.0 usbus2: 12Mbps Full Speed USB v1.0 usbus3: 12Mbps Full Speed USB v1.0 usbus4: 12Mbps Full Speed USB v1.0 usbus5: 12Mbps Full Speed USB v1.0 ugen0.1: at usbus0 uhub0: on usbus0 ugen1.1: at usbus1 uhub1: on usbus1 ugen2.1: at usbus2 uhub2: on usbus2 ugen3.1: at usbus3 uhub3: on usbus3 ugen4.1: at usbus4 uhub4: on usbus4 ugen5.1: at usbus5 uhub5: on usbus5 ada0 at ahcich0 bus 0 scbus0 target 0 lun 0 ada0: ATA-8 SATA 2.x device ada0: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) ada0: Command Queueing enabled ada0: 476940MB (976773168 512 byte sectors: 16H 63S/T 16383C) SMP: AP CPU #1 Launched! hwpmc: TSC/1/64/0x20 IAP/2/40/0x3ff IAF/3/40/0x61 Enter passphrase for ada0p3: uhub0: 2 ports with 2 removable, self powered uhub1: 2 ports with 2 removable, self powered uhub2: 2 ports with 2 removable, self powered uhub3: 2 ports with 2 removable, self powered uhub4: 2 ports with 2 removable, self powered uhub5: 2 ports with 2 removable, self powered ugen1.2: at usbus1 ugen2.2: at usbus2 ugen5.2: at usbus5 umass0: on usbus5 ugen1.3: at usbus1 (probe0:umass-sim0:0:0:0): TEST UNIT READY. CDB: 0 0 0 0 0 0 (probe0:umass-sim0:0:0:0): CAM status: SCSI Status Error (probe0:umass-sim0:0:0:0): SCSI status: Check Condition (probe0:umass-sim0:0:0:0): SCSI sense: UNIT ATTENTION asc:28,0 (Not ready to ready change, medium may have changed) da0 at umass-sim0 bus 0 scbus2 target 0 lun 0 da0: Removable Direct Access SCSI-2 device da0: 1.000MB/s transfers da0: 15326MB (31388672 512 byte sectors: 255H 63S/T 1953C) GEOM_ELI: Device ada0p3.eli created. GEOM_ELI: Encryption: AES-CBC 256 GEOM_ELI: Crypto: software Trying to mount root from zfs:zroot wlan0: Ethernet address: 00:21:5c:5f:7e:e3 drm0: on vgapci0 info: [drm] MSI enabled 1 message(s) vgapci0: child drm0 requested pci_enable_busmaster info: [drm] AGP at 0xd0000000 256MB info: [drm] Initialized i915 1.6.0 20080730 drm0: [ITHREAD] -----