Date: Tue, 18 May 2010 22:14:03 -0400 From: Damian Gerow <dgerow@afflictions.org> To: stable@freebsd.org Subject: AHCI timeouts on S3 resume Message-ID: <20100519021402.GI92949@plebeian.afflictions.org>
next in thread | raw e-mail | index | archive | help
A few months back, I swapped out my dying hard drive for a WD Scorpio Blue. Cheap, seemed reliable, and it was the only drive the local shop had in stock. However, it seems that AHCI doesn't like this device, and is having troubles during an S3 resume. It appears as though I'm experiencing two types of timeouts when resuming: recoverable, and non-recoverable. My question is: do I have a bad HDD, or is AHCI just not playing nicely? First off, if I try to resume when in graphics mode, it pretty much always fails (screen backlight comes on, but nothing else, system is non-responsive to all keypresses save ctrl+alt+del and a hard power down). I'm currently convinced this is because it's doing something that's triggering a non-recoverable timeout. However, when doing a suspend/resume from a text console, I can usually hit a recoverable timeout. Which looks like this (full suspend-resume cycle): ----- drm0: <Mobile Intel\M-B\M-. GM45 Express Chipset> on vgapci0 info: [drm] MSI enabled 1 message(s) vgapci0: child drm0 requested pci_enable_busmaster info: [drm] AGP at 0xd0000000 256MB info: [drm] Initialized i915 1.6.0 20080730 drm0: [ITHREAD] <switch from graphics to text; enter S3 sleep> wlan0: link state changed to DOWN <resume from S3 sleep> ugen1.2: <vendor 0x08ff> at usbus1 (disconnected) ugen2.2: <Chicony Electronics Co., Ltd.> at usbus2 (disconnected) ugen1.2: <vendor 0x08ff> at usbus1 ugen1.3: <Lenovo Computer Corp> at usbus1 (disconnected) ugen2.2: <Chicony Electronics Co., Ltd.> at usbus2 ugen1.3: <Lenovo Computer Corp> at usbus1 wlan0: link state changed to UP ahcich0: Timeout on slot 24 ahcich0: is 00400040 cs 0f000000 ss 0f000000 rs 0ff00000 tfd 80 serr 04050002 ----- Recovery from these timeouts seems to take 30s. However, periodically when resuming into text mode (and I assume what is almost always when resuming into graphics mode), I trigger a non-recoverable timeout: ----- ahcich0: Timeout on slot 13 ahcich0: is 00400000 cs 0000e000 ss 0000e000 rs 0000e00 tfd 80 serr 04050000 ahcich0: device is not ready (timeout 15000ms) tfd = 00000080 ahcich0: Timeout on slot 15 ahcich0: is 00400000 cs 0000e000 ss 0000e000 rs 0000800 tfd 00 serr 00000000 ahcich0: device is not ready (timeout 15000ms) tfd = 00000000 (ada0:ahcich0:0:0:0): lost device ahcich0: Poll timeout on slot 15 ahcich0: is 00000000 cs 00008000 ss 00000000 rs 0000800 tfd 00 serr 00000000 GEOM_ELI: g_eli_read_done() failed ada0p3.eli[READ(offset=13067157504, length=65536)] GEOM_ELI: g_eli_read_done() failed ada0p3.eli[READ(offset=262144, length=8192)] GEOM_ELI: g_eli_read_done() failed ada0p3.eli[READ(offset=482927181824, length=8192)] GEOM_ELI: g_eli_read_done() failed ada0p3.eli[READ(offset=482927443968, length=8192)] ahcich0: Timeout on slot 15 ahcich0: is 00000000 cs 00038000 ss 00038000 rs 0003800 tfd 00 serr 00000000 ahcich0: devfice is not ready (timeout 15000ms) tfd = 00000080 ahcich0: Poll timeout on slot 17 ahcich0: is 00000000 cs 00020000 ss 00000000 rs 0002000 tfd 00 serr 00000000 ahcich0: Timeout on slot 17 ahcich0: is 00000000 cs 000e0000 ss 000e0000 rs 000e000 tfd 00 serr 00000000 ahcich0: devfice is not ready (timeout 15000ms) tfd = 00000080 ----- This continues, with the slot increasing by two every time. Each 'Timeout' and 'Poll timeout' takes ~15s to trigger, and I'd let this run to slot 23 before I powered down. The HDD is a WD Scorpio blue, model WD5000BEVT-22A0RT0, and isn't exactly the fastest drive on the planet. SMART seems to be relatively clean, with some mild questions surrounding attributes 191, 9/193, and 194: ----- ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0 3 Spin_Up_Time 0x0027 186 185 021 Pre-fail Always - 1675 4 Start_Stop_Count 0x0032 055 055 000 Old_age Always - 45174 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0 9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 723 10 Spin_Retry_Count 0x0032 100 100 051 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 57 191 G-Sense_Error_Rate 0x0032 072 072 000 Old_age Always - 28 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 48 193 Load_Cycle_Count 0x0032 162 162 000 Old_age Always - 115712 194 Temperature_Celsius 0x0022 112 106 000 Old_age Always - 35 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0030 100 253 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0008 100 253 051 Old_age Offline - 0 ----- So, am I looking at a bum HDD, or is AHCI just not playing nicely? Hardware and traditional troubleshooting info: Laptop is a Lenovo X200. Disk controller: ----- ahci0@pci0:0:31:2: class=0x010601 card=0x20f817aa chip=0x29298086 rev=0x03 hdr=0x00 vendor = 'Intel Corporation' device = '82801IB/IR/IH (ICH9 Family) Mobile SATA AHCI Controller' class = mass storage subclass = SATA ----- loader.conf: ----- ahci_load="YES" geom_eli_load="YES" zfs_load="YES" vfs.root.mountfrom="zfs:zroot" vfs.zfs.txg.timeout="5" iwnfw_load="YES" if_iwn_load="YES" snd_hda_load="YES" sem_load="YES" smb_load="YES" coretemp_load="YES" usb_load="YES" uhid_load="YES" uhci_load="YES" ohci_load="YES" umass_load="YES" linux_enable="YES" kern.maxfiles="35000" cpufreq_load="YES" hint.p4tcc.0.disable="1" hint.acpi_throttle.0.disable="1" hw.pci.do_power_nodriver="2" ----- dmesg: ----- Copyright (c) 1992-2010 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 8.1-PRERELEASE #7 r208254: Tue May 18 08:44:31 EDT 2010 root@plebeian.afflictions.org:/usr/obj/usr/src/sys/PLEBEIAN amd64 module_register: module g_eli already exists! Module g_eli failed to register: 17 Timecounter "i8254" frequency 1193182 Hz quality 0 CPU: Intel(R) Core(TM)2 Duo CPU P8600 @ 2.40GHz (2394.02-MHz K8-class CPU) Origin = "GenuineIntel" Id = 0x10676 Family = 6 Model = 17 Stepping = 6 Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE> Features2=0x8e3fd<SSE3,DTES64,MON,DS_CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,SSE4.1> AMD Features=0x20100800<SYSCALL,NX,LM> AMD Features2=0x1<LAHF> TSC: P-state invariant real memory = 4294967296 (4096 MB) avail memory = 3991834624 (3806 MB) ACPI APIC Table: <LENOVO TP-6D > FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs FreeBSD/SMP: 1 package(s) x 2 core(s) cpu0 (BSP): APIC ID: 0 cpu1 (AP): APIC ID: 1 ioapic0: Changing APIC ID to 1 ioapic0 <Version 2.0> irqs 0-23 on motherboard kbd1 at kbdmux0 cryptosoft0: <software crypto> on motherboard acpi0: <LENOVO TP-6D> on motherboard CPU0: local APIC error 0x40 acpi0: [ITHREAD] acpi_ec0: <Embedded Controller: GPE 0x11, ECDT> port 0x62,0x66 on acpi0 acpi0: Power Button (fixed) acpi0: reservation of 0, a0000 (3) failed acpi0: reservation of 100000, bff00000 (3) failed Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000 acpi_timer0: <24-bit timer at 3.579545MHz> port 0x1008-0x100b on acpi0 cpu0: <ACPI CPU> on acpi0 cpu1: <ACPI CPU> on acpi0 acpi_hpet0: <High Precision Event Timer> iomem 0xfed00000-0xfed003ff on acpi0 Timecounter "HPET" frequency 14318180 Hz quality 900 acpi_lid0: <Control Method Lid Switch> on acpi0 acpi_button0: <Sleep Button> on acpi0 pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0 pci0: <ACPI PCI bus> on pcib0 vgapci0: <VGA-compatible display> port 0x1800-0x1807 mem 0xf2000000-0xf23fffff,0xd0000000-0xdfffffff irq 16 at device 2.0 on pci0 agp0: <Intel GM45 SVGA controller> on vgapci0 agp0: detected 32764k stolen memory agp0: aperture size is 256M vgapci1: <VGA-compatible display> mem 0xf2400000-0xf24fffff at device 2.1 on pci0 pci0: <simple comms> at device 3.0 (no driver attached) pci0: <network, ethernet> at device 25.0 (no driver attached) uhci0: <Intel 82801I (ICH9) USB controller> port 0x1860-0x187f irq 20 at device 26.0 on pci0 uhci0: [ITHREAD] usbus0: <Intel 82801I (ICH9) USB controller> on uhci0 uhci1: <Intel 82801I (ICH9) USB controller> port 0x1880-0x189f irq 21 at device 26.1 on pci0 uhci1: [ITHREAD] usbus1: <Intel 82801I (ICH9) USB controller> on uhci1 uhci2: <Intel 82801I (ICH9) USB controller> port 0x18a0-0x18bf irq 22 at device 26.2 on pci0 uhci2: [ITHREAD] usbus2: <Intel 82801I (ICH9) USB controller> on uhci2 pci0: <serial bus, USB> at device 26.7 (no driver attached) hdac0: <Intel 82801I High Definition Audio Controller> mem 0xf2620000-0xf2623fff irq 17 at device 27.0 on pci0 hdac0: HDA Driver Revision: 20100226_0142 hdac0: [ITHREAD] pcib1: <ACPI PCI-PCI bridge> irq 20 at device 28.0 on pci0 pci2: <ACPI PCI bus> on pcib1 pcib2: <ACPI PCI-PCI bridge> irq 21 at device 28.1 on pci0 pci3: <ACPI PCI bus> on pcib2 iwn0: <Intel(R) PRO/Wireless 4965BGN> mem 0xf2500000-0xf2501fff irq 17 at device 0.0 on pci3 iwn0: MIMO 2T3R, MoW1, address 00:21:5c:5f:7e:e3 iwn0: [ITHREAD] iwn0: 11a rates: 6Mbps 9Mbps 12Mbps 18Mbps 24Mbps 36Mbps 48Mbps 54Mbps iwn0: 11b rates: 1Mbps 2Mbps 5.5Mbps 11Mbps iwn0: 11g rates: 1Mbps 2Mbps 5.5Mbps 11Mbps 6Mbps 9Mbps 12Mbps 18Mbps 24Mbps 36Mbps 48Mbps 54Mbps pcib3: <ACPI PCI-PCI bridge> irq 23 at device 28.3 on pci0 pci5: <ACPI PCI bus> on pcib3 uhci3: <Intel 82801I (ICH9) USB controller> port 0x18c0-0x18df irq 16 at device 29.0 on pci0 uhci3: [ITHREAD] usbus3: <Intel 82801I (ICH9) USB controller> on uhci3 uhci4: <Intel 82801I (ICH9) USB controller> port 0x18e0-0x18ff irq 17 at device 29.1 on pci0 uhci4: [ITHREAD] usbus4: <Intel 82801I (ICH9) USB controller> on uhci4 uhci5: <Intel 82801I (ICH9) USB controller> port 0x1c00-0x1c1f irq 18 at device 29.2 on pci0 uhci5: [ITHREAD] usbus5: <Intel 82801I (ICH9) USB controller> on uhci5 pci0: <serial bus, USB> at device 29.7 (no driver attached) pcib4: <PCI-PCI bridge> at device 30.0 on pci0 pci13: <PCI bus> on pcib4 isab0: <PCI-ISA bridge> at device 31.0 on pci0 isa0: <ISA bus> on isab0 ahci0: <Intel ICH9M AHCI SATA controller> port 0x1c48-0x1c4f,0x183c-0x183f,0x1c40-0x1c47,0x1838-0x183b,0x1c20-0x1c3f mem 0xf2826000-0xf28267ff irq 16 at device 31.2 on pci0 ahci0: [ITHREAD] ahci0: AHCI v1.20 with 4 3Gbps ports, Port Multiplier not supported ahcich0: <AHCI channel> at channel 0 on ahci0 ahcich0: [ITHREAD] ahcich1: <AHCI channel> at channel 1 on ahci0 ahcich1: [ITHREAD] pci0: <serial bus, SMBus> at device 31.3 (no driver attached) acpi_tz0: <Thermal Zone> on acpi0 acpi_tz1: <Thermal Zone> on acpi0 atrtc0: <AT realtime clock> port 0x70-0x71 irq 8 on acpi0 atkbdc0: <Keyboard controller (i8042)> port 0x60,0x64 irq 1 on acpi0 atkbd0: <AT Keyboard> irq 1 on atkbdc0 kbd0 at atkbd0 atkbd0: [GIANT-LOCKED] atkbd0: [ITHREAD] psm0: <PS/2 Mouse> irq 12 on atkbdc0 psm0: [GIANT-LOCKED] psm0: [ITHREAD] psm0: model Generic PS/2 mouse, device ID 0 battery0: <ACPI Control Method Battery> on acpi0 acpi_acad0: <AC Adapter> on acpi0 orm0: <ISA Option ROMs> at iomem 0xc0000-0xcffff,0xd0000-0xd0fff,0xd1000-0xd1fff,0xd2000-0xd2fff,0xde000-0xdf7ff,0xe0000-0xeffff on isa0 sc0: <System console> at flags 0x100 on isa0 sc0: VGA <16 virtual consoles, flags=0x300> vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 coretemp0: <CPU On-Die Thermal Sensors> on cpu0 est0: <Enhanced SpeedStep Frequency Control> on cpu0 p4tcc0: <CPU Frequency Thermal Control> on cpu0 coretemp1: <CPU On-Die Thermal Sensors> on cpu1 est1: <Enhanced SpeedStep Frequency Control> on cpu1 p4tcc1: <CPU Frequency Thermal Control> on cpu1 ZFS NOTICE: Prefetch is disabled by default if less than 4GB of RAM is present; to enable, add "vfs.zfs.prefetch_disable=0" to /boot/loader.conf. ZFS filesystem version 3 ZFS storage pool version 14 Timecounters tick every 1.000 msec hdac0: HDA Codec #0: Conexant CX20561 (Hermosa) pcm0: <HDA Conexant CX20561 (Hermosa) PCM #0 Analog> at cad 0 nid 1 on hdac0 pcm1: <HDA Conexant CX20561 (Hermosa) PCM #1 Analog> at cad 0 nid 1 on hdac0 The GEOM class ELI is already loaded. usbus0: 12Mbps Full Speed USB v1.0 usbus1: 12Mbps Full Speed USB v1.0 usbus2: 12Mbps Full Speed USB v1.0 usbus3: 12Mbps Full Speed USB v1.0 usbus4: 12Mbps Full Speed USB v1.0 usbus5: 12Mbps Full Speed USB v1.0 ugen0.1: <Intel> at usbus0 uhub0: <Intel UHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus0 ugen1.1: <Intel> at usbus1 uhub1: <Intel UHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus1 ugen2.1: <Intel> at usbus2 uhub2: <Intel UHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus2 ugen3.1: <Intel> at usbus3 uhub3: <Intel UHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus3 ugen4.1: <Intel> at usbus4 uhub4: <Intel UHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus4 ugen5.1: <Intel> at usbus5 uhub5: <Intel UHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus5 ada0 at ahcich0 bus 0 scbus0 target 0 lun 0 ada0: <WDC WD5000BEVT-22A0RT0 01.01A01> ATA-8 SATA 2.x device ada0: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) ada0: Command Queueing enabled ada0: 476940MB (976773168 512 byte sectors: 16H 63S/T 16383C) SMP: AP CPU #1 Launched! hwpmc: TSC/1/64/0x20<REA> IAP/2/40/0x3ff<INT,USR,SYS,EDG,THR,REA,WRI,INV,QUA,PRC> IAF/3/40/0x61<INT,REA,WRI> Enter passphrase for ada0p3: uhub0: 2 ports with 2 removable, self powered uhub1: 2 ports with 2 removable, self powered uhub2: 2 ports with 2 removable, self powered uhub3: 2 ports with 2 removable, self powered uhub4: 2 ports with 2 removable, self powered uhub5: 2 ports with 2 removable, self powered ugen1.2: <vendor 0x08ff> at usbus1 ugen2.2: <Chicony Electronics Co., Ltd.> at usbus2 ugen5.2: <RICOH> at usbus5 umass0: <RICOH USB2.0-FLASH Media, class 0/0, rev 2.00/0.01, addr 2> on usbus5 ugen1.3: <Lenovo Computer Corp> at usbus1 (probe0:umass-sim0:0:0:0): TEST UNIT READY. CDB: 0 0 0 0 0 0 (probe0:umass-sim0:0:0:0): CAM status: SCSI Status Error (probe0:umass-sim0:0:0:0): SCSI status: Check Condition (probe0:umass-sim0:0:0:0): SCSI sense: UNIT ATTENTION asc:28,0 (Not ready to ready change, medium may have changed) da0 at umass-sim0 bus 0 scbus2 target 0 lun 0 da0: <RICOH R5U880FlashMedia 0000> Removable Direct Access SCSI-2 device da0: 1.000MB/s transfers da0: 15326MB (31388672 512 byte sectors: 255H 63S/T 1953C) GEOM_ELI: Device ada0p3.eli created. GEOM_ELI: Encryption: AES-CBC 256 GEOM_ELI: Crypto: software Trying to mount root from zfs:zroot wlan0: Ethernet address: 00:21:5c:5f:7e:e3 drm0: <Mobile Intel\M-B\M-. GM45 Express Chipset> on vgapci0 info: [drm] MSI enabled 1 message(s) vgapci0: child drm0 requested pci_enable_busmaster info: [drm] AGP at 0xd0000000 256MB info: [drm] Initialized i915 1.6.0 20080730 drm0: [ITHREAD] -----
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20100519021402.GI92949>