From owner-freebsd-stable@FreeBSD.ORG Wed May 19 13:48:54 2010 Return-Path: Delivered-To: stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E366B106567D for ; Wed, 19 May 2010 13:48:54 +0000 (UTC) (envelope-from dgerow@afflictions.org) Received: from ironport2-out.pppoe.ca (ironport2-out.teksavvy.com [206.248.154.181]) by mx1.freebsd.org (Postfix) with ESMTP id 9D9338FC16 for ; Wed, 19 May 2010 13:48:54 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AvsEAGaJ80vO+L5V/2dsb2JhbACeFHK9UYUQBIwv X-IronPort-AV: E=Sophos;i="4.53,263,1272859200"; d="scan'208";a="64558300" Received: from 206-248-190-85.dsl.teksavvy.com (HELO shell.afflictions.org) ([206.248.190.85]) by ironport2-out.pppoe.ca with ESMTP/TLS/DHE-RSA-AES256-SHA; 19 May 2010 09:48:52 -0400 Received: from shell.afflictions.org (shell.afflictions.org [172.20.143.66]) by shell.afflictions.org (8.14.4/8.14.4) with ESMTP id o4JDmpJ3058582; Wed, 19 May 2010 09:48:51 -0400 (EDT) (envelope-from dgerow@afflictions.org) Received: (from dwg@localhost) by shell.afflictions.org (8.14.4/8.14.4/Submit) id o4JDmn8C058526; Wed, 19 May 2010 09:48:49 -0400 (EDT) (envelope-from dgerow@afflictions.org) X-Authentication-Warning: shell.afflictions.org: dwg set sender to dgerow@afflictions.org using -f Date: Wed, 19 May 2010 09:48:49 -0400 From: Damian Gerow To: Jeremy Chadwick Message-ID: <20100519134847.GE97592@plebeian.afflictions.org> References: <20100519021402.GI92949@plebeian.afflictions.org> <20100519091103.GA72058@icarus.home.lan> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20100519091103.GA72058@icarus.home.lan> User-Agent: Mutt/1.5.20 (2009-06-14) Cc: stable@freebsd.org Subject: Re: AHCI timeouts on S3 resume X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 19 May 2010 13:48:55 -0000 Jeremy Chadwick wrote: : On Tue, May 18, 2010 at 10:14:03PM -0400, Damian Gerow wrote: : > A few months back, I swapped out my dying hard drive for a WD Scorpio Blue. : > Cheap, seemed reliable, and it was the only drive the local shop had in : > stock. However, it seems that AHCI doesn't like this device, and is having : > troubles during an S3 resume. It appears as though I'm experiencing two : > types of timeouts when resuming: recoverable, and non-recoverable. : > : > My question is: do I have a bad HDD, or is AHCI just not playing nicely? : : Your hard disk looks generally OK; it isn't going bad. The one thing I : can't tell or not is whether the disk is actually spinning back up on : resume; you'd have to literally listen for it, or look at SMART : Attribute #4 before and after a suspend/resume. I'll discuss analysis : of SMART statistics further down. The disk spins back up immediately on resume. I have no recollection of it /not/ doing so (it's definitely noticable), and I just confirmed it with a few S3 cycles. I also checked the WD spec sheet, and the average drive ready time is 4s. : I will point out, however, that you've set this value in loader.conf: : : > hw.pci.do_power_nodriver="2" : : I've read the sysctl -d description for it, but I am not familiar with : sleep/power states so I don't know the implications. I worry that this : value may be causing problems with your ICH9 controller. If you could : comment this out and re-try suspend/resume to see if AHCI times out, you : might determine if it's responsible for the problem. That *should* just remove power from devices without a driver. But I removed it, rebooted, went through two S3 cycles, and I'm still seeing the timeouts. (Recoverable; of the two cycles I did, I didn't see a non-recoverable timeout.) : > The HDD is a WD Scorpio blue, model WD5000BEVT-22A0RT0, and isn't exactly : > the fastest drive on the planet. SMART seems to be relatively clean, with : > some mild questions surrounding attributes 191, 9/193, and 194: : > : > ----- : > ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE : > 3 Spin_Up_Time 0x0027 186 185 021 Pre-fail Always - 1675 : > 4 Start_Stop_Count 0x0032 055 055 000 Old_age Always - 45174 : > 9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 723 : > 191 G-Sense_Error_Rate 0x0032 072 072 000 Old_age Always - 28 : > 193 Load_Cycle_Count 0x0032 162 162 000 Old_age Always - 115712 : > 194 Temperature_Celsius 0x0022 112 106 000 Old_age Always - 35 : > ----- : Attribute #9 indicates the total amount of time the hard disk has been : powered on (read: not asleep) during its lifetime. I can't tell you : whether or not this value is correct; only you would be able to : determine that, given your usage patterns. I *have* seen desktop drives : which have reported this value incorrectly (meaning, servers I know have : been on for thousands of hours that show "4" for this RAW_VALUE; : probably a firmware bug). I combined attributes 9 and 193 together because it seems like a load cycle count of ~116k with 723 power-on hours is a bit high. I believe laptop HDDs are designed to handle a higher rate of load cycle counts, but I've never really paid attention to them -- save on my previously dying drive, which had broken 1M, and started screeching when doing some seeks. But yes, that 723 power-on hours seems accurate. : Attribute #193 indicates the number of times the actuator arm (thus : heads) has been parked or come out of being parked. There is a known : problem with some models of WD "Green Power" (GP) drives where the drive : spends an excessive amount of time parking, and this counter increases : rapidly. One FreeBSD user who reported this problem to Western Digital : received a replacement firmware which addressed the problem. The WD : Scorpio Blue drives (or some of them) may have this same problem -- : HOWEVER, this model of hard disk (2.5" FF) is *specifically* intended : for laptops and low-power environments, so the behaviour seen in this : case could be 100% normal. WD would hopefully know. I'm fairly certain that WD only includes that IntelliPark feature on the GP drives. At least, WD doesn't indicate that there's any of their fancy new GP-related tricks on the Scorpio Blue line. I'd actually recently dropped my vfs.zfs.txg.timeout to 5, as I was experiencing some pretty horrible stalls when it was left at default (30, I believe). I was curious to see if this decreased the rate of my Load_Cycle_Count, but I'm already at ~122k. Given that this drive is rated to handle 600k, it makes me wonder if there isn't something like IntelliPark on this drive. : Hope this helps. Aye. It confirms that SMART clears my drive -- thanks!