From owner-freebsd-stable@FreeBSD.ORG Fri Nov 7 07:17:55 2008 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 219341065673 for ; Fri, 7 Nov 2008 07:17:55 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from QMTA08.westchester.pa.mail.comcast.net (qmta08.westchester.pa.mail.comcast.net [76.96.62.80]) by mx1.freebsd.org (Postfix) with ESMTP id 88E188FC1C for ; Fri, 7 Nov 2008 07:17:54 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from OMTA11.westchester.pa.mail.comcast.net ([76.96.62.36]) by QMTA08.westchester.pa.mail.comcast.net with comcast id c7Ao1a00F0mv7h0587HRze; Fri, 07 Nov 2008 07:17:25 +0000 Received: from koitsu.dyndns.org ([69.181.141.110]) by OMTA11.westchester.pa.mail.comcast.net with comcast id c7Hs1a00D2P6wsM3X7HtGE; Fri, 07 Nov 2008 07:17:53 +0000 X-Authority-Analysis: v=1.0 c=1 a=BlE0Q1ap2ugA:10 a=v_B5dTwbAAAA:8 a=QycZ5dHgAAAA:8 a=RXToGQUT5TuPp870QasA:9 a=BEO5HDKg8hp7B0vzeWEA:7 a=nriO4_Ig86q14uMBWibS2lZNNJ0A:4 a=EoioJ0NPDVgA:10 a=oltX7JrCFroA:10 Received: by icarus.home.lan (Postfix, from userid 1000) id 29DC25C34; Thu, 6 Nov 2008 23:17:52 -0800 (PST) Date: Thu, 6 Nov 2008 23:17:52 -0800 From: Jeremy Chadwick To: freebsd-hardware@freebsd.org Message-ID: <20081107071752.GA5842@icarus.home.lan> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.18 (2008-05-17) Cc: votdev@gmx.de, freebsd-stable@freebsd.org, sos@freebsd.org Subject: Western Digital hard disks and ATA timeouts X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 07 Nov 2008 07:17:55 -0000 A user and myself on a broadband forum were discussing the possibility of diminishing quality of hard disks (particularly 1TB models) in recent days (specifically October). The user continually referenced something called "deep recovery cycle", backed with claims from Newegg reviewers (who often know very little or nothing at all -- grain of salt concept applies), which make Western Digital's desktop hard disks unfit for RAID or server usage. I claimed shenanigans until the user pointed me to the following document on Western Digital's site: http://wdc.custhelp.com/cgi-bin/wdc.cfg/php/enduser/std_adp.php?p_faqid=1397 The feature described apparently causes the hard disk to enter some form of aggressive sector scan/sector remapping loop, which can take up to 2 minutes to complete, during which time, the hard disk is basically unusable. (I imagine ATA commands sent to the disk will simply time out or stall indefinitely, which would result in all sorts of timeout errors). Note that Western Digital's "RAID edition" drives claim to take up to 7 seconds to reallocate sectors, using something they call TLER, which force-limits the amount of time the drive can spend reallocating. TLER cannot be disabled: http://wdc.custhelp.com/cgi-bin/wdc.cfg/php/enduser/std_adp.php?p_faqid=1478 What baffles me is why Western Digital thinks that 2 minutes of the drive being unusable is acceptable "but only for desktops". Any FreeBSD desktop will start reporting ATA timeouts if the drive wedges for more than 5 seconds -- two minutes would just spew errors and hard-lock the system. What also baffles me is why Western Digital thinks the term "RAID" always means a hardware RAID controller is involved as a buffer between the OS and the disks. Bzzzt, bad assumption on their part. So why do we care? As stated, FreeBSD's ATA command timeout is hard-set to 5 seconds, and is not adjustable without editing the ATA code yourself and increasing the value. The FreeNAS folks have made patches available to turn the timeout value into a sysctl. Soren and/or others, please increase this timeout value. Five seconds has now been deemed too aggressive a default. And please consider migrating the timeout value into a sysctl. P.S. -- I do not consider any of this reason to avoid Western Digital drives. But I would warn users to be a little more cautious before reporting ATA timeouts when newer (circia 2007 and later) WD drives are in use. -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB |