From owner-freebsd-stable@FreeBSD.ORG Sun Nov 9 23:49:37 2008 Return-Path: Delivered-To: freebsd-stable@FreeBSD.ORG Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id F3A361065689 for ; Sun, 9 Nov 2008 23:49:36 +0000 (UTC) (envelope-from joe@zircon.seattle.wa.us) Received: from dsl254-019-221.sea1.dsl.speakeasy.net (dsl254-019-221.sea1.dsl.speakeasy.net [216.254.19.221]) by mx1.freebsd.org (Postfix) with ESMTP id 7AF568FC16 for ; Sun, 9 Nov 2008 23:49:36 +0000 (UTC) (envelope-from joe@zircon.seattle.wa.us) Received: (qmail 2801 invoked from network); 9 Nov 2008 23:29:08 -0000 Received: from localhost (HELO zircon.zircon.seattle.wa.us) (127.0.0.1) by localhost with ESMTP; 9 Nov 2008 23:29:08 -0000 Message-ID: <49177244.9060802@zircon.seattle.wa.us> Date: Sun, 09 Nov 2008 15:29:08 -0800 From: Joe Kelsey User-Agent: Thunderbird 2.0.0.17 (X11/20081017) MIME-Version: 1.0 To: =?ISO-8859-1?Q?S=F8ren_Schmidt?= References: <20081107071752.GA5842@icarus.home.lan> <77C223A7-C5FC-45DE-BF1A-3BC7982FA582@FreeBSD.ORG> In-Reply-To: <77C223A7-C5FC-45DE-BF1A-3BC7982FA582@FreeBSD.ORG> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit X-Mailman-Approved-At: Mon, 10 Nov 2008 00:05:41 +0000 Cc: Jeremy Chadwick , freebsd-stable@FreeBSD.ORG, votdev@gmx.de, Peter Wemm , freebsd-hardware@FreeBSD.ORG Subject: Re: Western Digital hard disks and ATA timeouts X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 09 Nov 2008 23:49:37 -0000 Søren Schmidt wrote: > On 7Nov, 2008, at 20:12 , Peter Wemm wrote: > >> On Thu, Nov 6, 2008 at 11:17 PM, Jeremy Chadwick >> wrote: >> [..] >>> As stated, FreeBSD's ATA command timeout is hard-set to 5 seconds, and >>> is not adjustable without editing the ATA code yourself and increasing >>> the value. The FreeNAS folks have made patches available to turn the >>> timeout value into a sysctl. >>> >>> Soren and/or others, please increase this timeout value. Five seconds >>> has now been deemed too aggressive a default. And please consider >>> migrating the timeout value into a sysctl. >> >> The 5 second timeout has been a problem for quite a while actually. >> I've had a number of instances where I've had to increase it to 20 or >> 30 seconds when recovering from marginal drives. The longest >> "successful" recovery attempt I've seen was 26 seconds, I believe on a >> Maxtor drive a few years ago. ("successful" == the drive spent 26 >> seconds but eventually successfully read the sector). Even the IBM >> death star drives could take much longer than 5 seconds to do a >> recovery 5 years ago. 5 seconds has never been a good default. >> >> I think the timeout should be increased to at least 30 seconds. My >> windows box has a timeout that goes for several minutes. >> >> If there is concern about FreeBSD appearing to hang, I could imagine >> that a console warning message could be printed after 5 seconds. But >> just say "drive has not yet responded". But give it more time. >> >> In this day and age we're generally not playing games with udma33 vs >> 66, notched cables, poor CRC support etc. SATA seems to have >> eliminated all that. Hmm, it might make sense to increase the timeout >> on SATA connections to 2 or 3 minutes by default. > > Actually I do have a patch around that logs the timeout on the console > after the normal timeout (5secs), then just goes on to wait for double > the timeout and log again etc etc, final timeout was IIRC 60 secs but > could be anything. I have a disk which I am finally getting rid of that produces READ_DMA and WRITE_DMA errors at a pretty high rate. I did enable the extra ATA error reporting and it doesn't seem to indicate any sort of actual errors, just extra long itmeouts. At one time, I did change the system to extend the timeout, but I did not see any real improvement at 30 seconds. I suspect that an even more extended timeout would be necessary to solve the problem. I am removing the disk this week. Does anyone want a disk that produces DMA timeouts at a regular rate? Would it help actually solve this problem? Please let me know if you want such a beast and I will ship it to you. /Joe