From owner-freebsd-stable@FreeBSD.ORG Wed Jul 16 01:10:26 2008 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 98F92106566C for ; Wed, 16 Jul 2008 01:10:26 +0000 (UTC) (envelope-from andrew@modulus.org) Received: from email.octopus.com.au (host-122-100-2-232.octopus.com.au [122.100.2.232]) by mx1.freebsd.org (Postfix) with ESMTP id 510198FC08 for ; Wed, 16 Jul 2008 01:10:25 +0000 (UTC) (envelope-from andrew@modulus.org) Received: by email.octopus.com.au (Postfix, from userid 1002) id 399CD17369; Wed, 16 Jul 2008 11:10:24 +1000 (EST) X-Spam-Checker-Version: SpamAssassin 3.2.3 (2007-08-08) on email.octopus.com.au X-Spam-Level: X-Spam-Status: No, score=-1.4 required=10.0 tests=ALL_TRUSTED autolearn=failed version=3.2.3 Received: from [10.1.50.60] (138.21.96.58.exetel.com.au [58.96.21.138]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) (Authenticated sender: admin@email.octopus.com.au) by email.octopus.com.au (Postfix) with ESMTP id 54FD11736B for ; Wed, 16 Jul 2008 11:10:19 +1000 (EST) Message-ID: <487D4A2A.9010508@modulus.org> Date: Wed, 16 Jul 2008 11:08:58 +1000 From: Andrew Snow User-Agent: Thunderbird 2.0.0.14 (X11/20080523) MIME-Version: 1.0 To: freebsd-stable@freebsd.org References: <487CCD46.8080506@ibctech.ca> <200807151711.m6FHBgVO007481@apollo.backplane.com> In-Reply-To: <200807151711.m6FHBgVO007481@apollo.backplane.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: taskqueue timeout X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 16 Jul 2008 01:10:26 -0000 Matthew Dillon wrote: > Try that first. If it helps then it is a known issue. Basically > a combination of the on-disk write cache and possible ECC corrections, > remappings, or excessive remapped sectors can cause the drive to take > much longer then normal to complete a request. The default 5-second > timeout is insufficient. From Western Digital's line of "enterprise" drives: "RAID-specific time-limited error recovery (TLER) - Pioneered by WD, this feature prevents drive fallout caused by the extended hard drive error-recovery processes common to desktop drives." Western Digital's information sheet on TLER states that they found most RAID controllers will wait 8 seconds for a disk to respond before dropping it from the RAID set. Consequently they changed their "enterprise" drives to try reading a bad sector for only 7 seconds before returning an error. Therefore I think the FreeBSD timeout should also be set to 8 seconds instead of 5 seconds. Desktop-targetted drives will not respond for over 10 seconds, up to minutes, so its not worth setting the FreeBSD timeout any higher. More info: http://www.wdc.com/en/library/sata/2579-001098.pdf http://en.wikipedia.org/wiki/Time-Limited_Error_Recovery - Andrew