From owner-freebsd-hackers@FreeBSD.ORG Sun Jun 28 15:43:55 2009 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 156261065675 for ; Sun, 28 Jun 2009 15:43:55 +0000 (UTC) (envelope-from kpielorz_lst@tdx.co.uk) Received: from lorca.tdx.co.uk (lorca.tdx.co.uk [62.13.128.6]) by mx1.freebsd.org (Postfix) with ESMTP id A6E918FC08 for ; Sun, 28 Jun 2009 15:43:54 +0000 (UTC) (envelope-from kpielorz_lst@tdx.co.uk) Received: from Octa64 (rainbow.tdx.co.uk [62.13.130.232] (may be forged)) (authenticated bits=0) by lorca.tdx.co.uk (8.14.0/8.14.0/Kp) with ESMTP id n5SFUNJ1090360 for ; Sun, 28 Jun 2009 16:30:23 +0100 (BST) Date: Sun, 28 Jun 2009 16:30:24 +0100 From: Karl Pielorz To: freebsd-hackers@freebsd.org Message-ID: <20E145B15D43DBD9A741F1DB@Octa64> X-Mailer: Mulberry/4.0.8 (Win32) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline Subject: ata 'Flush Cache' errors, on non-failing disk? X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 28 Jun 2009 15:43:55 -0000 Hi, I've recently updated my amd64 system from 6.4 to 7.2-Stable - this works fine, but I've started picking up errors on the console: ad36: TIMEOUT - FLUSHCACHE retrying (1 retry left) The drive (an WD5000AAKS) appears healthy - SMART reports no errors, or problems - and the timeouts only appear when that drive is 'being hammered' by write requests (e.g. during ZFS re-silvering to it) The Western-Digi drive doctor CD/ISO runs a full test, and reports no problems (in that machine, with that drive). I did find a number of posts, such as: Which point to the default timeout for the ATA flushcache command being 5 seconds, when perhaps it should be 30... But the code in 7.2-STABLE bears no resemblance to the code that the patch is for - so I'm guessing things have moved on since then... Is there anywhere I might apply a similar patch to up the timeout to see if that cures the problem? The only mentions of ATA_FLUSHCACHE appears to be calls to "ata_controlcmd( xxxx, ATA_FLUSHCACHE, 0, 0, 0);" - "ata_controlcmd" in turn seems to set a request timeout of '1' - but I can't tell if that's a timeout of 1 second, 1 tick, or 1 what - or if it's a timeout for adding the command to the queue, or actually a timeout for executing that command... Is upping that request timeout conditionally for cache flushes likely to have the effect I'm looking for? -Kp