From owner-freebsd-bugs@FreeBSD.ORG Tue Jun 30 08:20:01 2009 Return-Path: Delivered-To: freebsd-bugs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C041C1065672 for ; Tue, 30 Jun 2009 08:20:01 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 98CB68FC14 for ; Tue, 30 Jun 2009 08:20:01 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (gnats@localhost [127.0.0.1]) by freefall.freebsd.org (8.14.3/8.14.3) with ESMTP id n5U8K1eO061916 for ; Tue, 30 Jun 2009 08:20:01 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.3/8.14.3/Submit) id n5U8K1Ue061915; Tue, 30 Jun 2009 08:20:01 GMT (envelope-from gnats) Resent-Date: Tue, 30 Jun 2009 08:20:01 GMT Resent-Message-Id: <200906300820.n5U8K1Ue061915@freefall.freebsd.org> Resent-From: FreeBSD-gnats-submit@FreeBSD.org (GNATS Filer) Resent-To: freebsd-bugs@FreeBSD.org Resent-Reply-To: FreeBSD-gnats-submit@FreeBSD.org, Karl Pielorz Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A95D41065672 for ; Tue, 30 Jun 2009 08:11:12 +0000 (UTC) (envelope-from nobody@FreeBSD.org) Received: from www.freebsd.org (www.freebsd.org [IPv6:2001:4f8:fff6::21]) by mx1.freebsd.org (Postfix) with ESMTP id 7A8B88FC18 for ; Tue, 30 Jun 2009 08:11:12 +0000 (UTC) (envelope-from nobody@FreeBSD.org) Received: from www.freebsd.org (localhost [127.0.0.1]) by www.freebsd.org (8.14.3/8.14.3) with ESMTP id n5U8BCEt090092 for ; Tue, 30 Jun 2009 08:11:12 GMT (envelope-from nobody@www.freebsd.org) Received: (from nobody@localhost) by www.freebsd.org (8.14.3/8.14.3/Submit) id n5U8BCq0090091; Tue, 30 Jun 2009 08:11:12 GMT (envelope-from nobody) Message-Id: <200906300811.n5U8BCq0090091@www.freebsd.org> Date: Tue, 30 Jun 2009 08:11:12 GMT From: Karl Pielorz To: freebsd-gnats-submit@FreeBSD.org X-Send-Pr-Version: www-3.1 Cc: Subject: misc/136182: Heavy disk writes (e.g. ZFS resilver to a drive) can cause "adX: TIMEOUT - FLUSHCACHE retrying (1 retry left)" on console. X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 30 Jun 2009 08:20:02 -0000 >Number: 136182 >Category: misc >Synopsis: Heavy disk writes (e.g. ZFS resilver to a drive) can cause "adX: TIMEOUT - FLUSHCACHE retrying (1 retry left)" on console. >Confidential: no >Severity: non-critical >Priority: low >Responsible: freebsd-bugs >State: open >Quarter: >Keywords: >Date-Required: >Class: change-request >Submitter-Id: current-users >Arrival-Date: Tue Jun 30 08:20:01 UTC 2009 >Closed-Date: >Last-Modified: >Originator: Karl Pielorz >Release: 7.2-STABLE >Organization: >Environment: FreeBSD caladan.tdx.co.uk 7.2-STABLE FreeBSD 7.2-STABLE #54: Mon Jun 29 09:25:13 BST 2009 root@caladan.tdx.co.uk:/usr/src/sys/amd64/compile/CALADAN64-SMP amd64 >Description: While doing a ZFS 'resilver' to a new drive (a Western Digital WD5000AAKS), you get a number of 'flushcache' timeouts logged to the console. The drive reports no SMART errors, and passes (in this case) a Western Digital drive check, with no errors. The only error logged is for the 'flushcache' operation. Checking in the mailing list, there's past references to upping the timeout on the ATA 'flushcache' command from the default 5 seconds, to 30 seconds - as apparently the ATA spec says a flushcache can take up to 30 seconds. e.g. http://lists.freebsd.org/pipermail/freebsd-current/2009-April/005939.html This patch doesn't apply to 7.2-S, and doesn't seem to have made it in. The patch included with this PR fixes the problem for me - I no longer get flushcache warnings while doing a resilver on this system. >How-To-Repeat: Saturate a drive with write I/O - in my case, take 2 * 500Gb Western Digital WD5000AAKS SATA drives - in a ZFS mirror set. Fill the zpool that's created - and then remove one of the mirrored pairs, and replace it with another 'blank' drive. Tell ZFS to do a drive replace (which starts a 'resilver' to copy the data from the good drive, to the new drive). At various points you'll keep getting: ad34: TIMEOUT - FLUSHCACHE retrying (1 retry left) Logged to the console, sometimes once in a while - other times, quite often - during the resilver. >Fix: The attached patch 'fixes it for me' - it sets a timeout for flush commands, of 30 seconds, instead of the default 5. This has been running the past couple of days, and I've not seen a single flush timeout. Patch attached with submission follows: --- ata-disk.c 2009-06-30 08:55:56.000000000 +0100 +++ ata-disk.c.kp 2009-06-30 08:54:47.000000000 +0100 @@ -339,6 +339,7 @@ request->transfersize = 0; request->flags = ATA_R_CONTROL; request->u.ata.command = ATA_FLUSHCACHE; + request->timeout = 30; break; default: device_printf(dev, "FAILURE - unknown BIO operation\n"); >Release-Note: >Audit-Trail: >Unformatted: