From owner-freebsd-current@FreeBSD.ORG Tue Sep 15 16:45:25 2009 Return-Path: Delivered-To: current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 697D31065672; Tue, 15 Sep 2009 16:45:25 +0000 (UTC) (envelope-from mavbsd@gmail.com) Received: from mail-fx0-f210.google.com (mail-fx0-f210.google.com [209.85.220.210]) by mx1.freebsd.org (Postfix) with ESMTP id BF8C48FC15; Tue, 15 Sep 2009 16:45:24 +0000 (UTC) Received: by fxm6 with SMTP id 6so2803394fxm.43 for ; Tue, 15 Sep 2009 09:45:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:sender:message-id:date:from :user-agent:mime-version:to:cc:subject:references:in-reply-to :x-enigmail-version:content-type:content-transfer-encoding; bh=xf2CThXn4a1AOmnE+E43j3hb9quLIt0fYWa1gwkDVw0=; b=ifkX/TFqFN04boykO1trv8VMWSMrrjILAd/4vSzFk0iCEee8Oxo5JRInhtZlMcn7wn 85NxEAeFNPxqi1ak29WnDYH4ca465PIBiGYwXENboWwLqJxkWh0h/BmKVcPikthIPLWe 4bvHgB//wV+KpTT/aYkKzSYdHOUkABKitouUg= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=sender:message-id:date:from:user-agent:mime-version:to:cc:subject :references:in-reply-to:x-enigmail-version:content-type :content-transfer-encoding; b=a5wlhjh6jGTfHum0AIcNwyjkS+qdVE4Mrmz+eC/Y4V0gqMQFegudCC8r2NCGh3bC/6 BCbFX30KkiogM7orrbNtDnRLLmjC1xFjpqgxX810VPWUismlOJ7kaHKj/U1uByd7gxnJ uibadBHac8ZDSImSa+FVYvbgzERoyS79iaTeQ= Received: by 10.204.161.204 with SMTP id s12mr6416770bkx.26.1253031667428; Tue, 15 Sep 2009 09:21:07 -0700 (PDT) Received: from mavbook.mavhome.dp.ua (pc.mavhome.dp.ua [212.86.226.226]) by mx.google.com with ESMTPS id y15sm2303458fkd.49.2009.09.15.09.21.06 (version=SSLv3 cipher=RC4-MD5); Tue, 15 Sep 2009 09:21:07 -0700 (PDT) Sender: Alexander Motin Message-ID: <4AAFBEEE.8090101@FreeBSD.org> Date: Tue, 15 Sep 2009 19:21:02 +0300 From: Alexander Motin User-Agent: Thunderbird 2.0.0.23 (X11/20090901) MIME-Version: 1.0 To: Pawel Jakub Dawidek References: <4AAD4E51.5060908@FreeBSD.org> <4AAD5365.5000902@FreeBSD.org> <20090915155436.GB2199@garage.freebsd.pl> In-Reply-To: <20090915155436.GB2199@garage.freebsd.pl> X-Enigmail-Version: 0.96.0 Content-Type: text/plain; charset=KOI8-R Content-Transfer-Encoding: 7bit Cc: Kris Kennaway , FreeBSD Current Subject: Re: ata timeouts under load X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 15 Sep 2009 16:45:25 -0000 Pawel Jakub Dawidek wrote: > On Sun, Sep 13, 2009 at 11:17:41PM +0300, Alexander Motin wrote: >> Kris Kennaway wrote: >>> I am getting timeouts on 8.0b4/HEAD when I do a lot of ZFS I/O to a pool >>> on ad4: >>> >>> atapci0: port >>> 0xc800-0xc807,0xc400-0xc403,0xc000-0xc007,0xb800-0xb803,0xb400-0xb40f,0xb000-0xb0ff >>> irq 20 at device 15.0 on pci0 >>> ata2: on atapci0 >>> ata3: on atapci0 >>> ata0: on atapci1 >>> ata1: on atapci1 >>> >>> ad4: 476940MB at ata2-master SATA150 >>> ad4: WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout - >>> completing request directly >>> ad4: WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout - >>> completing request directly >>> ad4: WARNING - SETFEATURES ENABLE RCACHE taskqueue timeout - completing >>> request directly >>> ad4: WARNING - SETFEATURES ENABLE WCACHE taskqueue timeout - completing >>> request directly >>> ad4: WARNING - SET_MULTI taskqueue timeout - completing request directly >>> ad4: TIMEOUT - WRITE_DMA48 retrying (1 retry left) LBA=344052040 >>> ad4: WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout - >>> completing request directly >>> ad4: WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout - >>> completing request directly >>> >>> It becomes stuck in a loop displaying the above and is unable to >>> complete further I/O operations. I wonder if it is just batching up a >>> lot of I/O and then timing out because it is busy, and then not >>> recovering from this state? >>> >>> Any ideas what could be wrong? >> There are two different kinds of timeouts we can see: >> - first one, "ad4: WARNING - ..." is just a queue waiting timeout. It >> is not the reason, but consequence of the problem. And I have doubts >> that it is reasonable to do it. >> - second one, "TIMEOUT - WRITE_DMA48 ..." is a real command execution >> timeout. I don't know whether this is result of some improper error >> recovery, or you drive indeed lost required servo information near >> LBA=344052040 and tries to find it too long. You can try to read that >> sector and nearby ones with dd. > > Could this be related to BIO_FLUSH requests? BIO_FLUSH implemented via FLUSHCACHE48 command, not a WRITE_DMA48, so message doesn't fit. But theoretically, if drive has write caching enabled, FLUSHCACHE48 could take longer to execute then other commands. Especially looking that ATA(4) has very strict timeouts. -- Alexander Motin