From owner-freebsd-current@FreeBSD.ORG Sun Sep 13 21:02:09 2009 Return-Path: Delivered-To: current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B0FD4106568B; Sun, 13 Sep 2009 21:02:09 +0000 (UTC) (envelope-from kris@FreeBSD.org) Received: from kennaway-macbookpro.config (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 186528FC0C; Sun, 13 Sep 2009 21:02:08 +0000 (UTC) Message-ID: <4AAD5DD2.4030104@FreeBSD.org> Date: Sun, 13 Sep 2009 22:02:10 +0100 From: Kris Kennaway User-Agent: Thunderbird 2.0.0.23 (Macintosh/20090812) MIME-Version: 1.0 To: Alexander Motin References: <4AAD4E51.5060908@FreeBSD.org> <4AAD5365.5000902@FreeBSD.org> In-Reply-To: <4AAD5365.5000902@FreeBSD.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: FreeBSD Current Subject: Re: ata timeouts under load X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 13 Sep 2009 21:02:09 -0000 Alexander Motin wrote: > Kris Kennaway wrote: >> I am getting timeouts on 8.0b4/HEAD when I do a lot of ZFS I/O to a pool >> on ad4: >> >> atapci0: port >> 0xc800-0xc807,0xc400-0xc403,0xc000-0xc007,0xb800-0xb803,0xb400-0xb40f,0xb000-0xb0ff >> irq 20 at device 15.0 on pci0 >> ata2: on atapci0 >> ata3: on atapci0 >> ata0: on atapci1 >> ata1: on atapci1 >> >> ad4: 476940MB at ata2-master SATA150 >> ad4: WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout - >> completing request directly >> ad4: WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout - >> completing request directly >> ad4: WARNING - SETFEATURES ENABLE RCACHE taskqueue timeout - completing >> request directly >> ad4: WARNING - SETFEATURES ENABLE WCACHE taskqueue timeout - completing >> request directly >> ad4: WARNING - SET_MULTI taskqueue timeout - completing request directly >> ad4: TIMEOUT - WRITE_DMA48 retrying (1 retry left) LBA=344052040 >> ad4: WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout - >> completing request directly >> ad4: WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout - >> completing request directly >> >> It becomes stuck in a loop displaying the above and is unable to >> complete further I/O operations. I wonder if it is just batching up a >> lot of I/O and then timing out because it is busy, and then not >> recovering from this state? >> >> Any ideas what could be wrong? > > There are two different kinds of timeouts we can see: > - first one, "ad4: WARNING - ..." is just a queue waiting timeout. It > is not the reason, but consequence of the problem. And I have doubts > that it is reasonable to do it. > - second one, "TIMEOUT - WRITE_DMA48 ..." is a real command execution > timeout. I don't know whether this is result of some improper error > recovery, or you drive indeed lost required servo information near > LBA=344052040 and tries to find it too long. You can try to read that > sector and nearby ones with dd. > It's always that sequence (with setfeatures timing out first, then the dma later)...and the block number varies widely, also whether it's read/write. The disk itself & the data it contains appears to be OK as far as I have been able to determine so far. Kris