From owner-freebsd-current@FreeBSD.ORG Sun Sep 13 20:17:50 2009 Return-Path: Delivered-To: current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A7B66106566B for ; Sun, 13 Sep 2009 20:17:50 +0000 (UTC) (envelope-from mav@FreeBSD.org) Received: from cmail.optima.ua (cmail.optima.ua [195.248.191.121]) by mx1.freebsd.org (Postfix) with ESMTP id 3359B8FC0C for ; Sun, 13 Sep 2009 20:17:49 +0000 (UTC) Received: from [212.86.226.226] (account mav@alkar.net HELO mavbook.mavhome.dp.ua) by cmail.optima.ua (CommuniGate Pro SMTP 5.2.9) with ESMTPSA id 254182638; Sun, 13 Sep 2009 23:17:46 +0300 Message-ID: <4AAD5365.5000902@FreeBSD.org> Date: Sun, 13 Sep 2009 23:17:41 +0300 From: Alexander Motin User-Agent: Thunderbird 2.0.0.23 (X11/20090901) MIME-Version: 1.0 To: Kris Kennaway References: <4AAD4E51.5060908@FreeBSD.org> In-Reply-To: <4AAD4E51.5060908@FreeBSD.org> X-Enigmail-Version: 0.96.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: FreeBSD Current Subject: Re: ata timeouts under load X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 13 Sep 2009 20:17:50 -0000 Kris Kennaway wrote: > I am getting timeouts on 8.0b4/HEAD when I do a lot of ZFS I/O to a pool > on ad4: > > atapci0: port > 0xc800-0xc807,0xc400-0xc403,0xc000-0xc007,0xb800-0xb803,0xb400-0xb40f,0xb000-0xb0ff > irq 20 at device 15.0 on pci0 > ata2: on atapci0 > ata3: on atapci0 > ata0: on atapci1 > ata1: on atapci1 > > ad4: 476940MB at ata2-master SATA150 > ad4: WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout - > completing request directly > ad4: WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout - > completing request directly > ad4: WARNING - SETFEATURES ENABLE RCACHE taskqueue timeout - completing > request directly > ad4: WARNING - SETFEATURES ENABLE WCACHE taskqueue timeout - completing > request directly > ad4: WARNING - SET_MULTI taskqueue timeout - completing request directly > ad4: TIMEOUT - WRITE_DMA48 retrying (1 retry left) LBA=344052040 > ad4: WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout - > completing request directly > ad4: WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout - > completing request directly > > It becomes stuck in a loop displaying the above and is unable to > complete further I/O operations. I wonder if it is just batching up a > lot of I/O and then timing out because it is busy, and then not > recovering from this state? > > Any ideas what could be wrong? There are two different kinds of timeouts we can see: - first one, "ad4: WARNING - ..." is just a queue waiting timeout. It is not the reason, but consequence of the problem. And I have doubts that it is reasonable to do it. - second one, "TIMEOUT - WRITE_DMA48 ..." is a real command execution timeout. I don't know whether this is result of some improper error recovery, or you drive indeed lost required servo information near LBA=344052040 and tries to find it too long. You can try to read that sector and nearby ones with dd. -- Alexander Motin