From owner-freebsd-stable@FreeBSD.ORG  Sat Jan 26 00:38:46 2008
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 3457716A418
	for <freebsd-stable@freebsd.org>; Sat, 26 Jan 2008 00:38:46 +0000 (UTC)
	(envelope-from jdc@parodius.com)
Received: from mx01.sc1.parodius.com (mx01.sc1.parodius.com [72.20.106.3])
	by mx1.freebsd.org (Postfix) with ESMTP id 1B64213C448
	for <freebsd-stable@freebsd.org>; Sat, 26 Jan 2008 00:38:46 +0000 (UTC)
	(envelope-from jdc@parodius.com)
Received: by mx01.sc1.parodius.com (Postfix, from userid 1000)
	id 0A4091CC038; Fri, 25 Jan 2008 16:38:46 -0800 (PST)
Date: Fri, 25 Jan 2008 16:38:46 -0800
From: Jeremy Chadwick <koitsu@FreeBSD.org>
To: Joe Peterson <joe@skyrush.com>
Message-ID: <20080126003845.GA52183@eos.sc1.parodius.com>
References: <479A0731.6020405@skyrush.com>
	<20080125162940.GA38494@eos.sc1.parodius.com>
	<479A3764.6050800@skyrush.com>
	<3803988D-8D18-4E89-92EA-19BF62FD2395@mac.com>
	<479A4CB0.5080206@skyrush.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <479A4CB0.5080206@skyrush.com>
User-Agent: Mutt/1.5.16 (2007-06-09)
Cc: freebsd-stable@freebsd.org
Subject: Re: "ad0: TIMEOUT - WRITE_DMA" type errors with 7.0-RC1
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 26 Jan 2008 00:38:46 -0000

Joe, I wanted to send you a note about something that I'm still in the
process of dealing with.  The timing couldn't be more ironic.

I decided it would be worthwhile to migrate from my two-disk ZFS stripe
with a non-ZFS disk for nightly backups, to to a RAIDZ pool of all 3
disks combined (since they're all the same size).  I had another
terminal with gstat -I500ms running in it, so I could see overall I/O.

All was going well until about the 81GB mark of the copy.  gstat started
showing 0KB in/out on all the drives, and the rsync was stalled.  ^Z did
nothing, which is usually a bad sign.  :-)  I ssh'd in and did a dmesg
(summarised):

ad6: WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout - completing request directly
ad6: WARNING - SETFEATURES ENABLE WCACHE taskqueue timeout - completing request directly
ad6: WARNING - SET_MULTI taskqueue timeout - completing request directly
ad6: TIMEOUT - WRITE_DMA retrying (0 retries left) LBA=13951071
ad6: TIMEOUT - WRITE_DMA retrying (0 retries left) LBA=13951327
ad6: FAILURE - WRITE_DMA timed out LBA=13951071
ad6: FAILURE - WRITE_DMA timed out LBA=13951327
ad6: TIMEOUT - WRITE_DMA retrying (0 retries left) LBA=13951583
ad6: TIMEOUT - WRITE_DMA retrying (0 retries left) LBA=13951839
ad6: FAILURE - WRITE_DMA timed out LBA=13951583
ad6: FAILURE - WRITE_DMA timed out LBA=13951839
ad6: TIMEOUT - WRITE_DMA retrying (0 retries left) LBA=13952095
ad6: TIMEOUT - WRITE_DMA retrying (0 retries left) LBA=13952351
g_vfs_done():ad6s1d[WRITE(offset=7142916096, length=131072)]error = 5
g_vfs_done():ad6s1d[WRITE(offset=7143047168, length=131072)]error = 5
g_vfs_done():ad6s1d[WRITE(offset=7143178240, length=131072)]error = 5
g_vfs_done():ad6s1d[WRITE(offset=7143309312, length=131072)]error = 5
g_vfs_done():ad6s1d[WRITE(offset=7143440384, length=131072)]error = 5

It appears my /dev/ad6 (a Seagate -- more irony) must have some bad
blocks.  Actually, after letting things go for a while, I realised the
box just locked up.  Probably kernel panic'd due to the I/O problem.
I'll have to poke at SMART stats later to see what showed up.

-- 
| Jeremy Chadwick                                    jdc at parodius.com |
| Parodius Networking                           http://www.parodius.com/ |
| UNIX Systems Administrator                      Mountain View, CA, USA |
| Making life hard for others since 1977.                  PGP: 4BD6C0CB |