From owner-freebsd-stable@FreeBSD.ORG  Mon Feb 11 12:01:00 2008
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 7CBBE16A41A
	for <freebsd-stable@freebsd.org>; Mon, 11 Feb 2008 12:01:00 +0000 (UTC)
	(envelope-from remco@spacemarines.us)
Received: from green.qinip.net (green.qinip.net [62.100.30.36])
	by mx1.freebsd.org (Postfix) with ESMTP id 358D713C51A
	for <freebsd-stable@freebsd.org>; Mon, 11 Feb 2008 12:00:59 +0000 (UTC)
	(envelope-from remco@spacemarines.us)
Received: from marshal.spacemarines.us (h89220144089.dsl.speedlinq.nl
	[89.220.144.89]) by green.qinip.net (Postfix) with ESMTP id 9751CC875
	for <freebsd-stable@freebsd.org>; Mon, 11 Feb 2008 13:01:01 +0100 (CET)
Received: by marshal.spacemarines.us (Postfix, from userid 1000)
	id 6FA901CDAB; Mon, 11 Feb 2008 13:00:57 +0100 (CET)
Date: Mon, 11 Feb 2008 13:00:57 +0100
To: freebsd-stable@freebsd.org
Message-ID: <20080211120057.GA5821@marshal.spacemarines.us>
References: <479A0731.6020405@skyrush.com>
	<20080125162940.GA38494@eos.sc1.parodius.com>
	<479A3764.6050800@skyrush.com>
	<3803988D-8D18-4E89-92EA-19BF62FD2395@mac.com>
	<479A4CB0.5080206@skyrush.com>
	<20080126003845.GA52183@eos.sc1.parodius.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20080126003845.GA52183@eos.sc1.parodius.com>
User-Agent: Mutt/1.5.13 (2006-08-11)
From: remco@spacemarines.us (Remco van Bekkum)
Subject: Re: "ad0: TIMEOUT - WRITE_DMA" type errors with 7.0-RC1
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 11 Feb 2008 12:01:00 -0000

On Fri, Jan 25, 2008 at 04:38:46PM -0800, Jeremy Chadwick wrote:
> Joe, I wanted to send you a note about something that I'm still in the
> process of dealing with.  The timing couldn't be more ironic.
> 
> I decided it would be worthwhile to migrate from my two-disk ZFS stripe
> with a non-ZFS disk for nightly backups, to to a RAIDZ pool of all 3
> disks combined (since they're all the same size).  I had another
> terminal with gstat -I500ms running in it, so I could see overall I/O.
> 
> All was going well until about the 81GB mark of the copy.  gstat started
> showing 0KB in/out on all the drives, and the rsync was stalled.  ^Z did
> nothing, which is usually a bad sign.  :-)  I ssh'd in and did a dmesg
> (summarised):
> 
> ad6: WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout - completing request directly
> ad6: WARNING - SETFEATURES ENABLE WCACHE taskqueue timeout - completing request directly
> ad6: WARNING - SET_MULTI taskqueue timeout - completing request directly
> ad6: TIMEOUT - WRITE_DMA retrying (0 retries left) LBA=13951071
> ad6: TIMEOUT - WRITE_DMA retrying (0 retries left) LBA=13951327
> ad6: FAILURE - WRITE_DMA timed out LBA=13951071
> ad6: FAILURE - WRITE_DMA timed out LBA=13951327
> ad6: TIMEOUT - WRITE_DMA retrying (0 retries left) LBA=13951583
> ad6: TIMEOUT - WRITE_DMA retrying (0 retries left) LBA=13951839
> ad6: FAILURE - WRITE_DMA timed out LBA=13951583
> ad6: FAILURE - WRITE_DMA timed out LBA=13951839
> ad6: TIMEOUT - WRITE_DMA retrying (0 retries left) LBA=13952095
> ad6: TIMEOUT - WRITE_DMA retrying (0 retries left) LBA=13952351
> g_vfs_done():ad6s1d[WRITE(offset=7142916096, length=131072)]error = 5
> g_vfs_done():ad6s1d[WRITE(offset=7143047168, length=131072)]error = 5
> g_vfs_done():ad6s1d[WRITE(offset=7143178240, length=131072)]error = 5
> g_vfs_done():ad6s1d[WRITE(offset=7143309312, length=131072)]error = 5
> g_vfs_done():ad6s1d[WRITE(offset=7143440384, length=131072)]error = 5
> 
> It appears my /dev/ad6 (a Seagate -- more irony) must have some bad
> blocks.  Actually, after letting things go for a while, I realised the
> box just locked up.  Probably kernel panic'd due to the I/O problem.
> I'll have to poke at SMART stats later to see what showed up.
> 
> -- 
> | Jeremy Chadwick                                    jdc at parodius.com |
> | Parodius Networking                           http://www.parodius.com/ |
> | UNIX Systems Administrator                      Mountain View, CA, USA |
> | Making life hard for others since 1977.                  PGP: 4BD6C0CB |
> 
> _______________________________________________
> freebsd-stable@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"

Hi all,

After having replaced my first SATA disk with one of the same type,
having still the same errors, I replaced this 1TB drive with 4x500GB
Hitachi P7K500 in raidz. It worked fine for a week, but yesterday I
cvsupped and rebuild world. This afternoon everything is breaking down
again with the same errors:

Feb 11 12:34:09 xaero kernel: ad6: WARNING - SETFEATURES SET TRANSFER
MODE taskqueue timeout - completing request directly
Feb 11 12:34:13 xaero kernel: ad6: WARNING - SETFEATURES SET TRANSFER
MODE taskqueue timeout - completing request directly
Feb 11 12:34:17 xaero kernel: ad6: WARNING - SETFEATURES ENABLE RCACHE
taskqueue timeout - completing request directly
Feb 11 12:34:21 xaero kernel: ad6: WARNING - SETFEATURES ENABLE WCACHE
taskqueue timeout - completing request directly
Feb 11 12:34:25 xaero kernel: ad6: WARNING - SET_MULTI taskqueue timeout
- completing request directly
Feb 11 12:34:25 xaero kernel: ad6: FAILURE - WRITE_DMA48 timed out
LBA=298014274

Feb 11 12:34:29 xaero kernel: ad8: WARNING - SETFEATURES SET TRANSFER
MODE taskqueue timeout - completing request directly
Feb 11 12:34:33 xaero kernel: ad8: WARNING - SETFEATURES SET TRANSFER
MODE taskqueue timeout - completing request directly
Feb 11 12:34:37 xaero kernel: ad8: WARNING - SETFEATURES ENABLE RCACHE
taskqueue timeout - completing request directly
Feb 11 12:34:41 xaero kernel: ad8: WARNING - SETFEATURES ENABLE WCACHE
taskqueue timeout - completing request directly
Feb 11 12:34:45 xaero kernel: ad8: WARNING - SET_MULTI taskqueue timeout
- completing request directly
Feb 11 12:34:45 xaero kernel: ad8: FAILURE - WRITE_DMA48 timed out
LBA=298013590

So of 6 new disk I have 4 with the same errors. It would be quite safe then
to not blame the disks imho. I've tested the second drive in another
machine, but still got these timeout errors. What's wrong here?
It's on an amd64, Asus m2a-vm with ati xp600, AMD BE-2350 CPU, 2GB
800MHz RAM.

Regards,

Remco