From owner-freebsd-stable@FreeBSD.ORG Mon Feb 11 12:01:00 2008 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7CBBE16A41A for ; Mon, 11 Feb 2008 12:01:00 +0000 (UTC) (envelope-from remco@spacemarines.us) Received: from green.qinip.net (green.qinip.net [62.100.30.36]) by mx1.freebsd.org (Postfix) with ESMTP id 358D713C51A for ; Mon, 11 Feb 2008 12:00:59 +0000 (UTC) (envelope-from remco@spacemarines.us) Received: from marshal.spacemarines.us (h89220144089.dsl.speedlinq.nl [89.220.144.89]) by green.qinip.net (Postfix) with ESMTP id 9751CC875 for ; Mon, 11 Feb 2008 13:01:01 +0100 (CET) Received: by marshal.spacemarines.us (Postfix, from userid 1000) id 6FA901CDAB; Mon, 11 Feb 2008 13:00:57 +0100 (CET) Date: Mon, 11 Feb 2008 13:00:57 +0100 To: freebsd-stable@freebsd.org Message-ID: <20080211120057.GA5821@marshal.spacemarines.us> References: <479A0731.6020405@skyrush.com> <20080125162940.GA38494@eos.sc1.parodius.com> <479A3764.6050800@skyrush.com> <3803988D-8D18-4E89-92EA-19BF62FD2395@mac.com> <479A4CB0.5080206@skyrush.com> <20080126003845.GA52183@eos.sc1.parodius.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20080126003845.GA52183@eos.sc1.parodius.com> User-Agent: Mutt/1.5.13 (2006-08-11) From: remco@spacemarines.us (Remco van Bekkum) Subject: Re: "ad0: TIMEOUT - WRITE_DMA" type errors with 7.0-RC1 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 11 Feb 2008 12:01:00 -0000 On Fri, Jan 25, 2008 at 04:38:46PM -0800, Jeremy Chadwick wrote: > Joe, I wanted to send you a note about something that I'm still in the > process of dealing with. The timing couldn't be more ironic. > > I decided it would be worthwhile to migrate from my two-disk ZFS stripe > with a non-ZFS disk for nightly backups, to to a RAIDZ pool of all 3 > disks combined (since they're all the same size). I had another > terminal with gstat -I500ms running in it, so I could see overall I/O. > > All was going well until about the 81GB mark of the copy. gstat started > showing 0KB in/out on all the drives, and the rsync was stalled. ^Z did > nothing, which is usually a bad sign. :-) I ssh'd in and did a dmesg > (summarised): > > ad6: WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout - completing request directly > ad6: WARNING - SETFEATURES ENABLE WCACHE taskqueue timeout - completing request directly > ad6: WARNING - SET_MULTI taskqueue timeout - completing request directly > ad6: TIMEOUT - WRITE_DMA retrying (0 retries left) LBA=13951071 > ad6: TIMEOUT - WRITE_DMA retrying (0 retries left) LBA=13951327 > ad6: FAILURE - WRITE_DMA timed out LBA=13951071 > ad6: FAILURE - WRITE_DMA timed out LBA=13951327 > ad6: TIMEOUT - WRITE_DMA retrying (0 retries left) LBA=13951583 > ad6: TIMEOUT - WRITE_DMA retrying (0 retries left) LBA=13951839 > ad6: FAILURE - WRITE_DMA timed out LBA=13951583 > ad6: FAILURE - WRITE_DMA timed out LBA=13951839 > ad6: TIMEOUT - WRITE_DMA retrying (0 retries left) LBA=13952095 > ad6: TIMEOUT - WRITE_DMA retrying (0 retries left) LBA=13952351 > g_vfs_done():ad6s1d[WRITE(offset=7142916096, length=131072)]error = 5 > g_vfs_done():ad6s1d[WRITE(offset=7143047168, length=131072)]error = 5 > g_vfs_done():ad6s1d[WRITE(offset=7143178240, length=131072)]error = 5 > g_vfs_done():ad6s1d[WRITE(offset=7143309312, length=131072)]error = 5 > g_vfs_done():ad6s1d[WRITE(offset=7143440384, length=131072)]error = 5 > > It appears my /dev/ad6 (a Seagate -- more irony) must have some bad > blocks. Actually, after letting things go for a while, I realised the > box just locked up. Probably kernel panic'd due to the I/O problem. > I'll have to poke at SMART stats later to see what showed up. > > -- > | Jeremy Chadwick jdc at parodius.com | > | Parodius Networking http://www.parodius.com/ | > | UNIX Systems Administrator Mountain View, CA, USA | > | Making life hard for others since 1977. PGP: 4BD6C0CB | > > _______________________________________________ > freebsd-stable@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org" Hi all, After having replaced my first SATA disk with one of the same type, having still the same errors, I replaced this 1TB drive with 4x500GB Hitachi P7K500 in raidz. It worked fine for a week, but yesterday I cvsupped and rebuild world. This afternoon everything is breaking down again with the same errors: Feb 11 12:34:09 xaero kernel: ad6: WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout - completing request directly Feb 11 12:34:13 xaero kernel: ad6: WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout - completing request directly Feb 11 12:34:17 xaero kernel: ad6: WARNING - SETFEATURES ENABLE RCACHE taskqueue timeout - completing request directly Feb 11 12:34:21 xaero kernel: ad6: WARNING - SETFEATURES ENABLE WCACHE taskqueue timeout - completing request directly Feb 11 12:34:25 xaero kernel: ad6: WARNING - SET_MULTI taskqueue timeout - completing request directly Feb 11 12:34:25 xaero kernel: ad6: FAILURE - WRITE_DMA48 timed out LBA=298014274 Feb 11 12:34:29 xaero kernel: ad8: WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout - completing request directly Feb 11 12:34:33 xaero kernel: ad8: WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout - completing request directly Feb 11 12:34:37 xaero kernel: ad8: WARNING - SETFEATURES ENABLE RCACHE taskqueue timeout - completing request directly Feb 11 12:34:41 xaero kernel: ad8: WARNING - SETFEATURES ENABLE WCACHE taskqueue timeout - completing request directly Feb 11 12:34:45 xaero kernel: ad8: WARNING - SET_MULTI taskqueue timeout - completing request directly Feb 11 12:34:45 xaero kernel: ad8: FAILURE - WRITE_DMA48 timed out LBA=298013590 So of 6 new disk I have 4 with the same errors. It would be quite safe then to not blame the disks imho. I've tested the second drive in another machine, but still got these timeout errors. What's wrong here? It's on an amd64, Asus m2a-vm with ati xp600, AMD BE-2350 CPU, 2GB 800MHz RAM. Regards, Remco