From owner-freebsd-stable@FreeBSD.ORG Sat Jan 26 00:38:46 2008 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3457716A418 for ; Sat, 26 Jan 2008 00:38:46 +0000 (UTC) (envelope-from jdc@parodius.com) Received: from mx01.sc1.parodius.com (mx01.sc1.parodius.com [72.20.106.3]) by mx1.freebsd.org (Postfix) with ESMTP id 1B64213C448 for ; Sat, 26 Jan 2008 00:38:46 +0000 (UTC) (envelope-from jdc@parodius.com) Received: by mx01.sc1.parodius.com (Postfix, from userid 1000) id 0A4091CC038; Fri, 25 Jan 2008 16:38:46 -0800 (PST) Date: Fri, 25 Jan 2008 16:38:46 -0800 From: Jeremy Chadwick To: Joe Peterson Message-ID: <20080126003845.GA52183@eos.sc1.parodius.com> References: <479A0731.6020405@skyrush.com> <20080125162940.GA38494@eos.sc1.parodius.com> <479A3764.6050800@skyrush.com> <3803988D-8D18-4E89-92EA-19BF62FD2395@mac.com> <479A4CB0.5080206@skyrush.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <479A4CB0.5080206@skyrush.com> User-Agent: Mutt/1.5.16 (2007-06-09) Cc: freebsd-stable@freebsd.org Subject: Re: "ad0: TIMEOUT - WRITE_DMA" type errors with 7.0-RC1 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 26 Jan 2008 00:38:46 -0000 Joe, I wanted to send you a note about something that I'm still in the process of dealing with. The timing couldn't be more ironic. I decided it would be worthwhile to migrate from my two-disk ZFS stripe with a non-ZFS disk for nightly backups, to to a RAIDZ pool of all 3 disks combined (since they're all the same size). I had another terminal with gstat -I500ms running in it, so I could see overall I/O. All was going well until about the 81GB mark of the copy. gstat started showing 0KB in/out on all the drives, and the rsync was stalled. ^Z did nothing, which is usually a bad sign. :-) I ssh'd in and did a dmesg (summarised): ad6: WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout - completing request directly ad6: WARNING - SETFEATURES ENABLE WCACHE taskqueue timeout - completing request directly ad6: WARNING - SET_MULTI taskqueue timeout - completing request directly ad6: TIMEOUT - WRITE_DMA retrying (0 retries left) LBA=13951071 ad6: TIMEOUT - WRITE_DMA retrying (0 retries left) LBA=13951327 ad6: FAILURE - WRITE_DMA timed out LBA=13951071 ad6: FAILURE - WRITE_DMA timed out LBA=13951327 ad6: TIMEOUT - WRITE_DMA retrying (0 retries left) LBA=13951583 ad6: TIMEOUT - WRITE_DMA retrying (0 retries left) LBA=13951839 ad6: FAILURE - WRITE_DMA timed out LBA=13951583 ad6: FAILURE - WRITE_DMA timed out LBA=13951839 ad6: TIMEOUT - WRITE_DMA retrying (0 retries left) LBA=13952095 ad6: TIMEOUT - WRITE_DMA retrying (0 retries left) LBA=13952351 g_vfs_done():ad6s1d[WRITE(offset=7142916096, length=131072)]error = 5 g_vfs_done():ad6s1d[WRITE(offset=7143047168, length=131072)]error = 5 g_vfs_done():ad6s1d[WRITE(offset=7143178240, length=131072)]error = 5 g_vfs_done():ad6s1d[WRITE(offset=7143309312, length=131072)]error = 5 g_vfs_done():ad6s1d[WRITE(offset=7143440384, length=131072)]error = 5 It appears my /dev/ad6 (a Seagate -- more irony) must have some bad blocks. Actually, after letting things go for a while, I realised the box just locked up. Probably kernel panic'd due to the I/O problem. I'll have to poke at SMART stats later to see what showed up. -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB |