Date: Mon, 31 Aug 2009 12:22:25 -0600 From: Tim Judd <tajudd@gmail.com> To: Mark Stapper <stark@mapper.nl> Cc: freebsd-questions@freebsd.org Subject: Re: ZFS and DMA read error Message-ID: <ade45ae90908311122s685a6aa7o6dcc49a48c08000e@mail.gmail.com> In-Reply-To: <4A9B731E.9050400@mapper.nl> References: <4A9B731E.9050400@mapper.nl>
next in thread | previous in thread | raw e-mail | index | archive | help
On 8/31/09, Mark Stapper <stark@mapper.nl> wrote: > Good day to you, > > I'm having a bit of trouble with one of the disks in my zfs raidz1 pool. > It's giving me dma read error, and zpool is reporting READ failures. > However, data integrity is OK :-) > Unfortunately I was in the middle of rearranging my backup media, so I'm > backup up everything as we speak. > I will be testing the failing drive in another computer soon, however > before I return it i'd like to know if this could be caused my something > other than hardware failing. > Below the output of "zpool status" and a snippet of /var/log/messages > showing the DMA errors. > Thanks for the input. > Greetz, > Mark > > > pool: data > state: ONLINE > status: One or more devices has experienced an unrecoverable error. An > attempt was made to correct the error. Applications are unaffected. > action: Determine if the device needs to be replaced, and clear the errors > using 'zpool clear' or replace the device with 'zpool replace'. > see: http://www.sun.com/msg/ZFS-8000-9P > scrub: none requested > config: > > NAME STATE READ WRITE CKSUM > data ONLINE 0 0 0 > raidz1 ONLINE 0 0 0 > ad4 ONLINE 0 0 0 > ad6 ONLINE 21 0 0 > ad8 ONLINE 0 0 0 > ad10 ONLINE 0 0 0 > > errors: No known data errors > > Aug 31 03:04:35 yoshi kernel: ad6: FAILURE - READ_DMA48 > status=51<READY,DSC,ERROR> error=40<UNCORRECTABLE> LBA=932040832 > Aug 31 03:04:35 yoshi root: ZFS: vdev I/O failure, zpool=data > path=/dev/ad6 offset=477204905984 size=65536 error=5 > Aug 31 03:04:35 yoshi root: ZFS: vdev I/O failure, zpool=data > path=/dev/ad6 offset=477204925440 size=2560 error=5 <snip 9 identical messages, based on the uncorrectable LBA error> Since it's all throwing errors at the same LBA, I'd run a SMART diagnostics on the drive (i think it's port sysutils/smartmontools) and see if it's showing errors too. Looks like a failing/failed drive and I would recommend replacing it. I doubt (but you can try) spinrite will help you when you get to this point. spinrite's website is at grc.com Hope you have backups or redundancy. No fun replacing data. --TJ
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?ade45ae90908311122s685a6aa7o6dcc49a48c08000e>