Date: Sat, 26 Jan 2008 11:30:56 -0700 From: Joe Peterson <joe@skyrush.com> To: freebsd-stable@freebsd.org Subject: Re: "ad0: TIMEOUT - WRITE_DMA" type errors with 7.0-RC1 Message-ID: <479B7C60.7000800@skyrush.com> In-Reply-To: <20080126012124.GA53400@eos.sc1.parodius.com> References: <479A0731.6020405@skyrush.com> <20080125162940.GA38494@eos.sc1.parodius.com> <479A3764.6050800@skyrush.com> <3803988D-8D18-4E89-92EA-19BF62FD2395@mac.com> <479A4CB0.5080206@skyrush.com> <20080126003845.GA52183@eos.sc1.parodius.com> <479A86E5.5060806@skyrush.com> <20080126012124.GA53400@eos.sc1.parodius.com>
next in thread | previous in thread | raw e-mail | index | archive | help
I performed a ZFS scrub, which finished yesterday, and no new /var/log/messages errors were reported during that time. However, the scrub found something interesting: crater# zpool status -v pool: tank state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://www.sun.com/msg/ZFS-8000-8A scrub: scrub completed with 1 errors on Fri Jan 25 12:52:32 2008 config: NAME STATE READ WRITE CKSUM tank ONLINE 1 3 2 ad0s1d ONLINE 1 3 2 errors: Permanent errors have been detected in the following files: /home/joe/music/jukebox/christmas/Esquivel/Merry_XMas_from_the_SpaceAge_ Bachelor_Pad/07-Snowfall.mp3 Note that I have not touched this file since copying it to this drive. So, it seems one file failed a checksum check during the scrub. I now (expectedly) get errors trying to read this file - probably ZFS indicating the condition. When I just logged in tonight, I got two more /var/log/messages disk messages about WRITE_DMA48 TIMEOUT/FAILURE - might be a coincidence (just as I was typing my password). Also, smartctl still shows PASSED, however, this is interesting: 195 Hardware_ECC_Recovered 0x001a 061 046 000 Old_age Always - 9070 The number is much *smaller* now! It was "6" a few minutes before this... wrap around? Hmm, I'm really not sure, at this point, what is going on. So I have started a "SeaTools" (disk scanner from Seagate) "long test" of the drive. The short test passed already. The results should be interesting. If it finds nothing wrong, I am going to start to wonder if I am experiencing ZFS bugs that just happen to look like drive problems. I already did a long read, under linux, of disk contents, and got no messages about anything wrong. If I can turn on any debugging info to help determine if this is software-related, let me know the magic keywords to use. :) -Joe
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?479B7C60.7000800>