Date: Sat, 26 Jan 2008 11:30:56 -0700 From: Joe Peterson <joe@skyrush.com> To: freebsd-stable@freebsd.org Subject: Re: "ad0: TIMEOUT - WRITE_DMA" type errors with 7.0-RC1 Message-ID: <479B7C60.7000800@skyrush.com> In-Reply-To: <20080126012124.GA53400@eos.sc1.parodius.com> References: <479A0731.6020405@skyrush.com> <20080125162940.GA38494@eos.sc1.parodius.com> <479A3764.6050800@skyrush.com> <3803988D-8D18-4E89-92EA-19BF62FD2395@mac.com> <479A4CB0.5080206@skyrush.com> <20080126003845.GA52183@eos.sc1.parodius.com> <479A86E5.5060806@skyrush.com> <20080126012124.GA53400@eos.sc1.parodius.com>
index | next in thread | previous in thread | raw e-mail
I performed a ZFS scrub, which finished yesterday, and no new
/var/log/messages errors were reported during that time. However, the scrub
found something interesting:
crater# zpool status -v
pool: tank
state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: http://www.sun.com/msg/ZFS-8000-8A
scrub: scrub completed with 1 errors on Fri Jan 25 12:52:32 2008
config:
NAME STATE READ WRITE CKSUM
tank ONLINE 1 3 2
ad0s1d ONLINE 1 3 2
errors: Permanent errors have been detected in the following files:
/home/joe/music/jukebox/christmas/Esquivel/Merry_XMas_from_the_SpaceAge_
Bachelor_Pad/07-Snowfall.mp3
Note that I have not touched this file since copying it to this drive.
So, it seems one file failed a checksum check during the scrub. I now
(expectedly) get errors trying to read this file - probably ZFS indicating the
condition. When I just logged in tonight, I got two more /var/log/messages
disk messages about WRITE_DMA48 TIMEOUT/FAILURE - might be a coincidence (just
as I was typing my password).
Also, smartctl still shows PASSED, however, this is interesting:
195 Hardware_ECC_Recovered 0x001a 061 046 000 Old_age Always
- 9070
The number is much *smaller* now! It was "6" a few minutes before this...
wrap around? Hmm, I'm really not sure, at this point, what is going on.
So I have started a "SeaTools" (disk scanner from Seagate) "long test" of the
drive. The short test passed already. The results should be interesting. If
it finds nothing wrong, I am going to start to wonder if I am experiencing ZFS
bugs that just happen to look like drive problems. I already did a long read,
under linux, of disk contents, and got no messages about anything wrong.
If I can turn on any debugging info to help determine if this is
software-related, let me know the magic keywords to use. :)
-Joe
home |
help
Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?479B7C60.7000800>
