Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 05 Feb 2008 11:09:45 -0800
From:      Bakul Shah <bakul@bitblocks.com>
To:        Joe Peterson <joe@skyrush.com>
Cc:        freebsd-fs@freebsd.org
Subject:   Re: Forcing full file read in ZFS even when checksum error encountered 
Message-ID:  <20080205190946.3D69C5B59@mail.bitblocks.com>
In-Reply-To: Your message of "Tue, 05 Feb 2008 10:38:23 MST." <47A89F0F.1030505@skyrush.com> 

next in thread | previous in thread | raw e-mail | index | archive | help
> I've checked SMART - no [unrecoverable] errors and no additional sector
> reallocations, and I've done a SeaTools long test - no problems found.
> 
> But I do not understand: in zpool status, there are stats on read errors in
> addition to checksum errors.  If I understand correctly, a read error would be
> the system/HW reporting an error on read, whereas the whole idea of the
> checksums in ZFS is to catch errors that are *not* reported as read errors
> (i.e. silent bit changes that normal filesystems would never catch).  What I
> seem to be seeing is a case in which ZFS says the checksum is wrong.  There
> are only counts in the CKSUM col, not the other cols in the status, so I do
> not think this is a "read error" - it is ZFS's last line of defense (the
> checksum) reporting a mismatch.
> 
> In other words, I assume the read would complete if ZFS did not catch the
> checksum mismatch, and what I'd like to do is let it complete so I can see for
> myself where these bit errors are by comparing the read file to a known good
> copy (that I have).  If there are no mismatches, it would mean there is a
> metadata error of ZFS bug.

It could also be a memory error of some sort.  Does your
system haev ECC memory?  Also note that standalone tests do
not seem to catch all sorts of errors that heavy use of Unix
can sometimes trigger on a marginal system.

But I agree with you that it would be useful to have a debug
mode where you can get at the data even if it is bad (and a
test mode where you can write bad data on purpose:-). [A
long rant on writing testable code deleted]

You have access to the zfs sources! At the very least you can
add code to report the bad checksum & offset and see if
matches with checksum of the same block(s) in your known good
copy.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20080205190946.3D69C5B59>