Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 22 Nov 2014 18:56:29 -0500
From:      Zaphod Beeblebrox <zbeeble@gmail.com>
To:        freebsd-fs <freebsd-fs@freebsd.org>
Subject:   When a ZFS error is not an error.
Message-ID:  <CACpH0MdMWKP0yYG3wedjgWX5_Nt_6-FcbARiwT7EidnmY2Q5gQ@mail.gmail.com>

next in thread | raw e-mail | index | archive | help
I have a file that ZFS claims is in error that when I go through all the
effort to retrieve it, is not in error.  I have 405 files, then, that zfs
says are in error on this array and since some are rather large and since
retrieving one block seems to take 30 seconds (ie: hundreds of hours of
time to recover some files), I'd like to ask if there's some way to finesse
this... or to fix zfs.

To start, my array has errors like:

        NAME               STATE     READ WRITE CKSUM
        vr2                ONLINE       0     0   989
          raidz1-0         ONLINE       0     0 1.93K
            label/vr2-d0   ONLINE       0     0     0

(I've omitted the other lines ... they all '0').  I asked what this meant
... and the best I got was that the errors were not assigned to any
particular device.  So I learned how to use ZDB and I have a patch for
ZDB.  Apparently the deadlist can have a null in it that crashes ZDB.

No matter.  We have this file in the output of zpool status -v:

vr2/Audio@20080305-1450:/cds/service/02-Lord_Have_Mercy_Kyrie.mp3

... now even though it picks on the snapshot (not all of the -v reports
do), the following fails:

[1:170:470]root@virtual:/vr1/tmp/diag> cp
/vr2/Audio/cds/service/02-Lord_Have_Mercy_Kyrie.mp3 .
cp: foo.mp3: Bad address

So I did this:

for i in `grep L0 4351-dddddddd.txt | grep -v vr2/Audio | head -50 | cut
-c22-34`; do cc=`printf %05d $count`; echo getting $i 4035/b$cc; time zdb
-R vr2 $i:20000:r >4035/b$cc & count=$[count+1]; done

--- basically, 4351-dddddddd.txt is the output of zdb for that file (see
http://pastebin.com/tdqEJKJB) and the little script calls zdb to get the
first 20000 (hex) of each block because the remaining 4000 is the parity (9
disk array).

Then I cat it into one file, then I truncate it to the specified length ....

and lo and behold: The file is sound.

So what's ZFS on about not wanting to read this file?  Help?



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CACpH0MdMWKP0yYG3wedjgWX5_Nt_6-FcbARiwT7EidnmY2Q5gQ>