Date: Sun, 23 Nov 2014 20:00:50 -0500 From: Zaphod Beeblebrox <zbeeble@gmail.com> To: freebsd-fs <freebsd-fs@freebsd.org> Subject: Re: When a ZFS error is not an error. Message-ID: <CACpH0Me7xM2t7KWxVkPJtdAty5o_f7ztK3zuEq77rNM-PhnJGw@mail.gmail.com> In-Reply-To: <CACpH0MdMWKP0yYG3wedjgWX5_Nt_6-FcbARiwT7EidnmY2Q5gQ@mail.gmail.com> References: <CACpH0MdMWKP0yYG3wedjgWX5_Nt_6-FcbARiwT7EidnmY2Q5gQ@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
So... I recovered a 2nd file... this time a rar file. It fails the checksum, but I'm unsure of how I'm extracting the file. The array consists of two vdevs: 1 of 9 disks and the other of 8 disks, both raidz1. I'm pretty sure that the 9 drive vdev is '0' in this output ... as the 9 drive vdev lists first in the zpool status output (and was created first). Anyways, here's two lines from the verbose zdb output: 100000 L0 1:720cfa33400:24c00 20000L/20000P F=1 B=11756828567/11756828567 120000 L0 0:94048dc9000:24000 20000L/20000P F=1 B=11756828567/11756828567 ... What I'm not clear on is why the 8 drive vdev is writing 24c00 bytes and the 9 drive vdev is writing 24000 bytes. And ... in either case, am I to fetch the first 20000 bytes? ie: zdb -R 0:94048dc9000:20000 and zdb -R 1:720cfa33400:20000 ? If, when I read these blocks with zdb, the filesystem is reporting to checksum errors, am I getting the right data? Do I need to process the parity? On Sat, Nov 22, 2014 at 6:56 PM, Zaphod Beeblebrox <zbeeble@gmail.com> wrote: > I have a file that ZFS claims is in error that when I go through all the > effort to retrieve it, is not in error. I have 405 files, then, that zfs > says are in error on this array and since some are rather large and since > retrieving one block seems to take 30 seconds (ie: hundreds of hours of > time to recover some files), I'd like to ask if there's some way to finesse > this... or to fix zfs. > > To start, my array has errors like: > > NAME STATE READ WRITE CKSUM > vr2 ONLINE 0 0 989 > raidz1-0 ONLINE 0 0 1.93K > label/vr2-d0 ONLINE 0 0 0 > > (I've omitted the other lines ... they all '0'). I asked what this meant > ... and the best I got was that the errors were not assigned to any > particular device. So I learned how to use ZDB and I have a patch for > ZDB. Apparently the deadlist can have a null in it that crashes ZDB. > > No matter. We have this file in the output of zpool status -v: > > vr2/Audio@20080305-1450:/cds/service/02-Lord_Have_Mercy_Kyrie.mp3 > > ... now even though it picks on the snapshot (not all of the -v reports > do), the following fails: > > [1:170:470]root@virtual:/vr1/tmp/diag> cp > /vr2/Audio/cds/service/02-Lord_Have_Mercy_Kyrie.mp3 . > cp: foo.mp3: Bad address > > So I did this: > > for i in `grep L0 4351-dddddddd.txt | grep -v vr2/Audio | head -50 | cut > -c22-34`; do cc=`printf %05d $count`; echo getting $i 4035/b$cc; time zdb > -R vr2 $i:20000:r >4035/b$cc & count=$[count+1]; done > > --- basically, 4351-dddddddd.txt is the output of zdb for that file (see > http://pastebin.com/tdqEJKJB) and the little script calls zdb to get the > first 20000 (hex) of each block because the remaining 4000 is the parity (9 > disk array). > > Then I cat it into one file, then I truncate it to the specified length > .... > > and lo and behold: The file is sound. > > So what's ZFS on about not wanting to read this file? Help? >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CACpH0Me7xM2t7KWxVkPJtdAty5o_f7ztK3zuEq77rNM-PhnJGw>