From owner-freebsd-fs@FreeBSD.ORG Mon Nov 24 01:00:52 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 409EDA02 for ; Mon, 24 Nov 2014 01:00:52 +0000 (UTC) Received: from mail-ob0-x229.google.com (mail-ob0-x229.google.com [IPv6:2607:f8b0:4003:c01::229]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 02D37EE0 for ; Mon, 24 Nov 2014 01:00:52 +0000 (UTC) Received: by mail-ob0-f169.google.com with SMTP id vb8so6449817obc.28 for ; Sun, 23 Nov 2014 17:00:50 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=4mIGsBZ8p3u9x9omZEFLDMbBQMaAN8MYeeACmPvi4ps=; b=lZjDosi+sKsn24qkWRi4tsJIYFK9MSwknS+52Cml3heN0ecQUbwLTIxLTPohIxCS4W YEorulNoIJPxifbRX4eIz4u3CTp21A5Z6Q5f+XSlCve13yVrghQQaO7fRjNzgzp0T7q1 RJDbOzYYcGxhyXB4UoNdk3Fp7dJbiPtCCCngyZfmpgZcRG4USR0SqM3Oj6Kk5P4f87/8 Vo7+PdVtNX28An/U25FhTux/+dxAVQuRKAZGmng0TqaPnQUfWEzWI3/9+f0EH+DcyXa4 xFocGnjnDdRnEiObWwJ9M4n88mH4sTyIHAGw94yPqpg1SoREPSCaFD0WX1zwVYKvuuL+ TizA== MIME-Version: 1.0 X-Received: by 10.202.196.206 with SMTP id u197mr10031014oif.21.1416790850629; Sun, 23 Nov 2014 17:00:50 -0800 (PST) Received: by 10.76.0.138 with HTTP; Sun, 23 Nov 2014 17:00:50 -0800 (PST) In-Reply-To: References: Date: Sun, 23 Nov 2014 20:00:50 -0500 Message-ID: Subject: Re: When a ZFS error is not an error. From: Zaphod Beeblebrox To: freebsd-fs Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.18-1 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 24 Nov 2014 01:00:52 -0000 So... I recovered a 2nd file... this time a rar file. It fails the checksum, but I'm unsure of how I'm extracting the file. The array consists of two vdevs: 1 of 9 disks and the other of 8 disks, both raidz1. I'm pretty sure that the 9 drive vdev is '0' in this output ... as the 9 drive vdev lists first in the zpool status output (and was created first). Anyways, here's two lines from the verbose zdb output: 100000 L0 1:720cfa33400:24c00 20000L/20000P F=1 B=11756828567/11756828567 120000 L0 0:94048dc9000:24000 20000L/20000P F=1 B=11756828567/11756828567 ... What I'm not clear on is why the 8 drive vdev is writing 24c00 bytes and the 9 drive vdev is writing 24000 bytes. And ... in either case, am I to fetch the first 20000 bytes? ie: zdb -R 0:94048dc9000:20000 and zdb -R 1:720cfa33400:20000 ? If, when I read these blocks with zdb, the filesystem is reporting to checksum errors, am I getting the right data? Do I need to process the parity? On Sat, Nov 22, 2014 at 6:56 PM, Zaphod Beeblebrox wrote: > I have a file that ZFS claims is in error that when I go through all the > effort to retrieve it, is not in error. I have 405 files, then, that zfs > says are in error on this array and since some are rather large and since > retrieving one block seems to take 30 seconds (ie: hundreds of hours of > time to recover some files), I'd like to ask if there's some way to finesse > this... or to fix zfs. > > To start, my array has errors like: > > NAME STATE READ WRITE CKSUM > vr2 ONLINE 0 0 989 > raidz1-0 ONLINE 0 0 1.93K > label/vr2-d0 ONLINE 0 0 0 > > (I've omitted the other lines ... they all '0'). I asked what this meant > ... and the best I got was that the errors were not assigned to any > particular device. So I learned how to use ZDB and I have a patch for > ZDB. Apparently the deadlist can have a null in it that crashes ZDB. > > No matter. We have this file in the output of zpool status -v: > > vr2/Audio@20080305-1450:/cds/service/02-Lord_Have_Mercy_Kyrie.mp3 > > ... now even though it picks on the snapshot (not all of the -v reports > do), the following fails: > > [1:170:470]root@virtual:/vr1/tmp/diag> cp > /vr2/Audio/cds/service/02-Lord_Have_Mercy_Kyrie.mp3 . > cp: foo.mp3: Bad address > > So I did this: > > for i in `grep L0 4351-dddddddd.txt | grep -v vr2/Audio | head -50 | cut > -c22-34`; do cc=`printf %05d $count`; echo getting $i 4035/b$cc; time zdb > -R vr2 $i:20000:r >4035/b$cc & count=$[count+1]; done > > --- basically, 4351-dddddddd.txt is the output of zdb for that file (see > http://pastebin.com/tdqEJKJB) and the little script calls zdb to get the > first 20000 (hex) of each block because the remaining 4000 is the parity (9 > disk array). > > Then I cat it into one file, then I truncate it to the specified length > .... > > and lo and behold: The file is sound. > > So what's ZFS on about not wanting to read this file? Help? >