From owner-freebsd-fs@FreeBSD.ORG Sat Nov 22 23:56:30 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 59C4A3EA for ; Sat, 22 Nov 2014 23:56:30 +0000 (UTC) Received: from mail-oi0-x233.google.com (mail-oi0-x233.google.com [IPv6:2607:f8b0:4003:c06::233]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 27D61618 for ; Sat, 22 Nov 2014 23:56:29 +0000 (UTC) Received: by mail-oi0-f51.google.com with SMTP id e131so5256211oig.38 for ; Sat, 22 Nov 2014 15:56:29 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=sh/MGhZkjysTRVaFEupVVNOwfJDE7hXdBVghmrV/mUk=; b=QByPRhx0s+KFGBkfmLnlf+J7hTDa7iydM4/AXFkD9hDozI2XjSCdA7ZRYp0i/JPoNQ 9yhEBC5OMuNa8WNvv7Xi7msXPK+Rb9bFSZgflYIZ6jtZxiDQTBitZdlBGok0AcPrsGZQ JBwMB1/jDu59yM5yb3uCbkvKtqIPdX7EAYi2r4zSe8CzNcC4KPKRfIobXDRb+iRE0k3k nccOi3ogtRsTZqRnci4HTNvrsVY/lzvXlIyvVsbYpK9g/lZZOxk/02fXF8h1+8MKXfzg e+6nDRGZDSD6ZYvX7N8BeZHJjwmsMZ/9qq/aaekk3gBhV9160lrVCFROuj3dw4CYkfr8 PTYQ== MIME-Version: 1.0 X-Received: by 10.182.79.10 with SMTP id f10mr7951700obx.4.1416700589265; Sat, 22 Nov 2014 15:56:29 -0800 (PST) Received: by 10.76.0.138 with HTTP; Sat, 22 Nov 2014 15:56:29 -0800 (PST) Date: Sat, 22 Nov 2014 18:56:29 -0500 Message-ID: Subject: When a ZFS error is not an error. From: Zaphod Beeblebrox To: freebsd-fs Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.18-1 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 22 Nov 2014 23:56:30 -0000 I have a file that ZFS claims is in error that when I go through all the effort to retrieve it, is not in error. I have 405 files, then, that zfs says are in error on this array and since some are rather large and since retrieving one block seems to take 30 seconds (ie: hundreds of hours of time to recover some files), I'd like to ask if there's some way to finesse this... or to fix zfs. To start, my array has errors like: NAME STATE READ WRITE CKSUM vr2 ONLINE 0 0 989 raidz1-0 ONLINE 0 0 1.93K label/vr2-d0 ONLINE 0 0 0 (I've omitted the other lines ... they all '0'). I asked what this meant ... and the best I got was that the errors were not assigned to any particular device. So I learned how to use ZDB and I have a patch for ZDB. Apparently the deadlist can have a null in it that crashes ZDB. No matter. We have this file in the output of zpool status -v: vr2/Audio@20080305-1450:/cds/service/02-Lord_Have_Mercy_Kyrie.mp3 ... now even though it picks on the snapshot (not all of the -v reports do), the following fails: [1:170:470]root@virtual:/vr1/tmp/diag> cp /vr2/Audio/cds/service/02-Lord_Have_Mercy_Kyrie.mp3 . cp: foo.mp3: Bad address So I did this: for i in `grep L0 4351-dddddddd.txt | grep -v vr2/Audio | head -50 | cut -c22-34`; do cc=`printf %05d $count`; echo getting $i 4035/b$cc; time zdb -R vr2 $i:20000:r >4035/b$cc & count=$[count+1]; done --- basically, 4351-dddddddd.txt is the output of zdb for that file (see http://pastebin.com/tdqEJKJB) and the little script calls zdb to get the first 20000 (hex) of each block because the remaining 4000 is the parity (9 disk array). Then I cat it into one file, then I truncate it to the specified length .... and lo and behold: The file is sound. So what's ZFS on about not wanting to read this file? Help?