From owner-freebsd-hackers@freebsd.org Sun Aug 14 05:53:35 2016 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 447D7BB96AA; Sun, 14 Aug 2016 05:53:35 +0000 (UTC) (envelope-from zbeeble@gmail.com) Received: from mail-yw0-x22e.google.com (mail-yw0-x22e.google.com [IPv6:2607:f8b0:4002:c05::22e]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 061121B7B; Sun, 14 Aug 2016 05:53:35 +0000 (UTC) (envelope-from zbeeble@gmail.com) Received: by mail-yw0-x22e.google.com with SMTP id j12so12734128ywb.2; Sat, 13 Aug 2016 22:53:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:from:date:message-id:subject:to; bh=RaM3dVYlvXRPMFZLb8yN69GZIDDEa5ha9zG8kaX9QNU=; b=gGpbo1vAycUyWOTwN02KNZ5YKpusSMB0rYfBUvAfwNy5tabl1AjFljSKeUrtCF5IAA U9ojzgVyXbxvVOC2N2pcn30N8N+fLvfKf1Uf4Tv5RWSc+s6hujaDjFO2/Gc7NnsHkgEl N+8N4fN3wda5S4dJw54f1SpRUbAK9KKG0rmorWTZqKiowx8EhGGMdLPJ8HGPHgrV2wml 9c+0IXchmshN/oJrMPkS7I6p3MS2mvFIZqHXJ4uyi+FT0Am8PNbq4YtGVZGLRx9PB1R+ b27XX5CEonddvbLoEWKbGOGcn1UyawauDXFQxRfHhZ5Y3TA0DSo3D+hAfa0Q7PYMuJGE JSWg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=RaM3dVYlvXRPMFZLb8yN69GZIDDEa5ha9zG8kaX9QNU=; b=BQtdntLXEB4JvMeJ/P1GERqzamebjq/qhYnJalYGRAuyCYXP4PeS3lOQDXwZX7iKyM bEq91OD16GoiQu51oEORkfuq6ZpVcucoIsLhI5ffW4KKK7JINLzFURpqULwJ1O649h0n WeJUOJVUWQjYncc0SvjjgYZ3HDu69g1kNy7JfzWQj6BxTYubnRheiFgJsL1zN92NXZAx icIN2jw0zYUpVoWmMZVlXiCwFubC/nPOG7LsZ6dR3H8ECRLTTTVe7UaocOZwlGRfBW7A jcJNPXWvUi7sL8QOimznNtrOF8/JUUDW6uCBEzEBnTIqm8r3ikxH0QLUYMDkn+1SZ+bZ UVSA== X-Gm-Message-State: AEkooutP2hEOoCavoZ8o3dDfH+7KENe7Cap3UwX6YWkw1p3iGKZ1vhD8QfDXF9wehwyBiP3DRsWE6CA6DmogsQ== X-Received: by 10.13.195.67 with SMTP id f64mr15761801ywd.1.1471154013865; Sat, 13 Aug 2016 22:53:33 -0700 (PDT) MIME-Version: 1.0 Received: by 10.37.161.37 with HTTP; Sat, 13 Aug 2016 22:53:33 -0700 (PDT) From: Zaphod Beeblebrox Date: Sun, 14 Aug 2016 01:53:33 -0400 Message-ID: Subject: ZFS corrupt DVA panic: can it be fixed? To: freebsd-fs , FreeBSD Hackers Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.22 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 14 Aug 2016 05:53:35 -0000 Before this problem, I had a few crashes... which may have been hardware related. The hardware is (I think) fixed, but this problem remains. My searches seem to indicate that this has happened to other people. ... I've pasted here only the first two lines of the last 3 panic's I've had. panic: dva_get_dsize_sync(): bad DVA 1573890:1587590144 #3 0xffffffff822b8b01 at dva_get_dsize_sync+0xb1 panic: dva_get_dsize_sync(): bad DVA 1573890:1587590144 #3 0xffffffff822b8b01 at dva_get_dsize_sync+0xb1 panic: dva_get_dsize_sync(): bad DVA 1573890:1587590144 #3 0xffffffff822b8b01 at dva_get_dsize_sync+0xb1 I gather that the machine runs until something causes the kernel to encounter the corrupt DVA. I gather from reading stuff that this is part of the structure that holds free space on the drive. Since the numbers are the same in each panic, I'm assuming that each panic is encountering the same one. This is also the panic that is not dumping properly to either USB or spinning disk. I have zdb -uuumcD running right now. It seems to estimate that it's going to take an awfully long time, but the estimation might be broken because it's on 159 of 171 of whatever it's reading. Now... question: is this fixable? Can I just mark off the space as unusable, maybe? Since this has happened to more than one person, I gather it's a significant hole in the claim that ZFS is crashproof (or that it doesn't need repair after crashing). Maybe this check can be added to scrub (or scrub + an option)? Or maybe when we run across it, we fix it? Does fixing it (in the theoretical sense) require knowing all the free space on the drive? Doesn't scrub do that?