From owner-freebsd-current@FreeBSD.ORG Tue Oct 11 12:52:12 2011 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7088E1065673 for ; Tue, 11 Oct 2011 12:52:12 +0000 (UTC) (envelope-from paul@gromit.dlib.vt.edu) Received: from lennier.cc.vt.edu (lennier.cc.vt.edu [198.82.162.213]) by mx1.freebsd.org (Postfix) with ESMTP id 2905E8FC16 for ; Tue, 11 Oct 2011 12:52:11 +0000 (UTC) Received: from dagger.cc.vt.edu (dagger.cc.vt.edu [198.82.163.114]) by lennier.cc.vt.edu (8.13.8/8.13.8) with ESMTP id p9BCpfJk011326; Tue, 11 Oct 2011 08:51:41 -0400 Received: from auth3.smtp.vt.edu (EHLO auth3.smtp.vt.edu) ([198.82.161.152]) by dagger.cc.vt.edu (MOS 4.2.2-FCS FastPath queued) with ESMTP id SOK86122; Tue, 11 Oct 2011 08:51:40 -0400 (EDT) Received: from pmather.tower.lib.vt.edu (pmather.tower.lib.vt.edu [128.173.51.28]) (authenticated bits=0) by auth3.smtp.vt.edu (8.13.8/8.13.8) with ESMTP id p9BCpeNe007688 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NO); Tue, 11 Oct 2011 08:51:40 -0400 Mime-Version: 1.0 (Apple Message framework v1084) Content-Type: text/plain; charset=us-ascii From: Paul Mather In-Reply-To: Date: Tue, 11 Oct 2011 08:51:40 -0400 Content-Transfer-Encoding: quoted-printable Message-Id: References: <8B59D754-9062-4499-9873-7C2167622032@gromit.dlib.vt.edu> To: Olivier Smedts X-Mailer: Apple Mail (2.1084) X-Mirapoint-Received-SPF: 198.82.161.152 auth3.smtp.vt.edu paul@gromit.dlib.vt.edu 5 none X-Junkmail-Status: score=10/50, host=dagger.cc.vt.edu X-Junkmail-Signature-Raw: score=unknown, refid=str=0001.0A020202.4E943BDD.003A,ss=1,fgs=0, ip=0.0.0.0, so=2010-07-22 22:03:31, dmn=2009-09-10 00:05:08, mode=single engine X-Junkmail-IWF: false Cc: freebsd-current@freebsd.org Subject: Re: Strange ZFS filesystem corruption X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 11 Oct 2011 12:52:12 -0000 On Oct 4, 2011, at 12:09 PM, Olivier Smedts wrote: > 2011/10/3 Paul Mather : >> I know ZFS does not have a fsck utility ("because it doesn't need = one":), but does anyone know of any way of fixing this corruption short = of destroying the pool, creating a new one, and restoring from backup? = Is there some way of exporting and re-importing the pool that has the = side-effect of doing some kind of fsck-like repairing of subtle = corruption like this? >=20 > But there is the ZFS debugger, "zdb" ! >=20 > I can't really help you with that because I never had a corrupted > zpool, but if you search on the lists for up to a year or so, you'll > find some useful commands to inspect and destroy corrupted objects. >=20 >=20 > Usage: zdb [-CumdibcsDvhL] poolname [object...] > zdb [-div] dataset [object...] > zdb -m [-L] poolname [vdev [metaslab...]] > zdb -R poolname vdev:offset:size[:flags] > zdb -S poolname > zdb -l [-u] device > zdb -C >=20 > Dataset name must include at least one separator character '/' or = '@' > If dataset name is specified, only that dataset is dumped > If object numbers are specified, only those objects are dumped >=20 > Options to control amount of output: > -u uberblock > -d dataset(s) > -i intent logs > -C config (or cachefile if alone) > -h pool history > -b block statistics > -m metaslabs > -c checksum all metadata (twice for all data) blocks > -s report stats on zdb's I/O > -D dedup statistics > -S simulate dedup to measure effect > -v verbose (applies to all others) > -l dump label contents > -L disable leak tracking (do not load spacemaps) > -R read and display block from a device >=20 > Below options are intended for use with other options (except -l): > -A ignore assertions (-A), enable panic recovery (-AA) or both = (-AAA) > -F attempt automatic rewind within safe range of transaction = groups > -U -- use alternate cachefile > -X attempt extreme rewind (does not work with dataset) > -e pool is exported/destroyed/has altroot/not in a cachefile > -p -- use one or more with -e to specify path to vdev = dir > -P print numbers parsable > -t -- highest txg to use when searching for uberblocks > Specify an option more than once (e.g. -bb) to make only that option = verbose > Default is to dump everything non-verbosely I tried your suggestion and ran the command "zdb -ccv backups" to try = and check the consistency of the troublesome "backups" pool. This is = what I ended up with: =3D=3D=3D=3D=3D Traversing all blocks to verify checksums and verify nothing leaked ... [[...]] leaked space: vdev 0, offset 0x900b5557600, size 82944 leaked space: vdev 0, offset 0x900b556e400, size 23040 leaked space: vdev 0, offset 0x900b5553a00, size 9216 leaked space: vdev 0, offset 0x900b5540800, size 23040 leaked space: vdev 0, offset 0x900b550ea00, size 16896 leaked space: vdev 0, offset 0x900b54e2c00, size 9216 leaked space: vdev 0, offset 0x900b50f5600, size 6144 leaked space: vdev 0, offset 0x900b558dc00, size 70656 leaked space: vdev 0, offset 0x900b5580400, size 44544 leaked space: vdev 0, offset 0x900b55bd000, size 82944 leaked space: vdev 0, offset 0x900b55d6200, size 15360 leaked space: vdev 0, offset 0x900b55dd400, size 33792 leaked space: vdev 0, offset 0x900b55d2c00, size 6144 leaked space: vdev 0, offset 0x900b55a0800, size 95232 leaked space: vdev 0, offset 0x900b55f5400, size 6144 leaked space: vdev 0, offset 0x900b5716c00, size 6144 leaked space: vdev 0, offset 0x900b56e8400, size 6144 leaked space: vdev 0, offset 0x900b573b800, size 6144 leaked space: vdev 0, offset 0x900b5748a00, size 10752 leaked space: vdev 0, offset 0x900b58b5e00, size 3072 leaked space: vdev 0, offset 0x900b589de00, size 6144 leaked space: vdev 0, offset 0x900b575fe00, size 7680 leaked space: vdev 0, offset 0x900b5734600, size 15360 leaked space: vdev 0, offset 0x900b55e8200, size 43008 leaked space: vdev 0, offset 0x900b58ca200, size 27648 leaked space: vdev 0, offset 0x900b591d600, size 3072 leaked space: vdev 0, offset 0x900b591fa00, size 12288 leaked space: vdev 0, offset 0x900b5904a00, size 6144 leaked space: vdev 0, offset 0x900b594f400, size 53760 leaked space: vdev 0, offset 0x900b5939200, size 3072 leaked space: vdev 0, offset 0x900b5960800, size 4608 leaked space: vdev 0, offset 0x900b5966e00, size 3072 leaked space: vdev 0, offset 0x900b5963200, size 9216 leaked space: vdev 0, offset 0x900b595de00, size 4608 leaked space: vdev 0, offset 0x900b5928400, size 3072 leaked space: vdev 0, offset 0x900c9a93200, size 4608 leaked space: vdev 0, offset 0x900c9a8d800, size 21504 leaked space: vdev 0, offset 0x900c9afa400, size 3072 leaked space: vdev 0, offset 0x900c9af4a00, size 21504 leaked space: vdev 0, offset 0x900b5977000, size 9216 leaked space: vdev 0, offset 0x900b58b7000, size 75264 leaked space: vdev 0, offset 0x900b5575600, size 38400 leaked space: vdev 0, offset 0x900b4b24a00, size 18432 leaked space: vdev 0, offset 0x900b37a4400, size 6144 leaked space: vdev 0, offset 0x9004e2e5600, size 1536 leaked space: vdev 0, offset 0x9003d14cc00, size 1536 leaked space: vdev 0, offset 0x9002ef99200, size 39936 leaked space: vdev 0, offset 0x90027485400, size 12288 leaked space: vdev 0, offset 0x9001010d600, size 39936 block traversal size 7227697021440 !=3D alloc 7479385864704 (leaked = 251688843264) bp count: 66257621 bp logical: 7189198687232 avg: 108503 bp physical: 4780682987008 avg: 72152 compression: = 1.50 bp allocated: 7227697021440 avg: 109084 compression: = 0.99 bp deduped: 0 ref>1: 0 deduplication: = 1.00 SPA allocated: 7479385864704 used: 62.55% =3D=3D=3D=3D=3D (On a different pool that I checked using zdb I got the message "No = leaks (block sum matches space maps exactly)" followed by several lines = of pool statistics, which, I presume is the message you get when the = pool is okay.) I'm presuming from the above that the space leaks mean the pool is = corrupted, and that zdb has detected but not corrected this corruption. = I assume that ZFS's way of fixing corruption is not a fsck but a = "destroy the pool and restore it from backup?" Does that sound about = right? Cheers, Paul.