Date: Tue, 11 Oct 2011 08:51:40 -0400 From: Paul Mather <paul@gromit.dlib.vt.edu> To: Olivier Smedts <olivier@gid0.org> Cc: freebsd-current@freebsd.org Subject: Re: Strange ZFS filesystem corruption Message-ID: <A253C881-8148-401E-ABD2-A8289D89FBA0@gromit.dlib.vt.edu> In-Reply-To: <CABzXLYOsXWxR8u2RvuVekQ4%2BgwPQKT6C1dSasQoAUdGY0VBMjQ@mail.gmail.com> References: <8B59D754-9062-4499-9873-7C2167622032@gromit.dlib.vt.edu> <CABzXLYOsXWxR8u2RvuVekQ4%2BgwPQKT6C1dSasQoAUdGY0VBMjQ@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Oct 4, 2011, at 12:09 PM, Olivier Smedts wrote: > 2011/10/3 Paul Mather <paul@gromit.dlib.vt.edu>: >> I know ZFS does not have a fsck utility ("because it doesn't need = one":), but does anyone know of any way of fixing this corruption short = of destroying the pool, creating a new one, and restoring from backup? = Is there some way of exporting and re-importing the pool that has the = side-effect of doing some kind of fsck-like repairing of subtle = corruption like this? >=20 > But there is the ZFS debugger, "zdb" ! >=20 > I can't really help you with that because I never had a corrupted > zpool, but if you search on the lists for up to a year or so, you'll > find some useful commands to inspect and destroy corrupted objects. >=20 >=20 > Usage: zdb [-CumdibcsDvhL] poolname [object...] > zdb [-div] dataset [object...] > zdb -m [-L] poolname [vdev [metaslab...]] > zdb -R poolname vdev:offset:size[:flags] > zdb -S poolname > zdb -l [-u] device > zdb -C >=20 > Dataset name must include at least one separator character '/' or = '@' > If dataset name is specified, only that dataset is dumped > If object numbers are specified, only those objects are dumped >=20 > Options to control amount of output: > -u uberblock > -d dataset(s) > -i intent logs > -C config (or cachefile if alone) > -h pool history > -b block statistics > -m metaslabs > -c checksum all metadata (twice for all data) blocks > -s report stats on zdb's I/O > -D dedup statistics > -S simulate dedup to measure effect > -v verbose (applies to all others) > -l dump label contents > -L disable leak tracking (do not load spacemaps) > -R read and display block from a device >=20 > Below options are intended for use with other options (except -l): > -A ignore assertions (-A), enable panic recovery (-AA) or both = (-AAA) > -F attempt automatic rewind within safe range of transaction = groups > -U <cachefile_path> -- use alternate cachefile > -X attempt extreme rewind (does not work with dataset) > -e pool is exported/destroyed/has altroot/not in a cachefile > -p <path> -- use one or more with -e to specify path to vdev = dir > -P print numbers parsable > -t <txg> -- highest txg to use when searching for uberblocks > Specify an option more than once (e.g. -bb) to make only that option = verbose > Default is to dump everything non-verbosely I tried your suggestion and ran the command "zdb -ccv backups" to try = and check the consistency of the troublesome "backups" pool. This is = what I ended up with: =3D=3D=3D=3D=3D Traversing all blocks to verify checksums and verify nothing leaked ... [[...]] leaked space: vdev 0, offset 0x900b5557600, size 82944 leaked space: vdev 0, offset 0x900b556e400, size 23040 leaked space: vdev 0, offset 0x900b5553a00, size 9216 leaked space: vdev 0, offset 0x900b5540800, size 23040 leaked space: vdev 0, offset 0x900b550ea00, size 16896 leaked space: vdev 0, offset 0x900b54e2c00, size 9216 leaked space: vdev 0, offset 0x900b50f5600, size 6144 leaked space: vdev 0, offset 0x900b558dc00, size 70656 leaked space: vdev 0, offset 0x900b5580400, size 44544 leaked space: vdev 0, offset 0x900b55bd000, size 82944 leaked space: vdev 0, offset 0x900b55d6200, size 15360 leaked space: vdev 0, offset 0x900b55dd400, size 33792 leaked space: vdev 0, offset 0x900b55d2c00, size 6144 leaked space: vdev 0, offset 0x900b55a0800, size 95232 leaked space: vdev 0, offset 0x900b55f5400, size 6144 leaked space: vdev 0, offset 0x900b5716c00, size 6144 leaked space: vdev 0, offset 0x900b56e8400, size 6144 leaked space: vdev 0, offset 0x900b573b800, size 6144 leaked space: vdev 0, offset 0x900b5748a00, size 10752 leaked space: vdev 0, offset 0x900b58b5e00, size 3072 leaked space: vdev 0, offset 0x900b589de00, size 6144 leaked space: vdev 0, offset 0x900b575fe00, size 7680 leaked space: vdev 0, offset 0x900b5734600, size 15360 leaked space: vdev 0, offset 0x900b55e8200, size 43008 leaked space: vdev 0, offset 0x900b58ca200, size 27648 leaked space: vdev 0, offset 0x900b591d600, size 3072 leaked space: vdev 0, offset 0x900b591fa00, size 12288 leaked space: vdev 0, offset 0x900b5904a00, size 6144 leaked space: vdev 0, offset 0x900b594f400, size 53760 leaked space: vdev 0, offset 0x900b5939200, size 3072 leaked space: vdev 0, offset 0x900b5960800, size 4608 leaked space: vdev 0, offset 0x900b5966e00, size 3072 leaked space: vdev 0, offset 0x900b5963200, size 9216 leaked space: vdev 0, offset 0x900b595de00, size 4608 leaked space: vdev 0, offset 0x900b5928400, size 3072 leaked space: vdev 0, offset 0x900c9a93200, size 4608 leaked space: vdev 0, offset 0x900c9a8d800, size 21504 leaked space: vdev 0, offset 0x900c9afa400, size 3072 leaked space: vdev 0, offset 0x900c9af4a00, size 21504 leaked space: vdev 0, offset 0x900b5977000, size 9216 leaked space: vdev 0, offset 0x900b58b7000, size 75264 leaked space: vdev 0, offset 0x900b5575600, size 38400 leaked space: vdev 0, offset 0x900b4b24a00, size 18432 leaked space: vdev 0, offset 0x900b37a4400, size 6144 leaked space: vdev 0, offset 0x9004e2e5600, size 1536 leaked space: vdev 0, offset 0x9003d14cc00, size 1536 leaked space: vdev 0, offset 0x9002ef99200, size 39936 leaked space: vdev 0, offset 0x90027485400, size 12288 leaked space: vdev 0, offset 0x9001010d600, size 39936 block traversal size 7227697021440 !=3D alloc 7479385864704 (leaked = 251688843264) bp count: 66257621 bp logical: 7189198687232 avg: 108503 bp physical: 4780682987008 avg: 72152 compression: = 1.50 bp allocated: 7227697021440 avg: 109084 compression: = 0.99 bp deduped: 0 ref>1: 0 deduplication: = 1.00 SPA allocated: 7479385864704 used: 62.55% =3D=3D=3D=3D=3D (On a different pool that I checked using zdb I got the message "No = leaks (block sum matches space maps exactly)" followed by several lines = of pool statistics, which, I presume is the message you get when the = pool is okay.) I'm presuming from the above that the space leaks mean the pool is = corrupted, and that zdb has detected but not corrected this corruption. = I assume that ZFS's way of fixing corruption is not a fsck but a = "destroy the pool and restore it from backup?" Does that sound about = right? Cheers, Paul.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?A253C881-8148-401E-ABD2-A8289D89FBA0>