Date: Mon, 4 Sep 2017 19:12:50 +0200 From: Wiktor Niesiobedzki <bsd@vink.pl> To: freebsd-fs <freebsd-fs@freebsd.org> Subject: Re: Resolving errors with ZVOL-s Message-ID: <CAH17caXiKWi8APaGSJpXCxuQhXzNX7-VOfog%2BOyOCzxZ_PvF7g@mail.gmail.com> In-Reply-To: <CAH17caWn9gzUgiHEY5CGXv%2BLv3CAp%2BTkaiBJ=J3giA9aHaG_iA@mail.gmail.com> References: <CAH17caWLTOKvs-AoXGcUg8WkiBFZn1GFzMHeQwDUwmsWia7qBQ@mail.gmail.com> <CAH17caWn9gzUgiHEY5CGXv%2BLv3CAp%2BTkaiBJ=J3giA9aHaG_iA@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Hi, I can follow up on my issue - the same problem just happened on the second ZVOL that I've created: # zpool status -v pool: tank state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://illumos.org/msg/ZFS-8000-8A scan: scrub repaired 0 in 5h27m with 0 errors on Sat Sep 2 15:30:59 2017 config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 14 mirror-0 ONLINE 0 0 28 gpt/tank1.eli ONLINE 0 0 28 gpt/tank2.eli ONLINE 0 0 28 errors: Permanent errors have been detected in the following files: tank/docker-big:<0x1> <0x5095>:<0x1> I suspect that these errors might be related to my recent upgrade to 11.1. Until 19 of August I was running 11.0. I consider rolling back to 11.0 right now. For reference: # zfs get all tank/docker-big NAME PROPERTY VALUE SOURCE tank/docker-big type volume - tank/docker-big creation Sat Sep 2 10:09 2017 - tank/docker-big used 100G - tank/docker-big available 747G - tank/docker-big referenced 10.5G - tank/docker-big compressratio 4.58x - tank/docker-big reservation none default tank/docker-big volsize 100G local tank/docker-big volblocksize 128K - tank/docker-big checksum skein inherited from tank tank/docker-big compression lz4 inherited from tank tank/docker-big readonly off default tank/docker-big copies 1 default tank/docker-big refreservation 100G local tank/docker-big primarycache all default tank/docker-big secondarycache all default tank/docker-big usedbysnapshots 0 - tank/docker-big usedbydataset 10.5G - tank/docker-big usedbychildren 0 - tank/docker-big usedbyrefreservation 89.7G - tank/docker-big logbias latency default tank/docker-big dedup off default tank/docker-big mlslabel - tank/docker-big sync standard default tank/docker-big refcompressratio 4.58x - tank/docker-big written 10.5G - tank/docker-big logicalused 47.8G - tank/docker-big logicalreferenced 47.8G - tank/docker-big volmode dev local tank/docker-big snapshot_limit none default tank/docker-big snapshot_count none default tank/docker-big redundant_metadata all default tank/docker-big com.sun:auto-snapshot false local Any ideas what should I try before rolling back? Cheers, Wiktor 2017-09-02 19:17 GMT+02:00 Wiktor Niesiobedzki <bsd@vink.pl>: > Hi, > > I have recently encountered errors on my ZFS Pool on my 11.1-R: > $ uname -a > FreeBSD kadlubek 11.1-RELEASE-p1 FreeBSD 11.1-RELEASE-p1 #0: Wed Aug 9 > 11:55:48 UTC 2017 root@amd64-builder.daemonology > .net:/usr/obj/usr/src/sys/GENERIC amd64 > > # zpool status -v tank > pool: tank > state: ONLINE > status: One or more devices has experienced an error resulting in data > corruption. Applications may be affected. > action: Restore the file in question if possible. Otherwise restore the > entire pool from backup. > see: http://illumos.org/msg/ZFS-8000-8A > scan: scrub repaired 0 in 5h27m with 0 errors on Sat Sep 2 15:30:59 2017 > config: > > NAME STATE READ WRITE CKSUM > tank ONLINE 0 0 98 > mirror-0 ONLINE 0 0 196 > gpt/tank1.eli ONLINE 0 0 196 > gpt/tank2.eli ONLINE 0 0 196 > > errors: Permanent errors have been detected in the following files: > > dkr-test:<0x1> > > dkr-test is ZVOL that I use within bhyve and indeed - within bhyve I have > noticed I/O errors on this volume. This ZVOL did not have any snapshots. > > Following the advice mentioned in action I tried to restore the ZVOL: > # zfs desroy tank/dkr-test > > But still errors are mentioned in zpool status: > errors: Permanent errors have been detected in the following files: > > <0x5095>:<0x1> > > I can't find any reference to this dataset in zdb: > # zdb -d tank | grep 5095 > # zdb -d tank | grep 20629 > > > I tried also getting statistics about metadata in this pool: > # zdb -b tank > > Traversing all blocks to verify nothing leaked ... > > loading space map for vdev 0 of 1, metaslab 159 of 174 ... > No leaks (block sum matches space maps exactly) > > bp count: 24426601 > ganged count: 0 > bp logical: 1983127334912 avg: 81187 > bp physical: 1817897247232 avg: 74422 compression: > 1.09 > bp allocated: 1820446928896 avg: 74527 compression: > 1.09 > bp deduped: 0 ref>1: 0 deduplication: 1.00 > SPA allocated: 1820446928896 used: 60.90% > > additional, non-pointer bps of type 0: 57981 > Dittoed blocks on same vdev: 296490 > > And zdb got stuck using 100% CPU > > And now to my questions: > 1. Do I interpret correctly, that this situation is probably due to error > during write, and both copies of the block got checksum mismatching their > data? And if it is a hardware problem, it is probably something other than > disk? (No, I don't use ECC RAM) > > 2. Is there any way to remove offending dataset and clean the pool of the > errors? > > 3. Is my metadata OK? Or should I restore entire pool from backup? > > 4. I tried also running zdb -bc tank, but this resulted in kernel panic. I > might try to get the stack trace once I get physical access to machine next > week. Also - checksum verification slows down process from 1000MB/s to less > than 1MB/s. Is this expected? > > 5. When I work with zdb (as as above) should I try to limit writes to the > pool (e.g. by unmounting the datasets)? > > Cheers, > > Wiktor Niesiobedzki > > PS. Sorry for previous partial message. > >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAH17caXiKWi8APaGSJpXCxuQhXzNX7-VOfog%2BOyOCzxZ_PvF7g>