Date: Sat, 2 Sep 2017 19:17:16 +0200 From: Wiktor Niesiobedzki <bsd@vink.pl> To: freebsd-fs <freebsd-fs@freebsd.org> Subject: Re: Resolving errors with ZVOL-s Message-ID: <CAH17caWn9gzUgiHEY5CGXv%2BLv3CAp%2BTkaiBJ=J3giA9aHaG_iA@mail.gmail.com> In-Reply-To: <CAH17caWLTOKvs-AoXGcUg8WkiBFZn1GFzMHeQwDUwmsWia7qBQ@mail.gmail.com> References: <CAH17caWLTOKvs-AoXGcUg8WkiBFZn1GFzMHeQwDUwmsWia7qBQ@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Hi, I have recently encountered errors on my ZFS Pool on my 11.1-R: $ uname -a FreeBSD kadlubek 11.1-RELEASE-p1 FreeBSD 11.1-RELEASE-p1 #0: Wed Aug 9 11:55:48 UTC 2017 root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC amd64 # zpool status -v tank pool: tank state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://illumos.org/msg/ZFS-8000-8A scan: scrub repaired 0 in 5h27m with 0 errors on Sat Sep 2 15:30:59 2017 config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 98 mirror-0 ONLINE 0 0 196 gpt/tank1.eli ONLINE 0 0 196 gpt/tank2.eli ONLINE 0 0 196 errors: Permanent errors have been detected in the following files: dkr-test:<0x1> dkr-test is ZVOL that I use within bhyve and indeed - within bhyve I have noticed I/O errors on this volume. This ZVOL did not have any snapshots. Following the advice mentioned in action I tried to restore the ZVOL: # zfs desroy tank/dkr-test But still errors are mentioned in zpool status: errors: Permanent errors have been detected in the following files: <0x5095>:<0x1> I can't find any reference to this dataset in zdb: # zdb -d tank | grep 5095 # zdb -d tank | grep 20629 I tried also getting statistics about metadata in this pool: # zdb -b tank Traversing all blocks to verify nothing leaked ... loading space map for vdev 0 of 1, metaslab 159 of 174 ... No leaks (block sum matches space maps exactly) bp count: 24426601 ganged count: 0 bp logical: 1983127334912 avg: 81187 bp physical: 1817897247232 avg: 74422 compression: 1.09 bp allocated: 1820446928896 avg: 74527 compression: 1.09 bp deduped: 0 ref>1: 0 deduplication: 1.00 SPA allocated: 1820446928896 used: 60.90% additional, non-pointer bps of type 0: 57981 Dittoed blocks on same vdev: 296490 And zdb got stuck using 100% CPU And now to my questions: 1. Do I interpret correctly, that this situation is probably due to error during write, and both copies of the block got checksum mismatching their data? And if it is a hardware problem, it is probably something other than disk? (No, I don't use ECC RAM) 2. Is there any way to remove offending dataset and clean the pool of the errors? 3. Is my metadata OK? Or should I restore entire pool from backup? 4. I tried also running zdb -bc tank, but this resulted in kernel panic. I might try to get the stack trace once I get physical access to machine next week. Also - checksum verification slows down process from 1000MB/s to less than 1MB/s. Is this expected? 5. When I work with zdb (as as above) should I try to limit writes to the pool (e.g. by unmounting the datasets)? Cheers, Wiktor Niesiobedzki PS. Sorry for previous partial message.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAH17caWn9gzUgiHEY5CGXv%2BLv3CAp%2BTkaiBJ=J3giA9aHaG_iA>