Date: Sat, 2 Sep 2017 19:17:16 +0200 From: Wiktor Niesiobedzki <bsd@vink.pl> To: freebsd-fs <freebsd-fs@freebsd.org> Subject: Re: Resolving errors with ZVOL-s Message-ID: <CAH17caWn9gzUgiHEY5CGXv%2BLv3CAp%2BTkaiBJ=J3giA9aHaG_iA@mail.gmail.com> In-Reply-To: <CAH17caWLTOKvs-AoXGcUg8WkiBFZn1GFzMHeQwDUwmsWia7qBQ@mail.gmail.com> References: <CAH17caWLTOKvs-AoXGcUg8WkiBFZn1GFzMHeQwDUwmsWia7qBQ@mail.gmail.com>
index | next in thread | previous in thread | raw e-mail
Hi,
I have recently encountered errors on my ZFS Pool on my 11.1-R:
$ uname -a
FreeBSD kadlubek 11.1-RELEASE-p1 FreeBSD 11.1-RELEASE-p1 #0: Wed Aug 9
11:55:48 UTC 2017
root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC
amd64
# zpool status -v tank
pool: tank
state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: http://illumos.org/msg/ZFS-8000-8A
scan: scrub repaired 0 in 5h27m with 0 errors on Sat Sep 2 15:30:59 2017
config:
NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 98
mirror-0 ONLINE 0 0 196
gpt/tank1.eli ONLINE 0 0 196
gpt/tank2.eli ONLINE 0 0 196
errors: Permanent errors have been detected in the following files:
dkr-test:<0x1>
dkr-test is ZVOL that I use within bhyve and indeed - within bhyve I have
noticed I/O errors on this volume. This ZVOL did not have any snapshots.
Following the advice mentioned in action I tried to restore the ZVOL:
# zfs desroy tank/dkr-test
But still errors are mentioned in zpool status:
errors: Permanent errors have been detected in the following files:
<0x5095>:<0x1>
I can't find any reference to this dataset in zdb:
# zdb -d tank | grep 5095
# zdb -d tank | grep 20629
I tried also getting statistics about metadata in this pool:
# zdb -b tank
Traversing all blocks to verify nothing leaked ...
loading space map for vdev 0 of 1, metaslab 159 of 174 ...
No leaks (block sum matches space maps exactly)
bp count: 24426601
ganged count: 0
bp logical: 1983127334912 avg: 81187
bp physical: 1817897247232 avg: 74422 compression:
1.09
bp allocated: 1820446928896 avg: 74527 compression:
1.09
bp deduped: 0 ref>1: 0 deduplication: 1.00
SPA allocated: 1820446928896 used: 60.90%
additional, non-pointer bps of type 0: 57981
Dittoed blocks on same vdev: 296490
And zdb got stuck using 100% CPU
And now to my questions:
1. Do I interpret correctly, that this situation is probably due to error
during write, and both copies of the block got checksum mismatching their
data? And if it is a hardware problem, it is probably something other than
disk? (No, I don't use ECC RAM)
2. Is there any way to remove offending dataset and clean the pool of the
errors?
3. Is my metadata OK? Or should I restore entire pool from backup?
4. I tried also running zdb -bc tank, but this resulted in kernel panic. I
might try to get the stack trace once I get physical access to machine next
week. Also - checksum verification slows down process from 1000MB/s to less
than 1MB/s. Is this expected?
5. When I work with zdb (as as above) should I try to limit writes to the
pool (e.g. by unmounting the datasets)?
Cheers,
Wiktor Niesiobedzki
PS. Sorry for previous partial message.
help
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAH17caWn9gzUgiHEY5CGXv%2BLv3CAp%2BTkaiBJ=J3giA9aHaG_iA>
