FreeBSD Mail Archives

Date:      Sat, 2 Sep 2017 19:17:16 +0200
From:      Wiktor Niesiobedzki <bsd@vink.pl>
To:        freebsd-fs <freebsd-fs@freebsd.org>
Subject:   Re: Resolving errors with ZVOL-s
Message-ID:  <CAH17caWn9gzUgiHEY5CGXv%2BLv3CAp%2BTkaiBJ=J3giA9aHaG_iA@mail.gmail.com>
In-Reply-To: <CAH17caWLTOKvs-AoXGcUg8WkiBFZn1GFzMHeQwDUwmsWia7qBQ@mail.gmail.com>
References:  <CAH17caWLTOKvs-AoXGcUg8WkiBFZn1GFzMHeQwDUwmsWia7qBQ@mail.gmail.com>

index | next in thread | previous in thread | raw e-mail


Hi,

I have recently encountered errors on my ZFS Pool on my 11.1-R:
$ uname -a
FreeBSD kadlubek 11.1-RELEASE-p1 FreeBSD 11.1-RELEASE-p1 #0: Wed Aug  9
11:55:48 UTC 2017
root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC
amd64

# zpool status -v tank
  pool: tank
 state: ONLINE
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: http://illumos.org/msg/ZFS-8000-8A
  scan: scrub repaired 0 in 5h27m with 0 errors on Sat Sep  2 15:30:59 2017
config:

        NAME               STATE     READ WRITE CKSUM
        tank               ONLINE       0     0    98
          mirror-0         ONLINE       0     0   196
            gpt/tank1.eli  ONLINE       0     0   196
            gpt/tank2.eli  ONLINE       0     0   196

errors: Permanent errors have been detected in the following files:

        dkr-test:<0x1>

dkr-test is ZVOL that I use within bhyve and indeed - within bhyve I have
noticed I/O errors on this volume. This ZVOL did not have any snapshots.

Following the advice mentioned in action I tried to restore the ZVOL:
# zfs desroy tank/dkr-test

But still errors are mentioned in zpool status:
errors: Permanent errors have been detected in the following files:

        <0x5095>:<0x1>

I can't find any reference to this dataset in zdb:
 # zdb -d tank | grep 5095
 # zdb -d tank | grep 20629


I tried also getting statistics about metadata in this pool:
# zdb -b tank

Traversing all blocks to verify nothing leaked ...

loading space map for vdev 0 of 1, metaslab 159 of 174 ...
        No leaks (block sum matches space maps exactly)

        bp count:        24426601
        ganged count:           0
        bp logical:    1983127334912      avg:  81187
        bp physical:   1817897247232      avg:  74422     compression:
1.09
        bp allocated:  1820446928896      avg:  74527     compression:
1.09
        bp deduped:             0    ref>1:      0   deduplication:   1.00
        SPA allocated: 1820446928896     used: 60.90%

        additional, non-pointer bps of type 0:      57981
        Dittoed blocks on same vdev: 296490

And zdb got stuck using 100% CPU

And now to my questions:
1. Do I interpret correctly, that this situation is probably due to error
during write, and both copies of the block got checksum mismatching their
data? And if it is a hardware problem, it is probably something other than
disk? (No, I don't use ECC RAM)

2. Is there any way to remove offending dataset and clean the pool of the
errors?

3. Is my metadata OK? Or should I restore entire pool from backup?

4. I tried also running zdb -bc tank, but this resulted in kernel panic. I
might try to get the stack trace once I get physical access to machine next
week. Also - checksum verification slows down process from 1000MB/s to less
than 1MB/s. Is this expected?

5. When I work with zdb (as as above) should I try to limit writes to the
pool (e.g. by unmounting the datasets)?

Cheers,

Wiktor Niesiobedzki

PS. Sorry for previous partial message.

help

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAH17caWn9gzUgiHEY5CGXv%2BLv3CAp%2BTkaiBJ=J3giA9aHaG_iA>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation