Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 27 Jan 2021 09:55:49 +0000
From:      Matt Churchyard <matt.churchyard@userve.net>
To:        "freebsd-fs@freebsd.org" <freebsd-fs@freebsd.org>
Subject:   ZFS issues on 13-current snapshot
Message-ID:  <283b2b0b5df34c4f9b8c78b078a67381@SERVER.ad.usd-group.com>

next in thread | raw e-mail | index | archive | help
Hello,

I'm testing a 13-current machine for future use as an encrypted offsite bac=
kup store. As it's near release I was kind of hoping to get away with using=
 this 13 snapshot for a few months then switch to a RELEASE bootenv when it=
 comes out.

However, I seem to be having a few issues.
First of all I stated noticing that the USED & REFER columns weren't equal =
for individual datasets. This system so far has simply received a single sn=
apshot of a few datasets, and had readonly set immediately after. Some of t=
hem are showing several hundred MB linked to snapshots on datasets that hav=
en't been touched. I'm unable to send further snapshots without forcing a r=
ollback first. Not the end of the world but this isn't right and has never =
happened on previous ZFS systems. The most I've seen is a few KB because I =
forgot to set readonly and went into a few directories on a dataset with at=
ime=3Don.

offsite   446G  6.36T      140K  /offsite
[...]
offsite/secure/cms                                                359M  6.3=
6T      341M  /offsite/secure/cms
offsite/secure/cms@26-01-2021                                    17.6M     =
 -      341M  -
offsite/secure/company                                            225G  6.3=
6T      224G  /offsite/secure/company
offsite/secure/company@25-01-2021                                 673M     =
 -      224G  -

offsite/secure is an encrypted dataset using default options.
zfs diff will sit for a while (on small datasets - I gave up trying to run =
it on anything over a few GB) and eventually output nothing.

root@offsite:/etc # uname -a
FreeBSD offsite.backup 13.0-CURRENT FreeBSD 13.0-CURRENT #0 main-c255641-gf=
2b794e1e90: Thu Jan  7 06:25:26 UTC 2021     root@releng1.nyi.freebsd.org:/=
usr/obj/usr/src/amd64.amd64/sys/GENERIC  amd64
root@offsite:/etc # zpool version
zfs-0.8.0-1
zfs-kmod-0.8.0-1

I then thought I would run a scrub just to see if it found any obvious prob=
lems.
It started off running fine, estimating about 45-60 minutes for the whole p=
rocess of scanning 446GB. (This is 4 basic SATA Ironwolf 4TB disks in raidz=
2)
However it appeared to stall at 19.7%. Eventually it hit 19.71, and does ap=
pear to be going up, but at this point looks like it may take a days to com=
plete (currently says 3 hours but it's skewed by the initial fast progress =
and going up every time I check).
Gstat shows the disks at 100% doing anywhere between 10-50MB/s. (They were =
hitting anywhere up to 170MB/s to start off with. Obviously this varies whe=
n having to seek, but even at the rates currently seen I suspect it should =
be progressing faster than zpool output shows)

root@offsite:/etc # zpool status
  pool: offsite
state: ONLINE
  scan: scrub in progress since Wed Jan 27 09:29:50 2021
        555G scanned at 201M/s, 182G issued at 65.8M/s, 921G total
        0B repaired, 19.71% done, 03:11:51 to go
config:

        NAME                   STATE     READ WRITE CKSUM
        offsite                ONLINE       0     0     0
          raidz2-0             ONLINE       0     0     0
            gpt/data-ZGY85VKX  ONLINE       0     0     0
            gpt/data-ZGY88MRY  ONLINE       0     0     0
            gpt/data-ZGY88NZJ  ONLINE       0     0     0
            gpt/data-ZGY88QKF  ONLINE       0     0     0

errors: No known data errors

Update: I've probably spent 30+ minutes writing this email and it's reporti=
ng a few more GB read but not a single digit in progress percent.

  scan: scrub in progress since Wed Jan 27 09:29:50 2021
        559G scanned at 142M/s, 182G issued at 46.2M/s, 921G total
        0B repaired, 19.71% done, 04:33:08 to go

It doesn't inspire a lot of confidence. ZFS had become pretty rock solid in=
 FreeBSD in recent years and I have many systems running it. This should ha=
ve the most efficient scrub code to date and yet is currently taking about =
an hour to progress 0.01% on a new system with a fraction of the data it wi=
ll hold and 0% fragmentation.

As it stands at the moment, I will likely scrap this attempt and retry with=
 FreeBSD 12.

Regards,
Matt Churchyard




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?283b2b0b5df34c4f9b8c78b078a67381>