Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 18 Jan 2022 10:07:25 -0500
From:      Rich <rincebrain@gmail.com>
To:        Alan Somers <asomers@freebsd.org>
Cc:        Florent Rivoire <florent@rivoire.fr>, freebsd-fs <freebsd-fs@freebsd.org>
Subject:   Re: [zfs] recordsize: unexpected increase of disk usage when increasing it
Message-ID:  <CAOeNLuoTdMzD0ooCq%2B6x=uEzcXBmdwPRjm_TfxgUpcnN%2BDhknA@mail.gmail.com>
In-Reply-To: <CAOtMX2g-0rkYz7Q%2BKO=W49OdF5_GnV%2B-VW6Rb5Eb4LokvaPUpA@mail.gmail.com>
References:  <CADzRhsEsZMGE-SoeWLMG9NTtkwhhy6OGQQ046m9AxGFbp5h_kQ@mail.gmail.com> <CAOeNLuopaY3j7P030KO4LMwU3BOU5tXiu6gRsSKsDrFEuGKuaA@mail.gmail.com> <CAOtMX2h=miZt=6__oAhPVzsK9ReShy6nG%2BaTiudvK_jp2sQKJQ@mail.gmail.com> <CAOeNLuoQLgKn673FVotxdoDC3HBr1_j%2BzY0t9-uVj7N%2BFkoe1Q@mail.gmail.com> <CAOtMX2g4KduvFA6W062m93jnrJcjQ9KSzkXjb42F1nvhPWaZsw@mail.gmail.com> <CAOeNLuppbdRbC-bsDEqKKUBMO8KKvaLpVs-OcSA2AF2tO5b03w@mail.gmail.com> <CAOtMX2g-0rkYz7Q%2BKO=W49OdF5_GnV%2B-VW6Rb5Eb4LokvaPUpA@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
--00000000000001f4cb05d5dca1eb
Content-Type: text/plain; charset="UTF-8"

Nope. I just retried it on my FBSD 13-RELEASE VM, too:
# uname -a
FreeBSD fbsd13rel 13.0-RELEASE-p4 FreeBSD 13.0-RELEASE-p4 #0: Tue Aug 24
07:33:27 UTC 2021
root@amd64-builder.daemonology.net:/usr/obj/usr/src/amd64.amd64/sys/GENERIC
 amd64
# zpool version
zfs-2.1.99-683_ga967e54c2
zfs-kmod-2.1.99-683_ga967e54c2
# zpool get all | grep 'feature@' | grep disabled
buildpool  feature@edonr                  disabled
local
# dd if=/dev/urandom of=/buildpool/testme/2 bs=1179648 count=1
1+0 records in
1+0 records out
1179648 bytes transferred in 0.009827 secs (120041885 bytes/sec)
# du -sh /buildpool/testme/2
2.0M    /buildpool/testme/2
# zfs get all buildpool/testme | grep -v default
NAME              PROPERTY              VALUE                  SOURCE
buildpool/testme  type                  filesystem             -
buildpool/testme  creation              Tue Jan 18  4:46 2022  -
buildpool/testme  used                  4.03M                  -
buildpool/testme  available             277G                   -
buildpool/testme  referenced            4.03M                  -
buildpool/testme  compressratio         1.00x                  -
buildpool/testme  mounted               yes                    -
buildpool/testme  recordsize            1M                     local
buildpool/testme  compression           off                    local
buildpool/testme  atime                 off                    inherited
from buildpool
buildpool/testme  createtxg             15030                  -
buildpool/testme  version               5                      -
buildpool/testme  utf8only              off                    -
buildpool/testme  normalization         none                   -
buildpool/testme  casesensitivity       sensitive              -
buildpool/testme  guid                  11057815587819738755   -
buildpool/testme  usedbysnapshots       0B                     -
buildpool/testme  usedbydataset         4.03M                  -
buildpool/testme  usedbychildren        0B                     -
buildpool/testme  usedbyrefreservation  0B                     -
buildpool/testme  objsetid              280                    -
buildpool/testme  refcompressratio      1.00x                  -
buildpool/testme  written               4.03M                  -
buildpool/testme  logicalused           4.01M                  -
buildpool/testme  logicalreferenced     4.01M                  -

What version are you running?

- Rich

On Tue, Jan 18, 2022 at 10:00 AM Alan Somers <asomers@freebsd.org> wrote:

> That's not what I get.  Is your pool formatted using a very old
> version or something?
>
> somers@fbsd-head /u/h/somers [1]>
> dd if=/dev/random bs=1179648 of=/testpool/food/t/richfile count=1
> 1+0 records in
> 1+0 records out
> 1179648 bytes transferred in 0.003782 secs (311906705 bytes/sec)
> somers@fbsd-head /u/h/somers> du -sh  /testpool/food/t/richfile
> 1.1M    /testpool/food/t/richfile
>
> On Tue, Jan 18, 2022 at 7:51 AM Rich <rincebrain@gmail.com> wrote:
> >
> > 2.1M    /workspace/test1M/1
> >
> > - Rich
> >
> > On Tue, Jan 18, 2022 at 9:47 AM Alan Somers <asomers@freebsd.org> wrote:
> >>
> >> Yeah, it does.  Just check "du -sh <FILENAME>".  zdb there is showing
> >> you the logical size of the record, but it isn't showing how many disk
> >> blocks are actually allocated.
> >>
> >> On Tue, Jan 18, 2022 at 7:30 AM Rich <rincebrain@gmail.com> wrote:
> >> >
> >> > Really? I didn't know it would still trim the tails on files with
> compression off.
> >> >
> >> > ...
> >> >
> >> >         size    1179648
> >> >         parent  34
> >> >         links   1
> >> >         pflags  40800000004
> >> > Indirect blocks:
> >> >                0 L1  DVA[0]=<3:c02b96c000:1000>
> DVA[1]=<3:c810733000:1000> [L1 ZFS plain file] skein lz4 unencrypted LE
> contiguous unique double size=20000L/1000P birth=35675472L/35675472P fill=2
> cksum=5cfba24b351a09aa:8bd9dfef87c5b625:906ed5c3252943db:bed77ce51ad540d4
> >> >                0  L0 DVA[0]=<2:a0827db4000:100000> [L0 ZFS plain
> file] skein uncompressed unencrypted LE contiguous unique single
> size=100000L/100000P birth=35675472L/35675472P fill=1
> cksum=95b06edf60e5f54c:af6f6950775d0863:8fc28b0783fcd9d3:2e44676e48a59360
> >> >           100000  L0 DVA[0]=<2:a0827eb4000:100000> [L0 ZFS plain
> file] skein uncompressed unencrypted LE contiguous unique single
> size=100000L/100000P birth=35675472L/35675472P fill=1
> cksum=62a1f05769528648:8197c8a05ca9f1fb:a750c690124dd2e0:390bddc4314cd4c3
> >> >
> >> > It seems not?
> >> >
> >> > - Rich
> >> >
> >> >
> >> > On Tue, Jan 18, 2022 at 9:23 AM Alan Somers <asomers@freebsd.org>
> wrote:
> >> >>
> >> >> On Tue, Jan 18, 2022 at 7:13 AM Rich <rincebrain@gmail.com> wrote:
> >> >> >
> >> >> > Compression would have made your life better here, and possibly
> also made it clearer what's going on.
> >> >> >
> >> >> > All records in a file are going to be the same size
> pre-compression - so if you set the recordsize to 1M and save a 131.1M
> file, it's going to take up 132M on disk before compression/raidz
> overhead/whatnot.
> >> >>
> >> >> Not true.  ZFS will trim the file's tails even without compression
> enabled.
> >> >>
> >> >> >
> >> >> > Usually compression saves you from the tail padding actually
> requiring allocation on disk, which is one reason I encourage everyone to
> at least use lz4 (or, if you absolutely cannot for some reason, I guess zle
> should also work for this one case...)
> >> >> >
> >> >> > But I would say it's probably the sum of last record padding
> across the whole dataset, if you don't have compression on.
> >> >> >
> >> >> > - Rich
> >> >> >
> >> >> > On Tue, Jan 18, 2022 at 8:57 AM Florent Rivoire <
> florent@rivoire.fr> wrote:
> >> >> >>
> >> >> >> TLDR: I rsync-ed the same data twice: once with 128K recordsize
> and
> >> >> >> once with 1M, and the allocated size on disk is ~3% bigger with
> 1M.
> >> >> >> Why not smaller ?
> >> >> >>
> >> >> >>
> >> >> >> Hello,
> >> >> >>
> >> >> >> I would like some help to understand how the disk usage evolves
> when I
> >> >> >> change the recordsize.
> >> >> >>
> >> >> >> I've read several articles/presentations/forums about recordsize
> in
> >> >> >> ZFS, and if I try to summarize, I mainly understood that:
> >> >> >> - recordsize is the "maximum" size of "objects" (so "logical
> blocks")
> >> >> >> that zfs will create for both  -data & metadata, then each object
> is
> >> >> >> compressed and allocated to one vdev, splitted into smaller
> (ashift
> >> >> >> size) "physical" blocks and written on disks
> >> >> >> - increasing recordsize is usually good when storing large files
> that
> >> >> >> are not modified, because it limits the nb of metadata objects
> >> >> >> (block-pointers), which has a positive effect on performance
> >> >> >> - decreasing recordsize is useful for "databases-like" workloads
> (ie:
> >> >> >> small random writes inside existing objects), because it avoids
> write
> >> >> >> amplification (read-modify-write a large object for a small
> update)
> >> >> >>
> >> >> >> Today, I'm trying to observe the effect of increasing recordsize
> for
> >> >> >> *my* data (because I'm also considering defining
> special_small_blocks
> >> >> >> & using SSDs as "special", but not tested nor discussed here, just
> >> >> >> recordsize).
> >> >> >> So, I'm doing some benchmarks on my "documents" dataset (details
> in
> >> >> >> "notes" below), but the results are really strange to me.
> >> >> >>
> >> >> >> When I rsync the same data to a freshly-recreated zpool:
> >> >> >> A) with recordsize=128K : 226G allocated on disk
> >> >> >> B) with recordsize=1M : 232G allocated on disk => bigger than
> 128K ?!?
> >> >> >>
> >> >> >> I would clearly expect the other way around, because bigger
> recordsize
> >> >> >> generates less metadata so smaller disk usage, and there
> shouldn't be
> >> >> >> any overhead because 1M is just a maximum and not a forced size to
> >> >> >> allocate for every object.
> >> >>
> >> >> A common misconception.  The 1M recordsize applies to every newly
> >> >> created object, and every object must use the same size for all of
> its
> >> >> records (except possibly the last one).  But objects created before
> >> >> you changed the recsize will retain their old recsize, file tails
> have
> >> >> a flexible recsize.
> >> >>
> >> >> >> I don't mind the increased usage (I can live with a few GB more),
> but
> >> >> >> I would like to understand why it happens.
> >> >>
> >> >> You might be seeing the effects of sparsity.  ZFS is smart enough not
> >> >> to store file holes (and if any kind of compression is enabled, it
> >> >> will find long runs of zeroes and turn them into holes).  If your
> data
> >> >> contains any holes that are >= 128 kB but < 1MB, then they can be
> >> >> stored as holes with a 128 kB recsize but must be stored as long runs
> >> >> of zeros with a 1MB recsize.
> >> >>
> >> >> However, I would suggest that you don't bother.  With a 128kB
> recsize,
> >> >> ZFS has something like a 1000:1 ratio of data:metadata.  In other
> >> >> words, increasing your recsize can save you at most 0.1% of disk
> >> >> space.  Basically, it doesn't matter.  What it _does_ matter for is
> >> >> the tradeoff between write amplification and RAM usage.  1000:1 is
> >> >> comparable to the disk:ram of many computers.  And performance is
> more
> >> >> sensitive to metadata access times than data access times.  So
> >> >> increasing your recsize can help you keep a greater fraction of your
> >> >> metadata in ARC.  OTOH, as you remarked increasing your recsize will
> >> >> also increase write amplification.
> >> >>
> >> >> So to summarize:
> >> >> * Adjust compression settings to save disk space.
> >> >> * Adjust recsize to save RAM.
> >> >>
> >> >> -Alan
> >> >>
> >> >> >>
> >> >> >> I tried to give all the details of my tests below.
> >> >> >> Did I do something wrong ? Can you explain the increase ?
> >> >> >>
> >> >> >> Thanks !
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> ===============================================
> >> >> >> A) 128K
> >> >> >> ==========
> >> >> >>
> >> >> >> # zpool destroy bench
> >> >> >> # zpool create -o ashift=12 bench
> >> >> >> /dev/gptid/3c0f5cbc-b0ce-11ea-ab91-c8cbb8cc3ad4
> >> >> >>
> >> >> >> # rsync -av --exclude '.zfs' /mnt/tank/docs-florent/ /bench
> >> >> >> [...]
> >> >> >> sent 241,042,476,154 bytes  received 353,838 bytes  81,806,492.45
> bytes/sec
> >> >> >> total size is 240,982,439,038  speedup is 1.00
> >> >> >>
> >> >> >> # zfs get recordsize bench
> >> >> >> NAME   PROPERTY    VALUE    SOURCE
> >> >> >> bench  recordsize  128K     default
> >> >> >>
> >> >> >> # zpool list -v bench
> >> >> >> NAME                                           SIZE  ALLOC   FREE
> >> >> >> CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
> >> >> >> bench                                         2.72T   226G  2.50T
> >> >> >>   -         -     0%     8%  1.00x    ONLINE  -
> >> >> >>   gptid/3c0f5cbc-b0ce-11ea-ab91-c8cbb8cc3ad4  2.72T   226G  2.50T
> >> >> >>   -         -     0%  8.10%      -    ONLINE
> >> >> >>
> >> >> >> # zfs list bench
> >> >> >> NAME    USED  AVAIL     REFER  MOUNTPOINT
> >> >> >> bench   226G  2.41T      226G  /bench
> >> >> >>
> >> >> >> # zfs get all bench |egrep "(used|referenced|written)"
> >> >> >> bench  used                  226G                   -
> >> >> >> bench  referenced            226G                   -
> >> >> >> bench  usedbysnapshots       0B                     -
> >> >> >> bench  usedbydataset         226G                   -
> >> >> >> bench  usedbychildren        1.80M                  -
> >> >> >> bench  usedbyrefreservation  0B                     -
> >> >> >> bench  written               226G                   -
> >> >> >> bench  logicalused           226G                   -
> >> >> >> bench  logicalreferenced     226G                   -
> >> >> >>
> >> >> >> # zdb -Lbbbs bench > zpool-bench-rcd128K.zdb
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> ===============================================
> >> >> >> B) 1M
> >> >> >> ==========
> >> >> >>
> >> >> >> # zpool destroy bench
> >> >> >> # zpool create -o ashift=12 bench
> >> >> >> /dev/gptid/3c0f5cbc-b0ce-11ea-ab91-c8cbb8cc3ad4
> >> >> >> # zfs set recordsize=1M bench
> >> >> >>
> >> >> >> # rsync -av --exclude '.zfs' /mnt/tank/docs-florent/ /bench
> >> >> >> [...]
> >> >> >> sent 241,042,476,154 bytes  received 353,830 bytes  80,173,899.88
> bytes/sec
> >> >> >> total size is 240,982,439,038  speedup is 1.00
> >> >> >>
> >> >> >> # zfs get recordsize bench
> >> >> >> NAME   PROPERTY    VALUE    SOURCE
> >> >> >> bench  recordsize  1M       local
> >> >> >>
> >> >> >> # zpool list -v bench
> >> >> >> NAME                                           SIZE  ALLOC   FREE
> >> >> >> CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
> >> >> >> bench                                         2.72T   232G  2.49T
> >> >> >>   -         -     0%     8%  1.00x    ONLINE  -
> >> >> >>   gptid/3c0f5cbc-b0ce-11ea-ab91-c8cbb8cc3ad4  2.72T   232G  2.49T
> >> >> >>   -         -     0%  8.32%      -    ONLINE
> >> >> >>
> >> >> >> # zfs list bench
> >> >> >> NAME    USED  AVAIL     REFER  MOUNTPOINT
> >> >> >> bench   232G  2.41T      232G  /bench
> >> >> >>
> >> >> >> # zfs get all bench |egrep "(used|referenced|written)"
> >> >> >> bench  used                  232G                   -
> >> >> >> bench  referenced            232G                   -
> >> >> >> bench  usedbysnapshots       0B                     -
> >> >> >> bench  usedbydataset         232G                   -
> >> >> >> bench  usedbychildren        1.96M                  -
> >> >> >> bench  usedbyrefreservation  0B                     -
> >> >> >> bench  written               232G                   -
> >> >> >> bench  logicalused           232G                   -
> >> >> >> bench  logicalreferenced     232G                   -
> >> >> >>
> >> >> >> # zdb -Lbbbs bench > zpool-bench-rcd1M.zdb
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> ===============================================
> >> >> >> Notes:
> >> >> >> ==========
> >> >> >>
> >> >> >> - the source dataset contains ~50% of pictures (raw files and
> jpg),
> >> >> >> and also some music, various archived documents, zip, videos
> >> >> >> - no change on the source dataset while testing (cf size logged
> by resync)
> >> >> >> - I repeated the tests twice (128K, then 1M, then 128K, then 1M),
> and
> >> >> >> same results
> >> >> >> - probably not important here, but:
> >> >> >> /dev/gptid/3c0f5cbc-b0ce-11ea-ab91-c8cbb8cc3ad4 is a Red 3TB CMR
> >> >> >> (WD30EFRX), and /mnt/tank/docs-florent/ is a 128K-recordsize
> dataset
> >> >> >> on another zpool that I never tweaked except ashit=12 (because
> using
> >> >> >> the same model of Red 3TB)
> >> >> >>
> >> >> >> # zfs --version
> >> >> >> zfs-2.0.6-1
> >> >> >> zfs-kmod-v2021120100-zfs_a8c7652
> >> >> >>
> >> >> >> # uname -a
> >> >> >> FreeBSD xxxxxxxxx 12.2-RELEASE-p11 FreeBSD 12.2-RELEASE-p11
> >> >> >> 75566f060d4(HEAD) TRUENAS  amd64
>

--00000000000001f4cb05d5dca1eb
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Nope. I just retried it on my FBSD 13-RELEASE VM, too:<div=
># uname -a</div><div>FreeBSD fbsd13rel 13.0-RELEASE-p4 FreeBSD 13.0-RELEAS=
E-p4 #0: Tue Aug 24 07:33:27 UTC 2021 =C2=A0 =C2=A0 root@amd64-builder.daem=
onology.net:/usr/obj/usr/src/amd64.amd64/sys/GENERIC =C2=A0amd64<br><div># =
zpool version</div><div>zfs-2.1.99-683_ga967e54c2<br>zfs-kmod-2.1.99-683_ga=
967e54c2<br></div><div># zpool get all | grep &#39;feature@&#39; | grep dis=
abled</div><div>buildpool=C2=A0 feature@edonr =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0disabled =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 local</div><div># dd if=3D/de=
v/urandom of=3D/buildpool/testme/2 bs=3D1179648 count=3D1</div>1+0 records =
in<br>1+0 records out<br>1179648 bytes transferred in 0.009827 secs (120041=
885 bytes/sec)<br># du -sh /buildpool/testme/2<br>2.0M =C2=A0 =C2=A0/buildp=
ool/testme/2<br><div># zfs get all buildpool/testme | grep -v default</div>=
NAME =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0PROPERTY =C2=A0 =C2=A0=
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0VALUE =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=
 =C2=A0 =C2=A0 =C2=A0 =C2=A0SOURCE<br>buildpool/testme =C2=A0type =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0filesystem =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 -<br>buildpool/testme =C2=A0creation =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Tue Jan 18 =C2=A04:46 2022 =C2=
=A0-<br>buildpool/testme =C2=A0used =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A04.03M =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0-<br>buildpool/testme =C2=A0available =C2=A0 =C2=A0 =C2=A0=
 =C2=A0 =C2=A0 =C2=A0 277G =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=
 =C2=A0 =C2=A0 -<br>buildpool/testme =C2=A0referenced =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A04.03M =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0-<br>buildpool/testme =C2=A0compressratio =C2=A0 =C2=A0 =C2=A0=
 =C2=A0 1.00x =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0-<br>buildpool/testme =C2=A0mounted =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 yes =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0-<br>buildpool/testme =C2=A0recordsize =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A01M =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 local<br>buildpool/testme =C2=A0compression =C2=A0 =C2=A0=
 =C2=A0 =C2=A0 =C2=A0 off =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0local<br>buildpool/testme =C2=A0atime =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 off =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0inherited from buildpool<br>buildpool=
/testme =C2=A0createtxg =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 15030 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0-<br>buildpool/t=
estme =C2=A0version =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 5 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0-<=
br>buildpool/testme =C2=A0utf8only =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0off =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0-<br>buildpool/testme =C2=A0normalization =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 none =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
-<br>buildpool/testme =C2=A0casesensitivity =C2=A0 =C2=A0 =C2=A0 sensitive =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0-<br>buildpool/testme =C2=
=A0guid =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A011057=
815587819738755 =C2=A0 -<br>buildpool/testme =C2=A0usedbysnapshots =C2=A0 =
=C2=A0 =C2=A0 0B =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 -<br>buildpool/testme =C2=A0usedbydataset =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 4.03M =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0-<br>buildpool/testme =C2=A0usedbychildren =C2=A0 =C2=A0 =C2=A0 =C2=
=A00B =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=
 -<br>buildpool/testme =C2=A0usedbyrefreservation =C2=A00B =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 -<br>buildpool/test=
me =C2=A0objsetid =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0280 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0-<br>buil=
dpool/testme =C2=A0refcompressratio =C2=A0 =C2=A0 =C2=A01.00x =C2=A0 =C2=A0=
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0-<br>buildpool/testme =C2=
=A0written =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 4.03M =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0-<br>buildpool/testm=
e =C2=A0logicalused =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 4.01M =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0-<br>buildpool/testme =C2=
=A0logicalreferenced =C2=A0 =C2=A0 4.01M =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=
 =C2=A0 =C2=A0 =C2=A0 =C2=A0-<br></div><div><br></div><div>What version are=
 you running?</div><div><br></div><div>- Rich</div></div><br><div class=3D"=
gmail_quote"><div dir=3D"ltr" class=3D"gmail_attr">On Tue, Jan 18, 2022 at =
10:00 AM Alan Somers &lt;<a href=3D"mailto:asomers@freebsd.org">asomers@fre=
ebsd.org</a>&gt; wrote:<br></div><blockquote class=3D"gmail_quote" style=3D=
"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-le=
ft:1ex">That&#39;s not what I get.=C2=A0 Is your pool formatted using a ver=
y old<br>
version or something?<br>
<br>
somers@fbsd-head /u/h/somers [1]&gt;<br>
dd if=3D/dev/random bs=3D1179648 of=3D/testpool/food/t/richfile count=3D1<b=
r>
1+0 records in<br>
1+0 records out<br>
1179648 bytes transferred in 0.003782 secs (311906705 bytes/sec)<br>
somers@fbsd-head /u/h/somers&gt; du -sh=C2=A0 /testpool/food/t/richfile<br>
1.1M=C2=A0 =C2=A0 /testpool/food/t/richfile<br>
<br>
On Tue, Jan 18, 2022 at 7:51 AM Rich &lt;<a href=3D"mailto:rincebrain@gmail=
.com" target=3D"_blank">rincebrain@gmail.com</a>&gt; wrote:<br>
&gt;<br>
&gt; 2.1M=C2=A0 =C2=A0 /workspace/test1M/1<br>
&gt;<br>
&gt; - Rich<br>
&gt;<br>
&gt; On Tue, Jan 18, 2022 at 9:47 AM Alan Somers &lt;<a href=3D"mailto:asom=
ers@freebsd.org" target=3D"_blank">asomers@freebsd.org</a>&gt; wrote:<br>
&gt;&gt;<br>
&gt;&gt; Yeah, it does.=C2=A0 Just check &quot;du -sh &lt;FILENAME&gt;&quot=
;.=C2=A0 zdb there is showing<br>
&gt;&gt; you the logical size of the record, but it isn&#39;t showing how m=
any disk<br>
&gt;&gt; blocks are actually allocated.<br>
&gt;&gt;<br>
&gt;&gt; On Tue, Jan 18, 2022 at 7:30 AM Rich &lt;<a href=3D"mailto:rincebr=
ain@gmail.com" target=3D"_blank">rincebrain@gmail.com</a>&gt; wrote:<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt; Really? I didn&#39;t know it would still trim the tails on fi=
les with compression off.<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt; ...<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0size=C2=A0 =C2=A0 1179648<br=
>
&gt;&gt; &gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0parent=C2=A0 34<br>
&gt;&gt; &gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0links=C2=A0 =C2=A01<br>
&gt;&gt; &gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0pflags=C2=A0 40800000004<br>
&gt;&gt; &gt; Indirect blocks:<br>
&gt;&gt; &gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 0 L1=
=C2=A0 DVA[0]=3D&lt;3:c02b96c000:1000&gt; DVA[1]=3D&lt;3:c810733000:1000&gt=
; [L1 ZFS plain file] skein lz4 unencrypted LE contiguous unique double siz=
e=3D20000L/1000P birth=3D35675472L/35675472P fill=3D2 cksum=3D5cfba24b351a0=
9aa:8bd9dfef87c5b625:906ed5c3252943db:bed77ce51ad540d4<br>
&gt;&gt; &gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 0=C2=
=A0 L0 DVA[0]=3D&lt;2:a0827db4000:100000&gt; [L0 ZFS plain file] skein unco=
mpressed unencrypted LE contiguous unique single size=3D100000L/100000P bir=
th=3D35675472L/35675472P fill=3D1 cksum=3D95b06edf60e5f54c:af6f6950775d0863=
:8fc28b0783fcd9d3:2e44676e48a59360<br>
&gt;&gt; &gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0100000=C2=A0 L0 DVA[0=
]=3D&lt;2:a0827eb4000:100000&gt; [L0 ZFS plain file] skein uncompressed une=
ncrypted LE contiguous unique single size=3D100000L/100000P birth=3D3567547=
2L/35675472P fill=3D1 cksum=3D62a1f05769528648:8197c8a05ca9f1fb:a750c690124=
dd2e0:390bddc4314cd4c3<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt; It seems not?<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt; - Rich<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt; On Tue, Jan 18, 2022 at 9:23 AM Alan Somers &lt;<a href=3D"ma=
ilto:asomers@freebsd.org" target=3D"_blank">asomers@freebsd.org</a>&gt; wro=
te:<br>
&gt;&gt; &gt;&gt;<br>
&gt;&gt; &gt;&gt; On Tue, Jan 18, 2022 at 7:13 AM Rich &lt;<a href=3D"mailt=
o:rincebrain@gmail.com" target=3D"_blank">rincebrain@gmail.com</a>&gt; wrot=
e:<br>
&gt;&gt; &gt;&gt; &gt;<br>
&gt;&gt; &gt;&gt; &gt; Compression would have made your life better here, a=
nd possibly also made it clearer what&#39;s going on.<br>
&gt;&gt; &gt;&gt; &gt;<br>
&gt;&gt; &gt;&gt; &gt; All records in a file are going to be the same size =
pre-compression - so if you set the recordsize to 1M and save a 131.1M file=
, it&#39;s going to take up 132M on disk before compression/raidz overhead/=
whatnot.<br>
&gt;&gt; &gt;&gt;<br>
&gt;&gt; &gt;&gt; Not true.=C2=A0 ZFS will trim the file&#39;s tails even w=
ithout compression enabled.<br>
&gt;&gt; &gt;&gt;<br>
&gt;&gt; &gt;&gt; &gt;<br>
&gt;&gt; &gt;&gt; &gt; Usually compression saves you from the tail padding =
actually requiring allocation on disk, which is one reason I encourage ever=
yone to at least use lz4 (or, if you absolutely cannot for some reason, I g=
uess zle should also work for this one case...)<br>
&gt;&gt; &gt;&gt; &gt;<br>
&gt;&gt; &gt;&gt; &gt; But I would say it&#39;s probably the sum of last re=
cord padding across the whole dataset, if you don&#39;t have compression on=
.<br>
&gt;&gt; &gt;&gt; &gt;<br>
&gt;&gt; &gt;&gt; &gt; - Rich<br>
&gt;&gt; &gt;&gt; &gt;<br>
&gt;&gt; &gt;&gt; &gt; On Tue, Jan 18, 2022 at 8:57 AM Florent Rivoire &lt;=
<a href=3D"mailto:florent@rivoire.fr" target=3D"_blank">florent@rivoire.fr<=
/a>&gt; wrote:<br>
&gt;&gt; &gt;&gt; &gt;&gt;<br>
&gt;&gt; &gt;&gt; &gt;&gt; TLDR: I rsync-ed the same data twice: once with =
128K recordsize and<br>
&gt;&gt; &gt;&gt; &gt;&gt; once with 1M, and the allocated size on disk is =
~3% bigger with 1M.<br>
&gt;&gt; &gt;&gt; &gt;&gt; Why not smaller ?<br>
&gt;&gt; &gt;&gt; &gt;&gt;<br>
&gt;&gt; &gt;&gt; &gt;&gt;<br>
&gt;&gt; &gt;&gt; &gt;&gt; Hello,<br>
&gt;&gt; &gt;&gt; &gt;&gt;<br>
&gt;&gt; &gt;&gt; &gt;&gt; I would like some help to understand how the dis=
k usage evolves when I<br>
&gt;&gt; &gt;&gt; &gt;&gt; change the recordsize.<br>
&gt;&gt; &gt;&gt; &gt;&gt;<br>
&gt;&gt; &gt;&gt; &gt;&gt; I&#39;ve read several articles/presentations/for=
ums about recordsize in<br>
&gt;&gt; &gt;&gt; &gt;&gt; ZFS, and if I try to summarize, I mainly underst=
ood that:<br>
&gt;&gt; &gt;&gt; &gt;&gt; - recordsize is the &quot;maximum&quot; size of =
&quot;objects&quot; (so &quot;logical blocks&quot;)<br>
&gt;&gt; &gt;&gt; &gt;&gt; that zfs will create for both=C2=A0 -data &amp; =
metadata, then each object is<br>
&gt;&gt; &gt;&gt; &gt;&gt; compressed and allocated to one vdev, splitted i=
nto smaller (ashift<br>
&gt;&gt; &gt;&gt; &gt;&gt; size) &quot;physical&quot; blocks and written on=
 disks<br>
&gt;&gt; &gt;&gt; &gt;&gt; - increasing recordsize is usually good when sto=
ring large files that<br>
&gt;&gt; &gt;&gt; &gt;&gt; are not modified, because it limits the nb of me=
tadata objects<br>
&gt;&gt; &gt;&gt; &gt;&gt; (block-pointers), which has a positive effect on=
 performance<br>
&gt;&gt; &gt;&gt; &gt;&gt; - decreasing recordsize is useful for &quot;data=
bases-like&quot; workloads (ie:<br>
&gt;&gt; &gt;&gt; &gt;&gt; small random writes inside existing objects), be=
cause it avoids write<br>
&gt;&gt; &gt;&gt; &gt;&gt; amplification (read-modify-write a large object =
for a small update)<br>
&gt;&gt; &gt;&gt; &gt;&gt;<br>
&gt;&gt; &gt;&gt; &gt;&gt; Today, I&#39;m trying to observe the effect of i=
ncreasing recordsize for<br>
&gt;&gt; &gt;&gt; &gt;&gt; *my* data (because I&#39;m also considering defi=
ning special_small_blocks<br>
&gt;&gt; &gt;&gt; &gt;&gt; &amp; using SSDs as &quot;special&quot;, but not=
 tested nor discussed here, just<br>
&gt;&gt; &gt;&gt; &gt;&gt; recordsize).<br>
&gt;&gt; &gt;&gt; &gt;&gt; So, I&#39;m doing some benchmarks on my &quot;do=
cuments&quot; dataset (details in<br>
&gt;&gt; &gt;&gt; &gt;&gt; &quot;notes&quot; below), but the results are re=
ally strange to me.<br>
&gt;&gt; &gt;&gt; &gt;&gt;<br>
&gt;&gt; &gt;&gt; &gt;&gt; When I rsync the same data to a freshly-recreate=
d zpool:<br>
&gt;&gt; &gt;&gt; &gt;&gt; A) with recordsize=3D128K : 226G allocated on di=
sk<br>
&gt;&gt; &gt;&gt; &gt;&gt; B) with recordsize=3D1M : 232G allocated on disk=
 =3D&gt; bigger than 128K ?!?<br>
&gt;&gt; &gt;&gt; &gt;&gt;<br>
&gt;&gt; &gt;&gt; &gt;&gt; I would clearly expect the other way around, bec=
ause bigger recordsize<br>
&gt;&gt; &gt;&gt; &gt;&gt; generates less metadata so smaller disk usage, a=
nd there shouldn&#39;t be<br>
&gt;&gt; &gt;&gt; &gt;&gt; any overhead because 1M is just a maximum and no=
t a forced size to<br>
&gt;&gt; &gt;&gt; &gt;&gt; allocate for every object.<br>
&gt;&gt; &gt;&gt;<br>
&gt;&gt; &gt;&gt; A common misconception.=C2=A0 The 1M recordsize applies t=
o every newly<br>
&gt;&gt; &gt;&gt; created object, and every object must use the same size f=
or all of its<br>
&gt;&gt; &gt;&gt; records (except possibly the last one).=C2=A0 But objects=
 created before<br>
&gt;&gt; &gt;&gt; you changed the recsize will retain their old recsize, fi=
le tails have<br>
&gt;&gt; &gt;&gt; a flexible recsize.<br>
&gt;&gt; &gt;&gt;<br>
&gt;&gt; &gt;&gt; &gt;&gt; I don&#39;t mind the increased usage (I can live=
 with a few GB more), but<br>
&gt;&gt; &gt;&gt; &gt;&gt; I would like to understand why it happens.<br>
&gt;&gt; &gt;&gt;<br>
&gt;&gt; &gt;&gt; You might be seeing the effects of sparsity.=C2=A0 ZFS is=
 smart enough not<br>
&gt;&gt; &gt;&gt; to store file holes (and if any kind of compression is en=
abled, it<br>
&gt;&gt; &gt;&gt; will find long runs of zeroes and turn them into holes).=
=C2=A0 If your data<br>
&gt;&gt; &gt;&gt; contains any holes that are &gt;=3D 128 kB but &lt; 1MB, =
then they can be<br>
&gt;&gt; &gt;&gt; stored as holes with a 128 kB recsize but must be stored =
as long runs<br>
&gt;&gt; &gt;&gt; of zeros with a 1MB recsize.<br>
&gt;&gt; &gt;&gt;<br>
&gt;&gt; &gt;&gt; However, I would suggest that you don&#39;t bother.=C2=A0=
 With a 128kB recsize,<br>
&gt;&gt; &gt;&gt; ZFS has something like a 1000:1 ratio of data:metadata.=
=C2=A0 In other<br>
&gt;&gt; &gt;&gt; words, increasing your recsize can save you at most 0.1% =
of disk<br>
&gt;&gt; &gt;&gt; space.=C2=A0 Basically, it doesn&#39;t matter.=C2=A0 What=
 it _does_ matter for is<br>
&gt;&gt; &gt;&gt; the tradeoff between write amplification and RAM usage.=
=C2=A0 1000:1 is<br>
&gt;&gt; &gt;&gt; comparable to the disk:ram of many computers.=C2=A0 And p=
erformance is more<br>
&gt;&gt; &gt;&gt; sensitive to metadata access times than data access times=
.=C2=A0 So<br>
&gt;&gt; &gt;&gt; increasing your recsize can help you keep a greater fract=
ion of your<br>
&gt;&gt; &gt;&gt; metadata in ARC.=C2=A0 OTOH, as you remarked increasing y=
our recsize will<br>
&gt;&gt; &gt;&gt; also increase write amplification.<br>
&gt;&gt; &gt;&gt;<br>
&gt;&gt; &gt;&gt; So to summarize:<br>
&gt;&gt; &gt;&gt; * Adjust compression settings to save disk space.<br>
&gt;&gt; &gt;&gt; * Adjust recsize to save RAM.<br>
&gt;&gt; &gt;&gt;<br>
&gt;&gt; &gt;&gt; -Alan<br>
&gt;&gt; &gt;&gt;<br>
&gt;&gt; &gt;&gt; &gt;&gt;<br>
&gt;&gt; &gt;&gt; &gt;&gt; I tried to give all the details of my tests belo=
w.<br>
&gt;&gt; &gt;&gt; &gt;&gt; Did I do something wrong ? Can you explain the i=
ncrease ?<br>
&gt;&gt; &gt;&gt; &gt;&gt;<br>
&gt;&gt; &gt;&gt; &gt;&gt; Thanks !<br>
&gt;&gt; &gt;&gt; &gt;&gt;<br>
&gt;&gt; &gt;&gt; &gt;&gt;<br>
&gt;&gt; &gt;&gt; &gt;&gt;<br>
&gt;&gt; &gt;&gt; &gt;&gt; =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D<br>
&gt;&gt; &gt;&gt; &gt;&gt; A) 128K<br>
&gt;&gt; &gt;&gt; &gt;&gt; =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D<br>
&gt;&gt; &gt;&gt; &gt;&gt;<br>
&gt;&gt; &gt;&gt; &gt;&gt; # zpool destroy bench<br>
&gt;&gt; &gt;&gt; &gt;&gt; # zpool create -o ashift=3D12 bench<br>
&gt;&gt; &gt;&gt; &gt;&gt; /dev/gptid/3c0f5cbc-b0ce-11ea-ab91-c8cbb8cc3ad4<=
br>
&gt;&gt; &gt;&gt; &gt;&gt;<br>
&gt;&gt; &gt;&gt; &gt;&gt; # rsync -av --exclude &#39;.zfs&#39; /mnt/tank/d=
ocs-florent/ /bench<br>
&gt;&gt; &gt;&gt; &gt;&gt; [...]<br>
&gt;&gt; &gt;&gt; &gt;&gt; sent 241,042,476,154 bytes=C2=A0 received 353,83=
8 bytes=C2=A0 81,806,492.45 bytes/sec<br>
&gt;&gt; &gt;&gt; &gt;&gt; total size is 240,982,439,038=C2=A0 speedup is 1=
.00<br>
&gt;&gt; &gt;&gt; &gt;&gt;<br>
&gt;&gt; &gt;&gt; &gt;&gt; # zfs get recordsize bench<br>
&gt;&gt; &gt;&gt; &gt;&gt; NAME=C2=A0 =C2=A0PROPERTY=C2=A0 =C2=A0 VALUE=C2=
=A0 =C2=A0 SOURCE<br>
&gt;&gt; &gt;&gt; &gt;&gt; bench=C2=A0 recordsize=C2=A0 128K=C2=A0 =C2=A0 =
=C2=A0default<br>
&gt;&gt; &gt;&gt; &gt;&gt;<br>
&gt;&gt; &gt;&gt; &gt;&gt; # zpool list -v bench<br>
&gt;&gt; &gt;&gt; &gt;&gt; NAME=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0SIZE=C2=A0 ALLOC=C2=A0 =C2=A0FREE<br>
&gt;&gt; &gt;&gt; &gt;&gt; CKPOINT=C2=A0 EXPANDSZ=C2=A0 =C2=A0FRAG=C2=A0 =
=C2=A0 CAP=C2=A0 DEDUP=C2=A0 =C2=A0 HEALTH=C2=A0 ALTROOT<br>
&gt;&gt; &gt;&gt; &gt;&gt; bench=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A02.72T=C2=A0 =C2=A0226G=C2=A0 2.50T<br>
&gt;&gt; &gt;&gt; &gt;&gt;=C2=A0 =C2=A0-=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0-=
=C2=A0 =C2=A0 =C2=A00%=C2=A0 =C2=A0 =C2=A08%=C2=A0 1.00x=C2=A0 =C2=A0 ONLIN=
E=C2=A0 -<br>
&gt;&gt; &gt;&gt; &gt;&gt;=C2=A0 =C2=A0gptid/3c0f5cbc-b0ce-11ea-ab91-c8cbb8=
cc3ad4=C2=A0 2.72T=C2=A0 =C2=A0226G=C2=A0 2.50T<br>
&gt;&gt; &gt;&gt; &gt;&gt;=C2=A0 =C2=A0-=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0-=
=C2=A0 =C2=A0 =C2=A00%=C2=A0 8.10%=C2=A0 =C2=A0 =C2=A0 -=C2=A0 =C2=A0 ONLIN=
E<br>
&gt;&gt; &gt;&gt; &gt;&gt;<br>
&gt;&gt; &gt;&gt; &gt;&gt; # zfs list bench<br>
&gt;&gt; &gt;&gt; &gt;&gt; NAME=C2=A0 =C2=A0 USED=C2=A0 AVAIL=C2=A0 =C2=A0 =
=C2=A0REFER=C2=A0 MOUNTPOINT<br>
&gt;&gt; &gt;&gt; &gt;&gt; bench=C2=A0 =C2=A0226G=C2=A0 2.41T=C2=A0 =C2=A0 =
=C2=A0 226G=C2=A0 /bench<br>
&gt;&gt; &gt;&gt; &gt;&gt;<br>
&gt;&gt; &gt;&gt; &gt;&gt; # zfs get all bench |egrep &quot;(used|reference=
d|written)&quot;<br>
&gt;&gt; &gt;&gt; &gt;&gt; bench=C2=A0 used=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 226G=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0-<br>
&gt;&gt; &gt;&gt; &gt;&gt; bench=C2=A0 referenced=C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 226G=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0-<br>
&gt;&gt; &gt;&gt; &gt;&gt; bench=C2=A0 usedbysnapshots=C2=A0 =C2=A0 =C2=A0 =
=C2=A00B=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0-<br>
&gt;&gt; &gt;&gt; &gt;&gt; bench=C2=A0 usedbydataset=C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0226G=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0-<br>
&gt;&gt; &gt;&gt; &gt;&gt; bench=C2=A0 usedbychildren=C2=A0 =C2=A0 =C2=A0 =
=C2=A0 1.80M=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
-<br>
&gt;&gt; &gt;&gt; &gt;&gt; bench=C2=A0 usedbyrefreservation=C2=A0 0B=C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0-<br>
&gt;&gt; &gt;&gt; &gt;&gt; bench=C2=A0 written=C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0226G=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0-<br>
&gt;&gt; &gt;&gt; &gt;&gt; bench=C2=A0 logicalused=C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0226G=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0-<br>
&gt;&gt; &gt;&gt; &gt;&gt; bench=C2=A0 logicalreferenced=C2=A0 =C2=A0 =C2=
=A0226G=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0-<br>
&gt;&gt; &gt;&gt; &gt;&gt;<br>
&gt;&gt; &gt;&gt; &gt;&gt; # zdb -Lbbbs bench &gt; zpool-bench-rcd128K.zdb<=
br>
&gt;&gt; &gt;&gt; &gt;&gt;<br>
&gt;&gt; &gt;&gt; &gt;&gt;<br>
&gt;&gt; &gt;&gt; &gt;&gt;<br>
&gt;&gt; &gt;&gt; &gt;&gt; =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D<br>
&gt;&gt; &gt;&gt; &gt;&gt; B) 1M<br>
&gt;&gt; &gt;&gt; &gt;&gt; =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D<br>
&gt;&gt; &gt;&gt; &gt;&gt;<br>
&gt;&gt; &gt;&gt; &gt;&gt; # zpool destroy bench<br>
&gt;&gt; &gt;&gt; &gt;&gt; # zpool create -o ashift=3D12 bench<br>
&gt;&gt; &gt;&gt; &gt;&gt; /dev/gptid/3c0f5cbc-b0ce-11ea-ab91-c8cbb8cc3ad4<=
br>
&gt;&gt; &gt;&gt; &gt;&gt; # zfs set recordsize=3D1M bench<br>
&gt;&gt; &gt;&gt; &gt;&gt;<br>
&gt;&gt; &gt;&gt; &gt;&gt; # rsync -av --exclude &#39;.zfs&#39; /mnt/tank/d=
ocs-florent/ /bench<br>
&gt;&gt; &gt;&gt; &gt;&gt; [...]<br>
&gt;&gt; &gt;&gt; &gt;&gt; sent 241,042,476,154 bytes=C2=A0 received 353,83=
0 bytes=C2=A0 80,173,899.88 bytes/sec<br>
&gt;&gt; &gt;&gt; &gt;&gt; total size is 240,982,439,038=C2=A0 speedup is 1=
.00<br>
&gt;&gt; &gt;&gt; &gt;&gt;<br>
&gt;&gt; &gt;&gt; &gt;&gt; # zfs get recordsize bench<br>
&gt;&gt; &gt;&gt; &gt;&gt; NAME=C2=A0 =C2=A0PROPERTY=C2=A0 =C2=A0 VALUE=C2=
=A0 =C2=A0 SOURCE<br>
&gt;&gt; &gt;&gt; &gt;&gt; bench=C2=A0 recordsize=C2=A0 1M=C2=A0 =C2=A0 =C2=
=A0 =C2=A0local<br>
&gt;&gt; &gt;&gt; &gt;&gt;<br>
&gt;&gt; &gt;&gt; &gt;&gt; # zpool list -v bench<br>
&gt;&gt; &gt;&gt; &gt;&gt; NAME=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0SIZE=C2=A0 ALLOC=C2=A0 =C2=A0FREE<br>
&gt;&gt; &gt;&gt; &gt;&gt; CKPOINT=C2=A0 EXPANDSZ=C2=A0 =C2=A0FRAG=C2=A0 =
=C2=A0 CAP=C2=A0 DEDUP=C2=A0 =C2=A0 HEALTH=C2=A0 ALTROOT<br>
&gt;&gt; &gt;&gt; &gt;&gt; bench=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A02.72T=C2=A0 =C2=A0232G=C2=A0 2.49T<br>
&gt;&gt; &gt;&gt; &gt;&gt;=C2=A0 =C2=A0-=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0-=
=C2=A0 =C2=A0 =C2=A00%=C2=A0 =C2=A0 =C2=A08%=C2=A0 1.00x=C2=A0 =C2=A0 ONLIN=
E=C2=A0 -<br>
&gt;&gt; &gt;&gt; &gt;&gt;=C2=A0 =C2=A0gptid/3c0f5cbc-b0ce-11ea-ab91-c8cbb8=
cc3ad4=C2=A0 2.72T=C2=A0 =C2=A0232G=C2=A0 2.49T<br>
&gt;&gt; &gt;&gt; &gt;&gt;=C2=A0 =C2=A0-=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0-=
=C2=A0 =C2=A0 =C2=A00%=C2=A0 8.32%=C2=A0 =C2=A0 =C2=A0 -=C2=A0 =C2=A0 ONLIN=
E<br>
&gt;&gt; &gt;&gt; &gt;&gt;<br>
&gt;&gt; &gt;&gt; &gt;&gt; # zfs list bench<br>
&gt;&gt; &gt;&gt; &gt;&gt; NAME=C2=A0 =C2=A0 USED=C2=A0 AVAIL=C2=A0 =C2=A0 =
=C2=A0REFER=C2=A0 MOUNTPOINT<br>
&gt;&gt; &gt;&gt; &gt;&gt; bench=C2=A0 =C2=A0232G=C2=A0 2.41T=C2=A0 =C2=A0 =
=C2=A0 232G=C2=A0 /bench<br>
&gt;&gt; &gt;&gt; &gt;&gt;<br>
&gt;&gt; &gt;&gt; &gt;&gt; # zfs get all bench |egrep &quot;(used|reference=
d|written)&quot;<br>
&gt;&gt; &gt;&gt; &gt;&gt; bench=C2=A0 used=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 232G=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0-<br>
&gt;&gt; &gt;&gt; &gt;&gt; bench=C2=A0 referenced=C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 232G=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0-<br>
&gt;&gt; &gt;&gt; &gt;&gt; bench=C2=A0 usedbysnapshots=C2=A0 =C2=A0 =C2=A0 =
=C2=A00B=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0-<br>
&gt;&gt; &gt;&gt; &gt;&gt; bench=C2=A0 usedbydataset=C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0232G=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0-<br>
&gt;&gt; &gt;&gt; &gt;&gt; bench=C2=A0 usedbychildren=C2=A0 =C2=A0 =C2=A0 =
=C2=A0 1.96M=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
-<br>
&gt;&gt; &gt;&gt; &gt;&gt; bench=C2=A0 usedbyrefreservation=C2=A0 0B=C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0-<br>
&gt;&gt; &gt;&gt; &gt;&gt; bench=C2=A0 written=C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0232G=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0-<br>
&gt;&gt; &gt;&gt; &gt;&gt; bench=C2=A0 logicalused=C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0232G=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0-<br>
&gt;&gt; &gt;&gt; &gt;&gt; bench=C2=A0 logicalreferenced=C2=A0 =C2=A0 =C2=
=A0232G=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0-<br>
&gt;&gt; &gt;&gt; &gt;&gt;<br>
&gt;&gt; &gt;&gt; &gt;&gt; # zdb -Lbbbs bench &gt; zpool-bench-rcd1M.zdb<br=
>
&gt;&gt; &gt;&gt; &gt;&gt;<br>
&gt;&gt; &gt;&gt; &gt;&gt;<br>
&gt;&gt; &gt;&gt; &gt;&gt;<br>
&gt;&gt; &gt;&gt; &gt;&gt; =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D<br>
&gt;&gt; &gt;&gt; &gt;&gt; Notes:<br>
&gt;&gt; &gt;&gt; &gt;&gt; =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D<br>
&gt;&gt; &gt;&gt; &gt;&gt;<br>
&gt;&gt; &gt;&gt; &gt;&gt; - the source dataset contains ~50% of pictures (=
raw files and jpg),<br>
&gt;&gt; &gt;&gt; &gt;&gt; and also some music, various archived documents,=
 zip, videos<br>
&gt;&gt; &gt;&gt; &gt;&gt; - no change on the source dataset while testing =
(cf size logged by resync)<br>
&gt;&gt; &gt;&gt; &gt;&gt; - I repeated the tests twice (128K, then 1M, the=
n 128K, then 1M), and<br>
&gt;&gt; &gt;&gt; &gt;&gt; same results<br>
&gt;&gt; &gt;&gt; &gt;&gt; - probably not important here, but:<br>
&gt;&gt; &gt;&gt; &gt;&gt; /dev/gptid/3c0f5cbc-b0ce-11ea-ab91-c8cbb8cc3ad4 =
is a Red 3TB CMR<br>
&gt;&gt; &gt;&gt; &gt;&gt; (WD30EFRX), and /mnt/tank/docs-florent/ is a 128=
K-recordsize dataset<br>
&gt;&gt; &gt;&gt; &gt;&gt; on another zpool that I never tweaked except ash=
it=3D12 (because using<br>
&gt;&gt; &gt;&gt; &gt;&gt; the same model of Red 3TB)<br>
&gt;&gt; &gt;&gt; &gt;&gt;<br>
&gt;&gt; &gt;&gt; &gt;&gt; # zfs --version<br>
&gt;&gt; &gt;&gt; &gt;&gt; zfs-2.0.6-1<br>
&gt;&gt; &gt;&gt; &gt;&gt; zfs-kmod-v2021120100-zfs_a8c7652<br>
&gt;&gt; &gt;&gt; &gt;&gt;<br>
&gt;&gt; &gt;&gt; &gt;&gt; # uname -a<br>
&gt;&gt; &gt;&gt; &gt;&gt; FreeBSD xxxxxxxxx 12.2-RELEASE-p11 FreeBSD 12.2-=
RELEASE-p11<br>
&gt;&gt; &gt;&gt; &gt;&gt; 75566f060d4(HEAD) TRUENAS=C2=A0 amd64<br>
</blockquote></div>

--00000000000001f4cb05d5dca1eb--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAOeNLuoTdMzD0ooCq%2B6x=uEzcXBmdwPRjm_TfxgUpcnN%2BDhknA>