Date: Tue, 18 Jan 2022 10:07:25 -0500 From: Rich <rincebrain@gmail.com> To: Alan Somers <asomers@freebsd.org> Cc: Florent Rivoire <florent@rivoire.fr>, freebsd-fs <freebsd-fs@freebsd.org> Subject: Re: [zfs] recordsize: unexpected increase of disk usage when increasing it Message-ID: <CAOeNLuoTdMzD0ooCq%2B6x=uEzcXBmdwPRjm_TfxgUpcnN%2BDhknA@mail.gmail.com> In-Reply-To: <CAOtMX2g-0rkYz7Q%2BKO=W49OdF5_GnV%2B-VW6Rb5Eb4LokvaPUpA@mail.gmail.com> References: <CADzRhsEsZMGE-SoeWLMG9NTtkwhhy6OGQQ046m9AxGFbp5h_kQ@mail.gmail.com> <CAOeNLuopaY3j7P030KO4LMwU3BOU5tXiu6gRsSKsDrFEuGKuaA@mail.gmail.com> <CAOtMX2h=miZt=6__oAhPVzsK9ReShy6nG%2BaTiudvK_jp2sQKJQ@mail.gmail.com> <CAOeNLuoQLgKn673FVotxdoDC3HBr1_j%2BzY0t9-uVj7N%2BFkoe1Q@mail.gmail.com> <CAOtMX2g4KduvFA6W062m93jnrJcjQ9KSzkXjb42F1nvhPWaZsw@mail.gmail.com> <CAOeNLuppbdRbC-bsDEqKKUBMO8KKvaLpVs-OcSA2AF2tO5b03w@mail.gmail.com> <CAOtMX2g-0rkYz7Q%2BKO=W49OdF5_GnV%2B-VW6Rb5Eb4LokvaPUpA@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
--00000000000001f4cb05d5dca1eb Content-Type: text/plain; charset="UTF-8" Nope. I just retried it on my FBSD 13-RELEASE VM, too: # uname -a FreeBSD fbsd13rel 13.0-RELEASE-p4 FreeBSD 13.0-RELEASE-p4 #0: Tue Aug 24 07:33:27 UTC 2021 root@amd64-builder.daemonology.net:/usr/obj/usr/src/amd64.amd64/sys/GENERIC amd64 # zpool version zfs-2.1.99-683_ga967e54c2 zfs-kmod-2.1.99-683_ga967e54c2 # zpool get all | grep 'feature@' | grep disabled buildpool feature@edonr disabled local # dd if=/dev/urandom of=/buildpool/testme/2 bs=1179648 count=1 1+0 records in 1+0 records out 1179648 bytes transferred in 0.009827 secs (120041885 bytes/sec) # du -sh /buildpool/testme/2 2.0M /buildpool/testme/2 # zfs get all buildpool/testme | grep -v default NAME PROPERTY VALUE SOURCE buildpool/testme type filesystem - buildpool/testme creation Tue Jan 18 4:46 2022 - buildpool/testme used 4.03M - buildpool/testme available 277G - buildpool/testme referenced 4.03M - buildpool/testme compressratio 1.00x - buildpool/testme mounted yes - buildpool/testme recordsize 1M local buildpool/testme compression off local buildpool/testme atime off inherited from buildpool buildpool/testme createtxg 15030 - buildpool/testme version 5 - buildpool/testme utf8only off - buildpool/testme normalization none - buildpool/testme casesensitivity sensitive - buildpool/testme guid 11057815587819738755 - buildpool/testme usedbysnapshots 0B - buildpool/testme usedbydataset 4.03M - buildpool/testme usedbychildren 0B - buildpool/testme usedbyrefreservation 0B - buildpool/testme objsetid 280 - buildpool/testme refcompressratio 1.00x - buildpool/testme written 4.03M - buildpool/testme logicalused 4.01M - buildpool/testme logicalreferenced 4.01M - What version are you running? - Rich On Tue, Jan 18, 2022 at 10:00 AM Alan Somers <asomers@freebsd.org> wrote: > That's not what I get. Is your pool formatted using a very old > version or something? > > somers@fbsd-head /u/h/somers [1]> > dd if=/dev/random bs=1179648 of=/testpool/food/t/richfile count=1 > 1+0 records in > 1+0 records out > 1179648 bytes transferred in 0.003782 secs (311906705 bytes/sec) > somers@fbsd-head /u/h/somers> du -sh /testpool/food/t/richfile > 1.1M /testpool/food/t/richfile > > On Tue, Jan 18, 2022 at 7:51 AM Rich <rincebrain@gmail.com> wrote: > > > > 2.1M /workspace/test1M/1 > > > > - Rich > > > > On Tue, Jan 18, 2022 at 9:47 AM Alan Somers <asomers@freebsd.org> wrote: > >> > >> Yeah, it does. Just check "du -sh <FILENAME>". zdb there is showing > >> you the logical size of the record, but it isn't showing how many disk > >> blocks are actually allocated. > >> > >> On Tue, Jan 18, 2022 at 7:30 AM Rich <rincebrain@gmail.com> wrote: > >> > > >> > Really? I didn't know it would still trim the tails on files with > compression off. > >> > > >> > ... > >> > > >> > size 1179648 > >> > parent 34 > >> > links 1 > >> > pflags 40800000004 > >> > Indirect blocks: > >> > 0 L1 DVA[0]=<3:c02b96c000:1000> > DVA[1]=<3:c810733000:1000> [L1 ZFS plain file] skein lz4 unencrypted LE > contiguous unique double size=20000L/1000P birth=35675472L/35675472P fill=2 > cksum=5cfba24b351a09aa:8bd9dfef87c5b625:906ed5c3252943db:bed77ce51ad540d4 > >> > 0 L0 DVA[0]=<2:a0827db4000:100000> [L0 ZFS plain > file] skein uncompressed unencrypted LE contiguous unique single > size=100000L/100000P birth=35675472L/35675472P fill=1 > cksum=95b06edf60e5f54c:af6f6950775d0863:8fc28b0783fcd9d3:2e44676e48a59360 > >> > 100000 L0 DVA[0]=<2:a0827eb4000:100000> [L0 ZFS plain > file] skein uncompressed unencrypted LE contiguous unique single > size=100000L/100000P birth=35675472L/35675472P fill=1 > cksum=62a1f05769528648:8197c8a05ca9f1fb:a750c690124dd2e0:390bddc4314cd4c3 > >> > > >> > It seems not? > >> > > >> > - Rich > >> > > >> > > >> > On Tue, Jan 18, 2022 at 9:23 AM Alan Somers <asomers@freebsd.org> > wrote: > >> >> > >> >> On Tue, Jan 18, 2022 at 7:13 AM Rich <rincebrain@gmail.com> wrote: > >> >> > > >> >> > Compression would have made your life better here, and possibly > also made it clearer what's going on. > >> >> > > >> >> > All records in a file are going to be the same size > pre-compression - so if you set the recordsize to 1M and save a 131.1M > file, it's going to take up 132M on disk before compression/raidz > overhead/whatnot. > >> >> > >> >> Not true. ZFS will trim the file's tails even without compression > enabled. > >> >> > >> >> > > >> >> > Usually compression saves you from the tail padding actually > requiring allocation on disk, which is one reason I encourage everyone to > at least use lz4 (or, if you absolutely cannot for some reason, I guess zle > should also work for this one case...) > >> >> > > >> >> > But I would say it's probably the sum of last record padding > across the whole dataset, if you don't have compression on. > >> >> > > >> >> > - Rich > >> >> > > >> >> > On Tue, Jan 18, 2022 at 8:57 AM Florent Rivoire < > florent@rivoire.fr> wrote: > >> >> >> > >> >> >> TLDR: I rsync-ed the same data twice: once with 128K recordsize > and > >> >> >> once with 1M, and the allocated size on disk is ~3% bigger with > 1M. > >> >> >> Why not smaller ? > >> >> >> > >> >> >> > >> >> >> Hello, > >> >> >> > >> >> >> I would like some help to understand how the disk usage evolves > when I > >> >> >> change the recordsize. > >> >> >> > >> >> >> I've read several articles/presentations/forums about recordsize > in > >> >> >> ZFS, and if I try to summarize, I mainly understood that: > >> >> >> - recordsize is the "maximum" size of "objects" (so "logical > blocks") > >> >> >> that zfs will create for both -data & metadata, then each object > is > >> >> >> compressed and allocated to one vdev, splitted into smaller > (ashift > >> >> >> size) "physical" blocks and written on disks > >> >> >> - increasing recordsize is usually good when storing large files > that > >> >> >> are not modified, because it limits the nb of metadata objects > >> >> >> (block-pointers), which has a positive effect on performance > >> >> >> - decreasing recordsize is useful for "databases-like" workloads > (ie: > >> >> >> small random writes inside existing objects), because it avoids > write > >> >> >> amplification (read-modify-write a large object for a small > update) > >> >> >> > >> >> >> Today, I'm trying to observe the effect of increasing recordsize > for > >> >> >> *my* data (because I'm also considering defining > special_small_blocks > >> >> >> & using SSDs as "special", but not tested nor discussed here, just > >> >> >> recordsize). > >> >> >> So, I'm doing some benchmarks on my "documents" dataset (details > in > >> >> >> "notes" below), but the results are really strange to me. > >> >> >> > >> >> >> When I rsync the same data to a freshly-recreated zpool: > >> >> >> A) with recordsize=128K : 226G allocated on disk > >> >> >> B) with recordsize=1M : 232G allocated on disk => bigger than > 128K ?!? > >> >> >> > >> >> >> I would clearly expect the other way around, because bigger > recordsize > >> >> >> generates less metadata so smaller disk usage, and there > shouldn't be > >> >> >> any overhead because 1M is just a maximum and not a forced size to > >> >> >> allocate for every object. > >> >> > >> >> A common misconception. The 1M recordsize applies to every newly > >> >> created object, and every object must use the same size for all of > its > >> >> records (except possibly the last one). But objects created before > >> >> you changed the recsize will retain their old recsize, file tails > have > >> >> a flexible recsize. > >> >> > >> >> >> I don't mind the increased usage (I can live with a few GB more), > but > >> >> >> I would like to understand why it happens. > >> >> > >> >> You might be seeing the effects of sparsity. ZFS is smart enough not > >> >> to store file holes (and if any kind of compression is enabled, it > >> >> will find long runs of zeroes and turn them into holes). If your > data > >> >> contains any holes that are >= 128 kB but < 1MB, then they can be > >> >> stored as holes with a 128 kB recsize but must be stored as long runs > >> >> of zeros with a 1MB recsize. > >> >> > >> >> However, I would suggest that you don't bother. With a 128kB > recsize, > >> >> ZFS has something like a 1000:1 ratio of data:metadata. In other > >> >> words, increasing your recsize can save you at most 0.1% of disk > >> >> space. Basically, it doesn't matter. What it _does_ matter for is > >> >> the tradeoff between write amplification and RAM usage. 1000:1 is > >> >> comparable to the disk:ram of many computers. And performance is > more > >> >> sensitive to metadata access times than data access times. So > >> >> increasing your recsize can help you keep a greater fraction of your > >> >> metadata in ARC. OTOH, as you remarked increasing your recsize will > >> >> also increase write amplification. > >> >> > >> >> So to summarize: > >> >> * Adjust compression settings to save disk space. > >> >> * Adjust recsize to save RAM. > >> >> > >> >> -Alan > >> >> > >> >> >> > >> >> >> I tried to give all the details of my tests below. > >> >> >> Did I do something wrong ? Can you explain the increase ? > >> >> >> > >> >> >> Thanks ! > >> >> >> > >> >> >> > >> >> >> > >> >> >> =============================================== > >> >> >> A) 128K > >> >> >> ========== > >> >> >> > >> >> >> # zpool destroy bench > >> >> >> # zpool create -o ashift=12 bench > >> >> >> /dev/gptid/3c0f5cbc-b0ce-11ea-ab91-c8cbb8cc3ad4 > >> >> >> > >> >> >> # rsync -av --exclude '.zfs' /mnt/tank/docs-florent/ /bench > >> >> >> [...] > >> >> >> sent 241,042,476,154 bytes received 353,838 bytes 81,806,492.45 > bytes/sec > >> >> >> total size is 240,982,439,038 speedup is 1.00 > >> >> >> > >> >> >> # zfs get recordsize bench > >> >> >> NAME PROPERTY VALUE SOURCE > >> >> >> bench recordsize 128K default > >> >> >> > >> >> >> # zpool list -v bench > >> >> >> NAME SIZE ALLOC FREE > >> >> >> CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT > >> >> >> bench 2.72T 226G 2.50T > >> >> >> - - 0% 8% 1.00x ONLINE - > >> >> >> gptid/3c0f5cbc-b0ce-11ea-ab91-c8cbb8cc3ad4 2.72T 226G 2.50T > >> >> >> - - 0% 8.10% - ONLINE > >> >> >> > >> >> >> # zfs list bench > >> >> >> NAME USED AVAIL REFER MOUNTPOINT > >> >> >> bench 226G 2.41T 226G /bench > >> >> >> > >> >> >> # zfs get all bench |egrep "(used|referenced|written)" > >> >> >> bench used 226G - > >> >> >> bench referenced 226G - > >> >> >> bench usedbysnapshots 0B - > >> >> >> bench usedbydataset 226G - > >> >> >> bench usedbychildren 1.80M - > >> >> >> bench usedbyrefreservation 0B - > >> >> >> bench written 226G - > >> >> >> bench logicalused 226G - > >> >> >> bench logicalreferenced 226G - > >> >> >> > >> >> >> # zdb -Lbbbs bench > zpool-bench-rcd128K.zdb > >> >> >> > >> >> >> > >> >> >> > >> >> >> =============================================== > >> >> >> B) 1M > >> >> >> ========== > >> >> >> > >> >> >> # zpool destroy bench > >> >> >> # zpool create -o ashift=12 bench > >> >> >> /dev/gptid/3c0f5cbc-b0ce-11ea-ab91-c8cbb8cc3ad4 > >> >> >> # zfs set recordsize=1M bench > >> >> >> > >> >> >> # rsync -av --exclude '.zfs' /mnt/tank/docs-florent/ /bench > >> >> >> [...] > >> >> >> sent 241,042,476,154 bytes received 353,830 bytes 80,173,899.88 > bytes/sec > >> >> >> total size is 240,982,439,038 speedup is 1.00 > >> >> >> > >> >> >> # zfs get recordsize bench > >> >> >> NAME PROPERTY VALUE SOURCE > >> >> >> bench recordsize 1M local > >> >> >> > >> >> >> # zpool list -v bench > >> >> >> NAME SIZE ALLOC FREE > >> >> >> CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT > >> >> >> bench 2.72T 232G 2.49T > >> >> >> - - 0% 8% 1.00x ONLINE - > >> >> >> gptid/3c0f5cbc-b0ce-11ea-ab91-c8cbb8cc3ad4 2.72T 232G 2.49T > >> >> >> - - 0% 8.32% - ONLINE > >> >> >> > >> >> >> # zfs list bench > >> >> >> NAME USED AVAIL REFER MOUNTPOINT > >> >> >> bench 232G 2.41T 232G /bench > >> >> >> > >> >> >> # zfs get all bench |egrep "(used|referenced|written)" > >> >> >> bench used 232G - > >> >> >> bench referenced 232G - > >> >> >> bench usedbysnapshots 0B - > >> >> >> bench usedbydataset 232G - > >> >> >> bench usedbychildren 1.96M - > >> >> >> bench usedbyrefreservation 0B - > >> >> >> bench written 232G - > >> >> >> bench logicalused 232G - > >> >> >> bench logicalreferenced 232G - > >> >> >> > >> >> >> # zdb -Lbbbs bench > zpool-bench-rcd1M.zdb > >> >> >> > >> >> >> > >> >> >> > >> >> >> =============================================== > >> >> >> Notes: > >> >> >> ========== > >> >> >> > >> >> >> - the source dataset contains ~50% of pictures (raw files and > jpg), > >> >> >> and also some music, various archived documents, zip, videos > >> >> >> - no change on the source dataset while testing (cf size logged > by resync) > >> >> >> - I repeated the tests twice (128K, then 1M, then 128K, then 1M), > and > >> >> >> same results > >> >> >> - probably not important here, but: > >> >> >> /dev/gptid/3c0f5cbc-b0ce-11ea-ab91-c8cbb8cc3ad4 is a Red 3TB CMR > >> >> >> (WD30EFRX), and /mnt/tank/docs-florent/ is a 128K-recordsize > dataset > >> >> >> on another zpool that I never tweaked except ashit=12 (because > using > >> >> >> the same model of Red 3TB) > >> >> >> > >> >> >> # zfs --version > >> >> >> zfs-2.0.6-1 > >> >> >> zfs-kmod-v2021120100-zfs_a8c7652 > >> >> >> > >> >> >> # uname -a > >> >> >> FreeBSD xxxxxxxxx 12.2-RELEASE-p11 FreeBSD 12.2-RELEASE-p11 > >> >> >> 75566f060d4(HEAD) TRUENAS amd64 > --00000000000001f4cb05d5dca1eb Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable <div dir=3D"ltr">Nope. I just retried it on my FBSD 13-RELEASE VM, too:<div= ># uname -a</div><div>FreeBSD fbsd13rel 13.0-RELEASE-p4 FreeBSD 13.0-RELEAS= E-p4 #0: Tue Aug 24 07:33:27 UTC 2021 =C2=A0 =C2=A0 root@amd64-builder.daem= onology.net:/usr/obj/usr/src/amd64.amd64/sys/GENERIC =C2=A0amd64<br><div># = zpool version</div><div>zfs-2.1.99-683_ga967e54c2<br>zfs-kmod-2.1.99-683_ga= 967e54c2<br></div><div># zpool get all | grep 'feature@' | grep dis= abled</div><div>buildpool=C2=A0 feature@edonr =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0disabled =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 local</div><div># dd if=3D/de= v/urandom of=3D/buildpool/testme/2 bs=3D1179648 count=3D1</div>1+0 records = in<br>1+0 records out<br>1179648 bytes transferred in 0.009827 secs (120041= 885 bytes/sec)<br># du -sh /buildpool/testme/2<br>2.0M =C2=A0 =C2=A0/buildp= ool/testme/2<br><div># zfs get all buildpool/testme | grep -v default</div>= NAME =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0PROPERTY =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0VALUE =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0SOURCE<br>buildpool/testme =C2=A0type =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0filesystem =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 -<br>buildpool/testme =C2=A0creation =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Tue Jan 18 =C2=A04:46 2022 =C2= =A0-<br>buildpool/testme =C2=A0used =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A04.03M =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0-<br>buildpool/testme =C2=A0available =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 277G =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 -<br>buildpool/testme =C2=A0referenced =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A04.03M =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0-<br>buildpool/testme =C2=A0compressratio =C2=A0 =C2=A0 =C2=A0= =C2=A0 1.00x =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0-<br>buildpool/testme =C2=A0mounted =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 yes =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0-<br>buildpool/testme =C2=A0recordsize =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A01M =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 local<br>buildpool/testme =C2=A0compression =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 off =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0local<br>buildpool/testme =C2=A0atime =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 off =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0inherited from buildpool<br>buildpool= /testme =C2=A0createtxg =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 15030 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0-<br>buildpool/t= estme =C2=A0version =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 5 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0-<= br>buildpool/testme =C2=A0utf8only =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0off =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0-<br>buildpool/testme =C2=A0normalization =C2=A0 =C2=A0 =C2=A0 = =C2=A0 none =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = -<br>buildpool/testme =C2=A0casesensitivity =C2=A0 =C2=A0 =C2=A0 sensitive = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0-<br>buildpool/testme =C2= =A0guid =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A011057= 815587819738755 =C2=A0 -<br>buildpool/testme =C2=A0usedbysnapshots =C2=A0 = =C2=A0 =C2=A0 0B =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 -<br>buildpool/testme =C2=A0usedbydataset =C2=A0 =C2=A0 =C2= =A0 =C2=A0 4.03M =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0-<br>buildpool/testme =C2=A0usedbychildren =C2=A0 =C2=A0 =C2=A0 =C2= =A00B =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= -<br>buildpool/testme =C2=A0usedbyrefreservation =C2=A00B =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 -<br>buildpool/test= me =C2=A0objsetid =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0280 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0-<br>buil= dpool/testme =C2=A0refcompressratio =C2=A0 =C2=A0 =C2=A01.00x =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0-<br>buildpool/testme =C2= =A0written =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 4.03M =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0-<br>buildpool/testm= e =C2=A0logicalused =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 4.01M =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0-<br>buildpool/testme =C2= =A0logicalreferenced =C2=A0 =C2=A0 4.01M =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0-<br></div><div><br></div><div>What version are= you running?</div><div><br></div><div>- Rich</div></div><br><div class=3D"= gmail_quote"><div dir=3D"ltr" class=3D"gmail_attr">On Tue, Jan 18, 2022 at = 10:00 AM Alan Somers <<a href=3D"mailto:asomers@freebsd.org">asomers@fre= ebsd.org</a>> wrote:<br></div><blockquote class=3D"gmail_quote" style=3D= "margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-le= ft:1ex">That's not what I get.=C2=A0 Is your pool formatted using a ver= y old<br> version or something?<br> <br> somers@fbsd-head /u/h/somers [1]><br> dd if=3D/dev/random bs=3D1179648 of=3D/testpool/food/t/richfile count=3D1<b= r> 1+0 records in<br> 1+0 records out<br> 1179648 bytes transferred in 0.003782 secs (311906705 bytes/sec)<br> somers@fbsd-head /u/h/somers> du -sh=C2=A0 /testpool/food/t/richfile<br> 1.1M=C2=A0 =C2=A0 /testpool/food/t/richfile<br> <br> On Tue, Jan 18, 2022 at 7:51 AM Rich <<a href=3D"mailto:rincebrain@gmail= .com" target=3D"_blank">rincebrain@gmail.com</a>> wrote:<br> ><br> > 2.1M=C2=A0 =C2=A0 /workspace/test1M/1<br> ><br> > - Rich<br> ><br> > On Tue, Jan 18, 2022 at 9:47 AM Alan Somers <<a href=3D"mailto:asom= ers@freebsd.org" target=3D"_blank">asomers@freebsd.org</a>> wrote:<br> >><br> >> Yeah, it does.=C2=A0 Just check "du -sh <FILENAME>"= ;.=C2=A0 zdb there is showing<br> >> you the logical size of the record, but it isn't showing how m= any disk<br> >> blocks are actually allocated.<br> >><br> >> On Tue, Jan 18, 2022 at 7:30 AM Rich <<a href=3D"mailto:rincebr= ain@gmail.com" target=3D"_blank">rincebrain@gmail.com</a>> wrote:<br> >> ><br> >> > Really? I didn't know it would still trim the tails on fi= les with compression off.<br> >> ><br> >> > ...<br> >> ><br> >> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0size=C2=A0 =C2=A0 1179648<br= > >> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0parent=C2=A0 34<br> >> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0links=C2=A0 =C2=A01<br> >> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0pflags=C2=A0 40800000004<br> >> > Indirect blocks:<br> >> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 0 L1= =C2=A0 DVA[0]=3D<3:c02b96c000:1000> DVA[1]=3D<3:c810733000:1000>= ; [L1 ZFS plain file] skein lz4 unencrypted LE contiguous unique double siz= e=3D20000L/1000P birth=3D35675472L/35675472P fill=3D2 cksum=3D5cfba24b351a0= 9aa:8bd9dfef87c5b625:906ed5c3252943db:bed77ce51ad540d4<br> >> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 0=C2= =A0 L0 DVA[0]=3D<2:a0827db4000:100000> [L0 ZFS plain file] skein unco= mpressed unencrypted LE contiguous unique single size=3D100000L/100000P bir= th=3D35675472L/35675472P fill=3D1 cksum=3D95b06edf60e5f54c:af6f6950775d0863= :8fc28b0783fcd9d3:2e44676e48a59360<br> >> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0100000=C2=A0 L0 DVA[0= ]=3D<2:a0827eb4000:100000> [L0 ZFS plain file] skein uncompressed une= ncrypted LE contiguous unique single size=3D100000L/100000P birth=3D3567547= 2L/35675472P fill=3D1 cksum=3D62a1f05769528648:8197c8a05ca9f1fb:a750c690124= dd2e0:390bddc4314cd4c3<br> >> ><br> >> > It seems not?<br> >> ><br> >> > - Rich<br> >> ><br> >> ><br> >> > On Tue, Jan 18, 2022 at 9:23 AM Alan Somers <<a href=3D"ma= ilto:asomers@freebsd.org" target=3D"_blank">asomers@freebsd.org</a>> wro= te:<br> >> >><br> >> >> On Tue, Jan 18, 2022 at 7:13 AM Rich <<a href=3D"mailt= o:rincebrain@gmail.com" target=3D"_blank">rincebrain@gmail.com</a>> wrot= e:<br> >> >> ><br> >> >> > Compression would have made your life better here, a= nd possibly also made it clearer what's going on.<br> >> >> ><br> >> >> > All records in a file are going to be the same size = pre-compression - so if you set the recordsize to 1M and save a 131.1M file= , it's going to take up 132M on disk before compression/raidz overhead/= whatnot.<br> >> >><br> >> >> Not true.=C2=A0 ZFS will trim the file's tails even w= ithout compression enabled.<br> >> >><br> >> >> ><br> >> >> > Usually compression saves you from the tail padding = actually requiring allocation on disk, which is one reason I encourage ever= yone to at least use lz4 (or, if you absolutely cannot for some reason, I g= uess zle should also work for this one case...)<br> >> >> ><br> >> >> > But I would say it's probably the sum of last re= cord padding across the whole dataset, if you don't have compression on= .<br> >> >> ><br> >> >> > - Rich<br> >> >> ><br> >> >> > On Tue, Jan 18, 2022 at 8:57 AM Florent Rivoire <= <a href=3D"mailto:florent@rivoire.fr" target=3D"_blank">florent@rivoire.fr<= /a>> wrote:<br> >> >> >><br> >> >> >> TLDR: I rsync-ed the same data twice: once with = 128K recordsize and<br> >> >> >> once with 1M, and the allocated size on disk is = ~3% bigger with 1M.<br> >> >> >> Why not smaller ?<br> >> >> >><br> >> >> >><br> >> >> >> Hello,<br> >> >> >><br> >> >> >> I would like some help to understand how the dis= k usage evolves when I<br> >> >> >> change the recordsize.<br> >> >> >><br> >> >> >> I've read several articles/presentations/for= ums about recordsize in<br> >> >> >> ZFS, and if I try to summarize, I mainly underst= ood that:<br> >> >> >> - recordsize is the "maximum" size of = "objects" (so "logical blocks")<br> >> >> >> that zfs will create for both=C2=A0 -data & = metadata, then each object is<br> >> >> >> compressed and allocated to one vdev, splitted i= nto smaller (ashift<br> >> >> >> size) "physical" blocks and written on= disks<br> >> >> >> - increasing recordsize is usually good when sto= ring large files that<br> >> >> >> are not modified, because it limits the nb of me= tadata objects<br> >> >> >> (block-pointers), which has a positive effect on= performance<br> >> >> >> - decreasing recordsize is useful for "data= bases-like" workloads (ie:<br> >> >> >> small random writes inside existing objects), be= cause it avoids write<br> >> >> >> amplification (read-modify-write a large object = for a small update)<br> >> >> >><br> >> >> >> Today, I'm trying to observe the effect of i= ncreasing recordsize for<br> >> >> >> *my* data (because I'm also considering defi= ning special_small_blocks<br> >> >> >> & using SSDs as "special", but not= tested nor discussed here, just<br> >> >> >> recordsize).<br> >> >> >> So, I'm doing some benchmarks on my "do= cuments" dataset (details in<br> >> >> >> "notes" below), but the results are re= ally strange to me.<br> >> >> >><br> >> >> >> When I rsync the same data to a freshly-recreate= d zpool:<br> >> >> >> A) with recordsize=3D128K : 226G allocated on di= sk<br> >> >> >> B) with recordsize=3D1M : 232G allocated on disk= =3D> bigger than 128K ?!?<br> >> >> >><br> >> >> >> I would clearly expect the other way around, bec= ause bigger recordsize<br> >> >> >> generates less metadata so smaller disk usage, a= nd there shouldn't be<br> >> >> >> any overhead because 1M is just a maximum and no= t a forced size to<br> >> >> >> allocate for every object.<br> >> >><br> >> >> A common misconception.=C2=A0 The 1M recordsize applies t= o every newly<br> >> >> created object, and every object must use the same size f= or all of its<br> >> >> records (except possibly the last one).=C2=A0 But objects= created before<br> >> >> you changed the recsize will retain their old recsize, fi= le tails have<br> >> >> a flexible recsize.<br> >> >><br> >> >> >> I don't mind the increased usage (I can live= with a few GB more), but<br> >> >> >> I would like to understand why it happens.<br> >> >><br> >> >> You might be seeing the effects of sparsity.=C2=A0 ZFS is= smart enough not<br> >> >> to store file holes (and if any kind of compression is en= abled, it<br> >> >> will find long runs of zeroes and turn them into holes).= =C2=A0 If your data<br> >> >> contains any holes that are >=3D 128 kB but < 1MB, = then they can be<br> >> >> stored as holes with a 128 kB recsize but must be stored = as long runs<br> >> >> of zeros with a 1MB recsize.<br> >> >><br> >> >> However, I would suggest that you don't bother.=C2=A0= With a 128kB recsize,<br> >> >> ZFS has something like a 1000:1 ratio of data:metadata.= =C2=A0 In other<br> >> >> words, increasing your recsize can save you at most 0.1% = of disk<br> >> >> space.=C2=A0 Basically, it doesn't matter.=C2=A0 What= it _does_ matter for is<br> >> >> the tradeoff between write amplification and RAM usage.= =C2=A0 1000:1 is<br> >> >> comparable to the disk:ram of many computers.=C2=A0 And p= erformance is more<br> >> >> sensitive to metadata access times than data access times= .=C2=A0 So<br> >> >> increasing your recsize can help you keep a greater fract= ion of your<br> >> >> metadata in ARC.=C2=A0 OTOH, as you remarked increasing y= our recsize will<br> >> >> also increase write amplification.<br> >> >><br> >> >> So to summarize:<br> >> >> * Adjust compression settings to save disk space.<br> >> >> * Adjust recsize to save RAM.<br> >> >><br> >> >> -Alan<br> >> >><br> >> >> >><br> >> >> >> I tried to give all the details of my tests belo= w.<br> >> >> >> Did I do something wrong ? Can you explain the i= ncrease ?<br> >> >> >><br> >> >> >> Thanks !<br> >> >> >><br> >> >> >><br> >> >> >><br> >> >> >> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D<br> >> >> >> A) 128K<br> >> >> >> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D<br> >> >> >><br> >> >> >> # zpool destroy bench<br> >> >> >> # zpool create -o ashift=3D12 bench<br> >> >> >> /dev/gptid/3c0f5cbc-b0ce-11ea-ab91-c8cbb8cc3ad4<= br> >> >> >><br> >> >> >> # rsync -av --exclude '.zfs' /mnt/tank/d= ocs-florent/ /bench<br> >> >> >> [...]<br> >> >> >> sent 241,042,476,154 bytes=C2=A0 received 353,83= 8 bytes=C2=A0 81,806,492.45 bytes/sec<br> >> >> >> total size is 240,982,439,038=C2=A0 speedup is 1= .00<br> >> >> >><br> >> >> >> # zfs get recordsize bench<br> >> >> >> NAME=C2=A0 =C2=A0PROPERTY=C2=A0 =C2=A0 VALUE=C2= =A0 =C2=A0 SOURCE<br> >> >> >> bench=C2=A0 recordsize=C2=A0 128K=C2=A0 =C2=A0 = =C2=A0default<br> >> >> >><br> >> >> >> # zpool list -v bench<br> >> >> >> NAME=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0SIZE=C2=A0 ALLOC=C2=A0 =C2=A0FREE<br> >> >> >> CKPOINT=C2=A0 EXPANDSZ=C2=A0 =C2=A0FRAG=C2=A0 = =C2=A0 CAP=C2=A0 DEDUP=C2=A0 =C2=A0 HEALTH=C2=A0 ALTROOT<br> >> >> >> bench=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A02.72T=C2=A0 =C2=A0226G=C2=A0 2.50T<br> >> >> >>=C2=A0 =C2=A0-=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0-= =C2=A0 =C2=A0 =C2=A00%=C2=A0 =C2=A0 =C2=A08%=C2=A0 1.00x=C2=A0 =C2=A0 ONLIN= E=C2=A0 -<br> >> >> >>=C2=A0 =C2=A0gptid/3c0f5cbc-b0ce-11ea-ab91-c8cbb8= cc3ad4=C2=A0 2.72T=C2=A0 =C2=A0226G=C2=A0 2.50T<br> >> >> >>=C2=A0 =C2=A0-=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0-= =C2=A0 =C2=A0 =C2=A00%=C2=A0 8.10%=C2=A0 =C2=A0 =C2=A0 -=C2=A0 =C2=A0 ONLIN= E<br> >> >> >><br> >> >> >> # zfs list bench<br> >> >> >> NAME=C2=A0 =C2=A0 USED=C2=A0 AVAIL=C2=A0 =C2=A0 = =C2=A0REFER=C2=A0 MOUNTPOINT<br> >> >> >> bench=C2=A0 =C2=A0226G=C2=A0 2.41T=C2=A0 =C2=A0 = =C2=A0 226G=C2=A0 /bench<br> >> >> >><br> >> >> >> # zfs get all bench |egrep "(used|reference= d|written)"<br> >> >> >> bench=C2=A0 used=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 226G=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0-<br> >> >> >> bench=C2=A0 referenced=C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 226G=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0-<br> >> >> >> bench=C2=A0 usedbysnapshots=C2=A0 =C2=A0 =C2=A0 = =C2=A00B=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0-<br> >> >> >> bench=C2=A0 usedbydataset=C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0226G=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0-<br> >> >> >> bench=C2=A0 usedbychildren=C2=A0 =C2=A0 =C2=A0 = =C2=A0 1.80M=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = -<br> >> >> >> bench=C2=A0 usedbyrefreservation=C2=A0 0B=C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0-<br> >> >> >> bench=C2=A0 written=C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0226G=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0-<br> >> >> >> bench=C2=A0 logicalused=C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0226G=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0-<br> >> >> >> bench=C2=A0 logicalreferenced=C2=A0 =C2=A0 =C2= =A0226G=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0-<br> >> >> >><br> >> >> >> # zdb -Lbbbs bench > zpool-bench-rcd128K.zdb<= br> >> >> >><br> >> >> >><br> >> >> >><br> >> >> >> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D<br> >> >> >> B) 1M<br> >> >> >> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D<br> >> >> >><br> >> >> >> # zpool destroy bench<br> >> >> >> # zpool create -o ashift=3D12 bench<br> >> >> >> /dev/gptid/3c0f5cbc-b0ce-11ea-ab91-c8cbb8cc3ad4<= br> >> >> >> # zfs set recordsize=3D1M bench<br> >> >> >><br> >> >> >> # rsync -av --exclude '.zfs' /mnt/tank/d= ocs-florent/ /bench<br> >> >> >> [...]<br> >> >> >> sent 241,042,476,154 bytes=C2=A0 received 353,83= 0 bytes=C2=A0 80,173,899.88 bytes/sec<br> >> >> >> total size is 240,982,439,038=C2=A0 speedup is 1= .00<br> >> >> >><br> >> >> >> # zfs get recordsize bench<br> >> >> >> NAME=C2=A0 =C2=A0PROPERTY=C2=A0 =C2=A0 VALUE=C2= =A0 =C2=A0 SOURCE<br> >> >> >> bench=C2=A0 recordsize=C2=A0 1M=C2=A0 =C2=A0 =C2= =A0 =C2=A0local<br> >> >> >><br> >> >> >> # zpool list -v bench<br> >> >> >> NAME=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0SIZE=C2=A0 ALLOC=C2=A0 =C2=A0FREE<br> >> >> >> CKPOINT=C2=A0 EXPANDSZ=C2=A0 =C2=A0FRAG=C2=A0 = =C2=A0 CAP=C2=A0 DEDUP=C2=A0 =C2=A0 HEALTH=C2=A0 ALTROOT<br> >> >> >> bench=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A02.72T=C2=A0 =C2=A0232G=C2=A0 2.49T<br> >> >> >>=C2=A0 =C2=A0-=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0-= =C2=A0 =C2=A0 =C2=A00%=C2=A0 =C2=A0 =C2=A08%=C2=A0 1.00x=C2=A0 =C2=A0 ONLIN= E=C2=A0 -<br> >> >> >>=C2=A0 =C2=A0gptid/3c0f5cbc-b0ce-11ea-ab91-c8cbb8= cc3ad4=C2=A0 2.72T=C2=A0 =C2=A0232G=C2=A0 2.49T<br> >> >> >>=C2=A0 =C2=A0-=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0-= =C2=A0 =C2=A0 =C2=A00%=C2=A0 8.32%=C2=A0 =C2=A0 =C2=A0 -=C2=A0 =C2=A0 ONLIN= E<br> >> >> >><br> >> >> >> # zfs list bench<br> >> >> >> NAME=C2=A0 =C2=A0 USED=C2=A0 AVAIL=C2=A0 =C2=A0 = =C2=A0REFER=C2=A0 MOUNTPOINT<br> >> >> >> bench=C2=A0 =C2=A0232G=C2=A0 2.41T=C2=A0 =C2=A0 = =C2=A0 232G=C2=A0 /bench<br> >> >> >><br> >> >> >> # zfs get all bench |egrep "(used|reference= d|written)"<br> >> >> >> bench=C2=A0 used=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 232G=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0-<br> >> >> >> bench=C2=A0 referenced=C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 232G=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0-<br> >> >> >> bench=C2=A0 usedbysnapshots=C2=A0 =C2=A0 =C2=A0 = =C2=A00B=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0-<br> >> >> >> bench=C2=A0 usedbydataset=C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0232G=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0-<br> >> >> >> bench=C2=A0 usedbychildren=C2=A0 =C2=A0 =C2=A0 = =C2=A0 1.96M=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = -<br> >> >> >> bench=C2=A0 usedbyrefreservation=C2=A0 0B=C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0-<br> >> >> >> bench=C2=A0 written=C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0232G=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0-<br> >> >> >> bench=C2=A0 logicalused=C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0232G=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0-<br> >> >> >> bench=C2=A0 logicalreferenced=C2=A0 =C2=A0 =C2= =A0232G=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0-<br> >> >> >><br> >> >> >> # zdb -Lbbbs bench > zpool-bench-rcd1M.zdb<br= > >> >> >><br> >> >> >><br> >> >> >><br> >> >> >> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D<br> >> >> >> Notes:<br> >> >> >> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D<br> >> >> >><br> >> >> >> - the source dataset contains ~50% of pictures (= raw files and jpg),<br> >> >> >> and also some music, various archived documents,= zip, videos<br> >> >> >> - no change on the source dataset while testing = (cf size logged by resync)<br> >> >> >> - I repeated the tests twice (128K, then 1M, the= n 128K, then 1M), and<br> >> >> >> same results<br> >> >> >> - probably not important here, but:<br> >> >> >> /dev/gptid/3c0f5cbc-b0ce-11ea-ab91-c8cbb8cc3ad4 = is a Red 3TB CMR<br> >> >> >> (WD30EFRX), and /mnt/tank/docs-florent/ is a 128= K-recordsize dataset<br> >> >> >> on another zpool that I never tweaked except ash= it=3D12 (because using<br> >> >> >> the same model of Red 3TB)<br> >> >> >><br> >> >> >> # zfs --version<br> >> >> >> zfs-2.0.6-1<br> >> >> >> zfs-kmod-v2021120100-zfs_a8c7652<br> >> >> >><br> >> >> >> # uname -a<br> >> >> >> FreeBSD xxxxxxxxx 12.2-RELEASE-p11 FreeBSD 12.2-= RELEASE-p11<br> >> >> >> 75566f060d4(HEAD) TRUENAS=C2=A0 amd64<br> </blockquote></div> --00000000000001f4cb05d5dca1eb--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAOeNLuoTdMzD0ooCq%2B6x=uEzcXBmdwPRjm_TfxgUpcnN%2BDhknA>