Date: Tue, 18 Jan 2022 14:56:46 +0100 From: Florent Rivoire <florent@rivoire.fr> To: freebsd-fs@freebsd.org Subject: [zfs] recordsize: unexpected increase of disk usage when increasing it Message-ID: <CADzRhsEsZMGE-SoeWLMG9NTtkwhhy6OGQQ046m9AxGFbp5h_kQ@mail.gmail.com>
index | next in thread | raw e-mail
[-- Attachment #1 --]
TLDR: I rsync-ed the same data twice: once with 128K recordsize and
once with 1M, and the allocated size on disk is ~3% bigger with 1M.
Why not smaller ?
Hello,
I would like some help to understand how the disk usage evolves when I
change the recordsize.
I've read several articles/presentations/forums about recordsize in
ZFS, and if I try to summarize, I mainly understood that:
- recordsize is the "maximum" size of "objects" (so "logical blocks")
that zfs will create for both -data & metadata, then each object is
compressed and allocated to one vdev, splitted into smaller (ashift
size) "physical" blocks and written on disks
- increasing recordsize is usually good when storing large files that
are not modified, because it limits the nb of metadata objects
(block-pointers), which has a positive effect on performance
- decreasing recordsize is useful for "databases-like" workloads (ie:
small random writes inside existing objects), because it avoids write
amplification (read-modify-write a large object for a small update)
Today, I'm trying to observe the effect of increasing recordsize for
*my* data (because I'm also considering defining special_small_blocks
& using SSDs as "special", but not tested nor discussed here, just
recordsize).
So, I'm doing some benchmarks on my "documents" dataset (details in
"notes" below), but the results are really strange to me.
When I rsync the same data to a freshly-recreated zpool:
A) with recordsize=128K : 226G allocated on disk
B) with recordsize=1M : 232G allocated on disk => bigger than 128K ?!?
I would clearly expect the other way around, because bigger recordsize
generates less metadata so smaller disk usage, and there shouldn't be
any overhead because 1M is just a maximum and not a forced size to
allocate for every object.
I don't mind the increased usage (I can live with a few GB more), but
I would like to understand why it happens.
I tried to give all the details of my tests below.
Did I do something wrong ? Can you explain the increase ?
Thanks !
===============================================
A) 128K
==========
# zpool destroy bench
# zpool create -o ashift=12 bench
/dev/gptid/3c0f5cbc-b0ce-11ea-ab91-c8cbb8cc3ad4
# rsync -av --exclude '.zfs' /mnt/tank/docs-florent/ /bench
[...]
sent 241,042,476,154 bytes received 353,838 bytes 81,806,492.45 bytes/sec
total size is 240,982,439,038 speedup is 1.00
# zfs get recordsize bench
NAME PROPERTY VALUE SOURCE
bench recordsize 128K default
# zpool list -v bench
NAME SIZE ALLOC FREE
CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
bench 2.72T 226G 2.50T
- - 0% 8% 1.00x ONLINE -
gptid/3c0f5cbc-b0ce-11ea-ab91-c8cbb8cc3ad4 2.72T 226G 2.50T
- - 0% 8.10% - ONLINE
# zfs list bench
NAME USED AVAIL REFER MOUNTPOINT
bench 226G 2.41T 226G /bench
# zfs get all bench |egrep "(used|referenced|written)"
bench used 226G -
bench referenced 226G -
bench usedbysnapshots 0B -
bench usedbydataset 226G -
bench usedbychildren 1.80M -
bench usedbyrefreservation 0B -
bench written 226G -
bench logicalused 226G -
bench logicalreferenced 226G -
# zdb -Lbbbs bench > zpool-bench-rcd128K.zdb
===============================================
B) 1M
==========
# zpool destroy bench
# zpool create -o ashift=12 bench
/dev/gptid/3c0f5cbc-b0ce-11ea-ab91-c8cbb8cc3ad4
# zfs set recordsize=1M bench
# rsync -av --exclude '.zfs' /mnt/tank/docs-florent/ /bench
[...]
sent 241,042,476,154 bytes received 353,830 bytes 80,173,899.88 bytes/sec
total size is 240,982,439,038 speedup is 1.00
# zfs get recordsize bench
NAME PROPERTY VALUE SOURCE
bench recordsize 1M local
# zpool list -v bench
NAME SIZE ALLOC FREE
CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
bench 2.72T 232G 2.49T
- - 0% 8% 1.00x ONLINE -
gptid/3c0f5cbc-b0ce-11ea-ab91-c8cbb8cc3ad4 2.72T 232G 2.49T
- - 0% 8.32% - ONLINE
# zfs list bench
NAME USED AVAIL REFER MOUNTPOINT
bench 232G 2.41T 232G /bench
# zfs get all bench |egrep "(used|referenced|written)"
bench used 232G -
bench referenced 232G -
bench usedbysnapshots 0B -
bench usedbydataset 232G -
bench usedbychildren 1.96M -
bench usedbyrefreservation 0B -
bench written 232G -
bench logicalused 232G -
bench logicalreferenced 232G -
# zdb -Lbbbs bench > zpool-bench-rcd1M.zdb
===============================================
Notes:
==========
- the source dataset contains ~50% of pictures (raw files and jpg),
and also some music, various archived documents, zip, videos
- no change on the source dataset while testing (cf size logged by resync)
- I repeated the tests twice (128K, then 1M, then 128K, then 1M), and
same results
- probably not important here, but:
/dev/gptid/3c0f5cbc-b0ce-11ea-ab91-c8cbb8cc3ad4 is a Red 3TB CMR
(WD30EFRX), and /mnt/tank/docs-florent/ is a 128K-recordsize dataset
on another zpool that I never tweaked except ashit=12 (because using
the same model of Red 3TB)
# zfs --version
zfs-2.0.6-1
zfs-kmod-v2021120100-zfs_a8c7652
# uname -a
FreeBSD xxxxxxxxx 12.2-RELEASE-p11 FreeBSD 12.2-RELEASE-p11
75566f060d4(HEAD) TRUENAS amd64
[-- Attachment #2 --]
Traversing all blocks ...
bp count: 256008
ganged count: 0
bp logical: 249314841600 avg: 973855
bp physical: 248865674240 avg: 972101 compression: 1.00
bp allocated: 248943955968 avg: 972406 compression: 1.00
bp deduped: 0 ref>1: 0 deduplication: 1.00
Normal class: 245513138176 used: 8.21%
additional, non-pointer bps of type 0: 169
number of (compressed) bytes: number of bps
28: 8 ********
29: 3 ***
30: 0
31: 0
32: 1 *
33: 0
34: 0
35: 0
36: 0
37: 0
38: 0
39: 0
40: 2 **
41: 2 **
42: 0
43: 0
44: 0
45: 1 *
46: 0
47: 0
48: 1 *
49: 1 *
50: 1 *
51: 0
52: 0
53: 1 *
54: 8 ********
55: 12 ************
56: 0
57: 2 **
58: 0
59: 0
60: 0
61: 0
62: 4 ****
63: 0
64: 4 ****
65: 0
66: 3 ***
67: 2 **
68: 0
69: 1 *
70: 0
71: 0
72: 3 ***
73: 33 *********************************
74: 7 *******
75: 2 **
76: 5 *****
77: 3 ***
78: 1 *
79: 1 *
80: 2 **
81: 1 *
82: 3 ***
83: 1 *
84: 1 *
85: 0
86: 3 ***
87: 2 **
88: 1 *
89: 3 ***
90: 2 **
91: 1 *
92: 2 **
93: 0
94: 1 *
95: 2 **
96: 1 *
97: 0
98: 7 *******
99: 3 ***
100: 1 *
101: 1 *
102: 2 **
103: 1 *
104: 2 **
105: 1 *
106: 4 ****
107: 2 **
108: 2 **
109: 0
110: 2 **
111: 2 **
112: 1 *
Dittoed blocks on same vdev: 16330
Blocks LSIZE PSIZE ASIZE avg comp %Total Type
- - - - - - - unallocated
2 32K 8K 24K 12K 4.00 0.00 object directory
1 512 512 12K 12K 1.00 0.00 object array
1 16K 4K 12K 12K 4.00 0.00 packed nvlist
- - - - - - - packed nvlist size
- - - - - - - bpobj
- - - - - - - bpobj header
- - - - - - - SPA space map header
108 13.5M 476K 1.39M 13.2K 29.04 0.00 SPA space map
- - - - - - - ZIL intent log
1 128K 4K 8K 8K 32.00 0.00 L5 DMU dnode
1 128K 4K 8K 8K 32.00 0.00 L4 DMU dnode
1 128K 4K 8K 8K 32.00 0.00 L3 DMU dnode
1 128K 4K 8K 8K 32.00 0.00 L2 DMU dnode
2 256K 36K 76K 38K 7.11 0.00 L1 DMU dnode
628 9.81M 2.47M 5.07M 8.27K 3.97 0.00 L0 DMU dnode
634 10.6M 2.52M 5.18M 8.36K 4.19 0.00 DMU dnode
2 8K 8K 20K 10K 1.00 0.00 DMU objset
- - - - - - - DSL directory
- - - - - - - DSL directory child map
- - - - - - - DSL dataset snap map
- - - - - - - DSL props
- - - - - - - DSL dataset
- - - - - - - ZFS znode
- - - - - - - ZFS V0 ACL
63 1.97M 252K 504K 8K 8.00 0.00 L2 ZFS plain file
14.3K 458M 59.6M 119M 8.32K 7.69 0.05 L1 ZFS plain file
234K 232G 232G 232G 1014K 1.00 99.94 L0 ZFS plain file
248K 232G 232G 232G 956K 1.00 99.99 ZFS plain file
127 3.97M 508K 1016K 8K 8.00 0.00 L1 ZFS directory
864 5.21M 2.03M 5.92M 7.02K 2.57 0.00 L0 ZFS directory
991 9.18M 2.52M 6.91M 7.14K 3.64 0.00 ZFS directory
1 1K 1K 8K 8K 1.00 0.00 ZFS master node
- - - - - - - ZFS delete queue
- - - - - - - zvol object
- - - - - - - zvol prop
- - - - - - - other uint8[]
- - - - - - - other uint64[]
- - - - - - - other ZAP
- - - - - - - persistent error log
1 128K 4K 12K 12K 32.00 0.00 SPA history
- - - - - - - SPA history offsets
- - - - - - - Pool properties
- - - - - - - DSL permissions
- - - - - - - ZFS ACL
- - - - - - - ZFS SYSACL
- - - - - - - FUID table
- - - - - - - FUID table size
- - - - - - - DSL dataset next clones
- - - - - - - scan work queue
- - - - - - - ZFS user/group/project used
- - - - - - - ZFS user/group/project quota
- - - - - - - snapshot refcount tags
- - - - - - - DDT ZAP algorithm
- - - - - - - DDT statistics
- - - - - - - System attributes
- - - - - - - SA master node
1 1.50K 1.50K 8K 8K 1.00 0.00 SA attr registration
2 32K 8K 16K 8K 4.00 0.00 SA attr layouts
- - - - - - - scan translations
- - - - - - - deduplicated block
- - - - - - - DSL deadlist map
- - - - - - - DSL deadlist map hdr
- - - - - - - DSL dir clones
- - - - - - - bpobj subobj
- - - - - - - deferred free
- - - - - - - dedup ditto
11 170K 25K 108K 9.82K 6.82 0.00 other
1 128K 4K 8K 8K 32.00 0.00 L5 Total
1 128K 4K 8K 8K 32.00 0.00 L4 Total
1 128K 4K 8K 8K 32.00 0.00 L3 Total
64 2.09M 256K 512K 8K 8.38 0.00 L2 Total
14.4K 462M 60.1M 120M 8.32K 7.69 0.05 L1 Total
236K 232G 232G 232G 1008K 1.00 99.95 L0 Total
250K 232G 232G 232G 950K 1.00 100.00 Total
Block Size Histogram
block psize lsize asize
size Count Size Cum. Count Size Cum. Count Size Cum.
512: 139 69.5K 69.5K 139 69.5K 69.5K 0 0 0
1K: 251 301K 370K 251 301K 370K 0 0 0
2K: 164 432K 802K 164 432K 802K 0 0 0
4K: 15.3K 61.3M 62.1M 152 862K 1.62M 183 732K 732K
8K: 458 4.82M 66.9M 109 1.14M 2.77M 15.7K 126M 127M
16K: 64 1.41M 68.4M 955 15.3M 18.1M 449 9.38M 137M
32K: 91 4.17M 72.5M 14.6K 468M 486M 100 4.48M 141M
64K: 248 25.1M 97.6M 248 25.1M 511M 235 23.5M 165M
128K: 580 107M 205M 696 122M 633M 583 107M 271M
256K: 1.04K 382M 587M 1.04K 382M 1015M 1.04K 381M 653M
512K: 1.42K 1.05G 1.62G 1.42K 1.05G 2.04G 1.43K 1.06G 1.69G
1M: 230K 230G 232G 230K 230G 232G 230K 230G 232G
2M: 0 0 232G 0 0 232G 0 0 232G
4M: 0 0 232G 0 0 232G 0 0 232G
8M: 0 0 232G 0 0 232G 0 0 232G
16M: 0 0 232G 0 0 232G 0 0 232G
capacity operations bandwidth ---- errors ----
description used avail read write read write read write cksum
bench 229G 2.50T 251 0 11.7M 0 0 0 0
/dev/gptid/3c0f5cbc-b0ce-11ea-ab91-c8cbb8cc3ad4 229G 2.50T 251 0 11.7M 0 0 0 0
[-- Attachment #3 --]
Traversing all blocks ...
bp count: 1869965
ganged count: 0
bp logical: 242844272640 avg: 129865
bp physical: 242280355328 avg: 129564 compression: 1.00
bp allocated: 242420023296 avg: 129638 compression: 1.00
bp deduped: 0 ref>1: 0 deduplication: 1.00
Normal class: 235463172096 used: 7.88%
additional, non-pointer bps of type 0: 170
number of (compressed) bytes: number of bps
28: 10 **********
29: 2 **
30: 0
31: 0
32: 1 *
33: 0
34: 0
35: 0
36: 0
37: 0
38: 0
39: 0
40: 3 ***
41: 1 *
42: 0
43: 0
44: 0
45: 1 *
46: 0
47: 0
48: 1 *
49: 0
50: 1 *
51: 0
52: 0
53: 0
54: 18 ******************
55: 3 ***
56: 0
57: 2 **
58: 0
59: 0
60: 0
61: 1 *
62: 3 ***
63: 0
64: 3 ***
65: 2 **
66: 1 *
67: 3 ***
68: 0
69: 1 *
70: 0
71: 0
72: 1 *
73: 35 ***********************************
74: 8 ********
75: 4 ****
76: 2 **
77: 2 **
78: 3 ***
79: 1 *
80: 3 ***
81: 0
82: 2 **
83: 1 *
84: 0
85: 2 **
86: 2 **
87: 4 ****
88: 1 *
89: 0
90: 1 *
91: 2 **
92: 2 **
93: 1 *
94: 1 *
95: 3 ***
96: 1 *
97: 1 *
98: 8 ********
99: 1 *
100: 2 **
101: 1 *
102: 1 *
103: 2 **
104: 0
105: 2 **
106: 2 **
107: 5 *****
108: 3 ***
109: 0
110: 1 *
111: 1 *
112: 1 *
Dittoed blocks on same vdev: 21891
Blocks LSIZE PSIZE ASIZE avg comp %Total Type
- - - - - - - unallocated
2 32K 8K 24K 12K 4.00 0.00 object directory
1 512 512 12K 12K 1.00 0.00 object array
1 16K 4K 12K 12K 4.00 0.00 packed nvlist
- - - - - - - packed nvlist size
- - - - - - - bpobj
- - - - - - - bpobj header
- - - - - - - SPA space map header
102 12.8M 452K 1.32M 13.3K 28.88 0.00 SPA space map
- - - - - - - ZIL intent log
1 128K 4K 8K 8K 32.00 0.00 L5 DMU dnode
1 128K 4K 8K 8K 32.00 0.00 L4 DMU dnode
1 128K 4K 8K 8K 32.00 0.00 L3 DMU dnode
1 128K 4K 8K 8K 32.00 0.00 L2 DMU dnode
2 256K 36K 76K 38K 7.11 0.00 L1 DMU dnode
621 9.70M 2.45M 4.99M 8.23K 3.97 0.00 L0 DMU dnode
627 10.5M 2.50M 5.10M 8.33K 4.19 0.00 DMU dnode
2 8K 8K 20K 10K 1.00 0.00 DMU objset
- - - - - - - DSL directory
- - - - - - - DSL directory child map
- - - - - - - DSL dataset snap map
- - - - - - - DSL props
- - - - - - - DSL dataset
- - - - - - - ZFS znode
- - - - - - - ZFS V0 ACL
266 8.31M 1.06M 2.12M 8.18K 7.82 0.00 L2 ZFS plain file
19.6K 626M 123M 245M 12.5K 5.10 0.11 L1 ZFS plain file
1.76M 226G 226G 226G 128K 1.00 99.89 L0 ZFS plain file
1.78M 226G 226G 226G 127K 1.00 99.99 ZFS plain file
127 3.97M 508K 1016K 8K 8.00 0.00 L1 ZFS directory
864 5.21M 2.02M 5.91M 7.01K 2.58 0.00 L0 ZFS directory
991 9.18M 2.52M 6.91M 7.14K 3.64 0.00 ZFS directory
1 1K 1K 8K 8K 1.00 0.00 ZFS master node
- - - - - - - ZFS delete queue
- - - - - - - zvol object
- - - - - - - zvol prop
- - - - - - - other uint8[]
- - - - - - - other uint64[]
- - - - - - - other ZAP
- - - - - - - persistent error log
1 128K 4K 12K 12K 32.00 0.00 SPA history
- - - - - - - SPA history offsets
- - - - - - - Pool properties
- - - - - - - DSL permissions
- - - - - - - ZFS ACL
- - - - - - - ZFS SYSACL
- - - - - - - FUID table
- - - - - - - FUID table size
- - - - - - - DSL dataset next clones
- - - - - - - scan work queue
- - - - - - - ZFS user/group/project used
- - - - - - - ZFS user/group/project quota
- - - - - - - snapshot refcount tags
- - - - - - - DDT ZAP algorithm
- - - - - - - DDT statistics
- - - - - - - System attributes
- - - - - - - SA master node
1 1.50K 1.50K 8K 8K 1.00 0.00 SA attr registration
2 32K 8K 16K 8K 4.00 0.00 SA attr layouts
- - - - - - - scan translations
- - - - - - - deduplicated block
- - - - - - - DSL deadlist map
- - - - - - - DSL deadlist map hdr
- - - - - - - DSL dir clones
- - - - - - - bpobj subobj
- - - - - - - deferred free
- - - - - - - dedup ditto
11 170K 24.5K 96K 8.73K 6.96 0.00 other
1 128K 4K 8K 8K 32.00 0.00 L5 Total
1 128K 4K 8K 8K 32.00 0.00 L4 Total
1 128K 4K 8K 8K 32.00 0.00 L3 Total
267 8.44M 1.07M 2.13M 8.18K 7.91 0.00 L2 Total
19.7K 630M 123M 247M 12.5K 5.11 0.11 L1 Total
1.76M 226G 226G 226G 128K 1.00 99.89 L0 Total
1.78M 226G 226G 226G 127K 1.00 100.00 Total
Block Size Histogram
block psize lsize asize
size Count Size Cum. Count Size Cum. Count Size Cum.
512: 138 69K 69K 138 69K 69K 0 0 0
1K: 251 301K 370K 251 301K 370K 0 0 0
2K: 164 432K 802K 164 432K 802K 0 0 0
4K: 14.0K 56.1M 56.8M 152 862K 1.62M 183 732K 732K
8K: 7.20K 74.0M 131M 109 1.14M 2.77M 14.4K 116M 117M
16K: 63 1.40M 132M 948 15.2M 18.0M 7.19K 148M 264M
32K: 91 4.17M 136M 20.0K 642M 660M 100 4.46M 269M
64K: 248 25.1M 162M 248 25.1M 686M 235 23.5M 292M
128K: 1.76M 225G 226G 1.76M 225G 226G 1.76M 225G 226G
256K: 0 0 226G 0 0 226G 0 0 226G
512K: 0 0 226G 0 0 226G 0 0 226G
1M: 0 0 226G 0 0 226G 0 0 226G
2M: 0 0 226G 0 0 226G 0 0 226G
4M: 0 0 226G 0 0 226G 0 0 226G
8M: 0 0 226G 0 0 226G 0 0 226G
16M: 0 0 226G 0 0 226G 0 0 226G
capacity operations bandwidth ---- errors ----
description used avail read write read write read write cksum
bench 219G 2.50T 330 0 8.91M 0 0 0 0
/dev/gptid/3c0f5cbc-b0ce-11ea-ab91-c8cbb8cc3ad4 219G 2.50T 330 0 8.91M 0 0 0 0
home |
help
Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CADzRhsEsZMGE-SoeWLMG9NTtkwhhy6OGQQ046m9AxGFbp5h_kQ>
