Date: Fri, 18 May 2018 16:20:23 -0500 From: Eric Borisch <eborisch@gmail.com> To: Paul Esson <paul.esson@redstor.com> Cc: "freebsd-fs@freebsd.org" <freebsd-fs@freebsd.org> Subject: Re: Unexpected zvol usage Message-ID: <CAMsT2=kybGc0os70yGmEPFeBZU1DMKB0wtDPzBC9rJJwPACuoQ@mail.gmail.com> In-Reply-To: <HE1PR0102MB25880FD0731B56770F19D7F29E900@HE1PR0102MB2588.eurprd01.prod.exchangelabs.com> References: <HE1PR0102MB25880FD0731B56770F19D7F29E900@HE1PR0102MB2588.eurprd01.prod.exchangelabs.com>
next in thread | previous in thread | raw e-mail | index | archive | help
You're hitting the raidz-N layout rules: individual allocations must be a multiple of (N+1), or 3 for raidz-2, of the underlying block (ashift=12 -> 4k blocks). This is because each individual allocation carries its own parity, and also to avoid leaving holes in the drive when the allocation is removed: https://www.delphix.com/blog/delphix-engineering/ zfs-raidz-stripe-width-or-how-i-learned-stop-worrying-and-love-raidz So for an 8K block, raidz-2, with D being data, P being parity, and X being padding, you're at: DDP,PXX (',' marking allowable allocation multiples of 3), which means 2/3 (66.7%) of your storage is used for metadata + padding; compared with what you likely expected for parity-2 with 12 drives = 1/6 (16.7%), or an "extra" 50% overhead. The "extra" overhead for various volblocksizes in this layout (2/12) are: 4k: 50.0% 8k: 50.0% 16k:16.7% 32k:16.7% 64k: 7.1% 128k: 7.1% That's why Kevin is suggesting 128k. For this particular layout, you would have very similar space efficiency [2] with 64k, and improved latency. Or 16k to significantly reduce the "extra" overhead with the lowest latency impact. I calculate these "extra" overheads here [1], if you're interested. The compression potential with larger block sizes also helps to counteract this overhead on compressible workloads, so it is better on two fronts, if you enable something like lz4 with very low overhead. You really need to test with your workload to find your "optimal" choice, however. I ran into this myself when spinning up some VMs, and put together the linked sheet (Based on one from Matt Ahrens) to help myself and others when selecting array and zvol layouts / settings. - Eric [1] https://docs.google.com/spreadsheets/d/1kQJJpUtbWB_ Poyc7jcO3mNFrFqeHSWuQ8U8Y5UC3dHY/edit?usp=sharing [2] Similar, but I'm guessing not exact; there must be more overhead in the tracking of twice as many blocks, but it's fairly hidden from userland. On Fri, May 18, 2018 at 10:42 AM, Paul Esson <paul.esson@redstor.com> wrote: > Hi Folks, > > I have an 11.1-RELEASE system being used as a host for a bhyve guest. > There is a large zpool on the host created from 12 x 10TB HDDs using raidz2 > redundancy with ashift12. I have created a sparse zvol within the pool > using default settings and presented that to the bhyve vm as an ahci-hd > disk type. The guest has a zpool and filesystem dataset built on this > disk. When I start to write to the filesystem on the guest I am finding > that the used/referenced on the host's zvol are more than double those on > the guest. The logicalused/referenced values on the host zvol are more in > line with the equivalent guest values, but my problem is that the host zvol > is likely to fill before I have written all intended data to the guest. > > > I have included below information from both the host and guest before and > after writing. This output shows that the zvol uses a default 8K blocksize > and that the guest zfs is therefore ashift13. I also tried creating the > zvol with a 4K blocksize and the guest zfs ashift12 so that 4K blocks were > consistent across hosts and guest, but still saw the amplification on > writes to the zvol. > > Any insight greatly appreciated. > > > > HOST > > Zpool > RAIDZ2 12 x HDDs, ashift 12 > > NAME PROPERTY VALUE SOURCE > dc1-hn-01 type filesystem - > dc1-hn-01 creation Mon Apr 23 14:35 2018 - > dc1-hn-01 used 32.0G - > dc1-hn-01 available 78.2T - > dc1-hn-01 referenced 201K - > dc1-hn-01 compressratio 1.00x - > dc1-hn-01 mounted yes - > dc1-hn-01 quota none default > dc1-hn-01 reservation none default > dc1-hn-01 recordsize 128K default > dc1-hn-01 mountpoint /export/data/dc1-hn-01 local > dc1-hn-01 sharenfs off default > dc1-hn-01 checksum on default > dc1-hn-01 compression off default > dc1-hn-01 atime on default > dc1-hn-01 devices on default > dc1-hn-01 exec on default > dc1-hn-01 setuid on default > dc1-hn-01 readonly off default > dc1-hn-01 jailed off default > dc1-hn-01 snapdir hidden default > dc1-hn-01 aclmode discard default > dc1-hn-01 aclinherit restricted default > dc1-hn-01 canmount on default > dc1-hn-01 xattr off temporary > dc1-hn-01 copies 1 default > dc1-hn-01 version 5 - > dc1-hn-01 utf8only off - > dc1-hn-01 normalization none - > dc1-hn-01 casesensitivity sensitive - > dc1-hn-01 vscan off default > dc1-hn-01 nbmand off default > dc1-hn-01 sharesmb off default > dc1-hn-01 refquota none default > dc1-hn-01 refreservation none default > dc1-hn-01 primarycache all default > dc1-hn-01 secondarycache all default > dc1-hn-01 usedbysnapshots 0 - > dc1-hn-01 usedbydataset 201K - > dc1-hn-01 usedbychildren 32.0G - > dc1-hn-01 usedbyrefreservation 0 - > dc1-hn-01 logbias latency default > dc1-hn-01 dedup off default > dc1-hn-01 mlslabel - > dc1-hn-01 sync standard default > dc1-hn-01 refcompressratio 1.00x - > dc1-hn-01 written 201K - > dc1-hn-01 logicalused 2.89G - > dc1-hn-01 logicalreferenced 36.5K - > dc1-hn-01 volmode default default > dc1-hn-01 filesystem_limit none default > dc1-hn-01 snapshot_limit none default > dc1-hn-01 filesystem_count none default > dc1-hn-01 snapshot_count none default > dc1-hn-01 redundant_metadata all default > > NAME AVAIL USED USEDSNAP USEDDS > USEDREFRESERV USEDCHILD > dc1-hn-01 78.2T 32.0G 0 201K > 0 32.0G > dc1-hn-01/vm 78.2T 31.9G 0 990M > 0 30.9G > dc1-hn-01/vm/dc1-olbp-sn-11 78.2T 30.9G 0 238K > 0 30.9G > dc1-hn-01/vm/dc1-olbp-sn-11/disk0 78.2T 30.9G 0 4.35G > 26.6G 0 > dc1-hn-01/vm/dc1-olbp-sn-11/disk1 78.2T 4.50M 0 4.50M > 0 0 > > Sparse ZVOL - baseline > > NAME PROPERTY VALUE > SOURCE > dc1-hn-01/vm/dc1-olbp-sn-11/disk1 type volume > - > dc1-hn-01/vm/dc1-olbp-sn-11/disk1 creation Fri May 18 15:36 > 2018 - > dc1-hn-01/vm/dc1-olbp-sn-11/disk1 used 4.50M > - > dc1-hn-01/vm/dc1-olbp-sn-11/disk1 available 78.2T > - > dc1-hn-01/vm/dc1-olbp-sn-11/disk1 referenced 4.50M > - > dc1-hn-01/vm/dc1-olbp-sn-11/disk1 compressratio 1.00x > - > dc1-hn-01/vm/dc1-olbp-sn-11/disk1 reservation none > default > dc1-hn-01/vm/dc1-olbp-sn-11/disk1 volsize 28T > local > dc1-hn-01/vm/dc1-olbp-sn-11/disk1 volblocksize 8K > - > dc1-hn-01/vm/dc1-olbp-sn-11/disk1 checksum on > default > dc1-hn-01/vm/dc1-olbp-sn-11/disk1 compression off > default > dc1-hn-01/vm/dc1-olbp-sn-11/disk1 readonly off > default > dc1-hn-01/vm/dc1-olbp-sn-11/disk1 copies 1 > default > dc1-hn-01/vm/dc1-olbp-sn-11/disk1 refreservation none > default > dc1-hn-01/vm/dc1-olbp-sn-11/disk1 primarycache all > default > dc1-hn-01/vm/dc1-olbp-sn-11/disk1 secondarycache all > default > dc1-hn-01/vm/dc1-olbp-sn-11/disk1 usedbysnapshots 0 > - > dc1-hn-01/vm/dc1-olbp-sn-11/disk1 usedbydataset 4.50M > - > dc1-hn-01/vm/dc1-olbp-sn-11/disk1 usedbychildren 0 > - > dc1-hn-01/vm/dc1-olbp-sn-11/disk1 usedbyrefreservation 0 > - > dc1-hn-01/vm/dc1-olbp-sn-11/disk1 logbias latency > default > dc1-hn-01/vm/dc1-olbp-sn-11/disk1 dedup off > default > dc1-hn-01/vm/dc1-olbp-sn-11/disk1 mlslabel > - > dc1-hn-01/vm/dc1-olbp-sn-11/disk1 sync standard > default > dc1-hn-01/vm/dc1-olbp-sn-11/disk1 refcompressratio 1.00x > - > dc1-hn-01/vm/dc1-olbp-sn-11/disk1 written 4.50M > - > dc1-hn-01/vm/dc1-olbp-sn-11/disk1 logicalused 1.89M > - > dc1-hn-01/vm/dc1-olbp-sn-11/disk1 logicalreferenced 1.89M > - > dc1-hn-01/vm/dc1-olbp-sn-11/disk1 volmode dev > local > dc1-hn-01/vm/dc1-olbp-sn-11/disk1 snapshot_limit none > default > dc1-hn-01/vm/dc1-olbp-sn-11/disk1 snapshot_count none > default > dc1-hn-01/vm/dc1-olbp-sn-11/disk1 redundant_metadata all > default > > > GUEST - baseline > > 1 x vdisk from host ZVOL ashift 13 > > NAME AVAIL USED USEDSNAP USEDDS USEDREFRESERV > USEDCHILD > dc1-sn-11 26.9T 632K 0 176K 0 > 456K > > NAME PROPERTY VALUE SOURCE > dc1-sn-11 type filesystem - > dc1-sn-11 creation Fri May 18 15:40 2018 - > dc1-sn-11 used 632K - > dc1-sn-11 available 26.9T - > dc1-sn-11 referenced 176K - > dc1-sn-11 compressratio 1.00x - > dc1-sn-11 mounted yes - > dc1-sn-11 quota none default > dc1-sn-11 reservation none default > dc1-sn-11 recordsize 128K default > dc1-sn-11 mountpoint /export/data/dc1-sn-11 local > dc1-sn-11 sharenfs off default > dc1-sn-11 checksum on default > dc1-sn-11 compression off default > dc1-sn-11 atime on default > dc1-sn-11 devices on default > dc1-sn-11 exec on default > dc1-sn-11 setuid on default > dc1-sn-11 readonly off default > dc1-sn-11 jailed off default > dc1-sn-11 snapdir hidden default > dc1-sn-11 aclmode discard default > dc1-sn-11 aclinherit restricted default > dc1-sn-11 canmount on default > dc1-sn-11 xattr off temporary > dc1-sn-11 copies 1 default > dc1-sn-11 version 5 - > dc1-sn-11 utf8only off - > dc1-sn-11 normalization none - > dc1-sn-11 casesensitivity sensitive - > dc1-sn-11 vscan off default > dc1-sn-11 nbmand off default > dc1-sn-11 sharesmb off default > dc1-sn-11 refquota none default > dc1-sn-11 refreservation none default > dc1-sn-11 primarycache all default > dc1-sn-11 secondarycache all default > dc1-sn-11 usedbysnapshots 0 - > dc1-sn-11 usedbydataset 176K - > dc1-sn-11 usedbychildren 456K - > dc1-sn-11 usedbyrefreservation 0 - > dc1-sn-11 logbias latency default > dc1-sn-11 dedup off default > dc1-sn-11 mlslabel - > dc1-sn-11 sync standard default > dc1-sn-11 refcompressratio 1.00x - > dc1-sn-11 written 176K - > dc1-sn-11 logicalused 49K - > dc1-sn-11 logicalreferenced 11.5K - > dc1-sn-11 volmode default default > dc1-sn-11 filesystem_limit none default > dc1-sn-11 snapshot_limit none default > dc1-sn-11 filesystem_count none default > dc1-sn-11 snapshot_count none default > dc1-sn-11 redundant_metadata all default > > After writing some data to the guest > > HOST ZVOL > > NAME PROPERTY VALUE > SOURCE > dc1-hn-01/vm/dc1-olbp-sn-11/disk1 type volume > - > dc1-hn-01/vm/dc1-olbp-sn-11/disk1 creation Fri May 18 15:36 > 2018 - > dc1-hn-01/vm/dc1-olbp-sn-11/disk1 used 99.7G > - > dc1-hn-01/vm/dc1-olbp-sn-11/disk1 available 78.1T > - > dc1-hn-01/vm/dc1-olbp-sn-11/disk1 referenced 99.7G > - > dc1-hn-01/vm/dc1-olbp-sn-11/disk1 compressratio 1.00x > - > dc1-hn-01/vm/dc1-olbp-sn-11/disk1 reservation none > default > dc1-hn-01/vm/dc1-olbp-sn-11/disk1 volsize 28T > local > dc1-hn-01/vm/dc1-olbp-sn-11/disk1 volblocksize 8K > - > dc1-hn-01/vm/dc1-olbp-sn-11/disk1 checksum on > default > dc1-hn-01/vm/dc1-olbp-sn-11/disk1 compression off > default > dc1-hn-01/vm/dc1-olbp-sn-11/disk1 readonly off > default > dc1-hn-01/vm/dc1-olbp-sn-11/disk1 copies 1 > default > dc1-hn-01/vm/dc1-olbp-sn-11/disk1 refreservation none > default > dc1-hn-01/vm/dc1-olbp-sn-11/disk1 primarycache all > default > dc1-hn-01/vm/dc1-olbp-sn-11/disk1 secondarycache all > default > dc1-hn-01/vm/dc1-olbp-sn-11/disk1 usedbysnapshots 0 > - > dc1-hn-01/vm/dc1-olbp-sn-11/disk1 usedbydataset 99.7G > - > dc1-hn-01/vm/dc1-olbp-sn-11/disk1 usedbychildren 0 > - > dc1-hn-01/vm/dc1-olbp-sn-11/disk1 usedbyrefreservation 0 > - > dc1-hn-01/vm/dc1-olbp-sn-11/disk1 logbias latency > default > dc1-hn-01/vm/dc1-olbp-sn-11/disk1 dedup off > default > dc1-hn-01/vm/dc1-olbp-sn-11/disk1 mlslabel > - > dc1-hn-01/vm/dc1-olbp-sn-11/disk1 sync standard > default > dc1-hn-01/vm/dc1-olbp-sn-11/disk1 refcompressratio 1.00x > - > dc1-hn-01/vm/dc1-olbp-sn-11/disk1 written 99.7G > - > dc1-hn-01/vm/dc1-olbp-sn-11/disk1 logicalused 43.6G > - > dc1-hn-01/vm/dc1-olbp-sn-11/disk1 logicalreferenced 43.6G > - > dc1-hn-01/vm/dc1-olbp-sn-11/disk1 volmode dev > local > dc1-hn-01/vm/dc1-olbp-sn-11/disk1 snapshot_limit none > default > dc1-hn-01/vm/dc1-olbp-sn-11/disk1 snapshot_count none > default > dc1-hn-01/vm/dc1-olbp-sn-11/disk1 redundant_metadata all > default > > GUEST ZFS > > NAME PROPERTY VALUE SOURCE > dc1-sn-11 type filesystem - > dc1-sn-11 creation Fri May 18 15:40 2018 - > dc1-sn-11 used 44.3G - > dc1-sn-11 available 26.8T - > dc1-sn-11 referenced 176K - > dc1-sn-11 compressratio 1.00x - > dc1-sn-11 mounted no - > dc1-sn-11 quota none default > dc1-sn-11 reservation none default > dc1-sn-11 recordsize 128K default > dc1-sn-11 mountpoint /export/data/dc1-sn-11 local > dc1-sn-11 sharenfs off default > dc1-sn-11 checksum on default > dc1-sn-11 compression off default > dc1-sn-11 atime on default > dc1-sn-11 devices on default > dc1-sn-11 exec on default > dc1-sn-11 setuid on default > dc1-sn-11 readonly off default > dc1-sn-11 jailed off default > dc1-sn-11 snapdir hidden default > dc1-sn-11 aclmode discard default > dc1-sn-11 aclinherit restricted default > dc1-sn-11 canmount on default > dc1-sn-11 xattr on default > dc1-sn-11 copies 1 default > dc1-sn-11 version 5 - > dc1-sn-11 utf8only off - > dc1-sn-11 normalization none - > dc1-sn-11 casesensitivity sensitive - > dc1-sn-11 vscan off default > dc1-sn-11 nbmand off default > dc1-sn-11 sharesmb off default > dc1-sn-11 refquota none default > dc1-sn-11 refreservation none default > dc1-sn-11 primarycache all default > dc1-sn-11 secondarycache all default > dc1-sn-11 usedbysnapshots 0 - > dc1-sn-11 usedbydataset 176K - > dc1-sn-11 usedbychildren 44.3G - > dc1-sn-11 usedbyrefreservation 0 - > dc1-sn-11 logbias latency default > dc1-sn-11 dedup off default > dc1-sn-11 mlslabel - > dc1-sn-11 sync standard default > dc1-sn-11 refcompressratio 1.00x - > dc1-sn-11 written 176K - > dc1-sn-11 logicalused 44.2G - > dc1-sn-11 logicalreferenced 11.5K - > dc1-sn-11 volmode default default > dc1-sn-11 filesystem_limit none default > dc1-sn-11 snapshot_limit none default > dc1-sn-11 filesystem_count none default > dc1-sn-11 snapshot_count none default > dc1-sn-11 redundant_metadata all default > > > Regards, > > > Paul Esson > t +44 (0)118 951 5235 | m +44 (0)776 690 6514 > e paul.esson@redstor.com<mailto:paul.esson@redstor.com> > > > _______________________________________________ > freebsd-fs@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAMsT2=kybGc0os70yGmEPFeBZU1DMKB0wtDPzBC9rJJwPACuoQ>