Date: Wed, 8 May 2024 08:54:52 -0400 From: mike tancsa <mike@sentex.net> To: Warner Losh <imp@bsdimp.com> Cc: Matthew Grooms <mgrooms@shrew.net>, stable@freebsd.org Subject: Re: how to tell if TRIM is working Message-ID: <6e959ce7-b6f5-4776-99f6-df1c161b666d@sentex.net> In-Reply-To: <CANCZdfqa0=1kJQYpbZQ3z2xdxt8x6L7iYJSjQUc7SGUap8KP5Q@mail.gmail.com> References: <5e1b5097-c1c0-4740-a491-63c709d01c25@sentex.net> <67721332-fa1d-4b3c-aa57-64594ad5d77a@shrew.net> <77e203b3-c555-408b-9634-c452cb3a57ac@sentex.net> <CANCZdfqx_vhNb2BukbM0bxrf8NH_9sXPKW%2BUf=LdoXjw_2w=Dg@mail.gmail.com> <a6a53e96-a8ee-48c0-ae76-1e4150679f13@sentex.net> <CANCZdfqa0=1kJQYpbZQ3z2xdxt8x6L7iYJSjQUc7SGUap8KP5Q@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
This is a multi-part message in MIME format. --------------eLPL2RN0VjwmH2GZki0qyOfy Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit On 5/2/2024 10:34 AM, Warner Losh wrote: > > > On Thu, May 2, 2024 at 8:19 AM mike tancsa <mike@sentex.net> wrote: > > On 5/2/2024 10:16 AM, Warner Losh wrote: >> >> When trims are fast, you want to send them to the drive as soon >> as you >> know the blocks are freed. UFS always does this (if trim is >> enabled at all). >> ZFS has a lot of knobs to control when / how / if this is done. >> >> vfs.zfs.vdev.trim_min_active: 1 >> vfs.zfs.vdev.trim_max_active: 2 >> vfs.zfs.trim.queue_limit: 10 >> vfs.zfs.trim.txg_batch: 32 >> vfs.zfs.trim.metaslab_skip: 0 >> vfs.zfs.trim.extent_bytes_min: 32768 >> vfs.zfs.trim.extent_bytes_max: 134217728 >> vfs.zfs.l2arc.trim_ahead: 0 >> >> I've not tried to tune these in the past, but you can see how they affect things. >> > Thanks Warner, I will try and play around with these values to see > if they impact things. BTW, do you know what / why things would > be "skipped" during trim events ? > > kstat.zfs.zrootoffs.misc.iostats.trim_bytes_failed: 0 > kstat.zfs.zrootoffs.misc.iostats.trim_extents_failed: 0 > kstat.zfs.zrootoffs.misc.iostats.trim_bytes_skipped: 5968330752 > kstat.zfs.zrootoffs.misc.iostats.trim_extents_skipped: 503986 > kstat.zfs.zrootoffs.misc.iostats.trim_bytes_written: 181593186304 > kstat.zfs.zrootoffs.misc.iostats.trim_extents_written: 303115 > > > A quick look at the code suggests that it is when the extent to be > trimmed is smaller than the extent_bytes_min parameter. > > The minimum seems to be a trade off between too many trims to the > drive and making sure that the trims that you do send are maximally > effective. By specifying a smaller size, you'll be freeing up more > holes in the underlying NAND blocks. In some drives, this triggers > more data copying (and more write amp), so you want to set it a bit > higher for those. In other drivers, it improves the efficiency of the > GC algorithm, allowing each underlying block groomed to recover more > space for future writes. In the past, I've found that ZFS' defaults > are decent for 2018ish level of SATA SSDs, but a bit too trim avoidy > for newer nvme drives, even the cheap consumer ones. Though that's > just a coarse generalization from my buildworld workload. Other work > loads will have other data patterns, ymmv, so you need to measure it. > OK some updates. Since a new version of zfs was MFC'd into RELENG14 I thought I would try again. And to my pleasant surprise, it is working *really* well. My test of zfs send ${a} | zfs recv ${a}2 | zfs destroy ${a}2, followed by a zpool -w trim where ${a}= a ~ 300G dataset with millions of files of various sizes, is now very predictable over the course of a dozen loops. Previously, this would start to slow down by a factor of 3 or 4 after 3 iterations which corresponded roughly to the 1TB sized drives. zfs-2.2.4-FreeBSD_g256659204 zfs-kmod-2.2.4-FreeBSD_g256659204 FreeBSD r-14mfitest 14.1-STABLE FreeBSD 14.1-STABLE stable/14-45764d1d4 GENERIC amd64 same hardware as before, same zfs datasets. One thing that did seems to change and I am not sure if thats an issue or not is that I had booted TruNas' Linux variant to run the tests which also worked as expected. But looking at zpool history I dont see anything obvious that would change the behaviour when I re-ran the tests. zpool history shows 2024-04-25.10:16:35 zpool export quirk-test 2024-04-25.10:16:48 zpool import quirk-test 2024-04-26.11:44:25 zpool export quirk-test 2024-04-26.13:11:18 py-libzfs: zpool import 13273111966766428207 quirk-test 2024-04-26.13:11:22 py-libzfs: zfs inherit -r quirk-test/junk 2024-04-26.13:11:23 py-libzfs: zfs inherit -r quirk-test/bull1 2024-04-26.13:11:27 py-libzfs: zfs create -o mountpoint=legacy -o readonly=off -o snapdir=hidden -o xattr=sa quirk-test/.system 2024-04-26.13:11:27 py-libzfs: zfs create -o mountpoint=legacy -o readonly=off -o snapdir=hidden -o quota=1G -o xattr=sa quirk-test/.system/cores 2024-04-26.13:11:27 py-libzfs: zfs create -o mountpoint=legacy -o readonly=off -o snapdir=hidden -o xattr=sa quirk-test/.system/samba4 2024-04-26.13:11:27 py-libzfs: zfs create -o mountpoint=legacy -o readonly=off -o snapdir=hidden -o xattr=sa quirk-test/.system/configs-ae32c386e13840b2bf9c0083275e7941 2024-04-26.13:11:27 py-libzfs: zfs create -o mountpoint=legacy -o readonly=off -o snapdir=hidden -o xattr=sa quirk-test/.system/netdata-ae32c386e13840b2bf9c0083275e7941 2024-04-26.13:22:10 zfs snapshot quirk-test/bull1@snap1 Nothing else seems to have been done to the pool params. I tried the tests with the defaults of vfs.zfs.trim.extent_bytes_min as well with it divided by 2 but that didnt seems to make any difference. I have a stack of fresh WD 1 and 2TB blue SSDs that I might pop into the test box later this week and see if all is still good in case Linux did something to these disks although the output of camcontrol identify doesnt show any difference prior to the import/export so nothing seems to have changed with the drives.    ---Mike --------------eLPL2RN0VjwmH2GZki0qyOfy Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: 8bit <!DOCTYPE html> <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> </head> <body> <div class="moz-cite-prefix">On 5/2/2024 10:34 AM, Warner Losh wrote:<br> </div> <blockquote type="cite" cite="mid:CANCZdfqa0=1kJQYpbZQ3z2xdxt8x6L7iYJSjQUc7SGUap8KP5Q@mail.gmail.com"> <meta http-equiv="content-type" content="text/html; charset=UTF-8"> <div dir="ltr"> <div dir="ltr"><br> </div> <br> <div class="gmail_quote"> <div dir="ltr" class="gmail_attr">On Thu, May 2, 2024 at 8:19 AM mike tancsa <<a href="mailto:mike@sentex.net" moz-do-not-send="true" class="moz-txt-link-freetext">mike@sentex.net</a>> wrote:<br> </div> <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> <div> <div>On 5/2/2024 10:16 AM, Warner Losh wrote:<br> </div> <blockquote type="cite"> <div dir="ltr"> <div class="gmail_quote"><br> <div>When trims are fast, you want to send them to the drive as soon as you</div> <div>know the blocks are freed. UFS always does this (if trim is enabled at all).</div> <div>ZFS has a lot of knobs to control when / how / if this is done.</div> <div><br> </div> <div> <pre id="m_8001559206439009662gmail-line1">vfs.zfs.vdev.trim_min_active: 1 vfs.zfs.vdev.trim_max_active: 2 vfs.zfs.trim.queue_limit: 10 vfs.zfs.trim.txg_batch: 32 vfs.zfs.trim.metaslab_skip: 0 vfs.zfs.trim.extent_bytes_min: 32768 vfs.zfs.trim.extent_bytes_max: 134217728 vfs.zfs.l2arc.trim_ahead: 0 </pre> <pre id="m_8001559206439009662gmail-line1">I've not tried to tune these in the past, but you can see how they affect things. </pre> </div> <br> </div> </div> </blockquote> <p>Thanks Warner, I will try and play around with these values to see if they impact things. BTW, do you know what / why things would be "skipped" during trim events ?</p> <p>kstat.zfs.zrootoffs.misc.iostats.trim_bytes_failed: 0<br> kstat.zfs.zrootoffs.misc.iostats.trim_extents_failed: 0<br> kstat.zfs.zrootoffs.misc.iostats.trim_bytes_skipped: 5968330752<br> kstat.zfs.zrootoffs.misc.iostats.trim_extents_skipped: 503986<br> kstat.zfs.zrootoffs.misc.iostats.trim_bytes_written: 181593186304<br> kstat.zfs.zrootoffs.misc.iostats.trim_extents_written: 303115</p> </div> </blockquote> <div><br> </div> <div>A quick look at the code suggests that it is when the extent to be trimmed is smaller than the extent_bytes_min parameter.</div> <div><br> </div> <div>The minimum seems to be a trade off between too many trims to the drive and making sure that the trims that you do send are maximally effective. By specifying a smaller size, you'll be freeing up more holes in the underlying NAND blocks. In some drives, this triggers more data copying (and more write amp), so you want to set it a bit higher for those. In other drivers, it improves the efficiency of the GC algorithm, allowing each underlying block groomed to recover more space for future writes. In the past, I've found that ZFS' defaults are decent for 2018ish level of SATA SSDs, but a bit too trim avoidy for newer nvme drives, even the cheap consumer ones. Though that's just a coarse generalization from my buildworld workload. Other work loads will have other data patterns, ymmv, so you need to measure it.<br> </div> <br> </div> </div> </blockquote> <p>OK some updates. Since a new version of zfs was MFC'd into RELENG14 I thought I would try again. And to my pleasant surprise, it is working *really* well. My test of zfs send ${a} | zfs recv ${a}2 | zfs destroy ${a}2, followed by a zpool -w trim where ${a}= a ~ 300G dataset with millions of files of various sizes, is now very predictable over the course of a dozen loops. Previously, this would start to slow down by a factor of 3 or 4 after 3 iterations which corresponded roughly to the 1TB sized drives. <br> </p> <p><br> </p> <p>zfs-2.2.4-FreeBSD_g256659204<br> zfs-kmod-2.2.4-FreeBSD_g256659204</p> <p>FreeBSD r-14mfitest 14.1-STABLE FreeBSD 14.1-STABLE stable/14-45764d1d4 GENERIC amd64</p> <p>same hardware as before, same zfs datasets. One thing that did seems to change and I am not sure if thats an issue or not is that I had booted TruNas' Linux variant to run the tests which also worked as expected. But looking at zpool history I dont see anything obvious that would change the behaviour when I re-ran the tests. zpool history shows<br> </p> <p>2024-04-25.10:16:35 zpool export quirk-test<br> 2024-04-25.10:16:48 zpool import quirk-test<br> 2024-04-26.11:44:25 zpool export quirk-test<br> 2024-04-26.13:11:18 py-libzfs: zpool import 13273111966766428207 quirk-test<br> 2024-04-26.13:11:22 py-libzfs: zfs inherit -r quirk-test/junk<br> 2024-04-26.13:11:23 py-libzfs: zfs inherit -r quirk-test/bull1<br> 2024-04-26.13:11:27 py-libzfs: zfs create -o mountpoint=legacy -o readonly=off -o snapdir=hidden -o xattr=sa quirk-test/.system<br> 2024-04-26.13:11:27 py-libzfs: zfs create -o mountpoint=legacy -o readonly=off -o snapdir=hidden -o quota=1G -o xattr=sa quirk-test/.system/cores<br> 2024-04-26.13:11:27 py-libzfs: zfs create -o mountpoint=legacy -o readonly=off -o snapdir=hidden -o xattr=sa quirk-test/.system/samba4<br> 2024-04-26.13:11:27 py-libzfs: zfs create -o mountpoint=legacy -o readonly=off -o snapdir=hidden -o xattr=sa quirk-test/.system/configs-ae32c386e13840b2bf9c0083275e7941<br> 2024-04-26.13:11:27 py-libzfs: zfs create -o mountpoint=legacy -o readonly=off -o snapdir=hidden -o xattr=sa quirk-test/.system/netdata-ae32c386e13840b2bf9c0083275e7941<br> 2024-04-26.13:22:10 zfs snapshot quirk-test/bull1@snap1</p> <p>Nothing else seems to have been done to the pool params. I tried the tests with the defaults of vfs.zfs.trim.extent_bytes_min as well with it divided by 2 but that didnt seems to make any difference. I have a stack of fresh WD 1 and 2TB blue SSDs that I might pop into the test box later this week and see if all is still good in case Linux did something to these disks although the output of camcontrol identify doesnt show any difference prior to the import/export so nothing seems to have changed with the drives.<br> </p> <p>   ---Mike<br> </p> <p><br> </p> <p><br> </p> <p><br> </p> </body> </html> --------------eLPL2RN0VjwmH2GZki0qyOfy--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?6e959ce7-b6f5-4776-99f6-df1c161b666d>