Date: Fri, 15 Dec 2023 08:41:10 -0700 From: Warner Losh <imp@bsdimp.com> To: freebsd-fs@freebsd.org Subject: Re: measuring swap partition speed Message-ID: <CANCZdfpgvE8UjX1XUgimbotpnrH-fXYWZU3kfAwD849bMrT2Vg@mail.gmail.com> In-Reply-To: <ZXxis23iKT3iHDdt@int21h> References: <ZXxis23iKT3iHDdt@int21h>
next in thread | previous in thread | raw e-mail | index | archive | help
--00000000000062c301060c8e3b23 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Fri, Dec 15, 2023 at 7:29=E2=80=AFAM void <void@f-m.fm> wrote: > Hello list, I have on a rpi4 a usb3-connected disk partitioned like this: > > # gpart show > > =3D> 40 1953525088 da0 GPT (932G) > 40 532480 1 efi (260M) > 532520 2008 - free - (1.0M) > 534528 4194304 2 freebsd-swap (2.0G) > 4728832 4194304 4 freebsd-swap (2.0G) > 8923136 4194304 5 freebsd-swap (2.0G) > 13117440 4194304 6 freebsd-swap (2.0G) > 17311744 4194304 7 freebsd-swap (2.0G) > 21506048 4194304 8 freebsd-swap (2.0G) > 25700352 1927823360 3 freebsd-zfs (920G) > 1953523712 1416 - free - (708K) > > If processes swap out, it runs like a slug [1]. I'd like to test if it's > the disk on its way out. How would I test swap partitions? [2] > > [1] it didn't always run like a slug. > What's the underlying hardware? > [2] would nfs-mounted swap be faster? (1G network) > Maybe. > [3] swap is not encrypted > Good. You aren't CPU bound. So the good news, kinda, is that if this is spinning rust, your swap partitions are on the fastest part of the disk. I'd expect 1 that's 12G would work better than 6 that are 2G since you'd have less head thrash. Parllelism with multiple swap partitions works best when they are on separate spindles. The bad news is that your disk may be fine. I'd expect that as the ZFS partition fills up, the seek size will increase as greater distances have to be traversed to get back to the swap space. There's a sweet spot of a few tens of GB that drives usually can seek far faster than longer throws... But if it is a SSD, some comments. It makes no sense to have 6 swap partitions. 1 will do the job (though this is in a rpi, so maybe you are hitting some of our silly limits in the swap code on 32-bit architectures). LBAs are LBAs, which ones you use don't matter at all (and I don't want to hear about wear leveling: that doesn't matter at this level since the FTL does it behind the scenes in SSDs and NVMe drives). Your drive may be wearing out if it has slowed down with time (though a certain amount like 10-20% may be expected in the first little bit of life, the rate of performance decline often slows for a large part of life before again steeply declining). QLC SSDs do require a lot more drive care and feeding by the firmware, including a lot more writes to deal with 'read disturb' in a read-heavy workload. And a rewrite from the initial landing EB (that's typically SLC to be fast) to the longer term storage (QLC for the capacity). Many work loads trigger a lot more housekeeping than older TLC or MLC drives. And the cheapest NAND in the marketplace tends to be QLC, so the cheapest SSDs (and sometimes NVMe drives) tends to be QLC. For light use, it doesn't matter, but if you are starting to notice slow downs, you are beyond light use these drives do almost OK at (I'm not a fan of QLC drives, if you can't tell). If this is a thumb drive, you lose. Those are the cheapest of the cheap and crappiest of the crap in terms of performance (there are a few notable exceptions, but I'm playing the odds here). You are doomed to crappy performance. If it's really a micro-sd card behind a USB adapter, see my comments on thumb drives :). Now, having said all that, your best bet is to run a FIO test. fio is my go-to choice for doing benchmarking of storage. Do a random workload with a 8k write size (since that's the page size of aarch64) on one of the swap partitions when it's not in active use. I suspect you have a SSD, and that it will kinda suck, but be in line with the swap performance you are seeing. I use the following template for my testing (128k should be reduced to 8k for this test, though I've not looked at how much we cluster writes in our swap code, so maybe that's too pessimistic). You might also try reducing the number of I/O jobs, since I'm measuring, or trying to, what the best possible sustained throughput numbers are (latency in this test tends to run kinda high). ; SSD testing: 128k I/O 64 jobs 32 deep queue [global] direct=3D1 rw=3Drandread refill_buffers norandommap randrepeat=3D0 bs=3D128k ioengine=3Dposixaio iodepth=3D32 numjobs=3D64 runtime=3D60 group_reporting thread [ssd128k] Good luck. Warner --00000000000062c301060c8e3b23 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable <div dir=3D"ltr"><div dir=3D"ltr"><br></div><br><div class=3D"gmail_quote">= <div dir=3D"ltr" class=3D"gmail_attr">On Fri, Dec 15, 2023 at 7:29=E2=80=AF= AM void <<a href=3D"mailto:void@f-m.fm">void@f-m.fm</a>> wrote:<br></= div><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;bor= der-left:1px solid rgb(204,204,204);padding-left:1ex">Hello list, I have on= a rpi4 a usb3-connected disk partitioned like this:<br> <br> # gpart show<br> <br> =3D>=C2=A0 =C2=A0 =C2=A0 =C2=A0 40=C2=A0 1953525088=C2=A0 da0=C2=A0 GPT= =C2=A0 (932G)<br> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A040=C2=A0 =C2=A0 =C2=A0 532480=C2= =A0 =C2=A0 1=C2=A0 efi=C2=A0 (260M)<br> =C2=A0 =C2=A0 =C2=A0 =C2=A0532520=C2=A0 =C2=A0 =C2=A0 =C2=A0 2008=C2=A0 =C2= =A0 =C2=A0 =C2=A0- free -=C2=A0 (1.0M)<br> =C2=A0 =C2=A0 =C2=A0 =C2=A0534528=C2=A0 =C2=A0 =C2=A04194304=C2=A0 =C2=A0 2= =C2=A0 freebsd-swap=C2=A0 (2.0G)<br> =C2=A0 =C2=A0 =C2=A0 4728832=C2=A0 =C2=A0 =C2=A04194304=C2=A0 =C2=A0 4=C2= =A0 freebsd-swap=C2=A0 (2.0G)<br> =C2=A0 =C2=A0 =C2=A0 8923136=C2=A0 =C2=A0 =C2=A04194304=C2=A0 =C2=A0 5=C2= =A0 freebsd-swap=C2=A0 (2.0G)<br> =C2=A0 =C2=A0 =C2=A013117440=C2=A0 =C2=A0 =C2=A04194304=C2=A0 =C2=A0 6=C2= =A0 freebsd-swap=C2=A0 (2.0G)<br> =C2=A0 =C2=A0 =C2=A017311744=C2=A0 =C2=A0 =C2=A04194304=C2=A0 =C2=A0 7=C2= =A0 freebsd-swap=C2=A0 (2.0G)<br> =C2=A0 =C2=A0 =C2=A021506048=C2=A0 =C2=A0 =C2=A04194304=C2=A0 =C2=A0 8=C2= =A0 freebsd-swap=C2=A0 (2.0G)<br> =C2=A0 =C2=A0 =C2=A025700352=C2=A0 1927823360=C2=A0 =C2=A0 3=C2=A0 freebsd-= zfs=C2=A0 (920G)<br> =C2=A0 =C2=A01953523712=C2=A0 =C2=A0 =C2=A0 =C2=A0 1416=C2=A0 =C2=A0 =C2=A0= =C2=A0- free -=C2=A0 (708K)<br> <br> If processes swap out, it runs like a slug [1]. I'd like to test if it&= #39;s<br> the disk on its way out. How would I test swap partitions? [2]<br> <br> [1] it didn't always run like a slug.<br></blockquote><div><br></div><d= iv>What's the underlying hardware?</div><div>=C2=A0</div><blockquote cl= ass=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid= rgb(204,204,204);padding-left:1ex"> [2] would nfs-mounted swap be faster? (1G network)<br></blockquote><div><br= ></div><div>Maybe.</div><div>=C2=A0</div><blockquote class=3D"gmail_quote" = style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);pa= dding-left:1ex"> [3] swap is not encrypted<br></blockquote><div>=C2=A0</div><div>Good. You a= ren't CPU bound.</div><div><br></div><div>So the good news, kinda, is t= hat if this is spinning rust, your swap partitions are</div><div>on the fas= test part of the disk. I'd expect 1 that's 12G would work better th= an 6 that</div><div>are 2G since you'd have less head thrash. Parllelis= m=C2=A0with multiple swap partitions</div><div>works best when they are on = separate spindles.</div><div><br></div><div>The bad news is that your disk = may be fine. I'd expect that as the ZFS partition fills up,</div><div>t= he seek size will increase as greater distances have to be traversed to get= back</div><div>to the swap space. There's a sweet spot of a few tens o= f GB that drives usually can seek</div><div>far faster than longer throws..= .</div><div><br></div><div>But if it is a SSD, some comments. It makes no s= ense to have 6 swap partitions. 1 will do</div><div>the job (though this is= in a rpi, so maybe you are hitting some of our silly limits in the swap</d= iv><div>code on 32-bit architectures). LBAs are LBAs, which ones you use do= n't matter at all (and</div><div>I don't want to hear about wear le= veling: that doesn't matter at this level since the FTL does</div><div>= it behind the scenes in SSDs and NVMe drives). Your drive may be wearing ou= t if it has slowed</div><div>down with time (though a certain amount like 1= 0-20% may be expected in the first little bit of life,</div><div>the rate o= f performance decline often slows for a large part of life before again ste= eply declining).</div><div>QLC SSDs do require a lot more drive care and fe= eding by the firmware, including a lot more</div><div>writes to deal with &= #39;read disturb' in a read-heavy workload. And a rewrite from the init= ial landing</div><div>EB (that's typically SLC to be fast) to the longe= r term storage (QLC for the capacity). Many</div><div>work loads trigger a = lot more housekeeping than older TLC or MLC drives. And the cheapest</div><= div>NAND in the marketplace tends to be QLC, so the cheapest SSDs (and some= times NVMe</div><div>drives) tends to be QLC. For light use, it doesn't= matter, but if you are starting to notice slow</div><div>downs, you are be= yond light use these drives do almost OK at (I'm not a fan of QLC drive= s,</div><div>if you can't tell).</div><div><br></div><div>If this is a = thumb drive, you lose. Those are the cheapest of the cheap and crappiest of= the crap</div><div>in terms of performance (there are a few notable except= ions, but I'm playing the odds here).</div><div>You are doomed to crapp= y performance.</div><div><br></div><div>If it's really a micro-sd card = behind a USB adapter,=C2=A0 see my comments on thumb drives :).</div><div><= br></div><div>Now, having said all that, your best bet is to run a FIO test= . fio is my go-to choice for doing</div><div>benchmarking of storage. Do a = random workload with a 8k write size (since that's the page</div><div>s= ize of aarch64) on one of the swap partitions when it's not in active u= se. I suspect you have</div><div>a SSD, and that it will kinda suck, but be= in line with the swap performance you are seeing.</div><div><br></div><div= >I use the following template for my testing (128k should be reduced to 8k = for this test, though</div><div>I've not looked at how much we cluster = writes in our swap code, so maybe that's too pessimistic).</div><div>Yo= u might also try reducing the number of I/O jobs, since I'm measuring, = or trying to, what the</div><div>best possible sustained throughput numbers= are (latency in this test tends to run kinda high).</div><div><br></div><d= iv>; SSD testing: 128k I/O 64 jobs 32 deep queue<br><br>[global]<br>direct= =3D1<br>rw=3Drandread<br>refill_buffers<br>norandommap<br>randrepeat=3D0<br= >bs=3D128k<br>ioengine=3Dposixaio<br>iodepth=3D32<br>numjobs=3D64<br>runtim= e=3D60<br>group_reporting<br>thread<br><br>[ssd128k]<br></div><div><br></di= v><div>Good luck.</div><div><br></div><div>Warner<br></div></div></div> --00000000000062c301060c8e3b23--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CANCZdfpgvE8UjX1XUgimbotpnrH-fXYWZU3kfAwD849bMrT2Vg>