FreeBSD Mail Archives

Date:      Fri, 15 Dec 2023 08:41:10 -0700
From:      Warner Losh <imp@bsdimp.com>
To:        freebsd-fs@freebsd.org
Subject:   Re: measuring swap partition speed
Message-ID:  <CANCZdfpgvE8UjX1XUgimbotpnrH-fXYWZU3kfAwD849bMrT2Vg@mail.gmail.com>
In-Reply-To: <ZXxis23iKT3iHDdt@int21h>
References:  <ZXxis23iKT3iHDdt@int21h>

--00000000000062c301060c8e3b23
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

On Fri, Dec 15, 2023 at 7:29=E2=80=AFAM void <void@f-m.fm> wrote:

> Hello list, I have on a rpi4 a usb3-connected disk partitioned like this:
>
> # gpart show
>
> =3D>        40  1953525088  da0  GPT  (932G)
>            40      532480    1  efi  (260M)
>        532520        2008       - free -  (1.0M)
>        534528     4194304    2  freebsd-swap  (2.0G)
>       4728832     4194304    4  freebsd-swap  (2.0G)
>       8923136     4194304    5  freebsd-swap  (2.0G)
>      13117440     4194304    6  freebsd-swap  (2.0G)
>      17311744     4194304    7  freebsd-swap  (2.0G)
>      21506048     4194304    8  freebsd-swap  (2.0G)
>      25700352  1927823360    3  freebsd-zfs  (920G)
>    1953523712        1416       - free -  (708K)
>
> If processes swap out, it runs like a slug [1]. I'd like to test if it's
> the disk on its way out. How would I test swap partitions? [2]
>
> [1] it didn't always run like a slug.
>

What's the underlying hardware?


> [2] would nfs-mounted swap be faster? (1G network)
>

Maybe.


> [3] swap is not encrypted
>

Good. You aren't CPU bound.

So the good news, kinda, is that if this is spinning rust, your swap
partitions are
on the fastest part of the disk. I'd expect 1 that's 12G would work better
than 6 that
are 2G since you'd have less head thrash. Parllelism with multiple swap
partitions
works best when they are on separate spindles.

The bad news is that your disk may be fine. I'd expect that as the ZFS
partition fills up,
the seek size will increase as greater distances have to be traversed to
get back
to the swap space. There's a sweet spot of a few tens of GB that drives
usually can seek
far faster than longer throws...

But if it is a SSD, some comments. It makes no sense to have 6 swap
partitions. 1 will do
the job (though this is in a rpi, so maybe you are hitting some of our
silly limits in the swap
code on 32-bit architectures). LBAs are LBAs, which ones you use don't
matter at all (and
I don't want to hear about wear leveling: that doesn't matter at this level
since the FTL does
it behind the scenes in SSDs and NVMe drives). Your drive may be wearing
out if it has slowed
down with time (though a certain amount like 10-20% may be expected in the
first little bit of life,
the rate of performance decline often slows for a large part of life before
again steeply declining).
QLC SSDs do require a lot more drive care and feeding by the firmware,
including a lot more
writes to deal with 'read disturb' in a read-heavy workload. And a rewrite
from the initial landing
EB (that's typically SLC to be fast) to the longer term storage (QLC for
the capacity). Many
work loads trigger a lot more housekeeping than older TLC or MLC drives.
And the cheapest
NAND in the marketplace tends to be QLC, so the cheapest SSDs (and
sometimes NVMe
drives) tends to be QLC. For light use, it doesn't matter, but if you are
starting to notice slow
downs, you are beyond light use these drives do almost OK at (I'm not a fan
of QLC drives,
if you can't tell).

If this is a thumb drive, you lose. Those are the cheapest of the cheap and
crappiest of the crap
in terms of performance (there are a few notable exceptions, but I'm
playing the odds here).
You are doomed to crappy performance.

If it's really a micro-sd card behind a USB adapter,  see my comments on
thumb drives :).

Now, having said all that, your best bet is to run a FIO test. fio is my
go-to choice for doing
benchmarking of storage. Do a random workload with a 8k write size (since
that's the page
size of aarch64) on one of the swap partitions when it's not in active use.
I suspect you have
a SSD, and that it will kinda suck, but be in line with the swap
performance you are seeing.

I use the following template for my testing (128k should be reduced to 8k
for this test, though
I've not looked at how much we cluster writes in our swap code, so maybe
that's too pessimistic).
You might also try reducing the number of I/O jobs, since I'm measuring, or
trying to, what the
best possible sustained throughput numbers are (latency in this test tends
to run kinda high).

; SSD testing: 128k I/O 64 jobs 32 deep queue

[global]
direct=3D1
rw=3Drandread
refill_buffers
norandommap
randrepeat=3D0
bs=3D128k
ioengine=3Dposixaio
iodepth=3D32
numjobs=3D64
runtime=3D60
group_reporting
thread

[ssd128k]

Good luck.

Warner

--00000000000062c301060c8e3b23
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div dir=3D"ltr"><br></div><br><div class=3D"gmail_quote">=
<div dir=3D"ltr" class=3D"gmail_attr">On Fri, Dec 15, 2023 at 7:29=E2=80=AF=
AM void &lt;<a href=3D"mailto:void@f-m.fm">void@f-m.fm</a>&gt; wrote:<br></=
div><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;bor=
der-left:1px solid rgb(204,204,204);padding-left:1ex">Hello list, I have on=
 a rpi4 a usb3-connected disk partitioned like this:<br>
<br>
# gpart show<br>
<br>
=3D&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 40=C2=A0 1953525088=C2=A0 da0=C2=A0 GPT=
=C2=A0 (932G)<br>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A040=C2=A0 =C2=A0 =C2=A0 532480=C2=
=A0 =C2=A0 1=C2=A0 efi=C2=A0 (260M)<br>
=C2=A0 =C2=A0 =C2=A0 =C2=A0532520=C2=A0 =C2=A0 =C2=A0 =C2=A0 2008=C2=A0 =C2=
=A0 =C2=A0 =C2=A0- free -=C2=A0 (1.0M)<br>
=C2=A0 =C2=A0 =C2=A0 =C2=A0534528=C2=A0 =C2=A0 =C2=A04194304=C2=A0 =C2=A0 2=
=C2=A0 freebsd-swap=C2=A0 (2.0G)<br>
=C2=A0 =C2=A0 =C2=A0 4728832=C2=A0 =C2=A0 =C2=A04194304=C2=A0 =C2=A0 4=C2=
=A0 freebsd-swap=C2=A0 (2.0G)<br>
=C2=A0 =C2=A0 =C2=A0 8923136=C2=A0 =C2=A0 =C2=A04194304=C2=A0 =C2=A0 5=C2=
=A0 freebsd-swap=C2=A0 (2.0G)<br>
=C2=A0 =C2=A0 =C2=A013117440=C2=A0 =C2=A0 =C2=A04194304=C2=A0 =C2=A0 6=C2=
=A0 freebsd-swap=C2=A0 (2.0G)<br>
=C2=A0 =C2=A0 =C2=A017311744=C2=A0 =C2=A0 =C2=A04194304=C2=A0 =C2=A0 7=C2=
=A0 freebsd-swap=C2=A0 (2.0G)<br>
=C2=A0 =C2=A0 =C2=A021506048=C2=A0 =C2=A0 =C2=A04194304=C2=A0 =C2=A0 8=C2=
=A0 freebsd-swap=C2=A0 (2.0G)<br>
=C2=A0 =C2=A0 =C2=A025700352=C2=A0 1927823360=C2=A0 =C2=A0 3=C2=A0 freebsd-=
zfs=C2=A0 (920G)<br>
=C2=A0 =C2=A01953523712=C2=A0 =C2=A0 =C2=A0 =C2=A0 1416=C2=A0 =C2=A0 =C2=A0=
 =C2=A0- free -=C2=A0 (708K)<br>
<br>
If processes swap out, it runs like a slug [1]. I&#39;d like to test if it&=
#39;s<br>
the disk on its way out. How would I test swap partitions? [2]<br>
<br>
[1] it didn&#39;t always run like a slug.<br></blockquote><div><br></div><d=
iv>What&#39;s the underlying hardware?</div><div>=C2=A0</div><blockquote cl=
ass=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid=
 rgb(204,204,204);padding-left:1ex">
[2] would nfs-mounted swap be faster? (1G network)<br></blockquote><div><br=
></div><div>Maybe.</div><div>=C2=A0</div><blockquote class=3D"gmail_quote" =
style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);pa=
dding-left:1ex">
[3] swap is not encrypted<br></blockquote><div>=C2=A0</div><div>Good. You a=
ren&#39;t CPU bound.</div><div><br></div><div>So the good news, kinda, is t=
hat if this is spinning rust, your swap partitions are</div><div>on the fas=
test part of the disk. I&#39;d expect 1 that&#39;s 12G would work better th=
an 6 that</div><div>are 2G since you&#39;d have less head thrash. Parllelis=
m=C2=A0with multiple swap partitions</div><div>works best when they are on =
separate spindles.</div><div><br></div><div>The bad news is that your disk =
may be fine. I&#39;d expect that as the ZFS partition fills up,</div><div>t=
he seek size will increase as greater distances have to be traversed to get=
 back</div><div>to the swap space. There&#39;s a sweet spot of a few tens o=
f GB that drives usually can seek</div><div>far faster than longer throws..=
.</div><div><br></div><div>But if it is a SSD, some comments. It makes no s=
ense to have 6 swap partitions. 1 will do</div><div>the job (though this is=
 in a rpi, so maybe you are hitting some of our silly limits in the swap</d=
iv><div>code on 32-bit architectures). LBAs are LBAs, which ones you use do=
n&#39;t matter at all (and</div><div>I don&#39;t want to hear about wear le=
veling: that doesn&#39;t matter at this level since the FTL does</div><div>=
it behind the scenes in SSDs and NVMe drives). Your drive may be wearing ou=
t if it has slowed</div><div>down with time (though a certain amount like 1=
0-20% may be expected in the first little bit of life,</div><div>the rate o=
f performance decline often slows for a large part of life before again ste=
eply declining).</div><div>QLC SSDs do require a lot more drive care and fe=
eding by the firmware, including a lot more</div><div>writes to deal with &=
#39;read disturb&#39; in a read-heavy workload. And a rewrite from the init=
ial landing</div><div>EB (that&#39;s typically SLC to be fast) to the longe=
r term storage (QLC for the capacity). Many</div><div>work loads trigger a =
lot more housekeeping than older TLC or MLC drives. And the cheapest</div><=
div>NAND in the marketplace tends to be QLC, so the cheapest SSDs (and some=
times NVMe</div><div>drives) tends to be QLC. For light use, it doesn&#39;t=
 matter, but if you are starting to notice slow</div><div>downs, you are be=
yond light use these drives do almost OK at (I&#39;m not a fan of QLC drive=
s,</div><div>if you can&#39;t tell).</div><div><br></div><div>If this is a =
thumb drive, you lose. Those are the cheapest of the cheap and crappiest of=
 the crap</div><div>in terms of performance (there are a few notable except=
ions, but I&#39;m playing the odds here).</div><div>You are doomed to crapp=
y performance.</div><div><br></div><div>If it&#39;s really a micro-sd card =
behind a USB adapter,=C2=A0 see my comments on thumb drives :).</div><div><=
br></div><div>Now, having said all that, your best bet is to run a FIO test=
. fio is my go-to choice for doing</div><div>benchmarking of storage. Do a =
random workload with a 8k write size (since that&#39;s the page</div><div>s=
ize of aarch64) on one of the swap partitions when it&#39;s not in active u=
se. I suspect you have</div><div>a SSD, and that it will kinda suck, but be=
 in line with the swap performance you are seeing.</div><div><br></div><div=
>I use the following template for my testing (128k should be reduced to 8k =
for this test, though</div><div>I&#39;ve not looked at how much we cluster =
writes in our swap code, so maybe that&#39;s too pessimistic).</div><div>Yo=
u might also try reducing the number of I/O jobs, since I&#39;m measuring, =
or trying to, what the</div><div>best possible sustained throughput numbers=
 are (latency in this test tends to run kinda high).</div><div><br></div><d=
iv>; SSD testing: 128k I/O 64 jobs 32 deep queue<br><br>[global]<br>direct=
=3D1<br>rw=3Drandread<br>refill_buffers<br>norandommap<br>randrepeat=3D0<br=
>bs=3D128k<br>ioengine=3Dposixaio<br>iodepth=3D32<br>numjobs=3D64<br>runtim=
e=3D60<br>group_reporting<br>thread<br><br>[ssd128k]<br></div><div><br></di=
v><div>Good luck.</div><div><br></div><div>Warner<br></div></div></div>

--00000000000062c301060c8e3b23--

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CANCZdfpgvE8UjX1XUgimbotpnrH-fXYWZU3kfAwD849bMrT2Vg>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation