From nobody Fri Dec 15 15:41:10 2023 X-Original-To: freebsd-fs@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4SsD385MyQz53hv1 for ; Fri, 15 Dec 2023 15:41:24 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Received: from mail-ej1-x62f.google.com (mail-ej1-x62f.google.com [IPv6:2a00:1450:4864:20::62f]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1D4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4SsD376gRvz4TgS for ; Fri, 15 Dec 2023 15:41:23 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Authentication-Results: mx1.freebsd.org; dkim=pass header.d=bsdimp-com.20230601.gappssmtp.com header.s=20230601 header.b=SbWQZ9em; spf=none (mx1.freebsd.org: domain of wlosh@bsdimp.com has no SPF policy when checking 2a00:1450:4864:20::62f) smtp.mailfrom=wlosh@bsdimp.com; dmarc=none Received: by mail-ej1-x62f.google.com with SMTP id a640c23a62f3a-a22ed5f0440so108784666b.1 for ; Fri, 15 Dec 2023 07:41:23 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bsdimp-com.20230601.gappssmtp.com; s=20230601; t=1702654882; x=1703259682; darn=freebsd.org; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :from:to:cc:subject:date:message-id:reply-to; bh=OgC/po3Oe7EyasQ8fO2S7xXohwe5QlopW+jQIB4xTGg=; b=SbWQZ9emSQL9a29yGQiAVWvULbABo6CKAu+eEM7C1A2DAykLUX+Q8j3mK2HhevhiNf JO6G9dgzOx5l8B++Cifcgm9QHWhDNtA1fdYrruYkJq4myfIGR+6dEKr24iviwjXbobqG IVDyx03nRPb0O0gGpth8+piPBxqnRElqD+xqXmN6ZLQpQivymBdJDHl8DC2m6z+kBXSf M5zXq4/k7HZvKLS5YMKH01NvsM/JDRqxBcoenCh8JaITG83z6rtdiQQZdSeHZdDa6p+U S4Z5ZMnrLbBLB4FNVujTWmKmVZq2+s4Qw2t1G8741jYPtXmmSxpk9nM/bYl2BBhbRpzz zz9w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1702654882; x=1703259682; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=OgC/po3Oe7EyasQ8fO2S7xXohwe5QlopW+jQIB4xTGg=; b=mL1o3MLabG4dtgHA5ofGd/CcpHuEB1TwiQRIsCULZ5BYwjtl/isyzsr3HCT8pqQXad QG6Obw3CvtxNFSJjCLtT28CnZE/nQAuAuHzrDO4YTSdGJqAC+Bk2ICLze3P8Gs1l2Fi7 De0H+o4Oc+dDrq9oD9T+nCeNi2S7OKnY+Ji71RmQkjzMWo11lYA4ZGjJpt9SFz2F1CO3 GGdVg38sPar9RQTMlQ09DGVt3hM2EQwNYn6oZDvg3y57liuwwJlitU2S60o7aedkvbPY cW3oRRAebF3VUc0MZtRdZBpyne7jMaZt2ut7zjvlBQ78SbTTd5FCDNyJb+SMqGfNprdC v7fQ== X-Gm-Message-State: AOJu0Yyn76QT/JdxbmR/Y7n7Of6nNj1faFrjym9QmjPf13W9Nc64yDj3 uS/OqTxNccNuHxnla9WiuI20SBmreOqiZIyRNJAbswWsOl0/axWdRk/v8Q== X-Google-Smtp-Source: AGHT+IF8KVy3hdUMwHLiSQMDra/ZHic6FU+NVIh/TPzJC5BIuCP7L1RR1TFNPTGNoYty91UNbk/oMB3wJRtGjGL142A= X-Received: by 2002:a17:906:1091:b0:a1d:58c0:ed7e with SMTP id u17-20020a170906109100b00a1d58c0ed7emr3643152eju.198.1702654881475; Fri, 15 Dec 2023 07:41:21 -0800 (PST) List-Id: Filesystems List-Archive: https://lists.freebsd.org/archives/freebsd-fs List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-fs@freebsd.org MIME-Version: 1.0 References: In-Reply-To: From: Warner Losh Date: Fri, 15 Dec 2023 08:41:10 -0700 Message-ID: Subject: Re: measuring swap partition speed To: freebsd-fs@freebsd.org Content-Type: multipart/alternative; boundary="00000000000062c301060c8e3b23" X-Spamd-Result: default: False [-2.98 / 15.00]; NEURAL_HAM_LONG(-1.00)[-1.000]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; NEURAL_HAM_SHORT(-0.98)[-0.975]; FORGED_SENDER(0.30)[imp@bsdimp.com,wlosh@bsdimp.com]; R_DKIM_ALLOW(-0.20)[bsdimp-com.20230601.gappssmtp.com:s=20230601]; MIME_GOOD(-0.10)[multipart/alternative,text/plain]; RCVD_IN_DNSWL_NONE(0.00)[2a00:1450:4864:20::62f:from]; MLMMJ_DEST(0.00)[freebsd-fs@freebsd.org]; MIME_TRACE(0.00)[0:+,1:+,2:~]; RCVD_COUNT_ONE(0.00)[1]; R_SPF_NA(0.00)[no SPF record]; RCVD_TLS_LAST(0.00)[]; ARC_NA(0.00)[]; DMARC_NA(0.00)[bsdimp.com]; ASN(0.00)[asn:15169, ipnet:2a00:1450::/32, country:US]; TO_MATCH_ENVRCPT_ALL(0.00)[]; FROM_HAS_DN(0.00)[]; DKIM_TRACE(0.00)[bsdimp-com.20230601.gappssmtp.com:+]; TO_DN_NONE(0.00)[]; RCPT_COUNT_ONE(0.00)[1]; PREVIOUSLY_DELIVERED(0.00)[freebsd-fs@freebsd.org]; FROM_NEQ_ENVFROM(0.00)[imp@bsdimp.com,wlosh@bsdimp.com] X-Rspamd-Queue-Id: 4SsD376gRvz4TgS X-Spamd-Bar: -- --00000000000062c301060c8e3b23 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Fri, Dec 15, 2023 at 7:29=E2=80=AFAM void wrote: > Hello list, I have on a rpi4 a usb3-connected disk partitioned like this: > > # gpart show > > =3D> 40 1953525088 da0 GPT (932G) > 40 532480 1 efi (260M) > 532520 2008 - free - (1.0M) > 534528 4194304 2 freebsd-swap (2.0G) > 4728832 4194304 4 freebsd-swap (2.0G) > 8923136 4194304 5 freebsd-swap (2.0G) > 13117440 4194304 6 freebsd-swap (2.0G) > 17311744 4194304 7 freebsd-swap (2.0G) > 21506048 4194304 8 freebsd-swap (2.0G) > 25700352 1927823360 3 freebsd-zfs (920G) > 1953523712 1416 - free - (708K) > > If processes swap out, it runs like a slug [1]. I'd like to test if it's > the disk on its way out. How would I test swap partitions? [2] > > [1] it didn't always run like a slug. > What's the underlying hardware? > [2] would nfs-mounted swap be faster? (1G network) > Maybe. > [3] swap is not encrypted > Good. You aren't CPU bound. So the good news, kinda, is that if this is spinning rust, your swap partitions are on the fastest part of the disk. I'd expect 1 that's 12G would work better than 6 that are 2G since you'd have less head thrash. Parllelism with multiple swap partitions works best when they are on separate spindles. The bad news is that your disk may be fine. I'd expect that as the ZFS partition fills up, the seek size will increase as greater distances have to be traversed to get back to the swap space. There's a sweet spot of a few tens of GB that drives usually can seek far faster than longer throws... But if it is a SSD, some comments. It makes no sense to have 6 swap partitions. 1 will do the job (though this is in a rpi, so maybe you are hitting some of our silly limits in the swap code on 32-bit architectures). LBAs are LBAs, which ones you use don't matter at all (and I don't want to hear about wear leveling: that doesn't matter at this level since the FTL does it behind the scenes in SSDs and NVMe drives). Your drive may be wearing out if it has slowed down with time (though a certain amount like 10-20% may be expected in the first little bit of life, the rate of performance decline often slows for a large part of life before again steeply declining). QLC SSDs do require a lot more drive care and feeding by the firmware, including a lot more writes to deal with 'read disturb' in a read-heavy workload. And a rewrite from the initial landing EB (that's typically SLC to be fast) to the longer term storage (QLC for the capacity). Many work loads trigger a lot more housekeeping than older TLC or MLC drives. And the cheapest NAND in the marketplace tends to be QLC, so the cheapest SSDs (and sometimes NVMe drives) tends to be QLC. For light use, it doesn't matter, but if you are starting to notice slow downs, you are beyond light use these drives do almost OK at (I'm not a fan of QLC drives, if you can't tell). If this is a thumb drive, you lose. Those are the cheapest of the cheap and crappiest of the crap in terms of performance (there are a few notable exceptions, but I'm playing the odds here). You are doomed to crappy performance. If it's really a micro-sd card behind a USB adapter, see my comments on thumb drives :). Now, having said all that, your best bet is to run a FIO test. fio is my go-to choice for doing benchmarking of storage. Do a random workload with a 8k write size (since that's the page size of aarch64) on one of the swap partitions when it's not in active use. I suspect you have a SSD, and that it will kinda suck, but be in line with the swap performance you are seeing. I use the following template for my testing (128k should be reduced to 8k for this test, though I've not looked at how much we cluster writes in our swap code, so maybe that's too pessimistic). You might also try reducing the number of I/O jobs, since I'm measuring, or trying to, what the best possible sustained throughput numbers are (latency in this test tends to run kinda high). ; SSD testing: 128k I/O 64 jobs 32 deep queue [global] direct=3D1 rw=3Drandread refill_buffers norandommap randrepeat=3D0 bs=3D128k ioengine=3Dposixaio iodepth=3D32 numjobs=3D64 runtime=3D60 group_reporting thread [ssd128k] Good luck. Warner --00000000000062c301060c8e3b23 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable


=
On Fri, Dec 15, 2023 at 7:29=E2=80=AF= AM void <void@f-m.fm> wrote:
Hello list, I have on= a rpi4 a usb3-connected disk partitioned like this:

# gpart show

=3D>=C2=A0 =C2=A0 =C2=A0 =C2=A0 40=C2=A0 1953525088=C2=A0 da0=C2=A0 GPT= =C2=A0 (932G)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A040=C2=A0 =C2=A0 =C2=A0 532480=C2= =A0 =C2=A0 1=C2=A0 efi=C2=A0 (260M)
=C2=A0 =C2=A0 =C2=A0 =C2=A0532520=C2=A0 =C2=A0 =C2=A0 =C2=A0 2008=C2=A0 =C2= =A0 =C2=A0 =C2=A0- free -=C2=A0 (1.0M)
=C2=A0 =C2=A0 =C2=A0 =C2=A0534528=C2=A0 =C2=A0 =C2=A04194304=C2=A0 =C2=A0 2= =C2=A0 freebsd-swap=C2=A0 (2.0G)
=C2=A0 =C2=A0 =C2=A0 4728832=C2=A0 =C2=A0 =C2=A04194304=C2=A0 =C2=A0 4=C2= =A0 freebsd-swap=C2=A0 (2.0G)
=C2=A0 =C2=A0 =C2=A0 8923136=C2=A0 =C2=A0 =C2=A04194304=C2=A0 =C2=A0 5=C2= =A0 freebsd-swap=C2=A0 (2.0G)
=C2=A0 =C2=A0 =C2=A013117440=C2=A0 =C2=A0 =C2=A04194304=C2=A0 =C2=A0 6=C2= =A0 freebsd-swap=C2=A0 (2.0G)
=C2=A0 =C2=A0 =C2=A017311744=C2=A0 =C2=A0 =C2=A04194304=C2=A0 =C2=A0 7=C2= =A0 freebsd-swap=C2=A0 (2.0G)
=C2=A0 =C2=A0 =C2=A021506048=C2=A0 =C2=A0 =C2=A04194304=C2=A0 =C2=A0 8=C2= =A0 freebsd-swap=C2=A0 (2.0G)
=C2=A0 =C2=A0 =C2=A025700352=C2=A0 1927823360=C2=A0 =C2=A0 3=C2=A0 freebsd-= zfs=C2=A0 (920G)
=C2=A0 =C2=A01953523712=C2=A0 =C2=A0 =C2=A0 =C2=A0 1416=C2=A0 =C2=A0 =C2=A0= =C2=A0- free -=C2=A0 (708K)

If processes swap out, it runs like a slug [1]. I'd like to test if it&= #39;s
the disk on its way out. How would I test swap partitions? [2]

[1] it didn't always run like a slug.

What's the underlying hardware?
=C2=A0
[2] would nfs-mounted swap be faster? (1G network)
Maybe.
=C2=A0
[3] swap is not encrypted
=C2=A0
Good. You a= ren't CPU bound.

So the good news, kinda, is t= hat if this is spinning rust, your swap partitions are
on the fas= test part of the disk. I'd expect 1 that's 12G would work better th= an 6 that
are 2G since you'd have less head thrash. Parllelis= m=C2=A0with multiple swap partitions
works best when they are on = separate spindles.

The bad news is that your disk = may be fine. I'd expect that as the ZFS partition fills up,
t= he seek size will increase as greater distances have to be traversed to get= back
to the swap space. There's a sweet spot of a few tens o= f GB that drives usually can seek
far faster than longer throws..= .

But if it is a SSD, some comments. It makes no s= ense to have 6 swap partitions. 1 will do
the job (though this is= in a rpi, so maybe you are hitting some of our silly limits in the swap
code on 32-bit architectures). LBAs are LBAs, which ones you use do= n't matter at all (and
I don't want to hear about wear le= veling: that doesn't matter at this level since the FTL does
= it behind the scenes in SSDs and NVMe drives). Your drive may be wearing ou= t if it has slowed
down with time (though a certain amount like 1= 0-20% may be expected in the first little bit of life,
the rate o= f performance decline often slows for a large part of life before again ste= eply declining).
QLC SSDs do require a lot more drive care and fe= eding by the firmware, including a lot more
writes to deal with &= #39;read disturb' in a read-heavy workload. And a rewrite from the init= ial landing
EB (that's typically SLC to be fast) to the longe= r term storage (QLC for the capacity). Many
work loads trigger a = lot more housekeeping than older TLC or MLC drives. And the cheapest
<= div>NAND in the marketplace tends to be QLC, so the cheapest SSDs (and some= times NVMe
drives) tends to be QLC. For light use, it doesn't= matter, but if you are starting to notice slow
downs, you are be= yond light use these drives do almost OK at (I'm not a fan of QLC drive= s,
if you can't tell).

If this is a = thumb drive, you lose. Those are the cheapest of the cheap and crappiest of= the crap
in terms of performance (there are a few notable except= ions, but I'm playing the odds here).
You are doomed to crapp= y performance.

If it's really a micro-sd card = behind a USB adapter,=C2=A0 see my comments on thumb drives :).
<= br>
Now, having said all that, your best bet is to run a FIO test= . fio is my go-to choice for doing
benchmarking of storage. Do a = random workload with a 8k write size (since that's the page
s= ize of aarch64) on one of the swap partitions when it's not in active u= se. I suspect you have
a SSD, and that it will kinda suck, but be= in line with the swap performance you are seeing.

I use the following template for my testing (128k should be reduced to 8k = for this test, though
I've not looked at how much we cluster = writes in our swap code, so maybe that's too pessimistic).
Yo= u might also try reducing the number of I/O jobs, since I'm measuring, = or trying to, what the
best possible sustained throughput numbers= are (latency in this test tends to run kinda high).

; SSD testing: 128k I/O 64 jobs 32 deep queue

[global]
direct= =3D1
rw=3Drandread
refill_buffers
norandommap
randrepeat=3D0bs=3D128k
ioengine=3Dposixaio
iodepth=3D32
numjobs=3D64
runtim= e=3D60
group_reporting
thread

[ssd128k]

Good luck.

Warner
--00000000000062c301060c8e3b23--