Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 15 Nov 2023 20:50:21 -0800
From:      Mark Millard <marklmi@yahoo.com>
To:        FreeBSD Hackers <freebsd-hackers@freebsd.org>, FreeBSD Mailing List <freebsd-ports@freebsd.org>
Subject:   Re: Ryzen 9 7950X3D bulk -a times: adding an example with SMT disabled (so 16 hardware threads, not 32)
Message-ID:  <EA23FC42-B1A9-469E-B3BC-84945FB68A5A@yahoo.com>
In-Reply-To: <4596CD14-82EF-4213-9CD8-D065A2F7E073@yahoo.com>
References:  <88907269-7ECD-4539-AA3D-AD0A31B13CA7@yahoo.com> <4596CD14-82EF-4213-9CD8-D065A2F7E073@yahoo.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Nov 12, 2023, at 18:00, Mark Millard <marklmi@yahoo.com> wrote:

> On Nov 9, 2023, at 17:26, Mark Millard <marklmi@yahoo.com> wrote:
>=20
>> Reading some benchmark results for compilation activity that showed =
some
>> SMT vs. not examples and also using my C++ variant of the old HINT
>> benchmark, I ended up curious how a non-SMT from scratch bulk -a =
would
>> end up (ZFS context) compared my prior SMT based run.
>>=20
>> I use a high load average style of bulk -a activity that has =
USE_TMPFS=3Dall
>> involved. The system has 96 GiBytes of RAM (total across the 2 =
DIMMs).
>> The original under 1.5 day time definitely had significant swap space =
use
>> (RAM+SWAP =3D 96 GiBYtes + 364 GiBytes =3D=3D 460 GiBytes =3D=3D =
471040 MiBytes).
>> The media was (and is) a PCIe based Optane 905P 1.5T. ZFS on a single
>> partition on the single drive, ZFS used just for bectl reasons, not =
other
>> typical use-ZFS reasons. I've not controlled the ARC size-range =
explicitly.
>>=20
>> So less swap partition use is part of contribution to the results.
>>=20
>> The original bulk -a spent a couple of hours at the end where it was
>> just fetching and building textproc/stardict-quick . I have not =
cleared
>> out /usr/ports/distfiles or updated anything.
>>=20
>> So fetch time is also a difference here.
>>=20
>> SMT (32 hardware threads, original bulk -a):
>>=20
>> [33:10:00] [32] [04:37:23] Finished emulators/libretro-mame | =
libretro-mame-20220124_1: Success
>> [35:36:51] [23] [03:44:04] Finished textproc/stardict-quick | =
stardict-quick-2.4.2_9: Success
>> . . .
>> [main-amd64-bulk_a-default] [2023-11-01_07h14m50s] [committing:] =
Queued: 34683 Built: 33826 Failed: 179   Skipped: 358   Ignored: 320   =
Fetched: 0     Tobuild: 0      Time: 35:37:55
>>=20
>> Swap-involved MaxObs (Max Observed) figures:
>> 173310Mi MaxObsUsed
>> 256332Mi MaxObs(Act+Lndry+SwapUsed)
>> 265551Mi MaxObs(Act+Wir+Lndry+SwapUsed)
>> (So 265551Mi of 471040Mi RAM+SWAP.)
>>=20
>> Just-RAM MaxObs figures:
>> 81066Mi MaxObsActive
>> (Given the complications of getting usefully comparable wired figures =
for ZFS (ARC): omit.)
>> 94493Mi MaxObs(Act+Wir+Lndry)
>>=20
>> Note: MaxObs(A+B+C) <=3D MaxObs(A)+MaxObs(B)+MaxObs(C)
>>=20
>> ALLOW_MAKE_JOBS=3Dyes was used. No explicit restriction on =
PARALLEL_JOBS
>> or MAKE_JOBS_NUMBER (or analogous). So 32 builders allowed, each =
allowed
>> 32 make jobs. This explains the high load averages of the bulk -a :
>>=20
>> load averages . . . MaxObs: 360.70, 267.63, 210.84
>> (Those need not be all from the same time frame during the bulk -a .)
>>=20
>> As for the ports vintage:
>>=20
>> # ~/fbsd-based-on-what-commit.sh -C /usr/ports/
>> 6ec8e3450b29 (HEAD -> main, freebsd/main, freebsd/HEAD) devel/sdts++: =
Mark DEPRECATED
>> Author:     Muhammad Moinur Rahman <bofh@FreeBSD.org>
>> Commit:     Muhammad Moinur Rahman <bofh@FreeBSD.org>
>> CommitDate: 2023-10-21 19:01:38 +0000
>> branch: main
>> merge-base: 6ec8e3450b29462a590d09fb0b07ed214d456bd5
>> merge-base: CommitDate: 2023-10-21 19:01:38 +0000
>> n637598 (--first-parent --count for merge-base)
>>=20
>> I do have a environment that avoids various LLVM builds taking
>> as long to build :
>>=20
>> llvm1[3-7]  : no MLIR, no FLANG
>> llvm1[4-7]  : use BE_NATIVE
>> other llvm* : use defaults (so, no avoidance)
>>=20
>> I also prevent the builds from using strip on most of the install
>> materials built (not just toolchain materials).
>>=20
>>=20
>> non-SMT (16 hardware threads):
>>=20
>> Note one builder (math/fricas), the last still present, was
>> stuck and I had to kill processes to have it stop unless I
>> was willing to wiat for my large timeout figures. The last
>> builder normal-finish was:
>>=20
>> [39:48:10] [09] [00:16:23] Finished devel/gcc-msp430-ti-toolchain | =
gcc-msp430-ti-toolchain-9.3.1.2.20210722_1: Success
>>=20
>> So, trying to place some bounds for comparing to SMT (32 hw threads)
>> and non-SMT (16 hw threads):
>>=20
>> 33:10:00 SMT -> 39:48:10 non-SMT would be over 6.5 hrs longer for =
non-SMT
>> 35:36:51 SMT -> 39:48:10 non-SMT would be over 4 hrs longer for =
non-SMT
>>=20
>> As for SMT vs. non-SMT Maximum Observed figures:
>>=20
>> SMT     load averages . . . MaxObs: 360.70, 267.63, 210.84
>> non-SMT load averages . . . MaxObs: 152.89, 100.94,  76.28
>>=20
>> Swap-involved MaxObs figures for SMT (32 hw threads) vs not (16):
>> 173310Mi vs.  33003Mi MaxObsUsed
>> 256332Mi vs. 117221Mi MaxObs(Act+Lndry+SwapUsed)
>> 265551Mi vs. 124776Mi MaxObs(Act+Wir+Lndry+SwapUsed)
>>=20
>> Just-RAM MaxObs figures for SMT (32 hw threads) vs not (16):
>> 81066Mi vs. 69763Mi MaxObsActive
>> (Given the complications of getting usefully comparable wired figures =
for ZFS (ARC): omit.)
>> 94493Mi vs. 94303Mi MaxObs(Act+Wir+Lndry)
>>=20
>=20
> I've added a section for a plot for the 7950X3D to the end of:
>=20
> =
https://github.com/markmi/acpphint/blob/master/Some_acpphint_curves_with_n=
otes.md
>=20
> It is from a C++ variant of the old HINT benchmark and includes
> showing RAM caching consequences for the benchmark. The about
> 32 MiByte and about 96 MiByte cache sizes for the 2 CCDs are
> observable.
>=20
> I'll also note that for the devices present (active and not),
> at fully active the 7950X3D seems to use 225 Watts .. 235 Watts
> at the power cable for FreeBSD. Idle FreeBSD: more like 96
> Watts.
>=20
> (No video card. 2 forms of Optane 905P 1.5TB, one active. One
> Samsung 960 Pro 2TB, inactive. One Samsung 970 EVO Plus 2TB,
> inactive. 96 GiBytes of RAM total across 2 DIMMs. Fans and
> AIO cooling. Keyboard and mouse USB powered. USB3 Ethernet
> dongle. Monitor connection.)
>=20
>=20
> ThreadRipper 1950X "bulk -a" test in progress:
>=20
> I'm running a from-scratch USE_TMPFS=3Dall "bulk -a" on the
> ThreadRipper 1950X (128 GiBytes of RAM). =46rom what I've seen
> so far, it looks to likely take over 72 hr, so 2x+ as long
> as the 7950X3D. (Samgsung 960 Pro 1TB system media and
> Optane 900 480 GB swap space media in use, 447 GiByte I as I
> remember). The ZFS partition on the 960 Pro has ashift=3D14 .)
> It has a slightly modified copy of the ZFS from the 7950X3D
> as far as starting content goes. It does have openzfs-2.2
> compatibility fully enabled for its pool, including block
> cloning, unlike any other ZFS I have around
> (openzfs-2.1-freebsd).

ThreadRipper 1950X:

. . .
[85:21:50] [27] [02:06:01] Finished databases/mongodb60 | =
mongodb60-6.0.11: Success
[85:34:00] [28] [03:23:06] Finished biology/ncbi-cxx-toolkit | =
ncbi-cxx-toolkit-27.0.0_1: Success
[85:46:31] [30] [08:19:30] Finished cad/kicad-library-packages3d | =
kicad-library-packages3d-7.0.2_2: Success
[87:07:02] [03] [13:00:45] Finished emulators/libretro-mame | =
libretro-mame-20220124_1: Success

But one port that normally takes little time got stuck (in kqread,
apparently against a <defunct> child process), resulting in (later):

# poudriere status -b
[main-amd64-bulk_a-default] [2023-11-11_17h59m25s] [parallel_build:] =
Queued: 34683 Built: 33807 Failed: 173   Skipped: 382   Ignored: 320   =
Fetched: 0     Tobuild: 1      Time: 88:17:59
 ID  TOTAL        ORIGIN   PKGNAME                PHASE PHASE    TMPFS   =
 CPU% MEM%
[05] 17:27:25 ftp/curlie | curlie-1.6.7_15 check-sanity 17:27:15 1.28 =
GiB         =20
=3D>> Logs: =
/usr/local/poudriere/data/logs/bulk/main-amd64-bulk_a-default/2023-11-11_1=
7h59m25s

So it looks like:

Ryzen 9 7950X       96 GiBytes RAM (5600MT/s): 33 hr or so.
ThreadRipper 1950X 128 GiBytes RAM (2400MT/s): 87 hr or so.

For reference (both 32 hardware threads):

Ryzen 9      7950X: 265551Mi MaxObs(Act+Wir+Lndry+SwapUsed)
ThreadRipper 1950X: 245564Mi MaxObs(Act+Wir+Lndry+SwapUsed)

(The 96 GiByte vs. 128 GiByte RAM size difference makes other
figures messier to compare.)

I have updated the 7950X UEFI and am rerunning the from-scratch
bulk -a test in the ZFS context to check on system stability
for such.

=3D=3D=3D
Mark Millard
marklmi at yahoo.com




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?EA23FC42-B1A9-469E-B3BC-84945FB68A5A>