Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 12 Nov 2023 18:00:46 -0800
From:      Mark Millard <marklmi@yahoo.com>
To:        FreeBSD Hackers <freebsd-hackers@freebsd.org>, FreeBSD Mailing List <freebsd-ports@freebsd.org>
Subject:   Re: Ryzen 9 7950X3D bulk -a times: adding an example with SMT disabled (so 16 hardware threads, not 32)
Message-ID:  <4596CD14-82EF-4213-9CD8-D065A2F7E073@yahoo.com>
In-Reply-To: <88907269-7ECD-4539-AA3D-AD0A31B13CA7@yahoo.com>
References:  <88907269-7ECD-4539-AA3D-AD0A31B13CA7@yahoo.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Nov 9, 2023, at 17:26, Mark Millard <marklmi@yahoo.com> wrote:

> Reading some benchmark results for compilation activity that showed =
some
> SMT vs. not examples and also using my C++ variant of the old HINT
> benchmark, I ended up curious how a non-SMT from scratch bulk -a would
> end up (ZFS context) compared my prior SMT based run.
>=20
> I use a high load average style of bulk -a activity that has =
USE_TMPFS=3Dall
> involved. The system has 96 GiBytes of RAM (total across the 2 DIMMs).
> The original under 1.5 day time definitely had significant swap space =
use
> (RAM+SWAP =3D 96 GiBYtes + 364 GiBytes =3D=3D 460 GiBytes =3D=3D =
471040 MiBytes).
> The media was (and is) a PCIe based Optane 905P 1.5T. ZFS on a single
> partition on the single drive, ZFS used just for bectl reasons, not =
other
> typical use-ZFS reasons. I've not controlled the ARC size-range =
explicitly.
>=20
> So less swap partition use is part of contribution to the results.
>=20
> The original bulk -a spent a couple of hours at the end where it was
> just fetching and building textproc/stardict-quick . I have not =
cleared
> out /usr/ports/distfiles or updated anything.
>=20
> So fetch time is also a difference here.
>=20
> SMT (32 hardware threads, original bulk -a):
>=20
> [33:10:00] [32] [04:37:23] Finished emulators/libretro-mame | =
libretro-mame-20220124_1: Success
> [35:36:51] [23] [03:44:04] Finished textproc/stardict-quick | =
stardict-quick-2.4.2_9: Success
> . . .
> [main-amd64-bulk_a-default] [2023-11-01_07h14m50s] [committing:] =
Queued: 34683 Built: 33826 Failed: 179   Skipped: 358   Ignored: 320   =
Fetched: 0     Tobuild: 0      Time: 35:37:55
>=20
> Swap-involved MaxObs (Max Observed) figures:
> 173310Mi MaxObsUsed
> 256332Mi MaxObs(Act+Lndry+SwapUsed)
> 265551Mi MaxObs(Act+Wir+Lndry+SwapUsed)
> (So 265551Mi of 471040Mi RAM+SWAP.)
>=20
> Just-RAM MaxObs figures:
> 81066Mi MaxObsActive
> (Given the complications of getting usefully comparable wired figures =
for ZFS (ARC): omit.)
> 94493Mi MaxObs(Act+Wir+Lndry)
>=20
> Note: MaxObs(A+B+C) <=3D MaxObs(A)+MaxObs(B)+MaxObs(C)
>=20
> ALLOW_MAKE_JOBS=3Dyes was used. No explicit restriction on =
PARALLEL_JOBS
> or MAKE_JOBS_NUMBER (or analogous). So 32 builders allowed, each =
allowed
> 32 make jobs. This explains the high load averages of the bulk -a :
>=20
> load averages . . . MaxObs: 360.70, 267.63, 210.84
> (Those need not be all from the same time frame during the bulk -a .)
>=20
> As for the ports vintage:
>=20
> # ~/fbsd-based-on-what-commit.sh -C /usr/ports/
> 6ec8e3450b29 (HEAD -> main, freebsd/main, freebsd/HEAD) devel/sdts++: =
Mark DEPRECATED
> Author:     Muhammad Moinur Rahman <bofh@FreeBSD.org>
> Commit:     Muhammad Moinur Rahman <bofh@FreeBSD.org>
> CommitDate: 2023-10-21 19:01:38 +0000
> branch: main
> merge-base: 6ec8e3450b29462a590d09fb0b07ed214d456bd5
> merge-base: CommitDate: 2023-10-21 19:01:38 +0000
> n637598 (--first-parent --count for merge-base)
>=20
> I do have a environment that avoids various LLVM builds taking
> as long to build :
>=20
> llvm1[3-7]  : no MLIR, no FLANG
> llvm1[4-7]  : use BE_NATIVE
> other llvm* : use defaults (so, no avoidance)
>=20
> I also prevent the builds from using strip on most of the install
> materials built (not just toolchain materials).
>=20
>=20
> non-SMT (16 hardware threads):
>=20
> Note one builder (math/fricas), the last still present, was
> stuck and I had to kill processes to have it stop unless I
> was willing to wiat for my large timeout figures. The last
> builder normal-finish was:
>=20
> [39:48:10] [09] [00:16:23] Finished devel/gcc-msp430-ti-toolchain | =
gcc-msp430-ti-toolchain-9.3.1.2.20210722_1: Success
>=20
> So, trying to place some bounds for comparing to SMT (32 hw threads)
> and non-SMT (16 hw threads):
>=20
> 33:10:00 SMT -> 39:48:10 non-SMT would be over 6.5 hrs longer for =
non-SMT
> 35:36:51 SMT -> 39:48:10 non-SMT would be over 4 hrs longer for =
non-SMT
>=20
> As for SMT vs. non-SMT Maximum Observed figures:
>=20
> SMT     load averages . . . MaxObs: 360.70, 267.63, 210.84
> non-SMT load averages . . . MaxObs: 152.89, 100.94,  76.28
>=20
> Swap-involved MaxObs figures for SMT (32 hw threads) vs not (16):
> 173310Mi vs.  33003Mi MaxObsUsed
> 256332Mi vs. 117221Mi MaxObs(Act+Lndry+SwapUsed)
> 265551Mi vs. 124776Mi MaxObs(Act+Wir+Lndry+SwapUsed)
>=20
> Just-RAM MaxObs figures for SMT (32 hw threads) vs not (16):
> 81066Mi vs. 69763Mi MaxObsActive
> (Given the complications of getting usefully comparable wired figures =
for ZFS (ARC): omit.)
> 94493Mi vs. 94303Mi MaxObs(Act+Wir+Lndry)
>=20

I've added a section for a plot for the 7950X3D to the end of:

=
https://github.com/markmi/acpphint/blob/master/Some_acpphint_curves_with_n=
otes.md

It is from a C++ variant of the old HINT benchmark and includes
showing RAM caching consequences for the benchmark. The about
32 MiByte and about 96 MiByte cache sizes for the 2 CCDs are
observable.

I'll also note that for the devices present (active and not),
at fully active the 7950X3D seems to use 225 Watts .. 235 Watts
at the power cable for FreeBSD. Idle FreeBSD: more like 96
Watts.

(No video card. 2 forms of Optane 905P 1.5TB, one active. One
Samsung 960 Pro 2TB, inactive. One Samsung 970 EVO Plus 2TB,
inactive. 96 GiBytes of RAM total across 2 DIMMs. Fans and
AIO cooling. Keyboard and mouse USB powered. USB3 Ethernet
dongle. Monitor connection.)


ThreadRipper 1950X "bulk -a" test in progress:

I'm running a from-scratch USE_TMPFS=3Dall "bulk -a" on the
ThreadRipper 1950X (128 GiBytes of RAM). =46rom what I've seen
so far, it looks to likely take over 72 hr, so 2x+ as long
as the 7950X3D. (Samgsung 960 Pro 1TB system media and
Optane 900 480 GB swap space media in use, 447 GiByte I as I
remember). The ZFS partition on the 960 Pro has ashift=3D14 .)
It has a slightly modified copy of the ZFS from the 7950X3D
as far as starting content goes. It does have openzfs-2.2
compatibility fully enabled for its pool, including block
cloning, unlike any other ZFS I have around
(openzfs-2.1-freebsd).

=3D=3D=3D
Mark Millard
marklmi at yahoo.com




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4596CD14-82EF-4213-9CD8-D065A2F7E073>