Date: Sat, 6 Dec 2025 11:50:08 +0100 From: Mateusz Guzik <mjguzik@gmail.com> To: FreeBSD Current <freebsd-current@freebsd.org> Subject: performance regressions in 15.0 Message-ID: <CAGudoHFUJ23yUWPq7_VS2ek0zoGQOS42HB00n-hWspA3Cb4-XQ@mail.gmail.com>
index | next in thread | raw e-mail
I got pointed at phoronix: https://www.phoronix.com/review/freebsd-15-amd-epyc While I don't treat their results as gospel, a FreeBSD vs FreeBSD test showing a slowdown most definitely warrants a closer look. They observed slowdowns when using iperf over localhost and when compiling llvm. I can confirm both problems and more. I found the profiling tooling for userspace to be broken again so I did not investigate much and I'm not going to dig into it further. Test box is AMD EPYC 9454 48-Core Processor, with the 2 systems running as 8 core vms under kvm. I. iperf Package is: iperf3-3.19.1 Tested with: iperf3 -s + iperf3 -c localhost While the rates fluctuate, 14.3 is overall faster: [ ID] Interval Transfer Bitrate [ 5] 0.00-1.01 sec 2.70 GBytes 23.1 Gbits/sec [ 5] 1.01-2.07 sec 1.92 GBytes 15.5 Gbits/sec [ 5] 2.07-3.01 sec 1.76 GBytes 16.1 Gbits/sec [ 5] 3.01-4.02 sec 1.86 GBytes 15.9 Gbits/sec [ 5] 4.02-5.01 sec 2.84 GBytes 24.5 Gbits/sec [ 5] 5.01-6.02 sec 2.54 GBytes 21.7 Gbits/sec [ 5] 6.02-7.07 sec 2.18 GBytes 17.8 Gbits/sec [ 5] 7.07-8.02 sec 1.76 GBytes 15.9 Gbits/sec [ 5] 8.02-9.01 sec 1.88 GBytes 16.3 Gbits/sec [ 5] 9.01-10.02 sec 1.90 GBytes 16.2 Gbits/sec - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bitrate [ 5] 0.00-10.02 sec 21.3 GBytes 18.3 Gbits/sec receiver vs 15.0: [ ID] Interval Transfer Bitrate [ 5] 0.00-1.01 sec 1.85 GBytes 15.7 Gbits/sec [ 5] 1.01-2.02 sec 3.23 GBytes 27.5 Gbits/sec [ 5] 2.02-3.03 sec 1.84 GBytes 15.7 Gbits/sec [ 5] 3.03-4.01 sec 1.86 GBytes 16.3 Gbits/sec [ 5] 4.01-5.01 sec 1.64 GBytes 14.1 Gbits/sec [ 5] 5.01-6.07 sec 1.87 GBytes 15.1 Gbits/sec [ 5] 6.07-7.01 sec 1.23 GBytes 11.3 Gbits/sec [ 5] 7.01-8.01 sec 1.85 GBytes 15.8 Gbits/sec [ 5] 8.01-9.01 sec 1.42 GBytes 12.2 Gbits/sec [ 5] 9.01-10.01 sec 1.81 GBytes 15.5 Gbits/sec [ 5] 10.01-10.07 sec 99.9 MBytes 14.1 Gbits/sec - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bitrate [ 5] 0.00-10.07 sec 18.7 GBytes 16.0 Gbits/sec receiver This is reliably repeatable. II. compilation speed The the real and serious problem. Both versions of the system ship the same clang version: FreeBSD clang version 19.1.7 (https://github.com/llvm/llvm-project.git llvmorg-19.1.7-0-gcd708029e0b2) Target: x86_64-unknown-freebsd14.3 Thread model: posix InstalledDir: /usr/bin FreeBSD clang version 19.1.7 (https://github.com/llvm/llvm-project.git llvmorg-19.1.7-0-gcd708029e0b2) Target: x86_64-unknown-freebsd15.0 Thread model: posix InstalledDir: /usr/bin I found that compiling the will-it-scale suite about doubles in real time needed, along with doubling time spent in userspace. will-it-scale needs a little bit of massaging to work, diff at the end. check this out (repeabale): while true; do gmake -s clean && time gmake -s -j 8; done 14.3: gmake -s -j 8 8.93s user 2.03s system 769% cpu 1.42s (1.424) total gmake -s -j 8 9.02s user 2.16s system 757% cpu 1.48s (1.475) total gmake -s -j 8 9.29s user 1.95s system 774% cpu 1.45s (1.450) total gmake -s -j 8 8.97s user 2.46s system 770% cpu 1.48s (1.484) total gmake -s -j 8 9.13s user 2.30s system 773% cpu 1.48s (1.477) total 15.0: gmake -s -j 8 19.90s user 3.02s system 773% cpu 2.96s (2.963) total gmake -s -j 8 19.90s user 3.18s system 774% cpu 2.98s (2.979) total gmake -s -j 8 20.24s user 2.90s system 770% cpu 3.00s (3.005) total gmake -s -j 8 19.92s user 3.25s system 771% cpu 3.00s (3.003) total gmake -s -j 8 20.25s user 2.95s system 772% cpu 3.01s (3.006) total user time *skyrocketed* This is not some weird scheduling anomaly either: while true; do gmake -s clean && time cpuset -l 1 gmake -s ; done 14.3: cpuset -l 1 gmake -s 8.88s user 1.11s system 99% cpu 10.00s (10.003) total cpuset -l 1 gmake -s 8.94s user 1.12s system 99% cpu 10.07s (10.067) total cpuset -l 1 gmake -s 9.00s user 1.06s system 99% cpu 10.07s (10.072) total cpuset -l 1 gmake -s 8.88s user 1.17s system 99% cpu 10.07s (10.069) total cpuset -l 1 gmake -s 8.88s user 1.23s system 99% cpu 10.13s (10.127) total 15.0: cpuset -l 1 gmake -s 21.58s user 2.33s system 99% cpu 23.96s (23.961) total cpuset -l 1 gmake -s 21.16s user 2.54s system 99% cpu 23.76s (23.759) total cpuset -l 1 gmake -s 19.90s user 1.90s system 99% cpu 21.85s (21.854) total cpuset -l 1 gmake -s 19.76s user 1.74s system 99% cpu 21.55s (21.554) total cpuset -l 1 gmake -s 19.72s user 1.75s system 99% cpu 21.53s (21.526) total Per my previous remark I found userspace profiling to be non-operational and I did not try to fight it. It did however do few sanity checks mostly with will-its-scale: 1. syscall rate is down over 7% (tested with getppid1_processes) 2. malloc also got a slowdown(!). there are 2 benches, one ends up issuing syscalls, the other does not. Results in ops/s: malloc1_processes (malloc/free of 128MB): 14.3: 1960769 15.0: 1376087 (-30%) malloc2_processes (malloc/free of 1kB): 14.3: 156034491 15.0: 51645759 (-67%) Apart from that the kernel is overall slower, for example negative path lookups also regressed (-12%). Another issue is execve rate. To bench that I borrowed the following: http://apollo.backplane.com/DFlyMisc/doexec.c cc -O2 doexec.c cpuset -l 1 ./a.out 1 In ops/s: 14.3: 4905 15.0: 3672 (-25%) The clang thing might happen to be clang-specific. Whatever it is, I think the total slowdown is serious enough that it warrants investigation and an errata notice. But you do you, I am *not* going to work on this. will-it-scale howto: pkg install gmake hwloc git clone https://github.com/antonblanchard/will-it-scale add this: diff --git a/Makefile b/Makefile index 8dd0717..d779705 100644 --- a/Makefile +++ b/Makefile @@ -1,9 +1,11 @@ -CFLAGS+=-Wall -O2 -g -LDFLAGS+=-lhwloc +CFLAGS+=-Wall -O2 -g -I/usr/local/include +LDFLAGS+=-lhwloc -L/usr/local/lib processes := $(patsubst tests/%.c,%_processes,$(wildcard tests/*.c)) threads := $(patsubst tests/%.c,%_threads,$(wildcard tests/*.c)) +threadspawn1_processes_FLAGS+=-lpthread + all: processes threads processes: $(processes) diff --git a/tests/malloc1.c b/tests/malloc1.c index 14d4c3b..05737bb 100644 --- a/tests/malloc1.c +++ b/tests/malloc1.c @@ -12,6 +12,7 @@ void testcase(unsigned long long *iterations, unsigned long nr) while (1) { void *addr = malloc(SIZE); assert(addr != NULL); + asm volatile("" :: "m" (addr)); free(addr); (*iterations)++; diff --git a/tests/malloc2.c b/tests/malloc2.c index c24aceb..e769dd3 100644 --- a/tests/malloc2.c +++ b/tests/malloc2.c @@ -12,6 +12,7 @@ void testcase(unsigned long long *iterations, unsigned long nr) while (1) { void *addr = malloc(SIZE); assert(addr != NULL); + asm volatile("" :: "m" (addr)); free(addr); (*iterations)++;home | help
Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAGudoHFUJ23yUWPq7_VS2ek0zoGQOS42HB00n-hWspA3Cb4-XQ>
