Date: Sat, 11 Dec 2021 20:07:18 +0100 From: Mateusz Guzik <mjguzik@gmail.com> To: Piper H <potthua@gmail.com> Cc: freebsd-current@freebsd.org Subject: Re: Benchmarks: FreeBSD 13 vs. NetBSD 9.2 vs. OpenBSD 7 vs. DragonFlyBSD 6 vs. Linux Message-ID: <CAGudoHFxKP5mELT3ckjG2hOd_BZAidK3W1Y26X43Tgmo_uTHSg@mail.gmail.com> In-Reply-To: <CAGudoHHHTZ-8P_QaKwD%2Bys=mTmKkU%2BkUEPbGHByr%2Bj6THSigig@mail.gmail.com> References: <CA%2BGLnbgVGghYAYPbQfu0H0cGvXxk-v0jAZTxLLz%2BhRn5eXjP0g@mail.gmail.com> <CAGudoHHg-yvoLefgyEo3vo_hy5fpC1WcVYGvjxTPdcavoWUUcA@mail.gmail.com> <CAGudoHHHTZ-8P_QaKwD%2Bys=mTmKkU%2BkUEPbGHByr%2Bj6THSigig@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On 12/11/21, Mateusz Guzik <mjguzik@gmail.com> wrote: > On 12/11/21, Mateusz Guzik <mjguzik@gmail.com> wrote: >> On 12/11/21, Piper H <potthua@gmail.com> wrote: >>> I read this article from Reddit: >>> https://www.phoronix.com/scan.php?page=article&item=bsd-linux-eo2021&num=1 >>> >>> I am surprised to see that the BSD cluster today has much worse >>> performance >>> than Linux. >>> What do you think of this? >>> >> >> There is a lot to say here. >> >> One has to own up to Linux likely being a little bit (or even more so) >> faster for some of the legitimate tests. One, there are certain >> multicore scalability issues compared Linux, which should be pretty >> mild given the scale (16 cores/32 threads). A more important problem >> is userspace which fails to take advantage of SIMD instructions for >> core primitives like memset, memcpy et al. However, if the difference >> is more than few %, the result is likely bogus. Key thing to do when >> benchmarking is being able to explain the result, most notably if you >> run into huge discrepancies. >> >> I had a look at the most egregious result -- zstd and spoiler, it is a >> bug in result reporting in zstd. >> >> I got FreeBSD and Linux (Ubuntu Focal) vms running on: >> Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz >> >> Their zstd test ultimately ends up spawning: zstd -T24 -S -i15 -b19 >> FreeBSD-12.2-RELEASE-amd64-memstick.img (yes, they compress a ~1GB >> FreeBSD image). >> >> Side note, it does not matter, but I happen to have CURRENT kernel >> running on the FreeBSD 13 vm right now. >> >> [16:37] freebsd13:~ # time zstd -T24 -S -i15 -b19 >> FreeBSD-12.2-RELEASE-amd64-memstick.img >> 19#md64-memstick.img :1055957504 -> 692662162 (1.524), 3.97 MB/s ,2156.8 >> MB/s >> zstd -T24 -S -i15 -b19 FreeBSD-12.2-RELEASE-amd64-memstick.img >> 274.10s user 12.90s system 763% cpu 37.602 total >> >> In contrast: >> >> [16:37] ubuntu:...tem/compress-zstd-1.5.0 (130) # time zstd -T24 -S >> -i15 -b19 FreeBSD-12.2-RELEASE-amd64-memstick.img >> 19#md64-memstick.img :1055957504 -> 692662162 (1.524), 60.1 MB/s ,2030.6 >> MB/s >> zstd -T24 -S -i15 -b19 FreeBSD-12.2-RELEASE-amd64-memstick.img >> 328.65s user 3.48s system 850% cpu 39.070 total >> >> This is repeatable. If anything, FreeBSD did it *faster*. Yet zstd >> reports: >> FreeBSD: 3.97 MB/s ,2156.8 MB/s [total time real time of 37.602 seconds] >> Linux: 60.1 MB/s ,2030.6 MB/s [total time real time of 39.070 seconds] >> >> I don't know what these numbers are supposed to be, but it is pretty >> clear Phoronix grabs the first one. >> >> I'll look into sorting this out some time later. >> > > So I cloned https://github.com/facebook/zstd/ and got the v1.4.8 tag, > as currently imported into FreeBSD. The diff is pretty minimal and > deals with exposing extra symbols. > > zstd directly compiled from that source (with mere gmake) correctly > shows 2-digit MB speeds, so it has to be something in the FreeBSD > build which ends up messing with it. I ran out of curiosity at this > point (and more so time) at this point, but I invite someone else to > get to the bottom of this. > > Bottom line though: there is no zstd performance problem on FreeBSD. > Well I had another look at found it: the low number is computed from supposed total time spent on CPU. Compiling by hand gives c11 primitives to do it, while using the FreeBSD source tree lands with c90 which end up giving bogus results. A hack which I can't be bothered to productize pasted below. I can't easily repeat the test with patched zstd on the same box, but on another one this goes from supposed ~3.3MB/s to 70.2MB/s, which assume sorts it out. diff --git a/sys/contrib/zstd/programs/timefn.c b/sys/contrib/zstd/programs/timefn.c index 95460d0d971d..f5dcdf84186e 100644 --- a/sys/contrib/zstd/programs/timefn.c +++ b/sys/contrib/zstd/programs/timefn.c @@ -84,8 +84,7 @@ PTime UTIL_getSpanTimeNano(UTIL_time_t clockStart, UTIL_time_t clockEnd) /* C11 requires timespec_get, but FreeBSD 11 lacks it, while still claiming C11 compliance. Android also lacks it but does define TIME_UTC. */ -#elif (defined (__STDC_VERSION__) && (__STDC_VERSION__ >= 201112L) /* C11 */) \ - && defined(TIME_UTC) && !defined(__ANDROID__) +#else #include <stdlib.h> /* abort */ #include <stdio.h> /* perror */ @@ -133,14 +132,6 @@ PTime UTIL_getSpanTimeNano(UTIL_time_t begin, UTIL_time_t end) return nano; } - - -#else /* relies on standard C90 (note : clock_t measurements can be wrong when using multi-threading) */ - -UTIL_time_t UTIL_getTime(void) { return clock(); } -PTime UTIL_getSpanTimeMicro(UTIL_time_t clockStart, UTIL_time_t clockEnd) { return 1000000ULL * (clockEnd - clockStart) / CLOCKS_PER_SEC; } -PTime UTIL_getSpanTimeNano(UTIL_time_t clockStart, UTIL_time_t clockEnd) { return 1000000000ULL * (clockEnd - clockStart) / CLOCKS_PER_SEC; } - #endif diff --git a/sys/contrib/zstd/programs/timefn.h b/sys/contrib/zstd/programs/timefn.h index 5d2818e8a1b7..2f0a3c58528d 100644 --- a/sys/contrib/zstd/programs/timefn.h +++ b/sys/contrib/zstd/programs/timefn.h @@ -57,17 +57,9 @@ extern "C" { /* C11 requires timespec_get, but FreeBSD 11 lacks it, while still claiming C11 compliance. Android also lacks it but does define TIME_UTC. */ -#elif (defined (__STDC_VERSION__) && (__STDC_VERSION__ >= 201112L) /* C11 */) \ - && defined(TIME_UTC) && !defined(__ANDROID__) - +#else typedef struct timespec UTIL_time_t; #define UTIL_TIME_INITIALIZER { 0, 0 } - -#else /* relies on standard C90 (note : clock_t measurements can be wrong when using multi-threading) */ - - typedef clock_t UTIL_time_t; - #define UTIL_TIME_INITIALIZER 0 - #endif -- Mateusz Guzik <mjguzik gmail.com>
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAGudoHFxKP5mELT3ckjG2hOd_BZAidK3W1Y26X43Tgmo_uTHSg>