ENiysoSYNNU8RDb.z4a8tiOMcpoF7EjE__slzFRIKd0Q2y5oEt4b_MWYAka21D.6L4GcLlKz Pc6lDRtzkPpXZzmWw.10M8660ZMFjIXaVM4sMWSqYA3p5Yd3tvHfuqlfEU_Ip8TW_7vBJLheEXTw iXmwIJcWUBjrivCIQqgvSgm.JppoQtsTnW2zmNNNQa8ZJHetH8nJB2HK_FdFPBMjzadHNDskmYu7 DJovScdI2uRC9rI53M089C0HkFAxJBM2a0g8AFWS.YkgrsgXmZfDQu7EBJMcLL2CoxESumtRYFOY A_8mL70HjpqOgPu1z2KOrtGCsJLz01Mg8bP2ZiNEfKPYlom6flMysa57055jNmlFJ668HzaoeBpW eE2dIya1m7qu4TmxHT5AYkRS1BUwnwTU4DCpxVr3WHdmI_QCbjokldrEjh78hYIKbaqyTSjudHZg A7Ty.qvVKWMwGQa9zKJZPVk81OYkIoN6HaSUBJwQFR47UUprJnegbmdEMvVfQ5POK8vu1w5CwwQW aMDALtYNBFZ3mf66TeOQgBIlFQI7k.DFFFVTspsOMpc__tKMxr8YWlSyAENgDfNQg.IRVQxMVGmK gyombB_JRtBMXulD5uvXqXbghJCazzjTg8n3huHBN.si0b.uDGa4POH8lOUO_A5ooatCzYSBH2Ju E6sV9UdeIVW4rSWE8YG5QSksrbIvlEfGi4AJIX9rAWIxHfpp8JK.f0yTcll782pwLJgCR54qob25 HkvnmqwlQLDsizaWc0kmSBqScjXLbll.HU7UJe8NjHNQqd_JoxxkYOllhss7U5CQn_HdLztePBkb cgmIIBMjSCbSjSFnjQUdEoQ8fMNijssrNDOKPU0aanMvbntyAmstF_9au5ppvd68QecvoiYhrGqu nb2feL02ATtgk1m.qSHTQFDrOa5dksuQRnP2gpa7b8mZ7BQtsxwu7gyYWXs8fv3b8IrzFaPvIyAN r1JzrrDUNFOTAj0iYEgfG1QxSfJmv13dBJ9bXmjMLGWpZUGaAfBqj0LrhKzKIKNZDIVnDPDVVtBH Se8eVHDYSSb59Z5QNTmq2baxUJmG7PLnzc5gg3nDL61yaB3clKfTF2lrXhHQyD1z9kaRDplnU1ar BSTbuDxqmEu1Ou1d2vPgL1jBrzEldZKKFDlipf1BiThGZ58pCvTu0wu5TJxjhjoFuUm2mISVE_Xp k8k15Qir.Lk8CD4_oSu40I8jpbtEScRPXElCfjPgFN7JwO5lPQDJdt.Fa9E1XDGcGWJVBBxbJ.nc WrjeBYN5WyS78EvXoua7c9ok77IVOEch_lJVAiwm_XuraUjfixF6sPG1X23CEJ9EOJfRBmk.nUOD _sF0h.cbDh4ZSeFWqGrkmi35YBW7Vl9IlVNhBDbeSnj8utuP1LPDO.qBfIFhQ1rjCapJ5lrTUHf8 KN.HJIEdtQO02J9b6SML5m4UEPh351MgpQwJO4pGXxYzz1zBYXrtbpM0QlGlG4iUliB_Kxg32s_w a.S4vyqVRbMshrSTfKhdXf3p8ao_hXUcN28cCUnX1xxZwXHd5kPnjFAipr8ZSV9w7AOApAvy3UJm y.x9QmHiWCQb_VY3n7X7uBpfwcv2.E2IVtC4GtABh_eWQE_P3tXTFGZXS4UJU014TRCx2P88bHCg cyZQ- X-Sonic-MF: X-Sonic-ID: bcffc6d8-6d00-4b09-8c8d-82cad06aa4ef Received: from sonic.gate.mail.ne1.yahoo.com by sonic304.consmr.mail.gq1.yahoo.com with HTTP; Mon, 8 Dec 2025 17:24:08 +0000 Received: by hermes--production-gq1-54bf57fc64-7shwr (Yahoo Inc. Hermes SMTP Server) with ESMTPA ID f10c66e0566dec195f4c90d181ab9d13; Mon, 08 Dec 2025 17:24:03 +0000 (UTC) Content-Type: text/plain; charset=utf-8 List-Id: Production branch of FreeBSD source code List-Archive: https://lists.freebsd.org/archives/freebsd-stable List-Help: List-Post: List-Subscribe: List-Unsubscribe: X-BeenThere: freebsd-stable@freebsd.org Sender: owner-freebsd-stable@FreeBSD.org Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3826.700.81\)) Subject: Re: performance regressions in 15.0 [The Microsoft Dev Kit 2023 buildworld took about 6 minutes less time for jemalloc 5.3.0, not more, for non-debug contexts] From: Mark Millard In-Reply-To: Date: Mon, 8 Dec 2025 09:23:52 -0800 Cc: Warner Losh , FreeBSD Current , FreeBSD-STABLE Mailing List , Konstantin Belousov Content-Transfer-Encoding: quoted-printable Message-Id: <33F5F7DE-8DFA-47E2-A890-E07564825D05@yahoo.com> References: <18FB2858-5CBB-4B7A-8089-224A58C6A160@yahoo.com> <19A848A6-0042-4873-B70D-AD6805225B92@yahoo.com> <902C948B-0A4C-48E1-8C6C-1BC7A15209D7@yahoo.com> To: Mateusz Guzik X-Mailer: Apple Mail (2.3826.700.81) X-Spamd-Bar: ---- X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[]; ASN(0.00)[asn:36647, ipnet:98.137.64.0/20, country:US] X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-Rspamd-Queue-Id: 4dQ83c5lKgz3VCR On Dec 8, 2025, at 04:46, Mateusz Guzik wrote: > On Sun, Dec 7, 2025 at 5:19=E2=80=AFPM Mark Millard = wrote: >>=20 >> On Dec 6, 2025, at 19:03, Mark Millard wrote: >>=20 >>> On Dec 6, 2025, at 14:25, Warner Losh wrote: >>>=20 >>>> On Sat, Dec 6, 2025, 3:06=E2=80=AFPM Mark Millard = wrote: >>>>=20 >>>>> On Dec 6, 2025, at 06:14, Mark Millard wrote: >>>>>=20 >>>>>> Mateusz Guzik wrote on >>>>>> Date: Sat, 06 Dec 2025 10:50:08 UTC : >>>>>>=20 >>>>>>> I got pointed at phoronix: = https://www.phoronix.com/review/freebsd-15-amd-epyc >>>>>>>=20 >>>>>>> While I don't treat their results as gospel, a FreeBSD vs = FreeBSD test >>>>>>> showing a slowdown most definitely warrants a closer look. >>>>>>>=20 >>>>>>> They observed slowdowns when using iperf over localhost and when = compiling llvm. >>>>>>>=20 >>>>>>> I can confirm both problems and more. >>>>>>>=20 >>>>>>> I found the profiling tooling for userspace to be broken again = so I >>>>>>> did not investigate much and I'm not going to dig into it = further. >>>>>>>=20 >>>>>>> Test box is AMD EPYC 9454 48-Core Processor, with the 2 systems >>>>>>> running as 8 core vms under kvm. >>>>>>> . . . >>>>>>=20 >>>>>>=20 >>>>>>=20 >>>>>> Both of the below are from ampere3 (aarch64) instead, its >>>>>> 2 most recent "bulk -a" runs that completed, elapsed times >>>>>> shown for qt6-webengine-6.9.3 builds: >>>>>>=20 >>>>>> 150releng-arm64-quarterly qt6-webengine-6.9.3 53:33:46 >>>>>> 135arm64-default qt6-webengine-6.9.3 38:43:36 A somewhat better comparison is now available from the active builds, here quarterly 14.3 to match with the quarterly 15.0 . . . = https://pkg-status.freebsd.org/ampere1/data/143arm64-quarterly/1081574d367= d/logs/qt6-webengine-6.9.3.log shows 14.3 quarterly getting the qt6-webengine-6.9.3 build timing: 38:25:51 on ampere1 with: Host OSVERSION: 1600004 Jail OSVERSION: 1403000 15.0 is definitely the large one. As far as I know ampere1 and ampere3 match for there hardware = configurations. (Not that such information is public so I do not have great evidence.) Given the similarity to 135arm64-default, I will generally not switch to referencing 14.3's timing below, leaving that implicit. >>>>>> For reference: >>>>>>=20 >>>>>> Host OSVERSION: 1600000 >>>>>> Jail OSVERSION: 1500068 >>>>>>=20 >>>>>> vs. >>>>>>=20 >>>>>> Host OSVERSION: 1600000 >>>>>> Jail OSVERSION: 1305000 >>>>>>=20 >>>>>> The difference for the above is in the Jail's world builds, >>>>>> not in the boot's (kernel+world) builds. >>>>>>=20 >>>>>>=20 >>>>>> For reference: >>>>>>=20 >>>>>>=20 >>>>>> = https://pkg-status.freebsd.org/ampere3/build.html?mastername=3D150releng-a= rm64-quarterly&build=3D88084f9163ae >>>>>>=20 >>>>>> build of www/qt6-webengine | qt6-webengine-6.9.3 ended at Sun Nov = 30 05:40:02 -00 2025 >>>>>> build time: 2D:05:33:52 >>>>>>=20 >>>>>>=20 >>>>>> = https://pkg-status.freebsd.org/ampere3/build.html?mastername=3D135arm64-de= fault&build=3Df5384fe59be6 >>>>>>=20 >>>>>> build of www/qt6-webengine | qt6-webengine-6.9.3 ended at Sat Nov = 22 15:33:34 -00 2025 >>>>>> build time: 1D:14:43:41 >>>>>=20 >>>>>=20 >>>>> Expanding the notes to before and after jemalloc 5.3.0 >>>>> was merged to main: beefy18 was the main-amd64 builder >>>>> before and somewhat after the jemalloc 5.3.0 merge from >>>>> vendor branch: >>>>>=20 >>>>> Before: p2650762431ca_s51affb7e971 261:29:13 building 36074 = port-packages, start 05 Aug 2025 01:10:59 GMT >>>>> ( jemalloc 5.3.0 merge from = vendor branch: 15 Aug 2025) >>>>> After : p9652f95ce8e4_sb45a181a74c 428:49:20 building 36318 = port-packages, start 19 Aug 2025 01:30:33 GMT >>>>>=20 >>>>> (The log files are long gone for port-packages built.) >>>>>=20 >>>>> main-15 used a debug jail world but 15.0-RELEASE does not. >>>>>=20 >>>>> I'm not aware of such a port-package builder context for a >>>>> non-debug jail world before and after a jemalloc 5.3.0 merge. >>>>>=20 >>>> A few months before I landed the jemalloc patches, i did 4 or 5 = from dirt buildworlds. The elasped time was, iirc, with 1 or 2%. Enough = to see maybe a diff with the small sample size, but not enough for = ministat to trigger at 95%. I didn't recall keeping the data for this = and can't find it now. And I'm not even sure, in hindsight, I ran a good = experiment. It might be related, or not, but it would be easy enough for = someone to setup a two jails: one just before and one just after. Build = from scratch the world (same hash) on both. That would test it since = you'd be holding all other variables constant. >>>>=20 >>>> When we imported the tip of FreeBSD main at work, we didn't get a = cpu change trigger from our tests that I recall... >>>=20 >>>=20 >>> The range of commits look like: >>>=20 >>> =E2=80=A2 git: 9a7c512a6149 - main - ucred groups: restore a = useful comment Eric van Gyzen >>> =E2=80=A2 git: bf6039f09a30 - main - jemalloc: Unthin = contrib/jemalloc Warner Losh >>> =E2=80=A2 git: a0dfba697132 - main - jemalloc: Update = jemalloc.xml.in per FreeBSD-diffs Warner Losh >>> =E2=80=A2 git: 718b13ba6c5d - main - jemalloc: Add FreeBSD's = updates to jemalloc_preamble.h.in Warner Losh >>> =E2=80=A2 git: 6371645df7b0 - main - jemalloc: Add = JEMALLOC_PRIVATE_NAMESPACE for the libc namespace Warner Losh >>> =E2=80=A2 git: da260ab23f26 - main - jemalloc: Only replace = _pthread_mutex_init_calloc_cb in private namespace Warner Losh >>> =E2=80=A2 git: c43cad871720 - main - jemalloc: Merge from jemalloc = 5.3.0 vendor branch Warner Losh >>> =E2=80=A2 git: 69af14a57c9e - main - jemalloc: Note update in = UPDATING and RELNOTES Warner Losh >>>=20 >>> I've started a build of a non-debug 9a7c512a6149 world >>> to later create a chroot to do a test buildworld in. >>>=20 >>> I'll also do a build of a non-debug 69af14a57c9e world >>> to later create the other chroot to do a test >>> buildworld in. >>>=20 >>> non-debug means my use of: >>>=20 >>> WITH_MALLOC_PRODUCTION=3D >>> WITHOUT_ASSERT_DEBUG=3D >>> WITHOUT_PTHREADS_ASSERTIONS=3D >>> WITHOUT_LLVM_ASSERTIONS=3D >>>=20 >>> I've used "env WITH_META_MODE=3D" as it cuts down on the >>> volume and frequency of scrolling output. I'll do the >>> same later. >>>=20 >>> If there is anything you want controlled in a different >>> way, let me know. >>>=20 >>> The Windows Dev Kit 2023 is booted (world and kernel) >>> with: >>>=20 >>> # uname -apKU >>> FreeBSD aarch64-main-pbase 16.0-CURRENT FreeBSD 16.0-CURRENT = main-n281922-4872b48b175c GENERIC-NODEBUG arm64 aarch64 1600004 1600004 >>>=20 >>> which is from an official pkgbase distribution. So the >>> boot-world is a debug world but the boot-kernel is not. >>>=20 >>> The Windows Dev Kit 2023 will take some time for such >>> -j8 builds and I may end up sleeping in the middle of >>> the sequence someplace. So it may be a while before >>> I've any comparison/contrast data to report. >>>=20 >>=20 >>=20 >> Summary for jemalloc for before vs. at 5.3.0 >> for *non-debug* contexts doing the buildworld : >>=20 >> before 5.3.0: 9754 seconds (about 2.7 hrs) >> with 5.3.0: 9384 seconds (about 2.6 hrs) >>=20 >=20 > While in principle this can accurately reflect the difference, the > benchmark itself is not valid as is. I remind of what started this for my specific messages: On ampere3 : 150releng-arm64-quarterly qt6-webengine-6.9.3 53:33:46 135arm64-default qt6-webengine-6.9.3 38:43:36 A fairly large scale multiplication factor. The test was a cross check on that, at least that is how I interpreted Warner's request and was my purpose in agreeing to do the test. I tried to do what Warner asked. It adds a little data to what he reported. I do not view the result as indicating much more than the two builds are approximately equal for the time taken. I have no reason to care if the timings swapped, for example: same conclusion for the comparison I was making. It would be highly unlikely repeated tests to have variability reach anywhere near the qt6-webengine-6.9.3 scale factor difference. > First, you can't just run it once -- the result needs to be proven > repeatable and profiled.For a build of a that duration, for this few > resources,=20 For comparison to: 150releng-arm64-quarterly qt6-webengine-6.9.3 53:33:46 135arm64-default qt6-webengine-6.9.3 38:43:36 and that size of scale factor, I'd say, yes I can, given the near equality that I got. It is eveidence that the type of test has missed being relevant, other than showing no such systematic scale factor for the type of test. FYI: 32 GiBytes of RAM. 8 cores that are compatible with Cortex-A76 targeting, 4 are X1C and 4 are A78C. USB3 in use, with a U.2 1.4 TB Optane as media, via an adapter. UFS file system. > for all I know the real factor was randomness from I/O. Not for a change of scale to instead be similar to: 53:33:46 vs. 38:43:36 for building qt6-webengine-6.9.3 as far as I can see. > That aside you need a sanitized baseline. =46rom the description it = not > clear to me at all if you are doing the build with the clang perf > regression fixed or not. My result indicate, in part, that it is not a good way to investigate the 53:33:46 vs. 38:43:36 for building qt6-webengine-6.9.3 . I doubt I need a better baseline for that judgment now. I'd need a different type of test activity. > Even that aside, I outlined 3 more regressions: > - slower binary startup to begin with > - slower syscalls which fail with an error > - slower syscall interface in the first place >=20 > Out of the the first one is most important here. Do you expect any combination of those to be a significant part of the scale factor difference for 53:33:46 vs. 38:43:36 for building qt6-webengine-6.9.3 ? > If I was to work on this, I would not claim that we are targeting the same issue, even with Warner's request considered that added what he was targetting. > seeing that the question at hand is whether > the jemalloc update is a problem, I think the specifics of the qt6-webengine-6.9.3 building would need to be the investigative context for what was "at hand" for me. In part that judgement is based on the test I did finding near equality for jemalloc . > I would bypass all of the above and > instead take 14.3 (not stable/14!) as a baseline + jemalloc update on > top. This eliminates all of the factors other than jemalloc itself. I'll note that ampere1 with a 14.3 jail took 38:25:51 for its build of qt6-webengine-6.9.3 . That scale of timing is not specific to 13.5 jail worlds. > building world also seems a little fishy here and it is not clear to > me at all what version have you built The 9xxx sec timings were both building: 69af14a57c9e - main - jemalloc: Note update in UPDATING and RELNOTES = Warner Los (the end of the jemalloc commit sequence). One build was 69af14a57c9e in a chroot rebuilding itself. The other built 69af14a57c9e via: 9a7c512a6149 - main - ucred groups: restore a useful comment Eric van = Gyzen (the just before jemalloc 5.3.0 related commits started) The 2 chroots differ just by which jemalloc version was in use. > -- was the new jemalloc thing > building new jemalloc and old jemalloc building old jemalloc? More > imporantly I would be worried some of the build picks up whatever > jemalloc it finds to use during some of the build. >=20 > I would benchmark this by building a big port (not timing dependencies > of the port, just the port itself -- maybe even chromium or firefox). Using qt6-webengine-6.9.3 would mean using a known to have an issue context, at least for aarch64. But I can not take weeks of time for such an activity. amd64 is messier to compare official builds for because of lack of uniformity across the builder machines and each type of build being done on its own builder machine: no examples of same machine builds both. > That's of course quite a bit of effort and if there is nobody to do > that (or compatible), imo the pragmatic play is to revert the jemalloc > update for the time being. This restores the known working state and > should the update be a good thing it can land for 15.1, maybe fixed > up. 150releng-arm64-quarterly on ampere3: llvm21-21.1.2 : 21:26:14 143arm64-quarterly on ampere1: llvm21-21.1.2 : 15:24:24 Again a notable time ratio. (default/latest would not be a llvm version match.) Some basic looking around does not suggest to me that qt6-webengine-6.9.3 is somehow unique for having notable timing ratios for quarterly on an ampere* . But, as of yet, I've no good evidence for blaming jemalloc as a major contributor to those timing ratios --or for blaming any other specific part of 15.0 . =3D=3D=3D Mark Millard marklmi at yahoo.com