From nobody Sat May 13 19:49:46 2023 X-Original-To: freebsd-arm@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4QJbnl5l6tz4BjQ2 for ; Sat, 13 May 2023 19:50:03 +0000 (UTC) (envelope-from marklmi@yahoo.com) Received: from sonic315-55.consmr.mail.gq1.yahoo.com (sonic315-55.consmr.mail.gq1.yahoo.com [98.137.65.31]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4QJbnk3VXbz3q5X for ; Sat, 13 May 2023 19:50:02 +0000 (UTC) (envelope-from marklmi@yahoo.com) Authentication-Results: mx1.freebsd.org; dkim=pass header.d=yahoo.com header.s=s2048 header.b=MClAJ06K; spf=pass (mx1.freebsd.org: domain of marklmi@yahoo.com designates 98.137.65.31 as permitted sender) smtp.mailfrom=marklmi@yahoo.com; dmarc=pass (policy=reject) header.from=yahoo.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1684007400; bh=D9ZZ98OvpP07rF5bUgqj3Nsj9a7XWRVajXmRYEQbgRo=; h=From:Subject:Date:References:To:In-Reply-To:From:Subject:Reply-To; b=MClAJ06KC5/PFrgowKqpkNQa2b9gGgFTiN9T1y2wsuQHRzNi9Ju5TrfFRNKAEIkd/8RtEQagUwXt6lW7PkUC7FWF0B+n2ns5g3gUuqLeCo0TBk0wnLij9ydNdlgL9aobrRNHoQGg+3pMU11p5Qar6+HoPHkH0UN7AFbcZegDdgk7HsHuS+8EUV3IaeeZFjN82GVVRge+pIehbXRS4RoVvrteIPu+QePmkWVww8e8ZqoiPp0sAoZdNYAy6IBGBM/IqLRNu0ayycYwd5DVC+o2RyFUNQYXMF+ft/fAnpIL2KdTNS7dAn3bF+f/tfpNhbL3qqRZaXr6VgTVe6uTAtlnBw== X-SONIC-DKIM-SIGN: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1684007400; bh=Vzcvk3hK/EDvanpWT8qr6UhEFYq9LeaCY4I1wFLG6V8=; h=X-Sonic-MF:From:Subject:Date:To:From:Subject; b=KUz8MKBknGEW0ffDuL3VuDI7LF0uQdHeyDXJdMkdkj/fu56NeTDTlf1u7xMGEfrLQ1MWqk9Eom6p6t1FGE832DHnYV26AKTDFRBdelN+rWHs/Gimboc0abDsdTV9YE9AsC2RxR2U5IPQzLYj5qQlzTw8gApkrQr6RHjkyN1JUMfCyHmDuCyCIouECdlGaeOzW9OnN6wE9H9yQqNmWI0lslFrpGXdWooC3dmuSzQPJqESh7u0vEbj8Ttzau7Df4Q48ImBeN4dlr/9TAQiOcGzlvEbMBF+eTs6y/ZN0LpIA3CvKYsCk1xjjR25wd7axi6+36FcXH92c35kpKaPxi4uWQ== X-YMail-OSG: UyOK7xMVM1lkIpjnBiH0PPj0qQn3okrzgK2aYffx6YrQTRpvSRUMPZiJa.D73_z FqNWu3IMuJWzoH_urIRdqYa0_Xp.YyPG7.OBTwF.acJGoFWMGhZOnyA84iAJNZsZL_XdaCT8F484 r4Ml2XNwh9jVPsTRPo6zdYVc5eeITH021xj7xSERcPT3Ka0GBsHNdpdKDB6nPp_6rECwkoKGsH4I pNjkDb07rU50l1yweRGSFQ2QpVWYt_ZU7Im39wSkJReLKuKIWH2EKtmLFZIg6lf2MKSH62UloGv_ 0y7T4f_OMxv4ZNlyST6lvqI9ysY.SY6q2ygAK_N_BhmGy4uK4.fR1TNMlPrVhrJELyXtdTiC_Fve MHMQ5IuTVIWYvlHwzyAVCGdlFE5gc8USh9zdBMe1a6rY9SRjzDOYGg_QKuoXz7R5YNjntVpMHFJr iZK_.ozor.wd4AprDMdA.ARPpx_zP1QlHIL4dUwCdHIZ8T8YxJqCJbxaBVAW.GVSOK7PgphChZxz 6SE9VVxdHXCD.lzWBtBcm6deY0F80BQBSaSvC93lttT5tzZxEKdyTbl7C8XvxBP8vi9HhaPP0i3w yy5SFgbwOTeHVkKKbfnWbarrUgn3FfMu40ILOVcM6wIrzOln3yRH24VnueDhIKwoOmLB_XUXGlax Kk0Y_2g3E9eyigR6JRu0rw4ACiEvyV.PbOXI_KyWoX5jq7r0g4Okh8KoLFoxguwddPATjRP3blXK JLx5c8XUB5X649Mywjz0EPY.HtvhHaJmpgUo0Q2fW4G5696qj0KTJYZq9RCRpJm4DZtyDd229xMj p0JMjxkHDx3N.Tet5ldd0b6EhUgdX3JXcu.aWOh4KiZObW9IBmYESyWo6HUbZVo4n.nQsSEvhjIK fJYqQc59MT35v6N2AbRq7vlJHPV8niLnxUAGLyaI9LwLTvkmaCzpojMe7mghuZPvHIAZVKvce2VG t8hFDNAO7yMG22QakoM5mQNK9A_mV3tBT9D60u8PmHgG6EcX1eYD30fEOxCJ3OWz7UL4GUE578KY FEOAzIxIyZkmWF.4TGWLQZhDCy3Q0dVtEvwcaSXEnSg2Mk9iR.LhD6X_PG89tf26mawWtb095b3k Wer_IwU.sSY5bIleT1SqkoyFZa2vgBtHO1sAhuFZt_qG_yCk_gkwBWICbmXqCcws79icwLal.CtX 3toRtuCmDIWRQfQ2KHwwm1N6rsKQc1XX89YAmp4szMLjFdceiK6A3Vb5P6_e2qPxLbx0RTBQaMBu XPTUFUSGqmF_Z32JmhDrAV6QZaDU1.AUxNv1rPPL3zN__QO_HBxHAVse8bOV34TAAflPdbXqpQgL FnYX5ijre78eecYndpETrclLOKPL18dTMURfvVjYVzg4v925Sl3w_IM2Om1JkKNhj_QBYm_mmYKm n3i1LP1m9XncmZIgMr6O3qATM0HRNgv8EYd3x_rQ9RkDD.loAOQUFTyTYt11_A_m99VPV02JTb7u D9SoVhuml3lHP533Gxk3metJ5XtU6bIB8mVGxS3nAg8I0Y2tAIr2KV2vhQfvSQJ2eTqOscCfZCoh 15WLraal31WmrdWXYdnn5iuwvSV3bc.zf3gDeECDrtyP0BBz6ErPpPCs0vZLHmaqJAKL9X9Uxix8 THvJS100EVllxdXHfyZXhYI4sn2BvS8ibaWbmkJKgsMFDH1tVYqOClWvifkD_C3TPwTpfqix0YZk YSF3dfIHSGKtjguZyG8.Hki08iDawRYBg3F67Qjrqw4Kuq9Jv228fn.FD2pERvb4fibazXG3ybAf jZ0g3juC.fc87P81De_SA57JnJ_8dzDxtnFEC93b9Z1OwkzfHTVcc8gZZNG6DjzWePyqw4dRBP3j dGTUDYsC32Afu_POGYYtVTTdN.Amp1z1.95II3nEfzl_k4Fkp9DZUy9HelV1HsC6v1eeDxZTAKpc A5oLOth7TBWdX7I0WRyKioMNnoNstiJt68SgWYjSuIoMwJegrQiAi1ORMF0RdBwhcFYUxCxCWBHu mafPEVcuNFeSY.yKF_O_Iq6I_VLMCPrX3PSubFnMfktQytwz5ZY2is8MyqV.CFItU23TE5ZN3Y_O K1U4Pq0ypclQD2rfvziR6V3DtbBJ5AsT6JG5ihCLTuqqc9HbnR0NiVtvEuZY0NVu9Z.lsCvzQyPh IigJauSNmj5sITyKDLXCLMykpD4kJOJNqKfoYxBkXv53ycCE6y9lXusS7NNuSe2F0DJ.192zJKwq s_1IIjZWBosE6S7isCWjG5ach6pJ8tGbGDwy5b_RxRg_ZrZopwtBAgwqfyDcb_LTd X-Sonic-MF: X-Sonic-ID: 01ed7d9c-924a-45be-ba70-aa255590a955 Received: from sonic.gate.mail.ne1.yahoo.com by sonic315.consmr.mail.gq1.yahoo.com with HTTP; Sat, 13 May 2023 19:50:00 +0000 Received: by hermes--production-gq1-6db989bfb-ppvpv (Yahoo Inc. Hermes SMTP Server) with ESMTPA ID 22e7a63e5ae8afac8b4854511822cc89; Sat, 13 May 2023 19:49:57 +0000 (UTC) From: Mark Millard Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable List-Id: Porting FreeBSD to ARM processors List-Archive: https://lists.freebsd.org/archives/freebsd-arm List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-arm@freebsd.org Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3731.400.51.1.1\)) Subject: Re: -mcpu= selections and the Windows Dev Kit 2023: example from-scratch buildkernel times (after kernel-toolchain) Date: Sat, 13 May 2023 12:49:46 -0700 References: <3B5EB0DD-E9CB-41BD-9BCC-6549BBF0C0DA@yahoo.com> <6196193E-4A75-464C-AB0B-AE2C3BC00D66@yahoo.com> To: freebsd-arm In-Reply-To: <6196193E-4A75-464C-AB0B-AE2C3BC00D66@yahoo.com> Message-Id: <049ED1F8-CA62-4564-8635-4EFCF008ED9D@yahoo.com> X-Mailer: Apple Mail (2.3731.400.51.1.1) X-Spamd-Result: default: False [-2.97 / 15.00]; NEURAL_HAM_LONG(-1.00)[-1.000]; NEURAL_HAM_MEDIUM(-0.99)[-0.994]; DMARC_POLICY_ALLOW(-0.50)[yahoo.com,reject]; MV_CASE(0.50)[]; NEURAL_HAM_SHORT(-0.47)[-0.475]; R_DKIM_ALLOW(-0.20)[yahoo.com:s=s2048]; R_SPF_ALLOW(-0.20)[+ptr:yahoo.com]; MIME_GOOD(-0.10)[text/plain]; ARC_NA(0.00)[]; FROM_HAS_DN(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; TO_MATCH_ENVRCPT_ALL(0.00)[]; DWL_DNSWL_NONE(0.00)[yahoo.com:dkim]; RCPT_COUNT_ONE(0.00)[1]; MID_RHS_MATCH_FROM(0.00)[]; ASN(0.00)[asn:36647, ipnet:98.137.64.0/20, country:US]; RCVD_IN_DNSWL_NONE(0.00)[98.137.65.31:from]; TO_DN_ALL(0.00)[]; RCVD_COUNT_THREE(0.00)[3]; FREEMAIL_FROM(0.00)[yahoo.com]; RCVD_TLS_LAST(0.00)[]; DKIM_TRACE(0.00)[yahoo.com:+]; MIME_TRACE(0.00)[0:+]; FROM_EQ_ENVFROM(0.00)[]; FREEMAIL_ENVFROM(0.00)[yahoo.com]; MLMMJ_DEST(0.00)[freebsd-arm@freebsd.org] X-Rspamd-Queue-Id: 4QJbnk3VXbz3q5X X-Spamd-Bar: -- X-ThisMailContainsUnwantedMimeParts: N On May 13, 2023, at 01:50, Mark Millard wrote: > On May 13, 2023, at 01:28, Mark Millard wrote: >=20 >> While the selections were guided by some benchmark like >> explorations, the results for the Windows Dev Kit 2023 >> (WDK23 abbreviation) go like: >>=20 >>=20 >> -mcpu=3Dcortex-a72 code generation produced a (non-debug) >> kernel/world that, in turn, got (from scratch buildkernel after >> kernel-toolchain): >>=20 >> Kernel(s) GENERIC-NODBG-CA72 built in 597 seconds, ncpu: 8, make -j8 >>=20 >> (The rest of the aarch64 that I've access to is nearly-all cortex-a72 >> based, the others being cortex-a53 these days. So I was seeing how >> code tailored for the cortex-a72 context performed on the WDK23. >> cortex-a72 was my starting point with the WDK23.) >>=20 >>=20 >> -mcpu=3Dcortex-x1c+flagm code generation produced a (non-debug) >> kernel/world that, in turn, got (from scratch buildkernel after >> kernel-toolchain): >>=20 >> Kernel(s) GENERIC-NODBG-CA78C built in 584 seconds, ncpu: 8, make = -j8 >>=20 >> NOTE: "+flagm" is because of various clang/gcc having an inaccurate >> set of features that omit flagm --and I'm making sure I've got it >> enabled. -mcpu=3Dcortex-a78c is even worse: it has examples of = +fp16fml >> by default in some toolchains --but neither of the 2 types of core = has >> support for such. (The cortex-x1c and cortex-a78c actually have = matching >> features for code generation purposes, at least for all that I looked >> at. Toolchain mismatches for default features are sufficient evidence >> of an error in at least one case as far as I can tell.) >>=20 >> This context is implicitly +lse+rcpc . At the time I was not being >> explicit when defaults matched. >>=20 >> Notes: >> "lse" is the large system extension atomics, disabled below. >> "rcpc" is the extension having load acquire and store release >> instructions. (rcpc I was explicit about below, despite the >> default matching.) >>=20 >>=20 >> -mcpu=3Dcortex-x1c+flagm+nolse+rcpc code generation produced a >> (non-debug) kernel/world that, in turn, got (from scratch buildkernel >> after kernel-toolchain): >>=20 >> Kernel(s) GENERIC-NODBG-CA78CnoLSE built in 415 seconds, ncpu: 8, = make -j >>=20 >> Note: My explorations so far have tried the world combinations of >> lse and rcpc status but with a kernel that was based on >> -mcpu=3Dcortex-x1c+flagm . I then updated the kernel to match the >> -mcpu=3Dcortex-x1c+flagm+nolse+rcpc and used it to produce the above. >> So there is more exploring that I've not done yet. But I'm not >> expecting decreases to notably below the 415 sec. >>=20 >> The benchmark like activity had showed that +lse+rcpc for the >> world/benchmark builds lead to notable negative consequences for >> cpus 0..3 compared to the other 3 combinations of status. For >> cpus 4..7, it showed that +nolse+rcpc for the world/benchmark >> builds had a noticeable gain compared to the other 3 combinations. >> This guided the buildkernel testing selections done so far. The >> buildkernel tests were, in part, to be sure that the apparent >> consequences were not just odd consequences for time measurements >> that could mess up benchmark result comparisons being useful. >>=20 >>=20 >> For comparison to a standard FreeBSD non-debug build, I used a >> snapshot download of: >>=20 >> = http://ftp3.freebsd.org/pub/FreeBSD/snapshots/ISO-IMAGES/13.2/FreeBSD-13.2= -STABLE-arm64-aarch64-ROCK64-20230504-7dea7445ba44-255298.img.xz >>=20 >> and dd'd it to media, replaced the EFI/*/* with ones that >> work for the Windows Dev Kit 2023, booted the WDK23 with the media, >> copied over my /usr/*-src/ to the media, did a "make -j8 = kernel-toolchain", >> from the /usr/main-src/ copy and finally did a "make -j8 buildkernel" >> (so, from-scratch, given the toolchain materials are already in = place): >>=20 >> Kernel(s) GENERIC built in 505 seconds, ncpu: 8, make -j8 >>=20 >> ( /usr/main-src/ has the source that the other buildkernel timings >> were based on. ) >>=20 >>=20 >> Looks like -mcpu=3Dcortex-a72 and -mcpu=3Dcortex-x1c+flagm are far = from >> a good fit for buildkernel workloads to run under on the WDK23. = FreeBSD >> defaults and -mcpu=3Dcortex-x1c+flagm+nolse+rcpc seems to be better = fits >> for such use. >>=20 >>=20 >> Note: This testing was in a ZFS context, using bectl to advantage, in >> case that somehow matters. >>=20 >>=20 >> For reference: >>=20 >> # grep mcpu=3D /usr/main-src/sys/arm64/conf/GENERIC-NODBG-CA78C >> makeoptions CONF_CFLAGS=3D"-mcpu=3Dcortex-x1c+flagm+nolse+rcpc" >>=20 >> # grep mcpu=3D ~/src.configs/*CA78C-nodbg* >> XCFLAGS+=3D -mcpu=3Dcortex-x1c+flagm+nolse+rcpc >> XCXXFLAGS+=3D -mcpu=3Dcortex-x1c+flagm+nolse+rcpc >> ACFLAGS.arm64cpuid.S+=3D -mcpu=3Dcortex-x1c >> ACFLAGS.aesv8-armx.S+=3D -mcpu=3Dcortex-x1c >> ACFLAGS.ghashv8-armx.S+=3D -mcpu=3Dcortex-x1c >>=20 >> # more /usr/local/etc/poudriere.d/main-CA78C-make.conf >> CFLAGS+=3D -mcpu=3Dcortex-x1c+flagm+nolse+rcpc >> CXXFLAGS+=3D -mcpu=3Dcortex-x1c+flagm+nolse+rcpc >> CPPFLAGS+=3D -mcpu=3Dcortex-x1c+flagm+nolse+rcpc >> RUSTFLAGS_CPU_FEATURES=3D -C target-cpu=3Dcortex-x1c -C = target-feature=3D+x1c,+flagm,-lse,+rcpc >=20 > Note: RUSTFLAGS_CPU_FEATURES is something that I added to my > environment to allow the experiment: >=20 > # git -C /usr/ports/ diff Mk/Uses/cargo.mk > diff --git a/Mk/Uses/cargo.mk b/Mk/Uses/cargo.mk > index 50146372fee1..2f21453fd02b 100644 > --- a/Mk/Uses/cargo.mk > +++ b/Mk/Uses/cargo.mk > @@ -145,7 +145,9 @@ WITH_LTO=3D yes > . endif > # Adjust -C target-cpu if -march/-mcpu is set by bsd.cpu.mk > -. if ${ARCH} =3D=3D amd64 || ${ARCH} =3D=3D i386 > +. if defined(RUSTFLAGS_CPU_FEATURES) > +RUSTFLAGS+=3D ${RUSTFLAGS_CPU_FEATURES} > +. elif ${ARCH} =3D=3D amd64 || ${ARCH} =3D=3D i386 > RUSTFLAGS+=3D ${CFLAGS:M-march=3D*:S/-march=3D/-C target-cpu=3D/} > . elif ${ARCH:Mpowerpc*} > RUSTFLAGS+=3D ${CFLAGS:M-mcpu=3D*:S/-mcpu=3D/-C = target-cpu=3D/:S/power/pwr/} >=20 >> diff --git a/secure/lib/libcrypto/Makefile = b/secure/lib/libcrypto/Makefile >> index 8fde4f19d046..e13227d6450b 100644 >> --- a/secure/lib/libcrypto/Makefile >> +++ b/secure/lib/libcrypto/Makefile >> @@ -22,7 +22,7 @@ SRCS+=3D mem.c mem_dbg.c mem_sec.c o_dir.c = o_fips.c o_fopen.c o_init.c >> SRCS+=3D o_str.c o_time.c threads_pthread.c uid.c >> .if defined(ASM_aarch64) >> SRCS+=3D arm64cpuid.S armcap.c >> -ACFLAGS.arm64cpuid.S=3D -march=3Darmv8-a+crypto >> +ACFLAGS.arm64cpuid.S+=3D -march=3Darmv8-a+crypto >> .elif defined(ASM_amd64) >> SRCS+=3D x86_64cpuid.S >> .elif defined(ASM_arm) >> @@ -43,7 +43,7 @@ SRCS+=3D mem_clr.c >> SRCS+=3D aes_cbc.c aes_cfb.c aes_ecb.c aes_ige.c aes_misc.c aes_ofb.c = aes_wrap.c >> .if defined(ASM_aarch64) >> SRCS+=3D aes_core.c aesv8-armx.S vpaes-armv8.S >> -ACFLAGS.aesv8-armx.S=3D -march=3Darmv8-a+crypto >> +ACFLAGS.aesv8-armx.S+=3D -march=3Darmv8-a+crypto >> .elif defined(ASM_amd64) >> SRCS+=3D aes_core.c aesni-mb-x86_64.S aesni-sha1-x86_64.S = aesni-sha256-x86_64.S >> SRCS+=3D aesni-x86_64.S vpaes-x86_64.S >> @@ -278,7 +278,7 @@ SRCS+=3D cbc128.c ccm128.c cfb128.c ctr128.c = cts128.c gcm128.c ocb128.c >> SRCS+=3D ofb128.c wrap128.c xts128.c >> .if defined(ASM_aarch64) >> SRCS+=3D ghashv8-armx.S >> -ACFLAGS.ghashv8-armx.S=3D -march=3Darmv8-a+crypto >> +ACFLAGS.ghashv8-armx.S+=3D -march=3Darmv8-a+crypto I'll probably not do any more exploring of kernel vs. world cortex-x1c/cortex-a78c feature use vs. not combinations. My "-mcpu=3Dcortex-x1c+flagm context" based from scratch build of my ports took somewhat over 15 hrs on the WDK23: [main-CA78C-default] [2023-05-10_01h26m04s] [committing:] Queued: 480 = Built: 480 Failed: 0 Skipped: 0 Ignored: 0 Fetched: 0 Tobuild: 0 = Time: 15:08:47 Beyond using a -mcpu=3Dcortex-x1c+flagm+nolse+rcpc based context now, I've also recently changed the build sequence to use 2 stages to help avoid a long-tail-of-build being largely one process at a time (single thread) time: poudriere bulk -jmain-CA78C -w -f ~/origins/build-first.txt poudriere bulk -jmain-CA78C -w -f ~/origins/CA78C-origins.txt # more ~/origins/build-first.txt=20 devel/binutils devel/boost-jam devel/llvm16 devel/llvm15 lang/rust (Actually my test was without boost-jam being listed. I added that after the test. I also later added PRIORITY_BOOST=3D"boost-libs" to etc/poudriere.conf . CA78C-origins.txt also lists those port origins, along with the rest of the things I explicitly want built.) The above, in my context, happens to lead to devel/boost-libs building in parallel with other activity. I use a high-load-average-allowed style of building ports into packages: ALLOW_MAKE_JOBS=3Dyes and the default number of builders, so up to 8 on the WDK23. Also: USE_TMPFS=3Dall (based on about 118 GiBytes of swap, so RAM+SWAP approx=3D 150 GiBytes. Observed swap use got up to a little under 13 GiBytes but was not thrashing.) (This style would not scale well at some point but works for what I have access to, even the ThreadRipper 1950X with its 128 GiBytes of RAM and 32 FreeBSD "cpus". It has more swap configured.) Those, combined with the -mcpu=3Dcortex-x1c+flagm+nolse+rcpc use, has from-scratch port builds down to a slightly over 10 hours on the WDK23: [main-CA78C-default] [2023-05-13_01h31m02s] [committing:] Queued: 99 = Built: 99 Failed: 0 Skipped: 0 Ignored: 0 Fetched: 0 Tobuild: 0 = Time: 05:53:58 [main-CA78C-default] [2023-05-13_07h25m03s] [committing:] Queued: 381 = Built: 381 Failed: 0 Skipped: 0 Ignored: 0 Fetched: 0 Tobuild: 0 = Time: 04:07:07 This context was ZFS. I've not done a UFS-context test yet. =3D=3D=3D Mark Millard marklmi at yahoo.com