From nobody Sat May 13 08:28:18 2023 X-Original-To: freebsd-arm@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4QJJgT5kM0z4B6PW for ; Sat, 13 May 2023 08:28:37 +0000 (UTC) (envelope-from marklmi@yahoo.com) Received: from sonic304-23.consmr.mail.gq1.yahoo.com (sonic304-23.consmr.mail.gq1.yahoo.com [98.137.68.204]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4QJJgN21Pcz3QvC for ; Sat, 13 May 2023 08:28:32 +0000 (UTC) (envelope-from marklmi@yahoo.com) Authentication-Results: mx1.freebsd.org; dkim=pass header.d=yahoo.com header.s=s2048 header.b=asvvUpr1; spf=pass (mx1.freebsd.org: domain of marklmi@yahoo.com designates 98.137.68.204 as permitted sender) smtp.mailfrom=marklmi@yahoo.com; dmarc=pass (policy=reject) header.from=yahoo.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1683966510; bh=VDddJIAAnMH5d/BEu9R1EHrYC0J4fIKkhdMRucrLgY4=; h=From:Subject:Date:To:References:From:Subject:Reply-To; b=asvvUpr10nA8EPdl237vT74Vpxg4/WHdEk36ykb3dG3CIYcNGOnD0ftOSV60p9qCWTF9eXAb5jlWyvevP7vWVEUPQP4nJOzyl5QaIM8k9pX441OuWKluYqQuUOMJ1rh5/cltjVZrUSZA4Ubp/j9MbeUHLjL7UL6QlCAIW4R4C7WQ3YNc3IjvfAgFoDqhunTenNWhwqeknRrr7R2Faln04Vc5VMr7htf/vPkRSS4m6Pzqn+FT0h444rvpbQIXjG8JlWWYTpHPCp2ADvyDh4bbZcObgVsSuG6EeOz0ZPJEjcE21Y9uwyS5mltRPB0sLijeIOr8drEj+gGB8Dm5bEHR/w== X-SONIC-DKIM-SIGN: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1683966510; bh=2zI1YJVxCPa/+FRHjJA6F8NpJQsQpcTONDe4V5Qj3Kb=; h=X-Sonic-MF:From:Subject:Date:To:From:Subject; b=MLo3xab923qvl2vQL2eXuqmC0/+rliGs/3Ocbqs8cXu3VwhhF7Rg1NecR6CWG7JmX/eQ9rpnTE6hozHbZe/heyrRQQRPuVYb3YUc7fIpPeghv5qUqiS8CD4z/7kW8q5qGdagvhnQaBY++uam/F/hlwD+YWkwoSVGF/MIYrO7DApVGqdlNR73nfEtJk+MVI5CJhWrcjJphnqWCn0DDE9dJXuY4sfOc0cywYGXXmE0aZXlT3h5lY6aMnhgS41TzxGK3w6Pd1noVRh0v70HW6rz7yPAgMnw4jpbEmai17W/IvvvhOfN4CZvyp5JMIqGbhzjQb1MYiHK02dRDx/9JzTO5w== X-YMail-OSG: oOb0Z4gVM1lB9YuVHIh9Qg1tESKQ1NsoACVFom49kCziyKOPkQsBU2Q0rH2sYt1 TXg5BcdCSPaAatvzGH1CrS.SprHzeAZW0sXJZfvBgLzIVUFgO87m49EUUs8W_XeLbLUpB3FEEu2v ZS7emUJNL6BjHejj75nuSVhMzeIlEgzSfrFHMKJ29cSCyDJzx3RpQTQKuhgQwUdn4B9q6XcsKmd6 LDSLzE9YBEY0eq0kP4WSDR0zMfY0kenFTm2YWN8XmG4hJtWroe8H8n6zMHW9n1cTd7LVeE9umIMD LCl03ACiTQEhNDzgYMk11tfr_oc9YDVWIi8d7ec7yNruRXD2nGmZOdWGRlN.SJJSfm0C9huf_CzJ 4RAYxq_h7m9obIm2WjB67x.jV4iyGG9PIDEZ0CJhfkmlC467axvku9MFUa94ZLVI.5ZM_R72UZe1 eoe85rFiIEO7j8_MWKfDrFXQAPRWv5sfqFPNu.aaMfoOe5riSVDeFUjzQLnOPGMvLUjaTDK2TckK W5pGU1MnAQvAXBoEoBSHUgKOulwVuRw.xJHtWeJDA4IhA0ILZvszqMDnlRejUolz.a82Dj5D8_rA bGU9UTIrb8f.w6vXSqypxja9.1GIPZMbYHxDiA_Ta9TyKsJTYZ6cpZNiD2RXZuC.ZRIrSkpzyVcb Ouen5LPmHrCApoUSjrb7sUqN36JZwH2tn0hkv7SkzekCc2qi._R7iV4W64AjZirQi..eMm2KDhFg OTnaPbdw0DcsLnaom2n9gqla0YV3OUv.xUvAdCLC.Kqw6LikffxJf7RbZfoQ90UVjrmNFBQifQ4D 1RRBKSn16r7Tas0OOe0Eq0xNEXR_wapXik8jJuF9GOiU4gdYbU5RA_b5KKtG3hT19rpeHA5mPjKE Z375WAFmF2TORAatdZ6UlYJry2nuU6NlRL_1HPV1NKBcufFQ6_bUdpXrVh.HbrmuMj00lBCzQe0y bVN65.q.tEHGGX_JEaPOCykBCqYGBPSPjcb4RGrvZFiuHA0BP0Demid3_aCe0pWARFegQ2SmhKdQ TaHpNRFfj9cMGtd60NZM3H.iTUkrkGdITt__ZFxqiSoJdcYN4vavsb8dNyvnp5m8S16nzTVmuIeF hgiv5QCI_WHKURX8_85uq9Wy9zGgTclWHYg1O3162a8CVTGch313klKKmWs6r7Q5L_pxXv15QduQ b.ptP.BedLCjjFLf.bzQh0oqO37IUxj83RCVR6unQ53QACJNYbkNZYKvoAXonXl128l5rqPa2Vk9 b.d5ZF2nNpsMxTlACbWx1HDYfTpQKMcsnTbC.VCvgYkZDl1PHaUB7dCWg_2xKO5DmDtW2LVS6Fu5 _BQpm7g7eUa8UEYvD3kkAHBN0jDOM4cWyWIxADFsUik1AWhvqfWZOrnjybEiCQ3dq2VXsZOjNGMq w9qy7P1Oqiu95QyBA1ELp8E1I2gTmaoGWvV5c.2wqY6G6OElEY32g2n9ekTDnc2RCM3AgJQdAnSD PHPbGGnn7R875.zQyiLJpB6i7hChRVMhiu8_7vG5qFIHS76i5X4zGGvyAa_UZErVMFmgbK3vgjPc 6VlrXxEzFQyLMkNHT6DhXu_w.2mtErOSf4EekVDoVaWgxoqKTSdn12dMLf3e1O_45N0JqxIVF97e hC6Gun2BnPq27tUfQczKt32U.vc6khCSWzgkcshY0shkZmNp2F9w3CJqNWYP18wl.XsCNfX5koCi WZHtMM3H9TBTnktbinz01zHkJsZJOHVUgcDu826xp8qBRqmwwUod1MmUUQt1auKTc_pjqE1VkLEX zGo4RDAgbEw.PMxDV7E5Z_r3VfyqKyIU2woOb4.Sfe4j7qLVf_XJJTpvAhD.5UDS6_tBGGfCDdDJ KxOO7yKJglZ8oZ6dBRe3TDRTJ1FrBvvIQVJTHoPosrga0rYJon2hvZE4W21TYIAqIKtNR5.nty8v ycobtX0Uxo8JNGDmxZvIzY99coje_OGthQ8Bu8WSuAf97QM8zNUYNHqQot3pBw8ZY0xOn1_lwtYk Nf.yrC8kcok6TRJpneFwlQM9VrijSjk0HpBDwjD2mPlQSaYMQcimhp11ZbEI74tzancBtgan2ecm _lZijaxzlNNTHoS0fymMtfV2rRGtZHzC1w_x4qnvCnRBe67xbpnoYaWJVtQ26hf.0EBnTuperJZm MaaOriK7hTcABZc7GOJD3qIAmvQ7zYkxpJh1d7ldRtlYwOsEtPkME0Aq6tG4gaN2R.mr7OTWUlUP ghUlp06lLRPuAPF9QnYa7L68JODI31Brza8HyXJj0VHi6GSK5amSXIUli9y_c X-Sonic-MF: X-Sonic-ID: c1068968-3059-4aff-aab4-6e1b37b558e0 Received: from sonic.gate.mail.ne1.yahoo.com by sonic304.consmr.mail.gq1.yahoo.com with HTTP; Sat, 13 May 2023 08:28:30 +0000 Received: by hermes--production-gq1-6db989bfb-c6sbx (Yahoo Inc. Hermes SMTP Server) with ESMTPA ID 3a52a5be89c98dcf2c231d7500a0228b; Sat, 13 May 2023 08:28:29 +0000 (UTC) From: Mark Millard Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable List-Id: Porting FreeBSD to ARM processors List-Archive: https://lists.freebsd.org/archives/freebsd-arm List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-arm@freebsd.org Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3731.400.51.1.1\)) Subject: -mcpu= selections and the Windows Dev Kit 2023: example from-scratch buildkernel times (after kernel-toolchain) Message-Id: <3B5EB0DD-E9CB-41BD-9BCC-6549BBF0C0DA@yahoo.com> Date: Sat, 13 May 2023 01:28:18 -0700 To: freebsd-arm X-Mailer: Apple Mail (2.3731.400.51.1.1) References: <3B5EB0DD-E9CB-41BD-9BCC-6549BBF0C0DA.ref@yahoo.com> X-Spamd-Result: default: False [-3.38 / 15.00]; NEURAL_HAM_LONG(-1.00)[-1.000]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; NEURAL_HAM_SHORT(-0.89)[-0.885]; DMARC_POLICY_ALLOW(-0.50)[yahoo.com,reject]; MV_CASE(0.50)[]; R_SPF_ALLOW(-0.20)[+ptr:yahoo.com]; R_DKIM_ALLOW(-0.20)[yahoo.com:s=s2048]; MIME_GOOD(-0.10)[text/plain]; FROM_HAS_DN(0.00)[]; TO_MATCH_ENVRCPT_ALL(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; RCPT_COUNT_ONE(0.00)[1]; ARC_NA(0.00)[]; RCVD_IN_DNSWL_NONE(0.00)[98.137.68.204:from]; BLOCKLISTDE_FAIL(0.00)[98.137.68.204:query timed out]; DWL_DNSWL_NONE(0.00)[yahoo.com:dkim]; RCVD_TLS_LAST(0.00)[]; MLMMJ_DEST(0.00)[freebsd-arm@freebsd.org]; TO_DN_ALL(0.00)[]; RCVD_COUNT_THREE(0.00)[3]; FREEMAIL_FROM(0.00)[yahoo.com]; MID_RHS_MATCH_FROM(0.00)[]; DKIM_TRACE(0.00)[yahoo.com:+]; MIME_TRACE(0.00)[0:+]; FROM_EQ_ENVFROM(0.00)[]; ASN(0.00)[asn:36647, ipnet:98.137.64.0/20, country:US]; FREEMAIL_ENVFROM(0.00)[yahoo.com]; RWL_MAILSPIKE_POSSIBLE(0.00)[98.137.68.204:from] X-Rspamd-Queue-Id: 4QJJgN21Pcz3QvC X-Spamd-Bar: --- X-ThisMailContainsUnwantedMimeParts: N While the selections were guided by some benchmark like explorations, the results for the Windows Dev Kit 2023 (WDK23 abbreviation) go like: -mcpu=3Dcortex-a72 code generation produced a (non-debug) kernel/world that, in turn, got (from scratch buildkernel after kernel-toolchain): Kernel(s) GENERIC-NODBG-CA72 built in 597 seconds, ncpu: 8, make -j8 (The rest of the aarch64 that I've access to is nearly-all cortex-a72 based, the others being cortex-a53 these days. So I was seeing how code tailored for the cortex-a72 context performed on the WDK23. cortex-a72 was my starting point with the WDK23.) -mcpu=3Dcortex-x1c+flagm code generation produced a (non-debug) kernel/world that, in turn, got (from scratch buildkernel after kernel-toolchain): Kernel(s) GENERIC-NODBG-CA78C built in 584 seconds, ncpu: 8, make -j8 NOTE: "+flagm" is because of various clang/gcc having an inaccurate set of features that omit flagm --and I'm making sure I've got it enabled. -mcpu=3Dcortex-a78c is even worse: it has examples of +fp16fml by default in some toolchains --but neither of the 2 types of core has support for such. (The cortex-x1c and cortex-a78c actually have matching features for code generation purposes, at least for all that I looked at. Toolchain mismatches for default features are sufficient evidence of an error in at least one case as far as I can tell.) This context is implicitly +lse+rcpc . At the time I was not being explicit when defaults matched. Notes: "lse" is the large system extension atomics, disabled below. "rcpc" is the extension having load acquire and store release instructions. (rcpc I was explicit about below, despite the default matching.) -mcpu=3Dcortex-x1c+flagm+nolse+rcpc code generation produced a (non-debug) kernel/world that, in turn, got (from scratch buildkernel after kernel-toolchain): Kernel(s) GENERIC-NODBG-CA78CnoLSE built in 415 seconds, ncpu: 8, make = -j Note: My explorations so far have tried the world combinations of lse and rcpc status but with a kernel that was based on -mcpu=3Dcortex-x1c+flagm . I then updated the kernel to match the -mcpu=3Dcortex-x1c+flagm+nolse+rcpc and used it to produce the above. So there is more exploring that I've not done yet. But I'm not expecting decreases to notably below the 415 sec. The benchmark like activity had showed that +lse+rcpc for the world/benchmark builds lead to notable negative consequences for cpus 0..3 compared to the other 3 combinations of status. For cpus 4..7, it showed that +nolse+rcpc for the world/benchmark builds had a noticeable gain compared to the other 3 combinations. This guided the buildkernel testing selections done so far. The buildkernel tests were, in part, to be sure that the apparent consequences were not just odd consequences for time measurements that could mess up benchmark result comparisons being useful. For comparison to a standard FreeBSD non-debug build, I used a snapshot download of: = http://ftp3.freebsd.org/pub/FreeBSD/snapshots/ISO-IMAGES/13.2/FreeBSD-13.2= -STABLE-arm64-aarch64-ROCK64-20230504-7dea7445ba44-255298.img.xz and dd'd it to media, replaced the EFI/*/* with ones that work for the Windows Dev Kit 2023, booted the WDK23 with the media, copied over my /usr/*-src/ to the media, did a "make -j8 = kernel-toolchain", from the /usr/main-src/ copy and finally did a "make -j8 buildkernel" (so, from-scratch, given the toolchain materials are already in place): Kernel(s) GENERIC built in 505 seconds, ncpu: 8, make -j8 ( /usr/main-src/ has the source that the other buildkernel timings were based on. ) Looks like -mcpu=3Dcortex-a72 and -mcpu=3Dcortex-x1c+flagm are far from a good fit for buildkernel workloads to run under on the WDK23. FreeBSD defaults and -mcpu=3Dcortex-x1c+flagm+nolse+rcpc seems to be better fits for such use. Note: This testing was in a ZFS context, using bectl to advantage, in case that somehow matters. For reference: # grep mcpu=3D /usr/main-src/sys/arm64/conf/GENERIC-NODBG-CA78C makeoptions CONF_CFLAGS=3D"-mcpu=3Dcortex-x1c+flagm+nolse+rcpc" # grep mcpu=3D ~/src.configs/*CA78C-nodbg* XCFLAGS+=3D -mcpu=3Dcortex-x1c+flagm+nolse+rcpc XCXXFLAGS+=3D -mcpu=3Dcortex-x1c+flagm+nolse+rcpc ACFLAGS.arm64cpuid.S+=3D -mcpu=3Dcortex-x1c ACFLAGS.aesv8-armx.S+=3D -mcpu=3Dcortex-x1c ACFLAGS.ghashv8-armx.S+=3D -mcpu=3Dcortex-x1c # more /usr/local/etc/poudriere.d/main-CA78C-make.conf CFLAGS+=3D -mcpu=3Dcortex-x1c+flagm+nolse+rcpc CXXFLAGS+=3D -mcpu=3Dcortex-x1c+flagm+nolse+rcpc CPPFLAGS+=3D -mcpu=3Dcortex-x1c+flagm+nolse+rcpc RUSTFLAGS_CPU_FEATURES=3D -C target-cpu=3Dcortex-x1c -C = target-feature=3D+x1c,+flagm,-lse,+rcpc diff --git a/secure/lib/libcrypto/Makefile = b/secure/lib/libcrypto/Makefile index 8fde4f19d046..e13227d6450b 100644 --- a/secure/lib/libcrypto/Makefile +++ b/secure/lib/libcrypto/Makefile @@ -22,7 +22,7 @@ SRCS+=3D mem.c mem_dbg.c mem_sec.c o_dir.c = o_fips.c o_fopen.c o_init.c SRCS+=3D o_str.c o_time.c threads_pthread.c uid.c .if defined(ASM_aarch64) SRCS+=3D arm64cpuid.S armcap.c -ACFLAGS.arm64cpuid.S=3D -march=3Darmv8-a+crypto +ACFLAGS.arm64cpuid.S+=3D -march=3Darmv8-a+crypto .elif defined(ASM_amd64) SRCS+=3D x86_64cpuid.S .elif defined(ASM_arm) @@ -43,7 +43,7 @@ SRCS+=3D mem_clr.c SRCS+=3D aes_cbc.c aes_cfb.c aes_ecb.c aes_ige.c aes_misc.c aes_ofb.c = aes_wrap.c .if defined(ASM_aarch64) SRCS+=3D aes_core.c aesv8-armx.S vpaes-armv8.S -ACFLAGS.aesv8-armx.S=3D -march=3Darmv8-a+crypto +ACFLAGS.aesv8-armx.S+=3D -march=3Darmv8-a+crypto .elif defined(ASM_amd64) SRCS+=3D aes_core.c aesni-mb-x86_64.S aesni-sha1-x86_64.S = aesni-sha256-x86_64.S SRCS+=3D aesni-x86_64.S vpaes-x86_64.S @@ -278,7 +278,7 @@ SRCS+=3D cbc128.c ccm128.c cfb128.c ctr128.c = cts128.c gcm128.c ocb128.c SRCS+=3D ofb128.c wrap128.c xts128.c .if defined(ASM_aarch64) SRCS+=3D ghashv8-armx.S -ACFLAGS.ghashv8-armx.S=3D -march=3Darmv8-a+crypto +ACFLAGS.ghashv8-armx.S+=3D -march=3Darmv8-a+crypto =3D=3D=3D Mark Millard marklmi at yahoo.com