Date: Mon, 28 Sep 2020 08:36:40 -0700 From: Mark Millard <marklmi@yahoo.com> To: freebsd-arm <freebsd-arm@freebsd.org> Subject: Re: RPi4B buildworld buildkernel times for already installed system being -mcpu=cortex-a72 vs. -mcpu=cortex-a53 based Message-ID: <9CF3675E-072B-4845-A510-691508DCEF3C@yahoo.com> In-Reply-To: <C73994A3-625C-413B-B220-CE271AD92B2E@yahoo.com> References: <4E155E94-3AA0-464D-A1E9-45A7827537ED@yahoo.com> <A59A0896-5A24-4C05-AD15-DCC27F25B927@yahoo.com> <C73994A3-625C-413B-B220-CE271AD92B2E@yahoo.com>
next in thread | previous in thread | raw e-mail | index | archive | help
[Turns out, when sdram_freq_min=3D3200 is effective, -j4 builds are = faster than -j3 builds by about an hour (holding other configuration conditions constant).] On 2020-Sep-27, at 11:07, Mark Millard <marklmi at yahoo.com> wrote: > On 2020-Sep-20, at 18:40, Mark Millard <marklmi at yahoo.com> wrote: >=20 >> On 2020-Sep-20, at 18:32, Mark Millard <marklmi at yahoo.com> wrote: >>=20 >>> The following are from scratch buildworld buildkernel rebuilds >>> on a RPi4B (head -r363590 context). >>>=20 >>> ENVIRONMENT: -mcpu=3Dcortex-a72 based world and kernel running = already, RPi4B @ 2G Hz, >>> Restricted to 3 GiByte RAM, -j3: >>>=20 >>> World built in 37469 seconds, ncpu: 4, make -j3 >>> Kernel(s) GENERIC-NODBG built in 2474 seconds, ncpu: 4, make -j3 >>>=20 >>> ENVIRONMENT: -mcpu=3Dcortex-a53 based kernel running, RPi4B @ 2G Hz, >>> Restricted to 3 GiByte RAM, -j3: >>>=20 >>> World built in 44034 seconds, ncpu: 4, make -j3 >>> Kernel(s) GENERIC-NODBG built in 2895 seconds, ncpu: 4, make -j3 >>>=20 >>> So a little under 11.1 hr total vs. a little over 13.0 hr total, >>> a somewhat over 50 min improvement. >>=20 >> "a somewhat over 1hr 50 min improvement" is what I should have >> managed to type. >>=20 >>> (A xhci patch finally allowed me to boot -mcpu=3Dcortex-a72 >>> based kernel builds on the RPi4B: The xhci event ring >>> initialization code was missing a usb_bus_mem_flush_all >>> call previously.) >>>=20 >>>=20 >>> Supporting details: >>>=20 >>> (e-mail based spacing changes expected below) >>>=20 >>> # diff -u = ~/src.configs/src.conf.cortexA72-clang-bootstrap.aarch64-host = ~/src.configs/src.conf.cortexA53-clang-bootstrap.aarch64-host >>> --- = /root/src.configs/src.conf.cortexA72-clang-bootstrap.aarch64-host = 2020-03-13 22:29:25.470155000 -0700 >>> +++ = /root/src.configs/src.conf.cortexA53-clang-bootstrap.aarch64-host = 2020-03-13 22:29:25.469455000 -0700 >>> @@ -49,9 +49,9 @@ >>> # Use of the .clang 's here avoids >>> # interfering with other C<?>FLAGS >>> # usage, such as ?=3D usage. >>> -CFLAGS.clang+=3D -mcpu=3Dcortex-a72 >>> -CXXFLAGS.clang+=3D -mcpu=3Dcortex-a72 >>> -CPPFLAGS.clang+=3D -mcpu=3Dcortex-a72 >>> -ACFLAGS.arm64cpuid.S+=3D -mcpu=3Dcortex-a72+crypto >>> -ACFLAGS.aesv8-armx.S+=3D -mcpu=3Dcortex-a72+crypto >>> -ACFLAGS.ghashv8-armx.S+=3D -mcpu=3Dcortex-a72+crypto >>> +CFLAGS.clang+=3D -mcpu=3Dcortex-a53 >>> +CXXFLAGS.clang+=3D -mcpu=3Dcortex-a53 >>> +CPPFLAGS.clang+=3D -mcpu=3Dcortex-a53 >>> +ACFLAGS.arm64cpuid.S+=3D -mcpu=3Dcortex-a53+crypto >>> +ACFLAGS.aesv8-armx.S+=3D -mcpu=3Dcortex-a53+crypto >>> +ACFLAGS.ghashv8-armx.S+=3D -mcpu=3Dcortex-a53+crypto >>>=20 >>>=20 >>> The .amd64-host files are similar for doing cross builds. >>>=20 >>> I also use +=3D in secure/lib/libcrypto/Makefile : >>>=20 >>> # svnlite diff /usr/src/secure/lib/libcrypto/Makefile >>> Index: /usr/src/secure/lib/libcrypto/Makefile >>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >>> --- /usr/src/secure/lib/libcrypto/Makefile (revision 365919) >>> +++ /usr/src/secure/lib/libcrypto/Makefile (working copy) >>> @@ -20,7 +20,7 @@ >>> SRCS+=3D o_str.c o_time.c threads_pthread.c uid.c >>> .if defined(ASM_aarch64) >>> SRCS+=3D arm64cpuid.S armcap.c >>> -ACFLAGS.arm64cpuid.S=3D -march=3Darmv8-a+crypto >>> +ACFLAGS.arm64cpuid.S+=3D -march=3Darmv8-a+crypto >>> .elif defined(ASM_amd64) >>> SRCS+=3D x86_64cpuid.S >>> .elif defined(ASM_arm) >>> @@ -35,7 +35,7 @@ >>> SRCS+=3D aes_cbc.c aes_cfb.c aes_ecb.c aes_ige.c aes_misc.c = aes_ofb.c aes_wrap.c >>> .if defined(ASM_aarch64) >>> SRCS+=3D aes_core.c aesv8-armx.S vpaes-armv8.S >>> -ACFLAGS.aesv8-armx.S=3D -march=3Darmv8-a+crypto >>> +ACFLAGS.aesv8-armx.S+=3D -march=3Darmv8-a+crypto >>> .elif defined(ASM_amd64) >>> SRCS+=3D aes_core.c aesni-mb-x86_64.S aesni-sha1-x86_64.S = aesni-sha256-x86_64.S >>> SRCS+=3D aesni-x86_64.S vpaes-x86_64.S >>> @@ -242,7 +242,7 @@ >>> SRCS+=3D ofb128.c wrap128.c xts128.c >>> .if defined(ASM_aarch64) >>> SRCS+=3D ghashv8-armx.S >>> -ACFLAGS.ghashv8-armx.S=3D -march=3Darmv8-a+crypto >>> +ACFLAGS.ghashv8-armx.S+=3D -march=3Darmv8-a+crypto >>> .elif defined(ASM_amd64) >>> SRCS+=3D aesni-gcm-x86_64.S ghash-x86_64.S >>> .elif defined(ASM_arm) >>>=20 >>> The RPi4B is using: >>>=20 >>> over_voltage=3D6 >>> arm_freq=3D2000 >>>=20 >>> and was booted via uefi/ACPI. >>>=20 >>> I have not repeated the -j4 or other -jN comparisons that >>> I reported in the past. The -mcpu=3Dcortex-a53 figures are >>> from the past. >=20 > The following new timing is based on head -r365932 rebuilding > itself where the 8 GiByte RPi4B config.txt ended with: >=20 > over_voltage=3D6 > arm_freq=3D2000 > sdram_freq_min=3D3200 >=20 > and the boot was via u-boot, no RAM restriction. (The > sdram_freq_min assignment does not seem to do anything > for rpi4-uefi-devel v1.20 uefi/ACPI based booting.) > /etc/sysctl.conf has: dev.cpu.0.freq=3D2000 . No use of > powerd or other such. >=20 >=20 > ENVIRONMENT: -mcpu=3Dcortex-a72 based world and kernel running = already, > 8 GiBYte RPi4B @ 2G Hz with sdram_freq_min=3D3200, u-boot style boot, = -j3: >=20 > World built in 31852 seconds, ncpu: 4, make -j3 > Kernel(s) GENERIC-NODBG built in 2059 seconds, ncpu: 4, make -j3 >=20 > So somewhat under 9.5 hr overall. >=20 >=20 > That means somewhat over 3.5 hours faster than a -mcpu=3Dcortex-a53 > based system without sdram_freq_min=3D3200 using 3 GiByte RAM > but still RPi4B @ 2G Hz (uefi/ACPI boot): >=20 > World built in 44034 seconds, ncpu: 4, make -j3 > Kernel(s) GENERIC-NODBG built in 2895 seconds, ncpu: 4, make -j3 >=20 > (Same as reported in prior messages.) >=20 > But the prior -r362590 vs. the now -r363932 means there is more = varying > than in my previous comparisons. For example, clang 10 vs. clang 11. >=20 > I'm probably going to run a -j4 build to see how it compares in > this context. ENVIRONMENT: -mcpu=3Dcortex-a72 based world and kernel running already, 8 GiBYte RPi4B dev.cpu.0.freq=3D2000 with sdram_freq_min=3D3200, u-boot style boot, -j4: World built in 28526 seconds, ncpu: 4, make -j4 Kernel(s) GENERIC-NODBG built in 1841 seconds, ncpu: 4, make -j4 So somewhat under 8.5 hr overall. That means somewhat over 4.5 hours faster than a -mcpu=3Dcortex-a53 based system without sdram_freq_min=3D3200 using 3 GiByte RAM but still RPi4B @ 2G Hz (uefi/ACPI boot). > I've not run a default arm-freq/sdram_freq_min/dev.cpu.0.freq = buildworld > buildkernel in a long time and so do not have reasonable comparison > figures relative to that type of context. I do not plan on such an > experiment. >=20 >=20 > I'll note that I run these tests with a monitor connected that sits > with a static login prompt display after booting. I do not not test > with X11 or other use that might significantly compete for more power. > The serial port console is usually used. I have used ssh sometimes in > the past. >=20 > ~/src.configs/src.conf.cortexA72-clang-bootstrap.aarch64-host is still > unchanged: >=20 > # more ~/src.configs/src.conf.cortexA72-clang-bootstrap.aarch64-host=20= > TO_TYPE=3Daarch64 > # > KERNCONF=3DGENERIC-NODBG > TARGET=3Darm64 > .if ${.MAKE.LEVEL} =3D=3D 0 > TARGET_ARCH=3D${TO_TYPE} > .export TARGET_ARCH > .endif > # > #WITH_CROSS_COMPILER=3D > WITH_SYSTEM_COMPILER=3D > WITH_SYSTEM_LINKER=3D > # > WITH_LIBCPLUSPLUS=3D > #WITH_LLD_BOOTSTRAP=3D > WITHOUT_BINUTILS_BOOTSTRAP=3D > WITH_ELFTOOLCHAIN_BOOTSTRAP=3D > #Disables avoiding bootstrap: WITHOUT_LLVM_TARGET_ALL=3D > WITH_LLVM_TARGET_AARCH64=3D > WITH_LLVM_TARGET_ARM=3D > WITHOUT_LLVM_TARGET_MIPS=3D > WITHOUT_LLVM_TARGET_POWERPC=3D > WITHOUT_LLVM_TARGET_RISCV=3D > WITHOUT_LLVM_TARGET_X86=3D > #WITH_CLANG_BOOTSTRAP=3D > WITH_CLANG=3D > WITH_CLANG_IS_CC=3D > WITH_CLANG_FULL=3D > WITH_CLANG_EXTRAS=3D > WITH_LLD=3D > WITH_LLD_IS_LD=3D > WITHOUT_BINUTILS=3D > WITH_LLDB=3D > # > WITH_BOOT=3D > WITHOUT_LIB32=3D > # > # > NO_WERROR=3D > #WERROR=3D > MALLOC_PRODUCTION=3D > # > # Avoid stripping but do not control host -g status as well: > DEBUG_FLAGS+=3D > # > WITH_REPRODUCIBLE_BUILD=3D > WITH_DEBUG_FILES=3D > # > # Use of the .clang 's here avoids > # interfering with other C<?>FLAGS > # usage, such as ?=3D usage. > CFLAGS.clang+=3D -mcpu=3Dcortex-a72 > CXXFLAGS.clang+=3D -mcpu=3Dcortex-a72 > CPPFLAGS.clang+=3D -mcpu=3Dcortex-a72 > ACFLAGS.arm64cpuid.S+=3D -mcpu=3Dcortex-a72+crypto > ACFLAGS.aesv8-armx.S+=3D -mcpu=3Dcortex-a72+crypto > ACFLAGS.ghashv8-armx.S+=3D -mcpu=3Dcortex-a72+crypto >=20 =3D=3D=3D Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar)
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?9CF3675E-072B-4845-A510-691508DCEF3C>