Date: Sat, 15 Aug 2020 00:23:58 -0700 From: Mark Millard <marklmi@yahoo.com> To: freebsd-arm <freebsd-arm@freebsd.org> Subject: RPi4B and self-hosted buildworld buildkernel times: using more than -j3 is a waste in my tests. Message-ID: <D1FE2E6A-F83A-41AE-87FE-44BBA1CF09A8@yahoo.com> References: <D1FE2E6A-F83A-41AE-87FE-44BBA1CF09A8.ref@yahoo.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Self hosted, from scratch, buildworld buildkernel times (head -r363590 non-debug build, more context notes later): RPi4B set for 3072 MiByte context: -j4 buildworld: 44783 sec (a little under 12.5 hours) -j3 buildworld: 44034 sec (a little under 12.3 hours) -j2 buildworld: 49070 sec (a little under 13.7 hours) -j1 buildworld: 71083 sec (a little under 19.8 hours) -j4 buildkernel: 2876 sec (a little under 48 minutes) -j3 buildkernel: 2895 sec (a little under 49 minutes) -j2 buildkernel: 3289 sec (a little under 55 minutes) -j1 buildkernel: 4866 sec (a little under 82 minutes) So: -j4 does not cut the time required compared to -j3.=20 It appears that larger -jN figures would also not cut the time compared to -j3. Context notes: Build commands had "buildworld buildkernel" on the command lines. UEFI/ACPI based boot (v1.17) for the RPi4B. Each "buildworld buildkernel" was from-scratch and using the same src.conf and make.conf files (under other names). The file system is on a USB3 SSD and no sdcard is involved. The context is limited to 3072 MiByte in order to avoid the DMA handling problems that would otherwise happen. over_voltage=3D6 and arm_freq=3D2000 were in use. This makes the cortex-A72 clock rate match the MACCHIATObin Double Shot that I have access to (2 GHz). The MACCHIATObin got: -j4 buildworld: 18789 sec (a little under 5.3 hours) -j1 buildworld: 54331 sec (a little under 15.1 hours) -j4 buildkernel: 1296 sec (a little under 22 minutes) -j1 buildkernel: 3800 sec (a little under 63.33 minutes) So: much less time required compared to the RPi4B at the same clock rate. (The MACCHIATObin has a SATA SSD but buildworld buildkernel is not I/O bound.) There are huge differences in the effectiveness of the RAM caches and possibly other aspects related to RAM access. I looked with a benchmark program that exposes some overall effects of such variations, including allowing testing various thread counts. For the benchmarking, the range of problem sizes covered by L1 & L2 cache, the RPi4B and MACCHIATObin were a close match. But as problem sizes grew to much larger than the caches, the difference became large, especially for the likes -j4. (An OverDrive 1000 with its cortex-a57 @1.7 GHz takes even less time: again RAM caches and/or other aspects related to RAM-access greatly contribute.) For reference: # more ~/src.configs/src.conf.cortexA72-clang-bootstrap.aarch64-host=20 TO_TYPE=3Daarch64 # KERNCONF=3DGENERIC-NODBG TARGET=3Darm64 .if ${.MAKE.LEVEL} =3D=3D 0 TARGET_ARCH=3D${TO_TYPE} .export TARGET_ARCH .endif # WITH_SYSTEM_COMPILER=3D WITH_SYSTEM_LINKER=3D # WITH_LIBCPLUSPLUS=3D WITHOUT_BINUTILS_BOOTSTRAP=3D WITH_ELFTOOLCHAIN_BOOTSTRAP=3D #Disables avoiding bootstrap: WITHOUT_LLVM_TARGET_ALL=3D WITH_LLVM_TARGET_AARCH64=3D WITH_LLVM_TARGET_ARM=3D WITHOUT_LLVM_TARGET_MIPS=3D WITHOUT_LLVM_TARGET_POWERPC=3D WITHOUT_LLVM_TARGET_RISCV=3D WITHOUT_LLVM_TARGET_X86=3D WITH_CLANG=3D WITH_CLANG_IS_CC=3D WITH_CLANG_FULL=3D WITH_CLANG_EXTRAS=3D WITH_LLD=3D WITH_LLD_IS_LD=3D WITHOUT_BINUTILS=3D WITH_LLDB=3D # WITH_BOOT=3D WITHOUT_LIB32=3D # NO_WERROR=3D #WERROR=3D MALLOC_PRODUCTION=3D # # Avoid stripping but do not control host -g status as well: DEBUG_FLAGS+=3D # WITH_REPRODUCIBLE_BUILD=3D WITH_DEBUG_FILES=3D # # Use of the .clang 's here avoids # interfering with other C<?>FLAGS # usage, such as ?=3D usage. CFLAGS.clang+=3D -mcpu=3Dcortex-a72 CXXFLAGS.clang+=3D -mcpu=3Dcortex-a72 CPPFLAGS.clang+=3D -mcpu=3Dcortex-a72 ACFLAGS.arm64cpuid.S+=3D -mcpu=3Dcortex-a72+crypto ACFLAGS.aesv8-armx.S+=3D -mcpu=3Dcortex-a72+crypto ACFLAGS.ghashv8-armx.S+=3D -mcpu=3Dcortex-a72+crypto # more ~/src.configs/make.conf=20 CFLAGS.gcc+=3D -v (But gcc was not in use.) # more /usr/src/sys/arm64/conf/GENERIC-NODBG # # GENERIC -- Custom configuration for the arm64/aarch64 # include "GENERIC" ident GENERIC-NODBG makeoptions DEBUG=3D-g # Build kernel with gdb(1) = debug symbols options ALT_BREAK_TO_DEBUGGER options KDB # Enable kernel debugger support # For minimum debugger support (stable branch) use: #options KDB_TRACE # Print a stack trace for a = panic options DDB # Enable the kernel debugger # Extra stuff: #options VERBOSE_SYSINIT=3D0 # Enable verbose sysinit = messages #options BOOTVERBOSE=3D1 #options BOOTHOWTO=3DRB_VERBOSE #options KTR #options KTR_MASK=3DKTR_TRAP ##options KTR_CPUMASK=3D0xF #options KTR_VERBOSE # Disable any extra checking for. . . nooptions DEADLKRES # Enable the deadlock resolver nooptions INVARIANTS # Enable calls of extra sanity = checking nooptions INVARIANT_SUPPORT # Extra sanity checks of = internal structures, required by INVARIANTS nooptions WITNESS # Enable checks to detect = deadlocks and cycles nooptions WITNESS_SKIPSPIN # Don't run witness on spinlocks = for speed nooptions DIAGNOSTIC nooptions MALLOC_DEBUG_MAXZONES # Separate malloc(9) zones nooptions BUF_TRACKING nooptions FULL_BUF_TRACKING =3D=3D=3D Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar)
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?D1FE2E6A-F83A-41AE-87FE-44BBA1CF09A8>