Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 15 Aug 2020 00:23:58 -0700
From:      Mark Millard <marklmi@yahoo.com>
To:        freebsd-arm <freebsd-arm@freebsd.org>
Subject:   RPi4B and self-hosted buildworld buildkernel times: using more than -j3 is a waste in my tests.
Message-ID:  <D1FE2E6A-F83A-41AE-87FE-44BBA1CF09A8@yahoo.com>
References:  <D1FE2E6A-F83A-41AE-87FE-44BBA1CF09A8.ref@yahoo.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Self hosted, from scratch, buildworld buildkernel times
(head -r363590 non-debug build, more context notes
later):

RPi4B set for 3072 MiByte context:

-j4 buildworld:  44783 sec (a little under 12.5 hours)
-j3 buildworld:  44034 sec (a little under 12.3 hours)
-j2 buildworld:  49070 sec (a little under 13.7 hours)
-j1 buildworld:  71083 sec (a little under 19.8 hours)

-j4 buildkernel:  2876 sec (a little under 48 minutes)
-j3 buildkernel:  2895 sec (a little under 49 minutes)
-j2 buildkernel:  3289 sec (a little under 55 minutes)
-j1 buildkernel:  4866 sec (a little under 82 minutes)

So: -j4 does not cut the time required compared to -j3.=20
It appears that larger -jN figures would also not cut
the time compared to -j3.


Context notes:

Build commands had "buildworld buildkernel" on the
command lines.

UEFI/ACPI based boot (v1.17) for the RPi4B.

Each "buildworld buildkernel" was from-scratch and using
the same src.conf and make.conf files (under other names).

The file system is on a USB3 SSD and no sdcard is involved.
The context is limited to 3072 MiByte in order to avoid the
DMA handling problems that would otherwise happen.

over_voltage=3D6 and arm_freq=3D2000 were in use. This makes
the cortex-A72 clock rate match the MACCHIATObin Double
Shot that I have access to (2 GHz). The MACCHIATObin got:

-j4 buildworld:  18789 sec (a little under  5.3 hours)
-j1 buildworld:  54331 sec (a little under 15.1 hours)

-j4 buildkernel:  1296 sec (a little under 22    minutes)
-j1 buildkernel:  3800 sec (a little under 63.33 minutes)

So: much less time required compared to the RPi4B at the
same clock rate. (The MACCHIATObin has a SATA SSD but
buildworld buildkernel is not I/O bound.)

There are huge differences in the effectiveness of the
RAM caches and possibly other aspects related to RAM access.
I looked with a benchmark program that exposes some overall
effects of such variations, including allowing testing
various thread counts.

For the benchmarking, the range of problem sizes covered
by L1 & L2 cache, the RPi4B and MACCHIATObin were a close
match. But as problem sizes grew to much larger than the
caches, the difference became large, especially for the
likes -j4.

(An OverDrive 1000 with its cortex-a57 @1.7 GHz takes
even less time: again RAM caches and/or other aspects
related to RAM-access greatly contribute.)

For reference:

# more ~/src.configs/src.conf.cortexA72-clang-bootstrap.aarch64-host=20
TO_TYPE=3Daarch64
#
KERNCONF=3DGENERIC-NODBG
TARGET=3Darm64
.if ${.MAKE.LEVEL} =3D=3D 0
TARGET_ARCH=3D${TO_TYPE}
.export TARGET_ARCH
.endif
#
WITH_SYSTEM_COMPILER=3D
WITH_SYSTEM_LINKER=3D
#
WITH_LIBCPLUSPLUS=3D
WITHOUT_BINUTILS_BOOTSTRAP=3D
WITH_ELFTOOLCHAIN_BOOTSTRAP=3D
#Disables avoiding bootstrap: WITHOUT_LLVM_TARGET_ALL=3D
WITH_LLVM_TARGET_AARCH64=3D
WITH_LLVM_TARGET_ARM=3D
WITHOUT_LLVM_TARGET_MIPS=3D
WITHOUT_LLVM_TARGET_POWERPC=3D
WITHOUT_LLVM_TARGET_RISCV=3D
WITHOUT_LLVM_TARGET_X86=3D
WITH_CLANG=3D
WITH_CLANG_IS_CC=3D
WITH_CLANG_FULL=3D
WITH_CLANG_EXTRAS=3D
WITH_LLD=3D
WITH_LLD_IS_LD=3D
WITHOUT_BINUTILS=3D
WITH_LLDB=3D
#
WITH_BOOT=3D
WITHOUT_LIB32=3D
#
NO_WERROR=3D
#WERROR=3D
MALLOC_PRODUCTION=3D
#
# Avoid stripping but do not control host -g status as well:
DEBUG_FLAGS+=3D
#
WITH_REPRODUCIBLE_BUILD=3D
WITH_DEBUG_FILES=3D
#
# Use of the .clang 's here avoids
# interfering with other C<?>FLAGS
# usage, such as ?=3D usage.
CFLAGS.clang+=3D -mcpu=3Dcortex-a72
CXXFLAGS.clang+=3D -mcpu=3Dcortex-a72
CPPFLAGS.clang+=3D -mcpu=3Dcortex-a72
ACFLAGS.arm64cpuid.S+=3D -mcpu=3Dcortex-a72+crypto
ACFLAGS.aesv8-armx.S+=3D -mcpu=3Dcortex-a72+crypto
ACFLAGS.ghashv8-armx.S+=3D -mcpu=3Dcortex-a72+crypto

# more ~/src.configs/make.conf=20
CFLAGS.gcc+=3D -v

(But gcc was not in use.)

# more /usr/src/sys/arm64/conf/GENERIC-NODBG
#
# GENERIC -- Custom configuration for the arm64/aarch64
#

include "GENERIC"

ident   GENERIC-NODBG

makeoptions     DEBUG=3D-g                # Build kernel with gdb(1) =
debug symbols

options         ALT_BREAK_TO_DEBUGGER

options         KDB                     # Enable kernel debugger support

# For minimum debugger support (stable branch) use:
#options        KDB_TRACE               # Print a stack trace for a =
panic
options         DDB                     # Enable the kernel debugger

# Extra stuff:
#options        VERBOSE_SYSINIT=3D0       # Enable verbose sysinit =
messages
#options        BOOTVERBOSE=3D1
#options        BOOTHOWTO=3DRB_VERBOSE
#options        KTR
#options        KTR_MASK=3DKTR_TRAP
##options       KTR_CPUMASK=3D0xF
#options        KTR_VERBOSE

# Disable any extra checking for. . .
nooptions       DEADLKRES               # Enable the deadlock resolver
nooptions       INVARIANTS              # Enable calls of extra sanity =
checking
nooptions       INVARIANT_SUPPORT       # Extra sanity checks of =
internal structures, required by INVARIANTS
nooptions       WITNESS                 # Enable checks to detect =
deadlocks and cycles
nooptions       WITNESS_SKIPSPIN        # Don't run witness on spinlocks =
for speed
nooptions       DIAGNOSTIC
nooptions       MALLOC_DEBUG_MAXZONES   # Separate malloc(9) zones
nooptions       BUF_TRACKING
nooptions       FULL_BUF_TRACKING

=3D=3D=3D
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?D1FE2E6A-F83A-41AE-87FE-44BBA1CF09A8>