From owner-freebsd-arm@freebsd.org Sat Aug 15 07:24:05 2020 Return-Path: Delivered-To: freebsd-arm@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id F10783B1706 for ; Sat, 15 Aug 2020 07:24:05 +0000 (UTC) (envelope-from marklmi@yahoo.com) Received: from sonic312-25.consmr.mail.gq1.yahoo.com (sonic312-25.consmr.mail.gq1.yahoo.com [98.137.69.206]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4BTBd05RBQz3Tmm for ; Sat, 15 Aug 2020 07:24:04 +0000 (UTC) (envelope-from marklmi@yahoo.com) X-YMail-OSG: 0X1Gsm8VM1kkJ.nZbZWvEPqdmGek6jbOsVmTk_4c9ad.oC2Djq8T9W02lFkDVeN 79jRBFiubgTMzMrouAVIg5sO93R.BC1tJwJZdU8kfpRA8KSC7ORFXPFCVI7osvyRiaTDvl2XPnSO k1wZvGIlO9kBTWB5_YDuWU0F1hcE9vzmLPFf7TU_pi88cwUNeKsfxv2nZReALGlJ7vsUtg2xCeEy 7HqYCM43ei3DErJAUAv1fEZWW9H7wC7nQz9rpifDiwRWDvGYGnDZVhTy2Ms3cgjf.GAzzKMF7apJ qoJzgvRhiJM6Ed55GB7Et6bNhA6BEMnRf8K5atuARNmq1QyEJhTTrJKMFXukZu8YfZZwlW3bGE.y WkdWrgoYkX4AWrSk1ygC0X4T3qxHxgnFFU9sFB3fVC.mX7IhrDf47wr6UwCIabhxmuEw8FyGq890 .KOzVlJeaMt3bOf44D8LQ4FhmWFj7.F8bLJvdeQOz1WRYiCOwkWWRgViSItQ0pci.1fkvZRdbleI .qgaK.yggvUqu0zgKoCR_EgxoxG3vaNCTdD_lz1mYXJmp6QLtyCluYOnD_2TvKluusm_mXWht7tS PaVsCp.KJiMXRw3lsS0Q4HjpDtBOZhNPFrDaaweIL6AtN1V0JIwSAoOPoAVP4CMSsJLkREJkJpRl EUqAFZsG9pic4mkAWzXzXXkWNFIIxYXymhOPgqB4.Bvs9NqhKcHcy32QAYQhckQ2vmVvcpGR3swm 5oyVdNlhaCIUffn4y0j7ZQo2iMjKgKZHMQBQEWvnil.Iz54I7uGS42MEAEsx04ciVNwbZz0g4ivI c6.F8r0Z7.rD1mu0K3prp8FsrvqQne8_VFzJRSi0g9Ik6VAlLsxUxFMlDFkgvNEzgYkGD995UqWF vPw26h6GFGkBaEel76kFpUhlCW92fLUEmFbrh_hI3jt3qi2LJ9eLfzLXGFUGiE0g1G5rafdZLM0J kvEusghrV4O_CdwOp1.Om062K9Mq9O6MkUt3x8UpTuPJOjMeVKpedRVDdjcMlJXsUxwq1h7Gn.w9 W2uISelbrSiwd.lIXNQYFhTzmGrPe8Oej_iDW1PS6g3HIoR5XMa2sgHFf2wXGkTAjxJLfWUD8Zwm Hs06f9TXEpKkGI78.fmEOM14vuAtT0JraJiEexGW6Ic1OYP98W4FlRECvDdpGkbQqX4mVFmHEy.b CfWnEmqBAoMc.kaYVbeKQ12fiVr2FVSDNuUSd.HU_Y0ZlM.tSF40aBeNW1e6Y5qaycVBPhvszmBG wV6BYWobGEov36QcuL9OT3O6ZlKTRi_3Jq3yMMMVR.b9CWbhn.e5uu0fqGEjtrTCyg3U6d8YjPTU 9xli_01zOJftW9TRX5oC5TOSKuTBhoyPlVG8jT7030d3MCoYgO.f1flnh0gA1Po479PCrsR9tu5R bZMVxCe_6nBX2rvONKWggTPyVoW2uduAf4QnRVHAgPFHf2XsuzeK4_6gvfEfy0Ngzp1WIoG9SKk3 nTrtB2WhvgQ-- Received: from sonic.gate.mail.ne1.yahoo.com by sonic312.consmr.mail.gq1.yahoo.com with HTTP; Sat, 15 Aug 2020 07:24:02 +0000 Received: by smtp425.mail.bf1.yahoo.com (VZM Hermes SMTP Server) with ESMTPA ID 1c607235ab863447f5dc27f554c3146b; Sat, 15 Aug 2020 07:24:00 +0000 (UTC) From: Mark Millard Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Mime-Version: 1.0 (Mac OS X Mail 13.4 \(3608.120.23.2.1\)) Subject: RPi4B and self-hosted buildworld buildkernel times: using more than -j3 is a waste in my tests. Message-Id: Date: Sat, 15 Aug 2020 00:23:58 -0700 To: freebsd-arm X-Mailer: Apple Mail (2.3608.120.23.2.1) References: X-Rspamd-Queue-Id: 4BTBd05RBQz3Tmm X-Spamd-Bar: --- X-Spamd-Result: default: False [-3.42 / 15.00]; RCVD_TLS_LAST(0.00)[]; ARC_NA(0.00)[]; R_DKIM_ALLOW(-0.20)[yahoo.com:s=s2048]; RWL_MAILSPIKE_POSSIBLE(0.00)[98.137.69.206:from]; FROM_HAS_DN(0.00)[]; FREEMAIL_FROM(0.00)[yahoo.com]; MV_CASE(0.50)[]; MIME_GOOD(-0.10)[text/plain]; TO_MATCH_ENVRCPT_ALL(0.00)[]; NEURAL_HAM_LONG(-1.02)[-1.020]; RCPT_COUNT_ONE(0.00)[1]; NEURAL_HAM_MEDIUM(-1.06)[-1.064]; TO_DN_ALL(0.00)[]; DKIM_TRACE(0.00)[yahoo.com:+]; DMARC_POLICY_ALLOW(-0.50)[yahoo.com,reject]; RCVD_IN_DNSWL_NONE(0.00)[98.137.69.206:from]; NEURAL_HAM_SHORT(-0.83)[-0.833]; R_SPF_ALLOW(-0.20)[+ptr:yahoo.com]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; FREEMAIL_ENVFROM(0.00)[yahoo.com]; ASN(0.00)[asn:36647, ipnet:98.137.64.0/21, country:US]; RCVD_COUNT_TWO(0.00)[2]; MID_RHS_MATCH_FROM(0.00)[]; DWL_DNSWL_NONE(0.00)[yahoo.com:dkim] X-BeenThere: freebsd-arm@freebsd.org X-Mailman-Version: 2.1.33 Precedence: list List-Id: "Porting FreeBSD to ARM processors." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 15 Aug 2020 07:24:06 -0000 Self hosted, from scratch, buildworld buildkernel times (head -r363590 non-debug build, more context notes later): RPi4B set for 3072 MiByte context: -j4 buildworld: 44783 sec (a little under 12.5 hours) -j3 buildworld: 44034 sec (a little under 12.3 hours) -j2 buildworld: 49070 sec (a little under 13.7 hours) -j1 buildworld: 71083 sec (a little under 19.8 hours) -j4 buildkernel: 2876 sec (a little under 48 minutes) -j3 buildkernel: 2895 sec (a little under 49 minutes) -j2 buildkernel: 3289 sec (a little under 55 minutes) -j1 buildkernel: 4866 sec (a little under 82 minutes) So: -j4 does not cut the time required compared to -j3.=20 It appears that larger -jN figures would also not cut the time compared to -j3. Context notes: Build commands had "buildworld buildkernel" on the command lines. UEFI/ACPI based boot (v1.17) for the RPi4B. Each "buildworld buildkernel" was from-scratch and using the same src.conf and make.conf files (under other names). The file system is on a USB3 SSD and no sdcard is involved. The context is limited to 3072 MiByte in order to avoid the DMA handling problems that would otherwise happen. over_voltage=3D6 and arm_freq=3D2000 were in use. This makes the cortex-A72 clock rate match the MACCHIATObin Double Shot that I have access to (2 GHz). The MACCHIATObin got: -j4 buildworld: 18789 sec (a little under 5.3 hours) -j1 buildworld: 54331 sec (a little under 15.1 hours) -j4 buildkernel: 1296 sec (a little under 22 minutes) -j1 buildkernel: 3800 sec (a little under 63.33 minutes) So: much less time required compared to the RPi4B at the same clock rate. (The MACCHIATObin has a SATA SSD but buildworld buildkernel is not I/O bound.) There are huge differences in the effectiveness of the RAM caches and possibly other aspects related to RAM access. I looked with a benchmark program that exposes some overall effects of such variations, including allowing testing various thread counts. For the benchmarking, the range of problem sizes covered by L1 & L2 cache, the RPi4B and MACCHIATObin were a close match. But as problem sizes grew to much larger than the caches, the difference became large, especially for the likes -j4. (An OverDrive 1000 with its cortex-a57 @1.7 GHz takes even less time: again RAM caches and/or other aspects related to RAM-access greatly contribute.) For reference: # more ~/src.configs/src.conf.cortexA72-clang-bootstrap.aarch64-host=20 TO_TYPE=3Daarch64 # KERNCONF=3DGENERIC-NODBG TARGET=3Darm64 .if ${.MAKE.LEVEL} =3D=3D 0 TARGET_ARCH=3D${TO_TYPE} .export TARGET_ARCH .endif # WITH_SYSTEM_COMPILER=3D WITH_SYSTEM_LINKER=3D # WITH_LIBCPLUSPLUS=3D WITHOUT_BINUTILS_BOOTSTRAP=3D WITH_ELFTOOLCHAIN_BOOTSTRAP=3D #Disables avoiding bootstrap: WITHOUT_LLVM_TARGET_ALL=3D WITH_LLVM_TARGET_AARCH64=3D WITH_LLVM_TARGET_ARM=3D WITHOUT_LLVM_TARGET_MIPS=3D WITHOUT_LLVM_TARGET_POWERPC=3D WITHOUT_LLVM_TARGET_RISCV=3D WITHOUT_LLVM_TARGET_X86=3D WITH_CLANG=3D WITH_CLANG_IS_CC=3D WITH_CLANG_FULL=3D WITH_CLANG_EXTRAS=3D WITH_LLD=3D WITH_LLD_IS_LD=3D WITHOUT_BINUTILS=3D WITH_LLDB=3D # WITH_BOOT=3D WITHOUT_LIB32=3D # NO_WERROR=3D #WERROR=3D MALLOC_PRODUCTION=3D # # Avoid stripping but do not control host -g status as well: DEBUG_FLAGS+=3D # WITH_REPRODUCIBLE_BUILD=3D WITH_DEBUG_FILES=3D # # Use of the .clang 's here avoids # interfering with other CFLAGS # usage, such as ?=3D usage. CFLAGS.clang+=3D -mcpu=3Dcortex-a72 CXXFLAGS.clang+=3D -mcpu=3Dcortex-a72 CPPFLAGS.clang+=3D -mcpu=3Dcortex-a72 ACFLAGS.arm64cpuid.S+=3D -mcpu=3Dcortex-a72+crypto ACFLAGS.aesv8-armx.S+=3D -mcpu=3Dcortex-a72+crypto ACFLAGS.ghashv8-armx.S+=3D -mcpu=3Dcortex-a72+crypto # more ~/src.configs/make.conf=20 CFLAGS.gcc+=3D -v (But gcc was not in use.) # more /usr/src/sys/arm64/conf/GENERIC-NODBG # # GENERIC -- Custom configuration for the arm64/aarch64 # include "GENERIC" ident GENERIC-NODBG makeoptions DEBUG=3D-g # Build kernel with gdb(1) = debug symbols options ALT_BREAK_TO_DEBUGGER options KDB # Enable kernel debugger support # For minimum debugger support (stable branch) use: #options KDB_TRACE # Print a stack trace for a = panic options DDB # Enable the kernel debugger # Extra stuff: #options VERBOSE_SYSINIT=3D0 # Enable verbose sysinit = messages #options BOOTVERBOSE=3D1 #options BOOTHOWTO=3DRB_VERBOSE #options KTR #options KTR_MASK=3DKTR_TRAP ##options KTR_CPUMASK=3D0xF #options KTR_VERBOSE # Disable any extra checking for. . . nooptions DEADLKRES # Enable the deadlock resolver nooptions INVARIANTS # Enable calls of extra sanity = checking nooptions INVARIANT_SUPPORT # Extra sanity checks of = internal structures, required by INVARIANTS nooptions WITNESS # Enable checks to detect = deadlocks and cycles nooptions WITNESS_SKIPSPIN # Don't run witness on spinlocks = for speed nooptions DIAGNOSTIC nooptions MALLOC_DEBUG_MAXZONES # Separate malloc(9) zones nooptions BUF_TRACKING nooptions FULL_BUF_TRACKING =3D=3D=3D Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar)