Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 29 Feb 2024 11:00:30 -0800
From:      Mark Millard <marklmi@yahoo.com>
To:        freebsd-hackers <freebsd-hackers@freebsd.org>, FreeBSD ARM List <freebsd-arm@freebsd.org>
Subject:   Re: How to investigate an unexpected port build time-taken relationship in an aarch64 context?
Message-ID:  <5FA20383-1469-439C-9F6E-6707C3447F2B@yahoo.com>
In-Reply-To: <F5D20FC4-BF9D-4790-B3D3-0FA215B3F254@yahoo.com>
References:  <7D640D5A-7514-480E-8D5B-58003DB558E1@yahoo.com> <900B51D1-2005-4281-BFBD-96B49B147A13@yahoo.com> <F5D20FC4-BF9D-4790-B3D3-0FA215B3F254@yahoo.com>

next in thread | previous in thread | raw e-mail | index | archive | help
[The armv7 bulk build variation of that test proves to
be an interesting contrast with a prior result.]

On Feb 29, 2024, at 10:04, Mark Millard <marklmi@yahoo.com> wrote:

> [A different test case also gets the shorter time frame.]
>=20
> On Feb 28, 2024, at 22:59, Mark Millard <marklmi@yahoo.com> wrote:
>=20
>> On Feb 28, 2024, at 18:46, Mark Millard <marklmi@yahoo.com> wrote:
>>=20
>>> Example HW Context: Windows Development Kit 2023
>>> 8 cores: 4 cortex-A78C's and 4 cortex-X1C's
>>> Headless: serial console and ssh access, no x11 or the like =
installed.
>>> UFS use.
>>>=20
>>> Note: cortex-A76's are missing 3 or so instruction set
>>> features compared to A78C/X1C parts. Use of
>>> -mcpu=3Dcortex-a76 generated code is compatibile (and would allow
>>> the code to run on a cortex-a76 system, such as an RPi5 once
>>> supported).
>>>=20
>>> I've been doing poudriere-devel bulk timing experiments based on:
>>>=20
>>> A) PkgBase based system software (kernel and world) and
>>> general use of default code generation for ports and
>>> such.
>>>=20
>>> B) A personal -mcpu=3Dcortex-a76 based kernel, world, port builds
>>> (into packages via poudriere-devel).
>>>=20
>>> C) Also use of an armv7 poudriere jail based on armv7 PkgBase
>>> and default armv7 code generation. This was used in both the
>>> (A) and (B) contexts. These also show what I'm curious about.
>>>=20
>>> Using the armv7 poudriere jail context for illustration:
>>>=20
>>> For (B) used via the armv7 context:
>>>=20
>>> [05:40:24] [03] [04:55:38] Finished lang/rust | rust-1.76.0: Success
>>> . . .
>>> [05:45:58] [01] [05:01:12] Finished devel/llvm18@default | =
llvm18-18.1.0.r3: Success
>>> [05:46:00] [01] [00:00:00] Building devel/boost-libs | =
boost-libs-1.84.0
>>> [06:59:23] [01] [01:13:23] Finished devel/boost-libs | =
boost-libs-1.84.0: Success
>>>=20
>>> For (A) used via the armv7 poudriere jail context:
>>>=20
>>> [06:33:21] [01] [05:40:48] Finished lang/rust | rust-1.76.0: Success
>>> . . .
>>> [06:40:05] [05] [05:48:09] Finished devel/llvm18@default | =
llvm18-18.1.0.r3: Success
>>> [06:40:07] [01] [00:00:00] Building devel/boost-libs | =
boost-libs-1.84.0
>>> [06:57:48] [01] [00:17:41] Finished devel/boost-libs | =
boost-libs-1.84.0: Success
>>>=20
>>> The curiosity is about the 01:13:23 vs. 00:17:41 boost-libs: The
>>> ratio is large and in the opposite direction to most time trends.
>>>=20
>>> Notes: Almost all the time llvm18 and rust were building, both were
>>> building but little else did and the load average was 16+ from the
>>> llvm18/rust build activity. When boost-libs was building it was the
>>> only thing building and it looked to be single threaded when I
>>> was watching.
>>=20
>> I should have been explicit that the 01:13:23 was mostly
>> "stage" phase (not "build" phase) and I was referring to
>> the "stage" phase as far as single  threaded is concerned.
>>=20
>>> (A) and (B) without use of the armv7 context got similar results
>>> when I first noticed this but I'm going back and recording times
>>> for some variations. I do not have those to report other
>>> pairs of results yet.
>>>=20
>>> (In the armv7 poudriere jail context reported:)
>>> (B) takes less time for llvm18 and rust than (A) does.
>>> (A) takes vastly less time for boost-libs than (B) does,
>>> approximately a factor of 4 for the time-ratio.
>>>=20
>>> I'd be curious to get a clue what contributes to the boost-libs
>>> time ratio being so extreme once I have figures for other
>>> combinations of poudriere jail content vs. the system's content.
>>=20
>> Turns out that for the aarch64 jail (PkgBase system and
>> default code generation), stage started about 10 min into
>> the boost-libs activity. Package started somewhat under 5
>> minutes later. End result:
>>=20
>> [05:55:56] [03] [05:33:12] Finished lang/rust | rust-1.76.0: Success
>> . . .
>> [06:04:37] [01] [05:41:53] Finished devel/llvm18@default | =
llvm18-18.1.0.r3: Success
>> [06:04:39] [01] [00:00:00] Building devel/boost-libs | =
boost-libs-1.84.0
>> [06:20:50] [01] [00:16:11] Finished devel/boost-libs | =
boost-libs-1.84.0: Success
>>=20
>> So, very similar to the armv7 jail result for a PkgBase
>> system context (aarch64 boot and aarch64/armv7 PkgBase
>> jail, default code generation).
>>=20
>> It appears that the boost-libs "stage" phase is the context
>> for my question.
>>=20
>> For the jail code generation being based on -mcpu=3Dcortex-a76
>> code generation but the boot having been PkgBase based:
>>=20
>> Stage started about 11.5 min into the boost-libs activity.
>> Package started around 48 minutes later. End result
>> (showing only boost-libs):
>>=20
>> [01:07:01] [01] [01:06:31] Finished devel/boost-libs | =
boost-libs-1.84.0: Success
>>=20
>> I'll note that bjam stays around 100% CPU in top during this
>> much longer "stage" phase. Definitely less than the 01:13:23
>> time. MWCHAN "-", STAT RJ, PRI 135 when I looked.
>>=20
>> For the jail code generation and boot context both being based
>> on -mcpu=3Dcortex-a76 code generation:
>>=20
>> [05:16:38] [01] [00:49:22] Finished devel/boost-libs | =
boost-libs-1.83.0_1: Success
>>=20
>> Also definitely less than the 01:13:23 time. (I'm showing here an
>> earlier test when it was boost-libs v1.83.)
>>=20
>> A ZFS context (instead of UFS context) showed:
>>=20
>> [04:37:47] [03] [04:03:16] Finished lang/rust | rust-1.76.0: Success
>> . . .
>> [04:43:47] [01] [04:09:16] Finished devel/llvm18@default | =
llvm18-18.1.0.r3: Success
>> [04:43:48] [01] [00:00:00] Building devel/boost-libs | =
boost-libs-1.84.0
>> [05:41:46] [01] [00:57:58] Finished devel/boost-libs | =
boost-libs-1.84.0: Success
>=20
> Another test case . . .
>=20
> So I booted the -mcpu-cortex-a76 kernel and world (kernel
> configures to use LSE atomics) and chroot'd into a aarch64
> PkgBase world that has a PkgBase jail for poudiere use and
> that uses default code generation.
>=20
> Stage started about 9 min into the boost-libs activity.
> Package started around 4.5 minutes later. End result
> (showing only boost-libs):
>=20
> [00:15:56] [01] [00:15:48] Finished devel/boost-libs | =
boost-libs-1.84.0: Success
>=20

Given that, I then tried the armv7 poudriere jail from
inside the chroot into the PkgBase aarch64, boot kernel
and world still being the -mcpu=3Dcortex-a76 and LSE
atomics context:

[00:15:08] [01] [00:14:57] Finished devel/boost-libs | =
boost-libs-1.84.0: Success

But my earlier reported test without that chroot involved
(so from a -mcpu=3Dcortex-a76 world) was 01:13:23 instead of
00:14:57 :

[06:59:23] [01] [01:13:23] Finished devel/boost-libs | =
boost-libs-1.84.0: Success

So it appears that having the poudriere armv7 jail matched
with a closest-containing PkgBase aarch64 world (chroot
here), and a kernel using LSE atomics, gets the shorter
time frame for the armv7 context. (I've never seen the
shorter time frame on a cortex-a72/a53 for which LSE
atomics is not an option, non matter if armv7 was involved
or not.)

=3D=3D=3D
Mark Millard
marklmi at yahoo.com




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?5FA20383-1469-439C-9F6E-6707C3447F2B>