Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 5 Aug 2024 00:42:01 -0700
From:      Mark Millard <marklmi@yahoo.com>
To:        mmel@freebsd.org
Cc:        FreeBSD Toolchain <freebsd-toolchain@freebsd.org>, FreeBSD ARM List <freebsd-arm@freebsd.org>
Subject:   Re: Any known way to build devel/llvm* ( such as devel/llvm19 ) with --threads=1 for its linker activity during the build?
Message-ID:  <E65B240F-D503-46C7-A512-F60BDB18F55A@yahoo.com>
In-Reply-To: <E8E2166F-06DD-42FF-B54E-215BC507B3C3@yahoo.com>
References:  <4FFD603F-E67C-4B62-B91B-8BE365EAA050@yahoo.com> <82E78798-C376-45C4-80FE-96AD14229419@yahoo.com> <dcfa36c0-8ba6-4e8f-937d-17a99d8b23cf@freebsd.org> <F65EFFEF-FD93-49AB-B0E0-7BF880760EA8@yahoo.com> <E8E2166F-06DD-42FF-B54E-215BC507B3C3@yahoo.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Aug 5, 2024, at 00:27, Mark Millard <marklmi@yahoo.com> wrote:

> On Aug 5, 2024, at 00:15, Mark Millard <marklmi@yahoo.com> wrote:
>=20
>> On Aug 4, 2024, at 22:53, Michal Meloun <meloun.michal@gmail.com> =
wrote:
>>=20
>>> On 04.08.2024 23:31, Mark Millard wrote:
>>>> On Aug 3, 2024, at 23:07, Mark Millard <marklmi@yahoo.com> wrote:
>>>>> My recent attempts to build devel/llvm18 and devel/llvm19 in an =
armv7 context (native or aarch64-as-armv7) have had /usr/bin/ld failures =
that stop the build and report as:
>>>>>=20
>>>>> LLVM ERROR: out of memory
>>>>> Allocation failed
>>>>>=20
>>>>> (no system OOM activity or notices, so just a process =
size/fragmentation issue, or so I would expect).
>>>>>=20
>>>>> On native armv7 I also had rust 1.79.0 fail that way so --but =
aarch64-as-armv7 built it okay.
>>>>>=20
>>>>> I'm curious if --threads=3D1 use for the linker might allow the =
devel/llvm* builds to complete at this point. Similarly for rust. (top =
showed that the ld activity was multi-threaded.)
>>>>>=20
>>>>> Note: The structure of the poudriere-devel based native build =
attempts is historical and it used to work. Similarly for the =
aarch64-as-armv7 based build attempts. For now I'd just be exploring =
changes that might allow much of my historical overall structure to =
still work. But I expect that things are just growing to the point =
building is starting to be problematical with process address spaces =
that are bounded by a limit somewhat under 4 GiBytes.
>>>>>=20
>>>>>=20
>>>>> Native armv7 was a 2 GiByte OrangePi+ 2ed (4 cores) that had
>>>>> at boot time:
>>>>>=20
>>>>> AVAIL_RAM+SWAP =3D=3D 1958Mi+3685Mi =3D=3D 5643Mi
>>>>>=20
>>>>> and later had "Max(imum)Obs(erved)" figures:
>>>>>=20
>>>>> Mem: . . .,
>>>>> 1728Mi MaxObsActive, 275192Ki MaxObsWired, 1952Mi =
MaxObs(Act+Wir+Lndry)
>>>>>=20
>>>>> Swap: 3685Mi Total, . . .,
>>>>> 1535Mi MaxObsUsed, 3177Mi MaxObs(Act+Lndry+SwapUsed),
>>>>> 3398Mi MaxObs(A+Wir+L+SU), 3449Mi (A+W+L+SU+InAct)
>>>>>=20
>>>>>=20
>>>>> The aarch64-as-armv7 was a Win DevKit 2023 that has 8 cores and:
>>>>>=20
>>>>> AVAIL_RAM+SWAP =3D=3D 31311Mi+120831Mi =3D=3D 152142Mi
>>>>>=20
>>>>> So lots of 4 GiByte or smaller processes would fit.
>>>>>=20
>>>> Absent finding a way to get --threads=3D1 to be what is used, I
>>>> made the following crude way to test, built it, installed it
>>>> in the armv7 directory tree used for aarch64-as-armv7, and
>>>> then started an aarch64-as-armv7 test of building devel/llvm19
>>>> to see what the consequences are (leading whitespace details
>>>> might not be preserved):
>>>> # git -C /usr/main-src/ diff contrib/llvm-project/
>>>> diff --git a/contrib/llvm-project/lld/ELF/Driver.cpp =
b/contrib/llvm-project/lld/ELF/Driver.cpp
>>>> index 8b2c32b15348..299daf7dd6fa 100644
>>>> --- a/contrib/llvm-project/lld/ELF/Driver.cpp
>>>> +++ b/contrib/llvm-project/lld/ELF/Driver.cpp
>>>> @@ -1587,6 +1587,9 @@ static void readConfigs(opt::InputArgList =
&args) {
>>>>            arg->getValue() + "'");
>>>>    parallel::strategy =3D hardware_concurrency(threads);
>>>>    config->thinLTOJobs =3D v;
>>>> +  } else if (sizeof(void*) <=3D 4) {
>>>> +    log("set maximum concurrency to 1, specify --threads=3D to =
change");
>>>> +    parallel::strategy =3D hardware_concurrency(1);
>>>>  } else if (parallel::strategy.compute_thread_count() > 16) {
>>>>    log("set maximum concurrency to 16, specify --threads=3D to =
change");
>>>>    parallel::strategy =3D hardware_concurrency(16);
>>>> Basically, if the process address space has to be "small", avoid
>>>> any default memory use tradeoffs that multi-threading the linker
>>>> might involve --even if that means taking more time.
>>>> We will see if:
>>>> [00:00:33] [07] [00:00:00] Building   devel/llvm19@default | =
llvm19-19.1.0.r1
>>>> still fails to build as armv7 vs. if the change leads it to
>>>> manage to build as armv7.
>>>> =3D=3D=3D
>>>> Mark Millard
>>>> marklmi at yahoo.com
>>>=20
>>> I can build llvm18 and rust 1.79 on native armv7  without problems - =
on Tegra TK1, without poudriere and on the ufs filesystem. IMHO =
poudriere is unusable on 32bit systems.
>>=20
>> On Windows DevKit 2023 in a armv7 chroot I can build rust 1.79.0
>> as well. I've not tried a recent devel/llvm18 in that context,
>> just devel/llvm19 . An armv7 process in this context can use
>> about 1 GiByte more memory space than on the OrangePi+ 2ed. (See
>> later program example outputs.)
>>=20
>> Previously, devel/llvm18-18.1.7 had built fine some time back.
>> So I'm trying the modern 18.1.8_1 now on the Windows DevKit 2023.
>> But this is with forcing of --threads=3D1 for lld: same context as
>> the recent devel/llvm19 exploration.
>>=20
>> Note: UFS context, not ZFS.
>>=20
>> How does the Tegra TK1 context compare for the following
>> program and the example command?
>>=20
>> OrangePi+ 2ed (so: armv7 native with 2 GiBytes of RAM):
>>=20
>> # more process_size.c
>> // cc -std=3Dc11 process_size.c
>> // ./a.out 268435456 268435456 268435456 268435456 268435456 =
268435456 268435456 268435456 268435456 268435456 268435456 268435456 =
268435456 134217728 67108864 33554432 16777216 8388608 4194304 2097152 =
1048576
>>=20
>> #include <malloc.h>
>> #include <errno.h>
>> #include <stdio.h>
>> #include <stdlib.h>
>> #include <limits.h>
>>=20
>> int main(int argc, char *argv[])
>> {
>> size_t totalsize=3D 0u;
>> for (int i =3D 1; i < argc; ++i) {
>>  errno =3D 0;
>>  size_t size =3D strtoul(argv[i],NULL,0);
>>  void *p =3D malloc(size);
>>  if (p) totalsize +=3D size;
>>  printf("malloc(%zu) =3D %p [errno =3D %d]\n", size, p, errno);
>> }
>> printf("approx. total, a lower bound: %zu MiBytes\n", =
totalsize/1024u/1024u);
>> return 0;
>> }
>> # cc -std=3Dc11 process_size.c
>> # ./a.out 268435456 268435456 268435456 268435456 268435456 268435456 =
268435456 268435456 268435456 268435456 268435456 268435456 268435456 =
134217728 67108864 33554432 16777216 8388608 4194304 2097152 1048576
>> malloc(268435456) =3D 0x20800180 [errno =3D 0]
>> malloc(268435456) =3D 0x30801980 [errno =3D 0]
>> malloc(268435456) =3D 0x40802640 [errno =3D 0]
>> malloc(268435456) =3D 0x50803600 [errno =3D 0]
>> malloc(268435456) =3D 0x608048c0 [errno =3D 0]
>> malloc(268435456) =3D 0x70805140 [errno =3D 0]
>> malloc(268435456) =3D 0x80806580 [errno =3D 0]
>> malloc(268435456) =3D 0x90807780 [errno =3D 0]
>> malloc(268435456) =3D 0xa0808700 [errno =3D 0]
>> malloc(268435456) =3D 0x0 [errno =3D 12]
>> malloc(268435456) =3D 0x0 [errno =3D 12]
>> malloc(268435456) =3D 0x0 [errno =3D 12]
>> malloc(268435456) =3D 0x0 [errno =3D 12]
>> malloc(134217728) =3D 0xb0809a00 [errno =3D 0]
>> malloc(67108864) =3D 0x0 [errno =3D 12]
>> malloc(33554432) =3D 0xb880a5c0 [errno =3D 0]
>> malloc(16777216) =3D 0xba80b0c0 [errno =3D 0]
>> malloc(8388608) =3D 0x0 [errno =3D 12]
>> malloc(4194304) =3D 0x0 [errno =3D 12]
>> malloc(2097152) =3D 0xbb80c180 [errno =3D 0]
>> malloc(1048576) =3D 0xbba0de80 [errno =3D 0]
>> approx. total, a lower bound: 2483 MiBytes
>>=20
>>=20
>> Same program with same command on Windows DevKit 2023 in
>> armv7 chroot (aarch64-as-armv7 with 32 GiBytes of RAM):
>>=20
>> # ./a.out 268435456 268435456 268435456 268435456 268435456 268435456 =
268435456 268435456 268435456 268435456 268435456 268435456 268435456 =
134217728 67108864 33554432 16777216 8388608 4194304 2097152 1048576
>> malloc(268435456) =3D 0x20800b00 [errno =3D 0]
>> malloc(268435456) =3D 0x30801600 [errno =3D 0]
>> malloc(268435456) =3D 0x40802cc0 [errno =3D 0]
>> malloc(268435456) =3D 0x50803c80 [errno =3D 0]
>> malloc(268435456) =3D 0x608042c0 [errno =3D 0]
>> malloc(268435456) =3D 0x70805b00 [errno =3D 0]
>> malloc(268435456) =3D 0x808063c0 [errno =3D 0]
>> malloc(268435456) =3D 0x90807580 [errno =3D 0]
>> malloc(268435456) =3D 0xa0808b40 [errno =3D 0]
>> malloc(268435456) =3D 0xb0809980 [errno =3D 0]
>> malloc(268435456) =3D 0xc080abc0 [errno =3D 0]
>> malloc(268435456) =3D 0xd080ba00 [errno =3D 0]
>> malloc(268435456) =3D 0xe080cc80 [errno =3D 0]
>> malloc(134217728) =3D 0xf080d700 [errno =3D 0]
>> malloc(67108864) =3D 0x0 [errno =3D 12]
>> malloc(33554432) =3D 0xf880eb40 [errno =3D 0]
>> malloc(16777216) =3D 0xfa80fc00 [errno =3D 0]
>> malloc(8388608) =3D 0x0 [errno =3D 12]
>> malloc(4194304) =3D 0xfb810840 [errno =3D 0]
>> malloc(2097152) =3D 0xfbc117c0 [errno =3D 0]
>> malloc(1048576) =3D 0xfbe12940 [errno =3D 0]
>> approx. total, a lower bound: 3511 MiBytes
>>=20
>>=20
>> Note: If the Tegra TK1 in question has more than
>> 4 GiBytes  of RAM, the command line should explore
>> more than the example that I used.
>>=20
>>=20
>> Note: I've used the program for other patterns of
>> allocations. That is why it is not just a fixed
>> exploration algorithm.
>>=20
>>=20
>> As for poudriere-devel, I find it useful, even on
>> the OrangePi+ 2ed. But mostly that is a rare run
>> that is checking on how well the handling goes for
>> the 2 GiByte of RAM context (with notable SWAP for
>> the size of RAM). In other words, monitoring the
>> growth in a context that will break sooner than
>> my other contexts generally would. The tests take
>> days overall, most of the time being for rust and
>> a llvm* .
>>=20
>> Historically I've been able to have 2 builders,
>> each with MAKE_JOBS_NUMBER_LIMIT=3D2 , so all 4
>> cores in use building lang/rust and a devel/llvm*
>> at the same time successfully in poudriere-devel
>> on the 2 GiByte OrangePi+ 2ed. (This was before
>> recently imposing --threads=3D1 experiments,
>> given the recent build failures.)
>=20
> I should have noted that my normal devel/llvm* builds
> on aarch64 and armv7 avoid building: BE_AMDGPU and
> MLIR . They also target BE_NATIVE instead of
> BE_STANDARD . (aarch64 BE_NATIVE includes armv7 as
> well.)


Looking around, I see that my Windows DevKit 2023 context
still has /etc/sysctl.conf containing:

# Help armv7 effectively have more address space:
kern.maxssiz=3D67108864
kern.maxdsiz=3D536870912

That actually dates back to before some related commit(s)
were done for the armv7 process size issue --and might
not be useful any more.

=3D=3D=3D
Mark Millard
marklmi at yahoo.com




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?E65B240F-D503-46C7-A512-F60BDB18F55A>