Date: Mon, 5 Aug 2024 02:09:24 -0700 From: Mark Millard <marklmi@yahoo.com> To: mmel@freebsd.org Cc: FreeBSD Toolchain <freebsd-toolchain@freebsd.org>, FreeBSD ARM List <freebsd-arm@freebsd.org> Subject: Re: Any known way to build devel/llvm* ( such as devel/llvm19 ) with --threads=1 for its linker activity during the build? Message-ID: <EFEE572A-B34D-4A02-AA15-D7E15F12A826@yahoo.com> In-Reply-To: <0b3b532c-ae94-439c-81aa-9e80a08af43f@freebsd.org> References: <4FFD603F-E67C-4B62-B91B-8BE365EAA050@yahoo.com> <82E78798-C376-45C4-80FE-96AD14229419@yahoo.com> <dcfa36c0-8ba6-4e8f-937d-17a99d8b23cf@freebsd.org> <F65EFFEF-FD93-49AB-B0E0-7BF880760EA8@yahoo.com> <E8E2166F-06DD-42FF-B54E-215BC507B3C3@yahoo.com> <0b3b532c-ae94-439c-81aa-9e80a08af43f@freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Aug 5, 2024, at 00:44, meloun.michal@gmail.com wrote: > On 05.08.2024 9:27, Mark Millard wrote: >> On Aug 5, 2024, at 00:15, Mark Millard <marklmi@yahoo.com> wrote: >>> On Aug 4, 2024, at 22:53, Michal Meloun <meloun.michal@gmail.com> = wrote: >>>=20 >>>> On 04.08.2024 23:31, Mark Millard wrote: >>>>> On Aug 3, 2024, at 23:07, Mark Millard <marklmi@yahoo.com> wrote: >>>>>> My recent attempts to build devel/llvm18 and devel/llvm19 in an = armv7 context (native or aarch64-as-armv7) have had /usr/bin/ld failures = that stop the build and report as: >>>>>>=20 >>>>>> LLVM ERROR: out of memory >>>>>> Allocation failed >>>>>>=20 >>>>>> (no system OOM activity or notices, so just a process = size/fragmentation issue, or so I would expect). >>>>>>=20 >>>>>> On native armv7 I also had rust 1.79.0 fail that way so --but = aarch64-as-armv7 built it okay. >>>>>>=20 >>>>>> I'm curious if --threads=3D1 use for the linker might allow the = devel/llvm* builds to complete at this point. Similarly for rust. (top = showed that the ld activity was multi-threaded.) >>>>>>=20 >>>>>> Note: The structure of the poudriere-devel based native build = attempts is historical and it used to work. Similarly for the = aarch64-as-armv7 based build attempts. For now I'd just be exploring = changes that might allow much of my historical overall structure to = still work. But I expect that things are just growing to the point = building is starting to be problematical with process address spaces = that are bounded by a limit somewhat under 4 GiBytes. >>>>>>=20 >>>>>>=20 >>>>>> Native armv7 was a 2 GiByte OrangePi+ 2ed (4 cores) that had >>>>>> at boot time: >>>>>>=20 >>>>>> AVAIL_RAM+SWAP =3D=3D 1958Mi+3685Mi =3D=3D 5643Mi >>>>>>=20 >>>>>> and later had "Max(imum)Obs(erved)" figures: >>>>>>=20 >>>>>> Mem: . . ., >>>>>> 1728Mi MaxObsActive, 275192Ki MaxObsWired, 1952Mi = MaxObs(Act+Wir+Lndry) >>>>>>=20 >>>>>> Swap: 3685Mi Total, . . ., >>>>>> 1535Mi MaxObsUsed, 3177Mi MaxObs(Act+Lndry+SwapUsed), >>>>>> 3398Mi MaxObs(A+Wir+L+SU), 3449Mi (A+W+L+SU+InAct) >>>>>>=20 >>>>>>=20 >>>>>> The aarch64-as-armv7 was a Win DevKit 2023 that has 8 cores and: >>>>>>=20 >>>>>> AVAIL_RAM+SWAP =3D=3D 31311Mi+120831Mi =3D=3D 152142Mi >>>>>>=20 >>>>>> So lots of 4 GiByte or smaller processes would fit. >>>>>>=20 >>>>> Absent finding a way to get --threads=3D1 to be what is used, I >>>>> made the following crude way to test, built it, installed it >>>>> in the armv7 directory tree used for aarch64-as-armv7, and >>>>> then started an aarch64-as-armv7 test of building devel/llvm19 >>>>> to see what the consequences are (leading whitespace details >>>>> might not be preserved): >>>>> # git -C /usr/main-src/ diff contrib/llvm-project/ >>>>> diff --git a/contrib/llvm-project/lld/ELF/Driver.cpp = b/contrib/llvm-project/lld/ELF/Driver.cpp >>>>> index 8b2c32b15348..299daf7dd6fa 100644 >>>>> --- a/contrib/llvm-project/lld/ELF/Driver.cpp >>>>> +++ b/contrib/llvm-project/lld/ELF/Driver.cpp >>>>> @@ -1587,6 +1587,9 @@ static void readConfigs(opt::InputArgList = &args) { >>>>> arg->getValue() + "'"); >>>>> parallel::strategy =3D hardware_concurrency(threads); >>>>> config->thinLTOJobs =3D v; >>>>> + } else if (sizeof(void*) <=3D 4) { >>>>> + log("set maximum concurrency to 1, specify --threads=3D to = change"); >>>>> + parallel::strategy =3D hardware_concurrency(1); >>>>> } else if (parallel::strategy.compute_thread_count() > 16) { >>>>> log("set maximum concurrency to 16, specify --threads=3D to = change"); >>>>> parallel::strategy =3D hardware_concurrency(16); >>>>> Basically, if the process address space has to be "small", avoid >>>>> any default memory use tradeoffs that multi-threading the linker >>>>> might involve --even if that means taking more time. >>>>> We will see if: >>>>> [00:00:33] [07] [00:00:00] Building devel/llvm19@default | = llvm19-19.1.0.r1 >>>>> still fails to build as armv7 vs. if the change leads it to >>>>> manage to build as armv7. >>>>> =3D=3D=3D >>>>> Mark Millard >>>>> marklmi at yahoo.com >>>>=20 >>>> I can build llvm18 and rust 1.79 on native armv7 without problems = - on Tegra TK1, without poudriere and on the ufs filesystem. IMHO = poudriere is unusable on 32bit systems. >>>=20 >>> On Windows DevKit 2023 in a armv7 chroot I can build rust 1.79.0 >>> as well. I've not tried a recent devel/llvm18 in that context, >>> just devel/llvm19 . An armv7 process in this context can use >>> about 1 GiByte more memory space than on the OrangePi+ 2ed. (See >>> later program example outputs.) >>>=20 >>> Previously, devel/llvm18-18.1.7 had built fine some time back. >>> So I'm trying the modern 18.1.8_1 now on the Windows DevKit 2023. >>> But this is with forcing of --threads=3D1 for lld: same context as >>> the recent devel/llvm19 exploration. >>>=20 >>> Note: UFS context, not ZFS. >>>=20 >>> How does the Tegra TK1 context compare for the following >>> program and the example command? >>>=20 >>> OrangePi+ 2ed (so: armv7 native with 2 GiBytes of RAM): >>>=20 >>> # more process_size.c >>> // cc -std=3Dc11 process_size.c >>> // ./a.out 268435456 268435456 268435456 268435456 268435456 = 268435456 268435456 268435456 268435456 268435456 268435456 268435456 = 268435456 134217728 67108864 33554432 16777216 8388608 4194304 2097152 = 1048576 >>>=20 >>> #include <malloc.h> >>> #include <errno.h> >>> #include <stdio.h> >>> #include <stdlib.h> >>> #include <limits.h> >>>=20 >>> int main(int argc, char *argv[]) >>> { >>> size_t totalsize=3D 0u; >>> for (int i =3D 1; i < argc; ++i) { >>> errno =3D 0; >>> size_t size =3D strtoul(argv[i],NULL,0); >>> void *p =3D malloc(size); >>> if (p) totalsize +=3D size; >>> printf("malloc(%zu) =3D %p [errno =3D %d]\n", size, p, errno); >>> } >>> printf("approx. total, a lower bound: %zu MiBytes\n", = totalsize/1024u/1024u); >>> return 0; >>> } >>> # cc -std=3Dc11 process_size.c >>> # ./a.out 268435456 268435456 268435456 268435456 268435456 = 268435456 268435456 268435456 268435456 268435456 268435456 268435456 = 268435456 134217728 67108864 33554432 16777216 8388608 4194304 2097152 = 1048576 >>> malloc(268435456) =3D 0x20800180 [errno =3D 0] >>> malloc(268435456) =3D 0x30801980 [errno =3D 0] >>> malloc(268435456) =3D 0x40802640 [errno =3D 0] >>> malloc(268435456) =3D 0x50803600 [errno =3D 0] >>> malloc(268435456) =3D 0x608048c0 [errno =3D 0] >>> malloc(268435456) =3D 0x70805140 [errno =3D 0] >>> malloc(268435456) =3D 0x80806580 [errno =3D 0] >>> malloc(268435456) =3D 0x90807780 [errno =3D 0] >>> malloc(268435456) =3D 0xa0808700 [errno =3D 0] >>> malloc(268435456) =3D 0x0 [errno =3D 12] >>> malloc(268435456) =3D 0x0 [errno =3D 12] >>> malloc(268435456) =3D 0x0 [errno =3D 12] >>> malloc(268435456) =3D 0x0 [errno =3D 12] >>> malloc(134217728) =3D 0xb0809a00 [errno =3D 0] >>> malloc(67108864) =3D 0x0 [errno =3D 12] >>> malloc(33554432) =3D 0xb880a5c0 [errno =3D 0] >>> malloc(16777216) =3D 0xba80b0c0 [errno =3D 0] >>> malloc(8388608) =3D 0x0 [errno =3D 12] >>> malloc(4194304) =3D 0x0 [errno =3D 12] >>> malloc(2097152) =3D 0xbb80c180 [errno =3D 0] >>> malloc(1048576) =3D 0xbba0de80 [errno =3D 0] >>> approx. total, a lower bound: 2483 MiBytes >>>=20 >>>=20 >>> Same program with same command on Windows DevKit 2023 in >>> armv7 chroot (aarch64-as-armv7 with 32 GiBytes of RAM): >>>=20 >>> # ./a.out 268435456 268435456 268435456 268435456 268435456 = 268435456 268435456 268435456 268435456 268435456 268435456 268435456 = 268435456 134217728 67108864 33554432 16777216 8388608 4194304 2097152 = 1048576 >>> malloc(268435456) =3D 0x20800b00 [errno =3D 0] >>> malloc(268435456) =3D 0x30801600 [errno =3D 0] >>> malloc(268435456) =3D 0x40802cc0 [errno =3D 0] >>> malloc(268435456) =3D 0x50803c80 [errno =3D 0] >>> malloc(268435456) =3D 0x608042c0 [errno =3D 0] >>> malloc(268435456) =3D 0x70805b00 [errno =3D 0] >>> malloc(268435456) =3D 0x808063c0 [errno =3D 0] >>> malloc(268435456) =3D 0x90807580 [errno =3D 0] >>> malloc(268435456) =3D 0xa0808b40 [errno =3D 0] >>> malloc(268435456) =3D 0xb0809980 [errno =3D 0] >>> malloc(268435456) =3D 0xc080abc0 [errno =3D 0] >>> malloc(268435456) =3D 0xd080ba00 [errno =3D 0] >>> malloc(268435456) =3D 0xe080cc80 [errno =3D 0] >>> malloc(134217728) =3D 0xf080d700 [errno =3D 0] >>> malloc(67108864) =3D 0x0 [errno =3D 12] >>> malloc(33554432) =3D 0xf880eb40 [errno =3D 0] >>> malloc(16777216) =3D 0xfa80fc00 [errno =3D 0] >>> malloc(8388608) =3D 0x0 [errno =3D 12] >>> malloc(4194304) =3D 0xfb810840 [errno =3D 0] >>> malloc(2097152) =3D 0xfbc117c0 [errno =3D 0] >>> malloc(1048576) =3D 0xfbe12940 [errno =3D 0] >>> approx. total, a lower bound: 3511 MiBytes >>>=20 >>>=20 >>> Note: If the Tegra TK1 in question has more than >>> 4 GiBytes of RAM, the command line should explore >>> more than the example that I used. >>>=20 >>>=20 >>> Note: I've used the program for other patterns of >>> allocations. That is why it is not just a fixed >>> exploration algorithm. >>>=20 >>>=20 >>> As for poudriere-devel, I find it useful, even on >>> the OrangePi+ 2ed. But mostly that is a rare run >>> that is checking on how well the handling goes for >>> the 2 GiByte of RAM context (with notable SWAP for >>> the size of RAM). In other words, monitoring the >>> growth in a context that will break sooner than >>> my other contexts generally would. The tests take >>> days overall, most of the time being for rust and >>> a llvm* . >>>=20 >>> Historically I've been able to have 2 builders, >>> each with MAKE_JOBS_NUMBER_LIMIT=3D2 , so all 4 >>> cores in use building lang/rust and a devel/llvm* >>> at the same time successfully in poudriere-devel >>> on the 2 GiByte OrangePi+ 2ed. (This was before >>> recently imposing --threads=3D1 experiments, >>> given the recent build failures.) >> I should have noted that my normal devel/llvm* builds >> on aarch64 and armv7 avoid building: BE_AMDGPU and >> MLIR . They also target BE_NATIVE instead of >> BE_STANDARD . (aarch64 BE_NATIVE includes armv7 as >> well.) >> =3D=3D=3D >> Mark Millard >> marklmi at yahoo.com > Tegra has 4 Cortex-A15 cores and 2 GB of RAM. OrangePi+ 2ed: Cortex-A7 with 4 cores and 2 GiBytes of RAM. I wonder if the 2483 MiBytes would end up being about the same on the Tegra variation indicated. > All ports are built with default options. The only non-standard item = is the swap size -> I have 16GB of swap on a swap partition on the SSD. Wow, 16 GiBYtes of swap space for 2 GiBytes of RAM. I guess when the swap is added that you get a notice-pair of the structure: QUOTE warning: total configured swap (. . . pages) exceeds maximum recommended = amount (. . . pages). warning: increase kern.maxswzone or reduce amount of swap. END QUOTE with a rather large difference between the two ". . ." figures. Do you make other adjustments to deal with the otherwise-reported potential mistuning? It appears to make tradeoffs in the kernel internal memory handling, if I understand right. > But I guess that's not important in this case. At least for my context, it appears that memory allocations are failing to find a big enough free area inside the process's address space --without running out of system RAM+SWAP space overall. For the OrangePi+ 2ed ( and devel/llvm18 18.1.7 ) it was during the earlier linker run for: FAILED: bin/lli-child-target=20 . . . LLVM ERROR: out of memory Allocation failed That much finished just fine on the Windows DevKit 2023 used via a armv7 jail ( devel/llvm18 18.1.8_1 ). The failure point was in a later link ( matching what I saw via devel/llvm19 ). > I just started build of llvm19 - but it takes few hours to complete.. Probably fewer hours than on the OrangePi+ 2ed but more than on the Windows DevKit 2023 (if they were completing, anyway). =3D=3D=3D Mark Millard marklmi at yahoo.com
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?EFEE572A-B34D-4A02-AA15-D7E15F12A826>