Date: Mon, 5 Aug 2024 09:44:56 +0200 From: meloun.michal@gmail.com To: Mark Millard <marklmi@yahoo.com> Cc: FreeBSD Toolchain <freebsd-toolchain@freebsd.org>, FreeBSD ARM List <freebsd-arm@freebsd.org> Subject: Re: Any known way to build devel/llvm* ( such as devel/llvm19 ) with --threads=1 for its linker activity during the build? Message-ID: <0b3b532c-ae94-439c-81aa-9e80a08af43f@freebsd.org> In-Reply-To: <E8E2166F-06DD-42FF-B54E-215BC507B3C3@yahoo.com> References: <4FFD603F-E67C-4B62-B91B-8BE365EAA050@yahoo.com> <82E78798-C376-45C4-80FE-96AD14229419@yahoo.com> <dcfa36c0-8ba6-4e8f-937d-17a99d8b23cf@freebsd.org> <F65EFFEF-FD93-49AB-B0E0-7BF880760EA8@yahoo.com> <E8E2166F-06DD-42FF-B54E-215BC507B3C3@yahoo.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On 05.08.2024 9:27, Mark Millard wrote: > On Aug 5, 2024, at 00:15, Mark Millard <marklmi@yahoo.com> wrote: > >> On Aug 4, 2024, at 22:53, Michal Meloun <meloun.michal@gmail.com> wrote: >> >>> On 04.08.2024 23:31, Mark Millard wrote: >>>> On Aug 3, 2024, at 23:07, Mark Millard <marklmi@yahoo.com> wrote: >>>>> My recent attempts to build devel/llvm18 and devel/llvm19 in an armv7 context (native or aarch64-as-armv7) have had /usr/bin/ld failures that stop the build and report as: >>>>> >>>>> LLVM ERROR: out of memory >>>>> Allocation failed >>>>> >>>>> (no system OOM activity or notices, so just a process size/fragmentation issue, or so I would expect). >>>>> >>>>> On native armv7 I also had rust 1.79.0 fail that way so --but aarch64-as-armv7 built it okay. >>>>> >>>>> I'm curious if --threads=1 use for the linker might allow the devel/llvm* builds to complete at this point. Similarly for rust. (top showed that the ld activity was multi-threaded.) >>>>> >>>>> Note: The structure of the poudriere-devel based native build attempts is historical and it used to work. Similarly for the aarch64-as-armv7 based build attempts. For now I'd just be exploring changes that might allow much of my historical overall structure to still work. But I expect that things are just growing to the point building is starting to be problematical with process address spaces that are bounded by a limit somewhat under 4 GiBytes. >>>>> >>>>> >>>>> Native armv7 was a 2 GiByte OrangePi+ 2ed (4 cores) that had >>>>> at boot time: >>>>> >>>>> AVAIL_RAM+SWAP == 1958Mi+3685Mi == 5643Mi >>>>> >>>>> and later had "Max(imum)Obs(erved)" figures: >>>>> >>>>> Mem: . . ., >>>>> 1728Mi MaxObsActive, 275192Ki MaxObsWired, 1952Mi MaxObs(Act+Wir+Lndry) >>>>> >>>>> Swap: 3685Mi Total, . . ., >>>>> 1535Mi MaxObsUsed, 3177Mi MaxObs(Act+Lndry+SwapUsed), >>>>> 3398Mi MaxObs(A+Wir+L+SU), 3449Mi (A+W+L+SU+InAct) >>>>> >>>>> >>>>> The aarch64-as-armv7 was a Win DevKit 2023 that has 8 cores and: >>>>> >>>>> AVAIL_RAM+SWAP == 31311Mi+120831Mi == 152142Mi >>>>> >>>>> So lots of 4 GiByte or smaller processes would fit. >>>>> >>>> Absent finding a way to get --threads=1 to be what is used, I >>>> made the following crude way to test, built it, installed it >>>> in the armv7 directory tree used for aarch64-as-armv7, and >>>> then started an aarch64-as-armv7 test of building devel/llvm19 >>>> to see what the consequences are (leading whitespace details >>>> might not be preserved): >>>> # git -C /usr/main-src/ diff contrib/llvm-project/ >>>> diff --git a/contrib/llvm-project/lld/ELF/Driver.cpp b/contrib/llvm-project/lld/ELF/Driver.cpp >>>> index 8b2c32b15348..299daf7dd6fa 100644 >>>> --- a/contrib/llvm-project/lld/ELF/Driver.cpp >>>> +++ b/contrib/llvm-project/lld/ELF/Driver.cpp >>>> @@ -1587,6 +1587,9 @@ static void readConfigs(opt::InputArgList &args) { >>>> arg->getValue() + "'"); >>>> parallel::strategy = hardware_concurrency(threads); >>>> config->thinLTOJobs = v; >>>> + } else if (sizeof(void*) <= 4) { >>>> + log("set maximum concurrency to 1, specify --threads= to change"); >>>> + parallel::strategy = hardware_concurrency(1); >>>> } else if (parallel::strategy.compute_thread_count() > 16) { >>>> log("set maximum concurrency to 16, specify --threads= to change"); >>>> parallel::strategy = hardware_concurrency(16); >>>> Basically, if the process address space has to be "small", avoid >>>> any default memory use tradeoffs that multi-threading the linker >>>> might involve --even if that means taking more time. >>>> We will see if: >>>> [00:00:33] [07] [00:00:00] Building devel/llvm19@default | llvm19-19.1.0.r1 >>>> still fails to build as armv7 vs. if the change leads it to >>>> manage to build as armv7. >>>> === >>>> Mark Millard >>>> marklmi at yahoo.com >>> >>> I can build llvm18 and rust 1.79 on native armv7 without problems - on Tegra TK1, without poudriere and on the ufs filesystem. IMHO poudriere is unusable on 32bit systems. >> >> On Windows DevKit 2023 in a armv7 chroot I can build rust 1.79.0 >> as well. I've not tried a recent devel/llvm18 in that context, >> just devel/llvm19 . An armv7 process in this context can use >> about 1 GiByte more memory space than on the OrangePi+ 2ed. (See >> later program example outputs.) >> >> Previously, devel/llvm18-18.1.7 had built fine some time back. >> So I'm trying the modern 18.1.8_1 now on the Windows DevKit 2023. >> But this is with forcing of --threads=1 for lld: same context as >> the recent devel/llvm19 exploration. >> >> Note: UFS context, not ZFS. >> >> How does the Tegra TK1 context compare for the following >> program and the example command? >> >> OrangePi+ 2ed (so: armv7 native with 2 GiBytes of RAM): >> >> # more process_size.c >> // cc -std=c11 process_size.c >> // ./a.out 268435456 268435456 268435456 268435456 268435456 268435456 268435456 268435456 268435456 268435456 268435456 268435456 268435456 134217728 67108864 33554432 16777216 8388608 4194304 2097152 1048576 >> >> #include <malloc.h> >> #include <errno.h> >> #include <stdio.h> >> #include <stdlib.h> >> #include <limits.h> >> >> int main(int argc, char *argv[]) >> { >> size_t totalsize= 0u; >> for (int i = 1; i < argc; ++i) { >> errno = 0; >> size_t size = strtoul(argv[i],NULL,0); >> void *p = malloc(size); >> if (p) totalsize += size; >> printf("malloc(%zu) = %p [errno = %d]\n", size, p, errno); >> } >> printf("approx. total, a lower bound: %zu MiBytes\n", totalsize/1024u/1024u); >> return 0; >> } >> # cc -std=c11 process_size.c >> # ./a.out 268435456 268435456 268435456 268435456 268435456 268435456 268435456 268435456 268435456 268435456 268435456 268435456 268435456 134217728 67108864 33554432 16777216 8388608 4194304 2097152 1048576 >> malloc(268435456) = 0x20800180 [errno = 0] >> malloc(268435456) = 0x30801980 [errno = 0] >> malloc(268435456) = 0x40802640 [errno = 0] >> malloc(268435456) = 0x50803600 [errno = 0] >> malloc(268435456) = 0x608048c0 [errno = 0] >> malloc(268435456) = 0x70805140 [errno = 0] >> malloc(268435456) = 0x80806580 [errno = 0] >> malloc(268435456) = 0x90807780 [errno = 0] >> malloc(268435456) = 0xa0808700 [errno = 0] >> malloc(268435456) = 0x0 [errno = 12] >> malloc(268435456) = 0x0 [errno = 12] >> malloc(268435456) = 0x0 [errno = 12] >> malloc(268435456) = 0x0 [errno = 12] >> malloc(134217728) = 0xb0809a00 [errno = 0] >> malloc(67108864) = 0x0 [errno = 12] >> malloc(33554432) = 0xb880a5c0 [errno = 0] >> malloc(16777216) = 0xba80b0c0 [errno = 0] >> malloc(8388608) = 0x0 [errno = 12] >> malloc(4194304) = 0x0 [errno = 12] >> malloc(2097152) = 0xbb80c180 [errno = 0] >> malloc(1048576) = 0xbba0de80 [errno = 0] >> approx. total, a lower bound: 2483 MiBytes >> >> >> Same program with same command on Windows DevKit 2023 in >> armv7 chroot (aarch64-as-armv7 with 32 GiBytes of RAM): >> >> # ./a.out 268435456 268435456 268435456 268435456 268435456 268435456 268435456 268435456 268435456 268435456 268435456 268435456 268435456 134217728 67108864 33554432 16777216 8388608 4194304 2097152 1048576 >> malloc(268435456) = 0x20800b00 [errno = 0] >> malloc(268435456) = 0x30801600 [errno = 0] >> malloc(268435456) = 0x40802cc0 [errno = 0] >> malloc(268435456) = 0x50803c80 [errno = 0] >> malloc(268435456) = 0x608042c0 [errno = 0] >> malloc(268435456) = 0x70805b00 [errno = 0] >> malloc(268435456) = 0x808063c0 [errno = 0] >> malloc(268435456) = 0x90807580 [errno = 0] >> malloc(268435456) = 0xa0808b40 [errno = 0] >> malloc(268435456) = 0xb0809980 [errno = 0] >> malloc(268435456) = 0xc080abc0 [errno = 0] >> malloc(268435456) = 0xd080ba00 [errno = 0] >> malloc(268435456) = 0xe080cc80 [errno = 0] >> malloc(134217728) = 0xf080d700 [errno = 0] >> malloc(67108864) = 0x0 [errno = 12] >> malloc(33554432) = 0xf880eb40 [errno = 0] >> malloc(16777216) = 0xfa80fc00 [errno = 0] >> malloc(8388608) = 0x0 [errno = 12] >> malloc(4194304) = 0xfb810840 [errno = 0] >> malloc(2097152) = 0xfbc117c0 [errno = 0] >> malloc(1048576) = 0xfbe12940 [errno = 0] >> approx. total, a lower bound: 3511 MiBytes >> >> >> Note: If the Tegra TK1 in question has more than >> 4 GiBytes of RAM, the command line should explore >> more than the example that I used. >> >> >> Note: I've used the program for other patterns of >> allocations. That is why it is not just a fixed >> exploration algorithm. >> >> >> As for poudriere-devel, I find it useful, even on >> the OrangePi+ 2ed. But mostly that is a rare run >> that is checking on how well the handling goes for >> the 2 GiByte of RAM context (with notable SWAP for >> the size of RAM). In other words, monitoring the >> growth in a context that will break sooner than >> my other contexts generally would. The tests take >> days overall, most of the time being for rust and >> a llvm* . >> >> Historically I've been able to have 2 builders, >> each with MAKE_JOBS_NUMBER_LIMIT=2 , so all 4 >> cores in use building lang/rust and a devel/llvm* >> at the same time successfully in poudriere-devel >> on the 2 GiByte OrangePi+ 2ed. (This was before >> recently imposing --threads=1 experiments, >> given the recent build failures.) > > I should have noted that my normal devel/llvm* builds > on aarch64 and armv7 avoid building: BE_AMDGPU and > MLIR . They also target BE_NATIVE instead of > BE_STANDARD . (aarch64 BE_NATIVE includes armv7 as > well.) > > > === > Mark Millard > marklmi at yahoo.com > Tegra has 4 Cortex-A15 cores and 2 GB of RAM. All ports are built with default options. The only non-standard item is the swap size -> I have 16GB of swap on a swap partition on the SSD. But I guess that's not important in this case. I just started build of llvm19 - but it takes few hours to complete.. Michal
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?0b3b532c-ae94-439c-81aa-9e80a08af43f>