Date: Tue, 25 Jan 2022 13:23:51 -0800 From: Mark Millard <marklmi@yahoo.com> To: bob prohaska <fbsd@www.zefox.net> Cc: Free BSD <freebsd-arm@freebsd.org> Subject: Re: Troubles building world on stable/13 Message-ID: <47B71886-ADDD-43C4-A0E0-0DB066E3E9D2@yahoo.com> In-Reply-To: <35046946-7FE4-4E44-950F-BF9CCA72D8F0@yahoo.com> References: <20220121031601.GA26308@www.zefox.net> <FA290367-D4B6-463D-AC67-64F224B3C227@yahoo.com> <FBD31544-6D8F-40DB-BC36-F0B2BBA78A14@yahoo.com> <8595CFBD-DC65-4472-A0A1-8A7BE1C031D6@yahoo.com> <20220124165449.GA39982@www.zefox.net> <5FAC2B2C-7740-435E-A183-FB3EF1FCE7F9@yahoo.com> <1CB4EDCD-0998-4363-8CEA-14854EB76FA3@yahoo.com> <20220125162245.GA43635@www.zefox.net> <61A3CF79-552C-4884-A8EA-85003B249856@yahoo.com> <20220125180823.GB43635@www.zefox.net> <35046946-7FE4-4E44-950F-BF9CCA72D8F0@yahoo.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On 2022-Jan-25, at 12:49, Mark Millard <marklmi@yahoo.com> wrote: > On 2022-Jan-25, at 10:08, bob prohaska <fbsd@www.zefox.net> wrote: >=20 >> On Tue, Jan 25, 2022 at 09:13:08AM -0800, Mark Millard wrote: >>>=20 >>> -DBATCH ? I'm not aware of there being any use of that symbol. >>> Do you have a documentation reference for it so that I could >>> read about it? >>>=20 >> It's a switch to turn off dialog4ports. I can't find the reference >> now. Perhaps it's been deprecated? A name like -DUSE_DEFAULTS would >> be easier to understand anyway.=20 >=20 > I've never had buildworld buildkernel or the like try to use > dialog4ports. I've only had port building use it. buildworld > and buildkernel can be done with no ports installed at all. > dialog4ports is a port. >=20 > I think -DBATCH was ignored for the activity at hand. Actual evidence for my claim: # grep -r "\<BATCH\>" /usr/main-src/Makefile* /usr/main-src/share/mk/ | = more # grep -r "\<BATCH\>" /usr/main-src/Makefile* /usr/ports/Mk/ | more /usr/ports/Mk/bsd.licenses.mk:.if ${_LICENSE_STATUS} =3D=3D "ask" && = defined(BATCH) /usr/ports/Mk/bsd.licenses.mk:IGNORE=3D License ${_LICENSE} = needs confirmation, but BATCH is defined /usr/ports/Mk/Uses/perl5.mk:. if defined(BATCH) && = !defined(IS_INTERACTIVE) /usr/ports/Mk/Uses/perl5.mk:. endif # defined(BATCH) && = !defined(IS_INTERACTIVE) /usr/ports/Mk/Uses/cmake.mk:# Default: not set, unless = BATCH or PACKAGE_BUILDING is defined /usr/ports/Mk/Uses/cmake.mk:.if defined(BATCH) || = defined(PACKAGE_BUILDING) /usr/ports/Mk/bsd.port.mk:# to skip this = port by setting ${BATCH}, or compiling only /usr/ports/Mk/bsd.port.mk:.if defined(BATCH) /usr/ports/Mk/bsd.port.mk:.if defined(BATCH) /usr/ports/Mk/bsd.port.mk:SCRIPTS_ENV+=3D BATCH=3Dyes /usr/ports/Mk/bsd.port.mk:# If we're in BATCH mode and the port is = interactive, or we're /usr/ports/Mk/bsd.port.mk:# one might want to leave a build in BATCH = mode running /usr/ports/Mk/bsd.port.mk:.if (defined(IS_INTERACTIVE) && = defined(BATCH)) /usr/ports/Mk/bsd.port.mk: defined(PACKAGE_BUILDING) || = defined(BATCH)) >> On a whim, I tried building devel/llvm13 on a Pi4 running -current = with=20 >> 8 GB of RAM and 8 GB of swap. To my surprise, that stopped with: >> nemesis.zefox.com kernel log messages: >> +FreeBSD 14.0-CURRENT #26 main-5025e85013: Sun Jan 23 17:25:31 PST = 2022 >> +swap_pager: indefinite wait buffer: bufobj: 0, blkno: 1873450, size: = 4096 >> +swap_pager: indefinite wait buffer: bufobj: 0, blkno: 521393, size: = 4096 >> +swap_pager: indefinite wait buffer: bufobj: 0, blkno: 209826, size: = 12288 >> +swap_pager: indefinite wait buffer: bufobj: 0, blkno: 1717218, size: = 24576 >> +pid 56508 (c++), jid 0, uid 0, was killed: failed to reclaim memory >>=20 >> On an 8GB machine, that seems strange.=20 >=20 > -j<What?> build? -j4 ? >=20 > Were you watching the swap usage in top (or some such)? >=20 > Note: The "was killed" related notices have been improved > in main, but there is a misnomer case about "out of swap" > (last I checked). >=20 > An environment that gets "swap_pager: indefinite wait buffer" > notices is problematical and the I/O delays for the virtual > memory subsystem can lead to kills, if I understand right. >=20 > But, if I remember right, the actual message for a directly > I/O related kill is now different. >=20 > I think that being able to reproduce this case could be > important. I probably can not because I'd not get the > "swap_pager: indefinite wait buffer" in my hardware > context. >=20 >> Per the failure message I restarted the build of devel/llvm13 with=20 >> make -DBATCH MAKE_JOBS_UNSAFE=3DYES > make.log & >=20 > Just like -DBATCH is for ports, not buildworld buildkernel, > MAKE_JOBS_UNSAFE=3D is for ports, not buildworld buildkernel, > at least if I understand right. >=20 > In other words, it probably would have been the same result > without the two arguments. Actual evidence for my claim for MAKE_JOBS_UNSAFE : # grep -r MAKE_JOBS_UNSAFE /usr/main-src/Makefile* = /usr/main-src/share/mk/ | more # grep -r MAKE_JOBS_UNSAFE /usr/ports/Mk/ /usr/ports/Mk/bsd.port.mk:# MAKE_JOBS_UNSAFE /usr/ports/Mk/bsd.port.mk:.if defined(DISABLE_MAKE_JOBS) || = defined(MAKE_JOBS_UNSAFE) /usr/ports/Mk/bsd.port.mk:BUILD_FAIL_MESSAGE+=3D Try to set = MAKE_JOBS_UNSAFE=3Dyes and rebuild before reporting the failure to the = maintainer. /usr/ports/Mk/bsd.gecko.mk:.if defined(DISABLE_MAKE_JOBS) || = defined(MAKE_JOBS_UNSAFE) >> It seems to be running with only one thread so far, not sure if = that's >> by design or happenstance. >>=20 >>>> However, restarting buildworld using -j1 appears to have worked = past >>>> the former point of failure. >>>=20 >>> Hmm. That usually means one (or both) of two things was involved >>> in the failure: >>>=20 >>> A) a build race where something is not (fully) ready when >>> it is used >>>=20 >>> B) running out of resources, such as RAM+SWAP >>>=20 >>=20 >> The stable/13 machine is short of swap; it has only 2 GB, which >> used to be enough. >=20 > So RAM+SWAP is 1 GiByte + 2 GiByte, so 3 GiByte on that > RPi3*? (That would have been good to know earlier, such > as for my attempts at reproduction.) >=20 > -j<What?> for the RPi3* when it was failing? >=20 > Did you havae failures with the .cpp and .sh (so no > make use involved) in the RAM+SWAP context? >=20 >> Maybe that's the problem, but having an error=20 >> report that says it's a segfault is a confusing diagnostic.=20 >>=20 >>> But, as I understand, you were able to use a .cpp and >>> .sh file pair that had been produced to repeat the >>> problem on the RPi3B --and that would not have been a >>> parallel-activity context. >>>=20 >>=20 >> To be clear, the reproduction was on the same stable/13 that >> reported the original failure. An attempt at reproduction >> on a different Pi3 running -current ran without any errors. >> Come to think of it, that machine had more swap, too. >=20 > How much swap? >=20 >>>> It's in the building libraries phase now. >>>> Based on log size I'd guess it's about halfway through buildworld. >>>>=20 >>>=20 >>> Well, hopefully you will not be stuck with -j1 builds in >>> the future as well. >>>=20 >> Indeed! >=20 > At this point, I expect that the failure was tied to the > RAM+SWAP totaling to 3 GiBytes. >=20 > Knowing that context we might have a reproducible report > that can be made based on the .cpp and .sh files, where > restricting the RAM+SWAP use allowed is part of the > report. =3D=3D=3D Mark Millard marklmi at yahoo.com
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?47B71886-ADDD-43C4-A0E0-0DB066E3E9D2>