Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 3 Jul 2021 17:43:51 -0700
From:      Mark Millard via freebsd-ports <freebsd-ports@freebsd.org>
To:        bob prohaska <fbsd@www.zefox.net>
Cc:        FreeBSD ports <freebsd-ports@freebsd.org>, freebsd-arm <freebsd-arm@freebsd.org>, FreeBSD Toolchain <freebsd-toolchain@freebsd.org>
Subject:   Re: llvm10 build failure on Rpi3
Message-ID:  <B836EE78-0534-4D8D-A0DD-486193FBF511@yahoo.com>
In-Reply-To: <20210703215445.GA18768@www.zefox.net>
References:  <C64D1A3F-A42E-42E3-8491-4DE9F6A96CFB@yahoo.com> <43513842-6FC0-4A89-8F0C-9EB2B328A5ED@yahoo.com> <9CFE71E2-23C3-4072-A8AD-74EDB339A146@yahoo.com> <A4669E1F-6DA9-492C-B06C-12AABE60FCEB@yahoo.com> <F2A8E1C3-EAAD-448A-9A97-979CC9ED9BE7@yahoo.com> <60EEFD09-97DE-4B4F-BAFD-61B96EF60E27@yahoo.com> <F727FF9A-CDFB-4C9C-8333-0FEA6C54976A@yahoo.com> <77A35ACF-275F-44C8-AEEE-4EFE5B5CBEA4@yahoo.com> <20210703182546.GA17871@www.zefox.net> <380184FB-6BA1-4C2D-9C6B-E249C2CF1317@yahoo.com> <20210703215445.GA18768@www.zefox.net>

next in thread | previous in thread | raw e-mail | index | archive | help
On 2021-Jul-3, at 14:54, bob prohaska <fbsd at www.zefox.net> wrote:

> On Sat, Jul 03, 2021 at 01:15:19PM -0700, Mark Millard wrote:
>>=20
>>=20
>>=20
>> So you still have not tried an artifacts or snapshot kernel+world?
>>=20
> Not yet.=20
>=20
>>> Eventually I resorted to running make in devel/llvm10, to my =
surprise it
>>> ran to completion.
>>=20
>> Interesting.
>>=20
>> Was this -j4? -j1? -j2? Any other interesting characteristics
>> for how it was run?
>>=20
> Nothing special was done. IIRC, it was make -DBATCH > make.log in
> the background. =46rom top's screen it looked like -j4.=20
>=20
>> It would be interesting to see if building in a chroot
>> in that make style also worked (or a non-poudriere jail).
>>=20
>=20
> Can you point me to instructions for doing the experiment?

I'll deal with this is a separate reply.

>>> It also ran make package successfully. Again I tried to
>>> build just devel/llvm10 using poudriere, again getting "expected =
expression".=20
>>>=20
>>> At that point I resized the swap partitions to 1 GB each and tried =
poudriere
>>> on devel/llvm10. That got rid of the excessive swap warnings, but =
didn't help.
>>> Finally I placed=20
>>> MAKE_JOBS_NUMBER=3D2=20
>>> in /usr/local/etc/poudriere.d/make.conf and tried again. That still =
failed,
>>> still with "expected expression".=20
>>=20
>> I'll note that the running build build shows Load Averages
>> of under 3. So the MAKE_JOBS_NUMBER=3D2 seems to be working.
>>=20
>>> Since devel/llvm10 had created a package successfully, I tried =
slipping a copy
>>> into poudriere's package directory, hoping it would find and use the =
package
>>> to make further progress. Unfortunately, poudriere seems to remember =
the failure
>>> and won't use the proffered package.=20
>>=20
> [large snip which convinced me to give up on tricking poudriere into
> using a package constructed by make]=20
>>=20
>> Going in a different direction, one way to force a build to
>> start over after a failure is to: rm -fr PATH/.building
>> before starting a new bulk build. This might be appropriate
> I'm missing something here: what does PATH represent? There's
> nothing called .building under /usr/local/poudriere, at least
> after the run finishes.=20

Part of how this works is that .building/ is initially
populated with a shadow copy of the already existing
.latest/ mostly via use of hard links, with some top
level files actually copied.

If the status of the bulk run reaches stopped:done: then the
.building/ is mv'd (renamed) to be of the form .real_*/
with a new match for the * and then the links are adjusted
to point to the new .real_*/ and the old .real_*/ is
removed. In your context, this happens inside:

/usr/local/poudriere/data/packages/main-default/

So, yes, your run that reached stopped:done: no longer
has a .building/

By contrast, say you ^C the bulk run or that it reaches the
stopped:crashed: state instead of stopped:done: . Then the
.building/ would still be present, as would the pre-existing
existing .real_*/ and the links that use it. This is the
context for the next bulk run reporting:

"Using packages from previously failed build: ${PACKAGES}/.building"


>> if one suspects a problem of a kind that did not stop a
>> build but produced something for a build that fails to operate
>> correctly.
>>=20
> Such as a corrupt llmv-tblgen?

Yep, possibly via it depending on something else that
has problems.

>> So lang/rust finished. That is interesting because it includes an
>> llvm build internally.
>>=20
>=20
> Does that build invoke the same llvm-tblgen?

Every devel/llvm* build builds its own llvm-tblgen .
lang/rust would build its own too. And the system
llvm support builds its own as well.

> [snip]=20
>> Again, poudriere does not control memory initialization in
>> the processes in the builders.
>>=20
>=20
> For some reason I got the idea that whatever  asked for memory to use
> was responsible for initializing it.

Part of the point of having memory management libraries
have way to be told to fill-in things like 0xA5u bytes is
to get hints about contexts that end up with memory not
explicitly initialized by the requesting program.

Such is why I had you try the contrasting junk:false
case in /etc/malloc.conf . The results showed what the
memory allocation library initialized with instead of
something specific to the code requesting the allocation.

> Certainly not the kernel.....

The kernel fills in bytes into some user-space memory
as part of doing various requested operations. In such
cases it is potentially possible for the kernel to not
have filled-in the memory like it should have.

It is also possible for the kernel to replace the bytes
seen by user-space memory that it should not touch.
There is an example on-going issue with this for the
32-bit powerpc kernels that cover using old PowerMacs.

>>> The fact that the stoppage reported looks like
>>> a syntax error specific to devel/llmv10 which is unaffected by swap =
pressure
>>> makes it seem unrelated to kernel or swap constraints.=20
>>=20
>> The files with the syntax errors are ones generated by llvm-tblgen
>> during the build and it is the output of llvm-tblgen that is corrupt,
>> showing evidence of having used memory not initialized like it should
>> have been.
>>=20
>=20
> Wouldn't that point suspicion at llvm-tblgen, of whatever version
> LLVM is actually doing the work?=20

It points at llvm-tblgen and/or something(s) that llvm-tblgen
depends on. Either way, the observed failure is from the
llvm-tblgen output being incorrect and later complained about.

devel/llvm10 builds its own llvm-tblgen for its own use. Each
devel/llvm* does. (As does the system's llvm*.)

There is also the variability in which llvm-tblgen output is
messed up: it is always some example of:

lib/Target/*/*GenGlobalISel.inc

but which value for the *'s tends to vary from build attempt
to build attempt. It suggests that some sort of race condition
is involved.

>>> AIUI, the hardware of the Pi4 is considerably different from the Pi3 =
in terms
>>> of memory management, noted from an interview with Eben Upton on =
YouTube.
>>=20
>> Why would Eben Upton be talking about FreeBSD's memory management?
>>=20
> He was talking about the Pi4 hardware and how it differed from the Pi3

Which is not memory management as such.

>> I suspect that the talk is not about what you think it is about,
>> but some narrower aspects than the overall memory managment.
>>=20
>=20
> I thought it had something to do with added DMA capablity. The video =
is at
> https://www.youtube.com/watch?v=3Dhyj-7mTnumI
> In light of the discussion about llvm-tblgen I'm doubtful it's =
relevant,
> but it's not the worst way to waste an hour.
>=20
>>=20
>>> Is there any sort of sanity test for the poudriere system? If I =
delete and
>>> re-create the existing jail can the existing package library be =
preserved
>>> and re-used? If not, that's OK, I'd just like to know beforehand.
>>>=20
>>=20
>> # poudriere jail -jNAME -d
>> # poudriere jail -c -jNAME -m null -M /WORLDPATH -S /SRCPATH -v =
14.0-CURRENT
>>=20
>> should work fine. But really all that you are
>> doing is (using an example from my environment)
>> is deleting and rewriting a few very small files
>> in a directory with the jail's name:
>>=20
> So, in my case /usr/local/poudriere/poudriere-system?=20

After the delete would be:

poudriere jail -c -jNAME -m null -M =
/usr/local/poudriere/poudriere-system -S /usr/src -v 14.0-CURRENT

Same as in your: http://www.zefox.org/~bob/readme

> (using the nomenclature in your sample instructions).
> That would leave /usr/local/poudriere/data intact....

Yep. The delete does have an option (-C ???) for causing
more to be deleted under /usr/local/poudriere/data/ .

(Despite documentation claims otherwise, it did not
seem to delete packages when reqeuested.)

> I'm starting to understand why you think it unlikely
> to help.
>=20
>> The deletion/replacement of timestamp may have rebuild
>> consequences from appearing to have changed (or just
>> being missing).
>>=20
> If timestamps guide decisions on what to make and when,
> that might be significant. Not sure how I might've screwed
> them up, but in my hands anything is possible 8-)

I took a quick look and did not notice any timestamp
comparisons controlling anything.

>> Nothing about any of those is going to change how memory
>> initialization is working in llvm-tblgen's operation
>> for generating any *GenGlobalISel.inc files, other than
>> if the timestamp forces some sort of rebuild from scratch
>> of some build dependencies first.
>>=20
> Maybe this should be obvious, but which llvm-tblgen is in=20
> action? the one from the system, (12.0.1) or something
> else?
>=20

devel/llvm10 builds its own llvm-tblgen and uses it.
Every devel/llvm* build builds its own llvm-tblgen .

Looking in the .log file for a build there are lines
containing commands that start out with (from my
example devel/llvm10 build context):

/wrkdirs/usr/ports/devel/llvm10/work/.build/bin/llvm-tblgen

Before any of those, there are commands associated with
building that bin/llvm-tblgen .

=3D=3D=3D
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?B836EE78-0534-4D8D-A0DD-486193FBF511>