Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 5 Feb 2021 10:34:27 -0800
From:      Mark Millard <marklmi@yahoo.com>
To:        Justin Hibbits <chmeeedalf@gmail.com>
Cc:        freebsd-ppc <freebsd-ppc@freebsd.org>
Subject:   Re: main (14-CURRENT) may be unstable on powerpc64
Message-ID:  <52899C49-E0DF-4E19-A89F-8B9376B8F1F5@yahoo.com>
In-Reply-To: <20210205113544.349ee77e@ralga.knownspace>
References:  <28E64465-8A99-43CF-8B4F-044533EA03C4.ref@yahoo.com> <28E64465-8A99-43CF-8B4F-044533EA03C4@yahoo.com> <20210205113544.349ee77e@ralga.knownspace>

next in thread | previous in thread | raw e-mail | index | archive | help


On 2021-Feb-5, at 09:35, Justin Hibbits <chmeeedalf at gmail.com> wrote:

> On Fri, 5 Feb 2021 04:05:55 -0800
> Mark Millard via freebsd-ppc <freebsd-ppc@freebsd.org> wrote:
>=20
>> I am running on a 2-socket/1-core-each PowerMac G5
>> (8 GiByte RAM) based on:
>>=20
>> # ~/fbsd-based-on-what-freebsd-main.sh=20
>> merge-base: 847dfd2803f6c8b077e3ebc68e35adff2c79a65f
>> merge-base: CommitDate: 2021-02-03 21:24:22 +0000
>> 325d7069b027 (HEAD -> mm-src) mm-src snapshot for mm's patched build
>> in git context. 847dfd2803f6 (freebsd/main, freebsd/HEAD, pure-src,
>> main) readelf: do not trucate section name with -W FreeBSD FBSDG5L2
>> 14.0-CURRENT FreeBSD 14.0-CURRENT mm-src-n244624-325d7069b027
>> GENERIC64vtsc-NODBG-dcons  powerpc powerpc64 1400003 1400003
>>=20
>> I attempted to rebuild the ports to get FreeBSD:14 based
>> versions but got the below oddity in the process:
>>=20
>> # poudriere bulk -jFBSDpowerpc64 -c -w -f
>> ~/origins/powerpc64-origins.txt . . .
>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D<p=
hase: package
>>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D =3D=3D=3D>  Building package for
>>> gettext-tools-0.21 =20
>> Child process pid=3D44950 terminated abnormally: Segmentation fault
>> Child process pid=3D44956 terminated abnormally: Segmentation fault
>> actual-package-depends: dependency on /usr/local/lib/libtextstyle.so
>> not registered (normal if it belongs to base) Child process pid=3D44958=

>> terminated abnormally: Segmentation fault Child process pid=3D44962
>> terminated abnormally: Segmentation fault Child process pid=3D44971
>> terminated abnormally: Segmentation fault actual-package-depends:
>> dependency on /usr/local/lib/libintl.so not registered (normal if it
>> belongs to base) Child process pid=3D44973 terminated abnormally:
>> Segmentation fault Child process pid=3D44977 terminated abnormally:
>> Segmentation fault Child process pid=3D44980 terminated abnormally:
>> Segmentation fault actual-package-depends: dependency on
>> /usr/local/bin/indexinfo not registered (normal if it belongs to
>> base) Child process pid=3D44982 terminated abnormally: Segmentation
>> fault . . .
>>=20
>> Unfortunately, at the package phase, the above sort of thing
>> does not lead to a saved copy of the work/ area for the port
>> in poudriere and was classified as a Success. I do have the
>> console report:
>>=20
>> Feb  4 03:14:17 FBSDG5L2 kernel: pid 44950 (pkg-static), jid 4, uid
>> 0: exited on signal 11 Feb  4 03:14:17 FBSDG5L2 kernel: pid 44956
>> (pkg-static), jid 4, uid 0: exited on signal 11 Feb  4 03:14:17
>> FBSDG5L2 kernel: pid 44958 (pkg-static), jid 4, uid 0: exited on
>> signal 11 Feb  4 03:14:17 FBSDG5L2 kernel: pid 44962 (pkg-static),
>> jid 4, uid 0: exited on signal 11 Feb  4 03:14:17 FBSDG5L2 kernel:
>> pid 44971 (pkg-static), jid 4, uid 0: exited on signal 11 Feb  4
>> 03:14:17 FBSDG5L2 kernel: pid 44973 (pkg-static), jid 4, uid 0:
>> exited on signal 11 Feb  4 03:14:17 FBSDG5L2 kernel: pid 44977
>> (pkg-static), jid 4, uid 0: exited on signal 11 Feb  4 03:14:17
>> FBSDG5L2 kernel: pid 44980 (pkg-static), jid 4, uid 0: exited on
>> signal 11 Feb  4 03:14:17 FBSDG5L2 kernel: pid 44982 (pkg-static),
>> jid 4, uid 0: exited on signal 11
>>=20
>> so which program got the failures is known but I
>> did not end up with core files or other such. Also
>> they all seem to have happened with the same
>> reported time (second scale). (The messages above
>> do not show "(core dumped)" either, so even with
>> a copy of the work/ area there probably would not
>> have been evidence.)
>>=20
>> One point is that the time frame means that the once-a-day
>> checking activity (defaults) was likely running in parallel
>> with the poudriere activity.
>>=20
>> The above left the "deps" information missing for
>> gettex-tools-0.21 .
>>=20
>> When the poudriere run finished, the status was 7 failures
>> and 153 skipped because of lack of "deps" information for
>> gettext-tools-0.21 :
>>=20
>> [FBSDpowerpc64-default] [2021-02-04_02h19m21s] [committing:] Queued:
>> 476 Built: 316 Failed: 7   Skipped: 153 Ignored: 0   Tobuild: 0
>> Time: 24:28:45
>>=20
>> For reference, from early in the build:
>>=20
>> [00:24:17] [02] [00:00:00] Building devel/gettext-tools |
>> gettext-tools-0.21 . . .
>> [00:55:34] [02] [00:31:17] Finished devel/gettext-tools |
>> gettext-tools-0.21: Success
>>=20
>>=20
>> I then tried:
>>=20
>> # poudriere bulk -jFBSDpowerpc64 -i -C -w devel/gettext-tools
>>=20
>> and it built just fine this time:
>>=20
>> [FBSDpowerpc64-default] [2021-02-05_02h50m31s] [committing:] Queued:
>> 1  Built: 1  Failed: 0  Skipped: 0  Ignored: 0  Tobuild: 0   Time:
>> 00:23:26
>>=20
>>=20
>> In all cases, each poudriere job was allow to have an active
>> process per cpu (so 2 active processes per job). The retry,
>> of course, was just one poudriere job.
>>=20
>> So far I've no evidence of problems with the other 315 of 316
>> built ports from the first run, including no more pkg-static
>> failures.
>>=20
>>=20
>> I have started up an attempted build of the failed and skipped
>> ports.
>>=20
>>=20
>> I have no known way to repeat the problem on demand and no
>> evidence for specifically where pkg-static was executing when
>> it failed.
>>=20
>> =3D=3D=3D
>> Mark Millard
>> marklmi at yahoo.com
>> ( dsl-only.net went
>> away in early 2018-Mar)
>=20
> This is probably fallout from 710e45c4b, which has since been =
reverted.
> 710e45c4b broke other things like swig as well, which caused a lot of
> poudriere fallout for me (devel/llvm* failed because swig crashed).
>=20
> Try updating past 33f0540b1 and testing again.
>=20

The above is based on 847dfd2803f6, which is after 33f0540b1 . . .

https://cgit.freebsd.org/src/log/?qt=3Drange&q=3D33f0540b1~1..847dfd2803f6=

shows:

Commit message (Expand)	Author	Age	Files	Lines
* 	readelf: do not trucate section name with -W	Ed Maste	=
45 hours	1	-4/+9
* 	readelf: decode LA48 and ASG_DISABLE feature flags	Ed Maste	=
45 hours	1	-0/+2
* 	Add a VM flag to prevent reclaim on a failed contig allocation	=
Ryan Stone	45 hours	3	-2/+11
* 	dwmmc: Multiple busdma fixes.	Michal Meloun	46 hours	=
1	-15/+32
* 	linux: remove locks around callout_drain in timerfd_close()	=
shu	46 hours	1	-2/+0
* 	Revert "Reimplement strlen"	Mateusz Guzik	47 hours	=
2	-53/+108

(I Probably should have shown that in the original
message, given the difficulty in determining the
relative order of referenced commits.)

Before updating to be 847dfd2803f6 based, I had also
previously hit the swig issue with the llvm10 build.
That problem failed reliably until after I'd updated
past the revert. (Not trusting the other things built
is why I did a -c poudriere bulk after updating to an
environment based on after the revert.)

The variability in the pkg-static behavior this time
suggests race conditions are involved, though not
frequent failures.


=3D=3D=3D
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?52899C49-E0DF-4E19-A89F-8B9376B8F1F5>