Date: Fri, 5 Feb 2021 10:34:27 -0800 From: Mark Millard <marklmi@yahoo.com> To: Justin Hibbits <chmeeedalf@gmail.com> Cc: freebsd-ppc <freebsd-ppc@freebsd.org> Subject: Re: main (14-CURRENT) may be unstable on powerpc64 Message-ID: <52899C49-E0DF-4E19-A89F-8B9376B8F1F5@yahoo.com> In-Reply-To: <20210205113544.349ee77e@ralga.knownspace> References: <28E64465-8A99-43CF-8B4F-044533EA03C4.ref@yahoo.com> <28E64465-8A99-43CF-8B4F-044533EA03C4@yahoo.com> <20210205113544.349ee77e@ralga.knownspace>
next in thread | previous in thread | raw e-mail | index | archive | help
On 2021-Feb-5, at 09:35, Justin Hibbits <chmeeedalf at gmail.com> wrote: > On Fri, 5 Feb 2021 04:05:55 -0800 > Mark Millard via freebsd-ppc <freebsd-ppc@freebsd.org> wrote: >=20 >> I am running on a 2-socket/1-core-each PowerMac G5 >> (8 GiByte RAM) based on: >>=20 >> # ~/fbsd-based-on-what-freebsd-main.sh=20 >> merge-base: 847dfd2803f6c8b077e3ebc68e35adff2c79a65f >> merge-base: CommitDate: 2021-02-03 21:24:22 +0000 >> 325d7069b027 (HEAD -> mm-src) mm-src snapshot for mm's patched build >> in git context. 847dfd2803f6 (freebsd/main, freebsd/HEAD, pure-src, >> main) readelf: do not trucate section name with -W FreeBSD FBSDG5L2 >> 14.0-CURRENT FreeBSD 14.0-CURRENT mm-src-n244624-325d7069b027 >> GENERIC64vtsc-NODBG-dcons powerpc powerpc64 1400003 1400003 >>=20 >> I attempted to rebuild the ports to get FreeBSD:14 based >> versions but got the below oddity in the process: >>=20 >> # poudriere bulk -jFBSDpowerpc64 -c -w -f >> ~/origins/powerpc64-origins.txt . . . >> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D<p= hase: package >>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D =3D=3D=3D> Building package for >>> gettext-tools-0.21 =20 >> Child process pid=3D44950 terminated abnormally: Segmentation fault >> Child process pid=3D44956 terminated abnormally: Segmentation fault >> actual-package-depends: dependency on /usr/local/lib/libtextstyle.so >> not registered (normal if it belongs to base) Child process pid=3D44958= >> terminated abnormally: Segmentation fault Child process pid=3D44962 >> terminated abnormally: Segmentation fault Child process pid=3D44971 >> terminated abnormally: Segmentation fault actual-package-depends: >> dependency on /usr/local/lib/libintl.so not registered (normal if it >> belongs to base) Child process pid=3D44973 terminated abnormally: >> Segmentation fault Child process pid=3D44977 terminated abnormally: >> Segmentation fault Child process pid=3D44980 terminated abnormally: >> Segmentation fault actual-package-depends: dependency on >> /usr/local/bin/indexinfo not registered (normal if it belongs to >> base) Child process pid=3D44982 terminated abnormally: Segmentation >> fault . . . >>=20 >> Unfortunately, at the package phase, the above sort of thing >> does not lead to a saved copy of the work/ area for the port >> in poudriere and was classified as a Success. I do have the >> console report: >>=20 >> Feb 4 03:14:17 FBSDG5L2 kernel: pid 44950 (pkg-static), jid 4, uid >> 0: exited on signal 11 Feb 4 03:14:17 FBSDG5L2 kernel: pid 44956 >> (pkg-static), jid 4, uid 0: exited on signal 11 Feb 4 03:14:17 >> FBSDG5L2 kernel: pid 44958 (pkg-static), jid 4, uid 0: exited on >> signal 11 Feb 4 03:14:17 FBSDG5L2 kernel: pid 44962 (pkg-static), >> jid 4, uid 0: exited on signal 11 Feb 4 03:14:17 FBSDG5L2 kernel: >> pid 44971 (pkg-static), jid 4, uid 0: exited on signal 11 Feb 4 >> 03:14:17 FBSDG5L2 kernel: pid 44973 (pkg-static), jid 4, uid 0: >> exited on signal 11 Feb 4 03:14:17 FBSDG5L2 kernel: pid 44977 >> (pkg-static), jid 4, uid 0: exited on signal 11 Feb 4 03:14:17 >> FBSDG5L2 kernel: pid 44980 (pkg-static), jid 4, uid 0: exited on >> signal 11 Feb 4 03:14:17 FBSDG5L2 kernel: pid 44982 (pkg-static), >> jid 4, uid 0: exited on signal 11 >>=20 >> so which program got the failures is known but I >> did not end up with core files or other such. Also >> they all seem to have happened with the same >> reported time (second scale). (The messages above >> do not show "(core dumped)" either, so even with >> a copy of the work/ area there probably would not >> have been evidence.) >>=20 >> One point is that the time frame means that the once-a-day >> checking activity (defaults) was likely running in parallel >> with the poudriere activity. >>=20 >> The above left the "deps" information missing for >> gettex-tools-0.21 . >>=20 >> When the poudriere run finished, the status was 7 failures >> and 153 skipped because of lack of "deps" information for >> gettext-tools-0.21 : >>=20 >> [FBSDpowerpc64-default] [2021-02-04_02h19m21s] [committing:] Queued: >> 476 Built: 316 Failed: 7 Skipped: 153 Ignored: 0 Tobuild: 0 >> Time: 24:28:45 >>=20 >> For reference, from early in the build: >>=20 >> [00:24:17] [02] [00:00:00] Building devel/gettext-tools | >> gettext-tools-0.21 . . . >> [00:55:34] [02] [00:31:17] Finished devel/gettext-tools | >> gettext-tools-0.21: Success >>=20 >>=20 >> I then tried: >>=20 >> # poudriere bulk -jFBSDpowerpc64 -i -C -w devel/gettext-tools >>=20 >> and it built just fine this time: >>=20 >> [FBSDpowerpc64-default] [2021-02-05_02h50m31s] [committing:] Queued: >> 1 Built: 1 Failed: 0 Skipped: 0 Ignored: 0 Tobuild: 0 Time: >> 00:23:26 >>=20 >>=20 >> In all cases, each poudriere job was allow to have an active >> process per cpu (so 2 active processes per job). The retry, >> of course, was just one poudriere job. >>=20 >> So far I've no evidence of problems with the other 315 of 316 >> built ports from the first run, including no more pkg-static >> failures. >>=20 >>=20 >> I have started up an attempted build of the failed and skipped >> ports. >>=20 >>=20 >> I have no known way to repeat the problem on demand and no >> evidence for specifically where pkg-static was executing when >> it failed. >>=20 >> =3D=3D=3D >> Mark Millard >> marklmi at yahoo.com >> ( dsl-only.net went >> away in early 2018-Mar) >=20 > This is probably fallout from 710e45c4b, which has since been = reverted. > 710e45c4b broke other things like swig as well, which caused a lot of > poudriere fallout for me (devel/llvm* failed because swig crashed). >=20 > Try updating past 33f0540b1 and testing again. >=20 The above is based on 847dfd2803f6, which is after 33f0540b1 . . . https://cgit.freebsd.org/src/log/?qt=3Drange&q=3D33f0540b1~1..847dfd2803f6= shows: Commit message (Expand) Author Age Files Lines * readelf: do not trucate section name with -W Ed Maste = 45 hours 1 -4/+9 * readelf: decode LA48 and ASG_DISABLE feature flags Ed Maste = 45 hours 1 -0/+2 * Add a VM flag to prevent reclaim on a failed contig allocation = Ryan Stone 45 hours 3 -2/+11 * dwmmc: Multiple busdma fixes. Michal Meloun 46 hours = 1 -15/+32 * linux: remove locks around callout_drain in timerfd_close() = shu 46 hours 1 -2/+0 * Revert "Reimplement strlen" Mateusz Guzik 47 hours = 2 -53/+108 (I Probably should have shown that in the original message, given the difficulty in determining the relative order of referenced commits.) Before updating to be 847dfd2803f6 based, I had also previously hit the swig issue with the llvm10 build. That problem failed reliably until after I'd updated past the revert. (Not trusting the other things built is why I did a -c poudriere bulk after updating to an environment based on after the revert.) The variability in the pkg-static behavior this time suggests race conditions are involved, though not frequent failures. =3D=3D=3D Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar)
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?52899C49-E0DF-4E19-A89F-8B9376B8F1F5>