Date: Mon, 25 Nov 2024 18:05:12 -0800 From: Mark Millard <marklmi@yahoo.com> To: Guido Falsi <mad@madpilot.net>, Dimitry Andric <dim@FreeBSD.org>, =?utf-8?Q?Dag-Erling_Sm=C3=B8rgrav?= <des@FreeBSD.org>, Yasuhiro Kimura <yasu@FreeBSD.org> Cc: ports@freebsd.org, FreeBSD Current <freebsd-current@freebsd.org> Subject: Re: port binary dumping core on recent head in poudriere Message-ID: <A2820AEA-AB92-425F-AE91-2AF9629B3020@yahoo.com> In-Reply-To: <E77AF0C3-5210-41C7-B8B8-02A8E22DB23D@yahoo.com> References: <aa597431-54a8-4cde-8d4f-b75040b59bae@madpilot.net> <46E3A370-A3E0-4BAF-B707-87F94F98E248@FreeBSD.org> <5ee47c3d-f80e-4d50-9b6a-acb3c98e80e0@madpilot.net> <f9e32784-226a-4e1e-a24b-62f5e6d3d765@madpilot.net> <E4616829-D2DE-4EAF-B971-1EDA8B447F13@FreeBSD.org> <7c9c3cf5-bbd1-4642-8d04-33aa07a4db02@madpilot.net> <9df256a8-c6ed-46d9-b955-fc2657c12d36@madpilot.net> <5c502054-7353-4a1e-8350-c403482e9c0d@madpilot.net> <a203a89f-2eb7-4220-8dfb-648cd46fc6bb@madpilot.net> <3127C3BA-FC93-4636-ADDB-89518DE9C60D@FreeBSD.org> <86ed2zsp6l.fsf@ltc.des.dev> <5f24a570-26e0-4c0a-817f-591a234fd07b@madpilot.net> <5918C6A1-8FDB-40CA-8C86-EB7B7BE75A2E@yahoo.com> <86ed2zc8r5.fsf@ltc.des.dev> <45098ccf-4dc6-426c-849a-c923805d6723@madpilot.net> <F64DB4E9-A210-4E1F-B333-C597F3DBED54@yahoo.com> <38658C0D-CA33-4010-BBE1-E68D253A3DF7@FreeBSD.org> <1004a753-9a3c-4aa2-bfa8-4a0c471fe3ea@madpilot.net> <D14FF56C-506F-4168-91BC-1F10937B943F@yahoo.com> <E77AF0C3-5210-41C7-B8B8-02A8E22DB23D@yahoo.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Top posting going in a different direction that established a way to control the behavior in my context . . . I changed USE_TMPFS=3Dall to USE_TMPFS=3Dno : USE_TMPFS=3Dall gets the failure vs. USE_TMPFS=3Dno works just fine So it is a FreeBSD system error associated with use of tmpfs . Now back to what I looked at before trying the above . . . On Nov 25, 2024, at 17:05, Mark Millard <marklmi@yahoo.com> wrote: > On Nov 25, 2024, at 15:21, Mark Millard <marklmi@yahoo.com> wrote: >=20 >> On Nov 25, 2024, at 14:23, Guido Falsi <mad@madpilot.net> wrote: >>=20 >>> On 25/11/24 23:15, Dimitry Andric wrote: >>>> On 25 Nov 2024, at 23:12, Mark Millard <marklmi@yahoo.com> wrote: >>>>>=20 >>>>> On Nov 25, 2024, at 13:27, Guido Falsi <mad@madpilot.net> wrote: >>>>>=20 >>>>>> On 25/11/24 22:18, Dag-Erling Sm=C3=B8rgrav wrote: >>>>>>> Mark Millard <marklmi@yahoo.com> writes: >>>>>>>> Guido Falsi <mad@madpilot.net> writes: >>>>>>>>> On 25/11/24 09:17, Dag-Erling Sm=C3=B8rgrav wrote: >>>>>>>>>> Dimitry Andric <dim@FreeBSD.org> writes: >>>>>>>>>>> Probably best to create a bugzilla ticket, but as I said = before, I >>>>>>>>>>> cannot reproduce this. >>>>>>>>>> I can. My builder is running 15 and sees segfaults while = building >>>>>>>>>> packages for 14 and 15 but not for 13. >>>>>>>>> BTW removing optimizations (CPUTYPE) for only the affected = ports made >>>>>>>>> guile2 work again. Did not solve the issue with sassc though. = [...] >>>>>>>>> I'm also using ccache, but that does not look relevant. >>>>>>>> I've never used ccache or analogous and get the = libsass.so.1.0.0 >>>>>>>> .got.plt corruption that I've reported on the lists anyway. >>>>>>> I don't use ccache or optimizations. Here's an example of sassc >>>>>>> segfaulting in a 14.1-RELEASE-p6 jail: >>>>>>> = https://pkg.des.dev/logs/data/14amd64-default/2024-11-24_19h29m04s/logs/er= rors/plasma5-breeze-gtk-5.27.11.log >>>>>>> which matches the following entry from `/var/log/messages`: >>>>>>> Nov 24 21:23:06 pkg kernel: pid 71277 (sassc), jid 253, uid = 65534: exited on signal 11 (core dumped) >>>>>>> The poudriere host is a bhyve VM with 48 cores and 192 GB RAM on = a >>>>>>> 32c/64t AMD EPYC 7502P with 256 GB RAM. >>>>>>=20 >>>>>> I sincerely hope this is not relevant but my CPU is also AMD: AMD = Ryzen 5 5600G >>>>>=20 >>>>> The amd64 system type that I have access to and used >>>>> for my testing: >>>>>=20 >>>>> AMD 7950X3D (16 core, 32 thread, so 32 FreeBSD-cpus) with 192 = GiBytes of RAM >>>> I'm on Intel, and I don't see any crashes at all. So, are we = looking at some CPU specific issue here? >>>=20 >>> We can't say for sure, but we definitely have all people reporting = the issue on the same CPU brand, so it's some indication I guess. >>>=20 >>> I was hoping it would not come to this because I suspect such issues = are quite difficult to diagnose. >>=20 >> Unfortunately, for amd64 I only have access to: >>=20 >> ) An old ThreadRipper 1950X system (untested so far) >> ) The 7950X3D system >>=20 >> No Intel systems. >>=20 >> If someone had both AMD and Intel and could have >> boot&operate media that should work for both, say >> USB that can be simply moved between machines, >> running test on both would be appropriate. >> (Implication: the media not being tailored to the >> cpu specifics so the same system software is >> tested in both places.) >>=20 >> I'll note that the media in my context is PCIe Optane, >> ZFS based. I could try a U.2 Optane in a PCIe adaptor >> that has UFS instead for building textproc/libsass . >> (The U.2 content is an basically a rsync of the ZFS >> Optane media's live directory tree, with node naming >> and such adjusted afterwards.) >>=20 >> What do other folks have for the file system(s) >> involved? >=20 > I get the sassc failure from a a pure UFS live-context as > well. >=20 > Interestingly, there is variation in the .got.plt oddity. >=20 > Earlier: >=20 > Bad .got.plt: >=20 > Contents of section .got.plt: > 2bed60 00000000 00000000 00000000 00000000 ................ > . . . > 2befc0 00000000 00000000 00000000 00000000 ................ > 2befd0 00000000 00000000 00000000 00000000 ................ > 2befe0 00000000 00000000 00000000 00000000 ................ > 2beff0 00000000 00000000 00000000 00000000 ................ > 2bf000 96ab2a00 00000000 a6ab2a00 00000000 ..*.......*..... > 2bf010 b6ab2a00 00000000 c6ab2a00 00000000 ..*.......*..... > 2bf020 d6ab2a00 00000000 e6ab2a00 00000000 ..*.......*..... > 2bf030 f6ab2a00 00000000 06ac2a00 00000000 ..*.......*..... > . . . Interestingly, a later retest of the ZFS context did not get the above. Instead it ended up like the below bad case. I'll also note that scrubbing reports: # zpool status pool: zoptb state: ONLINE scan: scrub repaired 0B in 00:00:47 with 0 errors on Mon Nov 25 = 17:50:44 2024 config: NAME STATE READ WRITE CKSUM zoptb ONLINE 0 0 0 gpt/OptBzfs ONLINE 0 0 0 errors: No known data errors This should mean that the unexpected zeros were present before zfs did its checksum prior to writing the data. > The new bad .got.plt ended up with a bigger zero area, > the nonzero area again being nicely aligned for where > it starts. (The .got.plt starts at the same address > as above.) >=20 > Contents of section .got.plt: > 2bed60 00000000 00000000 00000000 00000000 ................ > . . . > 2befc0 00000000 00000000 00000000 00000000 ................ > 2befd0 00000000 00000000 00000000 00000000 ................ > 2befe0 00000000 00000000 00000000 00000000 ................ > 2beff0 00000000 00000000 00000000 00000000 ................ > 2bf000 00000000 00000000 00000000 00000000 ................ > 2bf010 00000000 00000000 00000000 00000000 ................ > 2bf020 00000000 00000000 00000000 00000000 ................ > 2bf030 00000000 00000000 00000000 00000000 ................ > . . . > 2bffc0 00000000 00000000 00000000 00000000 ................ > 2bffd0 00000000 00000000 00000000 00000000 ................ > 2bffe0 00000000 00000000 00000000 00000000 ................ > 2bfff0 00000000 00000000 00000000 00000000 ................ > 2c0000 96cb2a00 00000000 a6cb2a00 00000000 ..*.......*..... > 2c0010 b6cb2a00 00000000 c6cb2a00 00000000 ..*.......*..... > 2c0020 d6cb2a00 00000000 e6cb2a00 00000000 ..*.......*..... > 2c0030 f6cb2a00 00000000 06cc2a00 00000000 ..*.......*..... > . . . >=20 Adding the comparison of the good .got.plt from the PkgBase based chroot with the official packages installed: Contents of section .got.plt: 2bed60 78ba2b00 00000000 00000000 00000000 x.+............. 2bed70 00000000 00000000 86a62a00 00000000 ..........*..... 2bed80 96a62a00 00000000 a6a62a00 00000000 ..*.......*..... 2bed90 b6a62a00 00000000 c6a62a00 00000000 ..*.......*..... . . . 2befc0 16ab2a00 00000000 26ab2a00 00000000 ..*.....&.*..... 2befd0 36ab2a00 00000000 46ab2a00 00000000 6.*.....F.*..... 2befe0 56ab2a00 00000000 66ab2a00 00000000 V.*.....f.*..... 2beff0 76ab2a00 00000000 86ab2a00 00000000 v.*.......*..... 2bf000 96ab2a00 00000000 a6ab2a00 00000000 ..*.......*..... 2bf010 b6ab2a00 00000000 c6ab2a00 00000000 ..*.......*..... 2bf020 d6ab2a00 00000000 e6ab2a00 00000000 ..*.......*..... 2bf030 f6ab2a00 00000000 06ac2a00 00000000 ..*.......*..... . . . 2bffc0 16cb2a00 00000000 26cb2a00 00000000 ..*.....&.*..... 2bffd0 36cb2a00 00000000 46cb2a00 00000000 6.*.....F.*..... 2bffe0 56cb2a00 00000000 66cb2a00 00000000 V.*.....f.*..... 2bfff0 76cb2a00 00000000 86cb2a00 00000000 v.*.......*..... 2c0000 96cb2a00 00000000 a6cb2a00 00000000 ..*.......*..... 2c0010 b6cb2a00 00000000 c6cb2a00 00000000 ..*.......*..... 2c0020 d6cb2a00 00000000 e6cb2a00 00000000 ..*.......*..... 2c0030 f6cb2a00 00000000 06cc2a00 00000000 ..*.......*..... . . . The contents of the non-zero parts of any pair of the examples agree. =3D=3D=3D Mark Millard marklmi at yahoo.com
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?A2820AEA-AB92-425F-AE91-2AF9629B3020>