Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 25 Nov 2024 18:05:12 -0800
From:      Mark Millard <marklmi@yahoo.com>
To:        Guido Falsi <mad@madpilot.net>, Dimitry Andric <dim@FreeBSD.org>, =?utf-8?Q?Dag-Erling_Sm=C3=B8rgrav?= <des@FreeBSD.org>, Yasuhiro Kimura <yasu@FreeBSD.org>
Cc:        ports@freebsd.org, FreeBSD Current <freebsd-current@freebsd.org>
Subject:   Re: port binary dumping core on recent head in poudriere
Message-ID:  <A2820AEA-AB92-425F-AE91-2AF9629B3020@yahoo.com>
In-Reply-To: <E77AF0C3-5210-41C7-B8B8-02A8E22DB23D@yahoo.com>
References:  <aa597431-54a8-4cde-8d4f-b75040b59bae@madpilot.net> <46E3A370-A3E0-4BAF-B707-87F94F98E248@FreeBSD.org> <5ee47c3d-f80e-4d50-9b6a-acb3c98e80e0@madpilot.net> <f9e32784-226a-4e1e-a24b-62f5e6d3d765@madpilot.net> <E4616829-D2DE-4EAF-B971-1EDA8B447F13@FreeBSD.org> <7c9c3cf5-bbd1-4642-8d04-33aa07a4db02@madpilot.net> <9df256a8-c6ed-46d9-b955-fc2657c12d36@madpilot.net> <5c502054-7353-4a1e-8350-c403482e9c0d@madpilot.net> <a203a89f-2eb7-4220-8dfb-648cd46fc6bb@madpilot.net> <3127C3BA-FC93-4636-ADDB-89518DE9C60D@FreeBSD.org> <86ed2zsp6l.fsf@ltc.des.dev> <5f24a570-26e0-4c0a-817f-591a234fd07b@madpilot.net> <5918C6A1-8FDB-40CA-8C86-EB7B7BE75A2E@yahoo.com> <86ed2zc8r5.fsf@ltc.des.dev> <45098ccf-4dc6-426c-849a-c923805d6723@madpilot.net> <F64DB4E9-A210-4E1F-B333-C597F3DBED54@yahoo.com> <38658C0D-CA33-4010-BBE1-E68D253A3DF7@FreeBSD.org> <1004a753-9a3c-4aa2-bfa8-4a0c471fe3ea@madpilot.net> <D14FF56C-506F-4168-91BC-1F10937B943F@yahoo.com> <E77AF0C3-5210-41C7-B8B8-02A8E22DB23D@yahoo.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Top posting going in a different direction that
established a way to control the behavior in my
context . . .

I changed USE_TMPFS=3Dall to USE_TMPFS=3Dno :

USE_TMPFS=3Dall gets the failure
vs.
USE_TMPFS=3Dno works just fine

So it is a FreeBSD system error associated with
use of tmpfs .


Now back to what I looked at before trying the
above . . .

On Nov 25, 2024, at 17:05, Mark Millard <marklmi@yahoo.com> wrote:

> On Nov 25, 2024, at 15:21, Mark Millard <marklmi@yahoo.com> wrote:
>=20
>> On Nov 25, 2024, at 14:23, Guido Falsi <mad@madpilot.net> wrote:
>>=20
>>> On 25/11/24 23:15, Dimitry Andric wrote:
>>>> On 25 Nov 2024, at 23:12, Mark Millard <marklmi@yahoo.com> wrote:
>>>>>=20
>>>>> On Nov 25, 2024, at 13:27, Guido Falsi <mad@madpilot.net> wrote:
>>>>>=20
>>>>>> On 25/11/24 22:18, Dag-Erling Sm=C3=B8rgrav wrote:
>>>>>>> Mark Millard <marklmi@yahoo.com> writes:
>>>>>>>> Guido Falsi <mad@madpilot.net> writes:
>>>>>>>>> On 25/11/24 09:17, Dag-Erling Sm=C3=B8rgrav wrote:
>>>>>>>>>> Dimitry Andric <dim@FreeBSD.org> writes:
>>>>>>>>>>> Probably best to create a bugzilla ticket, but as I said =
before, I
>>>>>>>>>>> cannot reproduce this.
>>>>>>>>>> I can.  My builder is running 15 and sees segfaults while =
building
>>>>>>>>>> packages for 14 and 15 but not for 13.
>>>>>>>>> BTW removing optimizations (CPUTYPE) for only the affected =
ports made
>>>>>>>>> guile2 work again. Did not solve the issue with sassc though.  =
[...]
>>>>>>>>> I'm also using ccache, but that does not look relevant.
>>>>>>>> I've never used ccache or analogous and get the =
libsass.so.1.0.0
>>>>>>>> .got.plt corruption that I've reported on the lists anyway.
>>>>>>> I don't use ccache or optimizations.  Here's an example of sassc
>>>>>>> segfaulting in a 14.1-RELEASE-p6 jail:
>>>>>>> =
https://pkg.des.dev/logs/data/14amd64-default/2024-11-24_19h29m04s/logs/er=
rors/plasma5-breeze-gtk-5.27.11.log
>>>>>>> which matches the following entry from `/var/log/messages`:
>>>>>>> Nov 24 21:23:06 pkg kernel: pid 71277 (sassc), jid 253, uid =
65534: exited on signal 11 (core dumped)
>>>>>>> The poudriere host is a bhyve VM with 48 cores and 192 GB RAM on =
a
>>>>>>> 32c/64t AMD EPYC 7502P with 256 GB RAM.
>>>>>>=20
>>>>>> I sincerely hope this is not relevant but my CPU is also AMD: AMD =
Ryzen 5 5600G
>>>>>=20
>>>>> The amd64 system type that I have access to and used
>>>>> for my testing:
>>>>>=20
>>>>> AMD 7950X3D (16 core, 32 thread, so 32 FreeBSD-cpus) with 192 =
GiBytes of RAM
>>>> I'm on Intel, and I don't see any crashes at all. So, are we =
looking at some CPU specific issue here?
>>>=20
>>> We can't say for sure, but we definitely have all people reporting =
the issue on the same CPU brand, so it's some indication I guess.
>>>=20
>>> I was hoping it would not come to this because I suspect such issues =
are quite difficult to diagnose.
>>=20
>> Unfortunately, for amd64 I only have access to:
>>=20
>> ) An old ThreadRipper 1950X system (untested so far)
>> ) The 7950X3D system
>>=20
>> No Intel systems.
>>=20
>> If someone had both AMD and Intel and could have
>> boot&operate media that should work for both, say
>> USB that can be simply moved between machines,
>> running test on both would be appropriate.
>> (Implication: the media not being tailored to the
>> cpu specifics so the same system software is
>> tested in both places.)
>>=20
>> I'll note that the media in my context is PCIe Optane,
>> ZFS based. I could try a U.2 Optane in a PCIe adaptor
>> that has UFS instead for building textproc/libsass .
>> (The U.2 content is an basically a rsync of the ZFS
>> Optane media's live directory tree, with node naming
>> and such adjusted afterwards.)
>>=20
>> What do other folks have for the file system(s)
>> involved?
>=20
> I get the sassc failure from a a pure UFS live-context as
> well.
>=20
> Interestingly, there is variation in the .got.plt oddity.
>=20
> Earlier:
>=20
> Bad .got.plt:
>=20
> Contents of section .got.plt:
> 2bed60 00000000 00000000 00000000 00000000  ................
> . . .
> 2befc0 00000000 00000000 00000000 00000000  ................
> 2befd0 00000000 00000000 00000000 00000000  ................
> 2befe0 00000000 00000000 00000000 00000000  ................
> 2beff0 00000000 00000000 00000000 00000000  ................
> 2bf000 96ab2a00 00000000 a6ab2a00 00000000  ..*.......*.....
> 2bf010 b6ab2a00 00000000 c6ab2a00 00000000  ..*.......*.....
> 2bf020 d6ab2a00 00000000 e6ab2a00 00000000  ..*.......*.....
> 2bf030 f6ab2a00 00000000 06ac2a00 00000000  ..*.......*.....
> . . .

Interestingly, a later retest of the ZFS
context did not get the above. Instead it
ended up like the below bad case.

I'll also note that scrubbing reports:

# zpool status
  pool: zoptb
 state: ONLINE
  scan: scrub repaired 0B in 00:00:47 with 0 errors on Mon Nov 25 =
17:50:44 2024
config:

	NAME           STATE     READ WRITE CKSUM
	zoptb          ONLINE       0     0     0
	  gpt/OptBzfs  ONLINE       0     0     0

errors: No known data errors

This should mean that the unexpected zeros were present
before zfs did its checksum prior to writing the data.

> The new bad .got.plt ended up with a bigger zero area,
> the nonzero area again being nicely aligned for where
> it starts. (The .got.plt starts at the same address
> as above.)
>=20
> Contents of section .got.plt:
> 2bed60 00000000 00000000 00000000 00000000  ................
> . . .
> 2befc0 00000000 00000000 00000000 00000000  ................
> 2befd0 00000000 00000000 00000000 00000000  ................
> 2befe0 00000000 00000000 00000000 00000000  ................
> 2beff0 00000000 00000000 00000000 00000000  ................
> 2bf000 00000000 00000000 00000000 00000000  ................
> 2bf010 00000000 00000000 00000000 00000000  ................
> 2bf020 00000000 00000000 00000000 00000000  ................
> 2bf030 00000000 00000000 00000000 00000000  ................
> . . .
> 2bffc0 00000000 00000000 00000000 00000000  ................
> 2bffd0 00000000 00000000 00000000 00000000  ................
> 2bffe0 00000000 00000000 00000000 00000000  ................
> 2bfff0 00000000 00000000 00000000 00000000  ................
> 2c0000 96cb2a00 00000000 a6cb2a00 00000000  ..*.......*.....
> 2c0010 b6cb2a00 00000000 c6cb2a00 00000000  ..*.......*.....
> 2c0020 d6cb2a00 00000000 e6cb2a00 00000000  ..*.......*.....
> 2c0030 f6cb2a00 00000000 06cc2a00 00000000  ..*.......*.....
> . . .
>=20

Adding the comparison of the good .got.plt from
the PkgBase based chroot with the official packages
installed:

Contents of section .got.plt:
 2bed60 78ba2b00 00000000 00000000 00000000  x.+.............
 2bed70 00000000 00000000 86a62a00 00000000  ..........*.....
 2bed80 96a62a00 00000000 a6a62a00 00000000  ..*.......*.....
 2bed90 b6a62a00 00000000 c6a62a00 00000000  ..*.......*.....
. . .
 2befc0 16ab2a00 00000000 26ab2a00 00000000  ..*.....&.*.....
 2befd0 36ab2a00 00000000 46ab2a00 00000000  6.*.....F.*.....
 2befe0 56ab2a00 00000000 66ab2a00 00000000  V.*.....f.*.....
 2beff0 76ab2a00 00000000 86ab2a00 00000000  v.*.......*.....
 2bf000 96ab2a00 00000000 a6ab2a00 00000000  ..*.......*.....
 2bf010 b6ab2a00 00000000 c6ab2a00 00000000  ..*.......*.....
 2bf020 d6ab2a00 00000000 e6ab2a00 00000000  ..*.......*.....
 2bf030 f6ab2a00 00000000 06ac2a00 00000000  ..*.......*.....
. . .
 2bffc0 16cb2a00 00000000 26cb2a00 00000000  ..*.....&.*.....
 2bffd0 36cb2a00 00000000 46cb2a00 00000000  6.*.....F.*.....
 2bffe0 56cb2a00 00000000 66cb2a00 00000000  V.*.....f.*.....
 2bfff0 76cb2a00 00000000 86cb2a00 00000000  v.*.......*.....
 2c0000 96cb2a00 00000000 a6cb2a00 00000000  ..*.......*.....
 2c0010 b6cb2a00 00000000 c6cb2a00 00000000  ..*.......*.....
 2c0020 d6cb2a00 00000000 e6cb2a00 00000000  ..*.......*.....
 2c0030 f6cb2a00 00000000 06cc2a00 00000000  ..*.......*.....
. . .

The contents of the non-zero parts of any pair
of the examples agree.

=3D=3D=3D
Mark Millard
marklmi at yahoo.com




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?A2820AEA-AB92-425F-AE91-2AF9629B3020>