Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 26 Nov 2024 00:21:40 -0800
From:      Mark Millard <marklmi@yahoo.com>
To:        "jah@freebsd.org" <jah@FreeBSD.org>, dougm@freebsd.org, asomers@freebsd.org, Mark Johnston <markj@FreeBSD.org>, FreeBSD Current <freebsd-current@freebsd.org>
Cc:        Dimitry Andric <dim@FreeBSD.org>, Guido Falsi <mad@madpilot.net>, =?utf-8?Q?Dag-Erling_Sm=C3=B8rgrav?= <des@FreeBSD.org>, Yasuhiro Kimura <yasu@FreeBSD.org>, ports@freebsd.org
Subject:   Re: port binary dumping core on recent head in poudriere [tmpfs corruptions involving blocks of zeros that should not be all zeros]
Message-ID:  <3660625A-0EE8-40DA-A248-EC18C734718C@yahoo.com>
In-Reply-To: <0690CFB1-6A6D-4B63-916C-BAB7F6256000@yahoo.com>
References:  <aa597431-54a8-4cde-8d4f-b75040b59bae@madpilot.net> <46E3A370-A3E0-4BAF-B707-87F94F98E248@FreeBSD.org> <5ee47c3d-f80e-4d50-9b6a-acb3c98e80e0@madpilot.net> <f9e32784-226a-4e1e-a24b-62f5e6d3d765@madpilot.net> <E4616829-D2DE-4EAF-B971-1EDA8B447F13@FreeBSD.org> <7c9c3cf5-bbd1-4642-8d04-33aa07a4db02@madpilot.net> <9df256a8-c6ed-46d9-b955-fc2657c12d36@madpilot.net> <5c502054-7353-4a1e-8350-c403482e9c0d@madpilot.net> <a203a89f-2eb7-4220-8dfb-648cd46fc6bb@madpilot.net> <3127C3BA-FC93-4636-ADDB-89518DE9C60D@FreeBSD.org> <86ed2zsp6l.fsf@ltc.des.dev> <5f24a570-26e0-4c0a-817f-591a234fd07b@madpilot.net> <5918C6A1-8FDB-40CA-8C86-EB7B7BE75A2E@yahoo.com> <86ed2zc8r5.fsf@ltc.des.dev> <45098ccf-4dc6-426c-849a-c923805d6723@madpilot.net> <F64DB4E9-A210-4E1F-B333-C597F3DBED54@yahoo.com> <38658C0D-CA33-4010-BBE1-E68D253A3DF7@FreeBSD.org> <1004a753-9a3c-4aa2-bfa8-4a0c471fe3ea@madpilot.net> <D14FF56C-506F-4168-91BC-1F10937B943F@yahoo.com> <E77AF0C3-5210-41C7-B8B8-02A8E22DB23D@yahoo.com> <A2820AEA-AB92-425F-AE91-2AF9629B3020@yahoo.com> <0690CFB1-6A6D-4B63-916C-BAB7F6256000@yahoo.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Nov 25, 2024, at 22:10, Mark Millard <marklmi@yahoo.com> wrote:

> On Nov 25, 2024, at 18:05, Mark Millard <marklmi@yahoo.com> wrote:
>=20
>> Top posting going in a different direction that
>> established a way to control the behavior in my
>> context . . .
>=20
> For folks new to the discoveries: the context here
> is poudriere bulk builds, for USE_TMPFS=3Dall vs.
> USE_TMPFS=3Dno . My test context is amd64 on a
> 7950X3D system with 192 GiBytes of RAM. Others have
> other contexts, including an Intel system.
>=20
>> I changed USE_TMPFS=3Dall to USE_TMPFS=3Dno :
>>=20
>> USE_TMPFS=3Dall gets the failure
>=20
> Note: The test case is corruptions of the likes of parts of
> the .got.plt in libsass.so.1.0.0 from text/proc/libsass .
> The corruptions are well 4 KiByte aligned blocks of zeros
> showing up in the files that should not be that way.
>=20
> 2 examples of bad libsass.so.1.0.0 builds have:
>=20
> Contents of section .got.plt:
> 2bed60 00000000 00000000 00000000 00000000  ................
> . . .
> 2befc0 00000000 00000000 00000000 00000000  ................
> 2befd0 00000000 00000000 00000000 00000000  ................
> 2befe0 00000000 00000000 00000000 00000000  ................
> 2beff0 00000000 00000000 00000000 00000000  ................
> 2bf000 96ab2a00 00000000 a6ab2a00 00000000  ..*.......*.....
> 2bf010 b6ab2a00 00000000 c6ab2a00 00000000  ..*.......*.....
> 2bf020 d6ab2a00 00000000 e6ab2a00 00000000  ..*.......*.....
> 2bf030 f6ab2a00 00000000 06ac2a00 00000000  ..*.......*.....
> . . .
>=20
> Contents of section .got.plt:
> 2bed60 00000000 00000000 00000000 00000000  ................
> . . .
> 2befc0 00000000 00000000 00000000 00000000  ................
> 2befd0 00000000 00000000 00000000 00000000  ................
> 2befe0 00000000 00000000 00000000 00000000  ................
> 2beff0 00000000 00000000 00000000 00000000  ................
> 2bf000 00000000 00000000 00000000 00000000  ................
> 2bf010 00000000 00000000 00000000 00000000  ................
> 2bf020 00000000 00000000 00000000 00000000  ................
> 2bf030 00000000 00000000 00000000 00000000  ................
> . . .
> 2bffc0 00000000 00000000 00000000 00000000  ................
> 2bffd0 00000000 00000000 00000000 00000000  ................
> 2bffe0 00000000 00000000 00000000 00000000  ................
> 2bfff0 00000000 00000000 00000000 00000000  ................
> 2c0000 96cb2a00 00000000 a6cb2a00 00000000  ..*.......*.....
> 2c0010 b6cb2a00 00000000 c6cb2a00 00000000  ..*.......*.....
> 2c0020 d6cb2a00 00000000 e6cb2a00 00000000  ..*.......*.....
> 2c0030 f6cb2a00 00000000 06cc2a00 00000000  ..*.......*.....
> . . .
>=20
> So: Where the zeros end varies but the start of
> good data end up's at some 0x...000 offset: a
> multiple of 4 KiBytes.
>=20
>> vs.
>> USE_TMPFS=3Dno works just fine
>>=20
>> So it is a FreeBSD system error associated with
>> use of tmpfs .
>=20
> Recent work on tmpfs includes:
>=20
> Mon, 09 Sep 2024
> =E2=80=A2 git: 8fa5e0f21fd1 - main - tmpfs: Account for whiteouts =
during rename/rmdir Jason A. Harmening
> Fri, 04 Oct 2024
> =E2=80=A2 git: 75734c4360fc - main - tmpfs: check residence in =
data_locked Doug Moore
> Sun, 13 Oct 2024
> =E2=80=A2 git: ec22e705c266 - main - tmpfs: remove duplicate flags =
check in tmpfs_rmdir Alan Somers
> Thu, 24 Oct 2024
> =E2=80=A2 git: db08b0b04dec - main - tmpfs_vnops: move swap work to =
swap_pager Doug Moore
>=20
> swap_pager (given the reference to it above):
>=20
> Tue, 08 Oct 2024
>    =E2=80=A2 git: d0b225d16418 - main - swap_pager: use iterators in =
swp_pager_meta_build Doug Moore
> Fri, 11 Oct 2024
>    =E2=80=A2 git: 1107834090be - main - swap_pager: swapoff detecting =
object death Doug Moore
> Thu, 24 Oct 2024
>    =E2=80=A2 git: 34951b0b9e78 - main - swap_pager: move =
scan_all_shadowed, use iterators Doug Moore
>    =E2=80=A2 git: 02e85d1c8a41 - main - swap_pager: fix assert in =
seek_data Doug Moore=20
>    =E2=80=A2 git: faa9356f97d2 - main - swap_pager: fix seek_hole =
assert Doug Moore
> Sat, 26 Oct 2024
>    =E2=80=A2 git: 39f6d1e7f835 - main - swap_pager: iter in haspage, =
lookup, getpages Doug Moore
> Wed, 13 Nov 2024
>    =E2=80=A2 git: d11d407aee48 - main - swap_pager: Ensure that =
swapoff puts swapped-in pages in page queues Mark Johnston
>=20
> I do not know at this time when the corruptions started. The
> above is only suggestive.

With a bulk -i active but from outside the bulk -i :

# df -m | sort -k6,6 | grep ^tmpfs
tmpfs                                                                    =
           182907     0 182907     0%    =
/usr/local/poudriere/data/.m/main-amd64-default
tmpfs                                                                    =
           184770  1863 182907     1%    =
/usr/local/poudriere/data/.m/main-amd64-default/ref
tmpfs                                                                    =
             2048    45   2002     2%    =
/usr/local/poudriere/data/.m/main-amd64-default/ref/.p
tmpfs                                                                    =
           182907     0 182907     0%    =
/usr/local/poudriere/data/.m/main-amd64-default/ref/var/db/ports

Note: bulk -i lands one in =
/usr/local/poudriere/data/.m/main-amd64-default/ref/


=46rom inside a bulk -i where I did a manual make command
after it built and installed libsass.so.1.0.0 . The
manual make produced a /wrkdirs/ :

# find -s / -name libsass.so.1.0.0 -exec ls -ilodT {} \;
6417 -rwxr-xr-x  1 root wheel - 42444424 Nov 26 07:24:37 2024 =
/usr/local/lib/libsass.so.1.0.0
11872 -rwxr-xr-x  1 root wheel - 42444424 Nov 26 07:26:48 2024 =
/wrkdirs/usr/ports/textproc/libsass/work/libsass-3.6.6/src/.libs/libsass.s=
o.1.0.0
12294 -rwxr-xr-x  1 root wheel - 42444424 Nov 26 07:26:48 2024 =
/wrkdirs/usr/ports/textproc/libsass/work/stage/usr/local/lib/libsass.so.1.=
0.0

# objdump -hs =
/wrkdirs/usr/ports/textproc/libsass/work/libsass-3.6.6/src/.libs/libsass.s=
o.1.0.0 | less
. . .
 2bed60 78ba2b00 00000000 00000000 00000000  x.+.............
 2bed70 00000000 00000000 86a62a00 00000000  ..........*.....
 2bed80 96a62a00 00000000 a6a62a00 00000000  ..*.......*.....
 2bed90 b6a62a00 00000000 c6a62a00 00000000  ..*.......*.....
. . .

So the original creation looks okay. But . . .

# objdump -hs =
/wrkdirs/usr/ports/textproc/libsass/work/stage/usr/local/lib/libsass.so.1.=
0.0 | less
. . .
 2bed60 00000000 00000000 00000000 00000000  ................
 2bed70 00000000 00000000 00000000 00000000  ................
 2bed80 00000000 00000000 00000000 00000000  ................
 2bed90 00000000 00000000 00000000 00000000  ................
. . .
 2befc0 00000000 00000000 00000000 00000000  ................
 2befd0 00000000 00000000 00000000 00000000  ................
 2befe0 00000000 00000000 00000000 00000000  ................
 2beff0 00000000 00000000 00000000 00000000  ................
 2bf000 96ab2a00 00000000 a6ab2a00 00000000  ..*.......*.....
 2bf010 b6ab2a00 00000000 c6ab2a00 00000000  ..*.......*.....
 2bf020 d6ab2a00 00000000 e6ab2a00 00000000  ..*.......*.....
 2bf030 f6ab2a00 00000000 06ac2a00 00000000  ..*.......*.....
. . .

So: The later, staged copy is a bad copy. Both are in the
tmpfs. So copying to the staging area makes a corrupted
copy inside the same tmpfs. After that, further copies of
staging's bad copy can be expected to be messed up.


=3D=3D=3D
Mark Millard
marklmi at yahoo.com




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3660625A-0EE8-40DA-A248-EC18C734718C>