Date: Tue, 26 Nov 2024 00:21:40 -0800 From: Mark Millard <marklmi@yahoo.com> To: "jah@freebsd.org" <jah@FreeBSD.org>, dougm@freebsd.org, asomers@freebsd.org, Mark Johnston <markj@FreeBSD.org>, FreeBSD Current <freebsd-current@freebsd.org> Cc: Dimitry Andric <dim@FreeBSD.org>, Guido Falsi <mad@madpilot.net>, =?utf-8?Q?Dag-Erling_Sm=C3=B8rgrav?= <des@FreeBSD.org>, Yasuhiro Kimura <yasu@FreeBSD.org>, ports@freebsd.org Subject: Re: port binary dumping core on recent head in poudriere [tmpfs corruptions involving blocks of zeros that should not be all zeros] Message-ID: <3660625A-0EE8-40DA-A248-EC18C734718C@yahoo.com> In-Reply-To: <0690CFB1-6A6D-4B63-916C-BAB7F6256000@yahoo.com> References: <aa597431-54a8-4cde-8d4f-b75040b59bae@madpilot.net> <46E3A370-A3E0-4BAF-B707-87F94F98E248@FreeBSD.org> <5ee47c3d-f80e-4d50-9b6a-acb3c98e80e0@madpilot.net> <f9e32784-226a-4e1e-a24b-62f5e6d3d765@madpilot.net> <E4616829-D2DE-4EAF-B971-1EDA8B447F13@FreeBSD.org> <7c9c3cf5-bbd1-4642-8d04-33aa07a4db02@madpilot.net> <9df256a8-c6ed-46d9-b955-fc2657c12d36@madpilot.net> <5c502054-7353-4a1e-8350-c403482e9c0d@madpilot.net> <a203a89f-2eb7-4220-8dfb-648cd46fc6bb@madpilot.net> <3127C3BA-FC93-4636-ADDB-89518DE9C60D@FreeBSD.org> <86ed2zsp6l.fsf@ltc.des.dev> <5f24a570-26e0-4c0a-817f-591a234fd07b@madpilot.net> <5918C6A1-8FDB-40CA-8C86-EB7B7BE75A2E@yahoo.com> <86ed2zc8r5.fsf@ltc.des.dev> <45098ccf-4dc6-426c-849a-c923805d6723@madpilot.net> <F64DB4E9-A210-4E1F-B333-C597F3DBED54@yahoo.com> <38658C0D-CA33-4010-BBE1-E68D253A3DF7@FreeBSD.org> <1004a753-9a3c-4aa2-bfa8-4a0c471fe3ea@madpilot.net> <D14FF56C-506F-4168-91BC-1F10937B943F@yahoo.com> <E77AF0C3-5210-41C7-B8B8-02A8E22DB23D@yahoo.com> <A2820AEA-AB92-425F-AE91-2AF9629B3020@yahoo.com> <0690CFB1-6A6D-4B63-916C-BAB7F6256000@yahoo.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Nov 25, 2024, at 22:10, Mark Millard <marklmi@yahoo.com> wrote: > On Nov 25, 2024, at 18:05, Mark Millard <marklmi@yahoo.com> wrote: >=20 >> Top posting going in a different direction that >> established a way to control the behavior in my >> context . . . >=20 > For folks new to the discoveries: the context here > is poudriere bulk builds, for USE_TMPFS=3Dall vs. > USE_TMPFS=3Dno . My test context is amd64 on a > 7950X3D system with 192 GiBytes of RAM. Others have > other contexts, including an Intel system. >=20 >> I changed USE_TMPFS=3Dall to USE_TMPFS=3Dno : >>=20 >> USE_TMPFS=3Dall gets the failure >=20 > Note: The test case is corruptions of the likes of parts of > the .got.plt in libsass.so.1.0.0 from text/proc/libsass . > The corruptions are well 4 KiByte aligned blocks of zeros > showing up in the files that should not be that way. >=20 > 2 examples of bad libsass.so.1.0.0 builds have: >=20 > Contents of section .got.plt: > 2bed60 00000000 00000000 00000000 00000000 ................ > . . . > 2befc0 00000000 00000000 00000000 00000000 ................ > 2befd0 00000000 00000000 00000000 00000000 ................ > 2befe0 00000000 00000000 00000000 00000000 ................ > 2beff0 00000000 00000000 00000000 00000000 ................ > 2bf000 96ab2a00 00000000 a6ab2a00 00000000 ..*.......*..... > 2bf010 b6ab2a00 00000000 c6ab2a00 00000000 ..*.......*..... > 2bf020 d6ab2a00 00000000 e6ab2a00 00000000 ..*.......*..... > 2bf030 f6ab2a00 00000000 06ac2a00 00000000 ..*.......*..... > . . . >=20 > Contents of section .got.plt: > 2bed60 00000000 00000000 00000000 00000000 ................ > . . . > 2befc0 00000000 00000000 00000000 00000000 ................ > 2befd0 00000000 00000000 00000000 00000000 ................ > 2befe0 00000000 00000000 00000000 00000000 ................ > 2beff0 00000000 00000000 00000000 00000000 ................ > 2bf000 00000000 00000000 00000000 00000000 ................ > 2bf010 00000000 00000000 00000000 00000000 ................ > 2bf020 00000000 00000000 00000000 00000000 ................ > 2bf030 00000000 00000000 00000000 00000000 ................ > . . . > 2bffc0 00000000 00000000 00000000 00000000 ................ > 2bffd0 00000000 00000000 00000000 00000000 ................ > 2bffe0 00000000 00000000 00000000 00000000 ................ > 2bfff0 00000000 00000000 00000000 00000000 ................ > 2c0000 96cb2a00 00000000 a6cb2a00 00000000 ..*.......*..... > 2c0010 b6cb2a00 00000000 c6cb2a00 00000000 ..*.......*..... > 2c0020 d6cb2a00 00000000 e6cb2a00 00000000 ..*.......*..... > 2c0030 f6cb2a00 00000000 06cc2a00 00000000 ..*.......*..... > . . . >=20 > So: Where the zeros end varies but the start of > good data end up's at some 0x...000 offset: a > multiple of 4 KiBytes. >=20 >> vs. >> USE_TMPFS=3Dno works just fine >>=20 >> So it is a FreeBSD system error associated with >> use of tmpfs . >=20 > Recent work on tmpfs includes: >=20 > Mon, 09 Sep 2024 > =E2=80=A2 git: 8fa5e0f21fd1 - main - tmpfs: Account for whiteouts = during rename/rmdir Jason A. Harmening > Fri, 04 Oct 2024 > =E2=80=A2 git: 75734c4360fc - main - tmpfs: check residence in = data_locked Doug Moore > Sun, 13 Oct 2024 > =E2=80=A2 git: ec22e705c266 - main - tmpfs: remove duplicate flags = check in tmpfs_rmdir Alan Somers > Thu, 24 Oct 2024 > =E2=80=A2 git: db08b0b04dec - main - tmpfs_vnops: move swap work to = swap_pager Doug Moore >=20 > swap_pager (given the reference to it above): >=20 > Tue, 08 Oct 2024 > =E2=80=A2 git: d0b225d16418 - main - swap_pager: use iterators in = swp_pager_meta_build Doug Moore > Fri, 11 Oct 2024 > =E2=80=A2 git: 1107834090be - main - swap_pager: swapoff detecting = object death Doug Moore > Thu, 24 Oct 2024 > =E2=80=A2 git: 34951b0b9e78 - main - swap_pager: move = scan_all_shadowed, use iterators Doug Moore > =E2=80=A2 git: 02e85d1c8a41 - main - swap_pager: fix assert in = seek_data Doug Moore=20 > =E2=80=A2 git: faa9356f97d2 - main - swap_pager: fix seek_hole = assert Doug Moore > Sat, 26 Oct 2024 > =E2=80=A2 git: 39f6d1e7f835 - main - swap_pager: iter in haspage, = lookup, getpages Doug Moore > Wed, 13 Nov 2024 > =E2=80=A2 git: d11d407aee48 - main - swap_pager: Ensure that = swapoff puts swapped-in pages in page queues Mark Johnston >=20 > I do not know at this time when the corruptions started. The > above is only suggestive. With a bulk -i active but from outside the bulk -i : # df -m | sort -k6,6 | grep ^tmpfs tmpfs = 182907 0 182907 0% = /usr/local/poudriere/data/.m/main-amd64-default tmpfs = 184770 1863 182907 1% = /usr/local/poudriere/data/.m/main-amd64-default/ref tmpfs = 2048 45 2002 2% = /usr/local/poudriere/data/.m/main-amd64-default/ref/.p tmpfs = 182907 0 182907 0% = /usr/local/poudriere/data/.m/main-amd64-default/ref/var/db/ports Note: bulk -i lands one in = /usr/local/poudriere/data/.m/main-amd64-default/ref/ =46rom inside a bulk -i where I did a manual make command after it built and installed libsass.so.1.0.0 . The manual make produced a /wrkdirs/ : # find -s / -name libsass.so.1.0.0 -exec ls -ilodT {} \; 6417 -rwxr-xr-x 1 root wheel - 42444424 Nov 26 07:24:37 2024 = /usr/local/lib/libsass.so.1.0.0 11872 -rwxr-xr-x 1 root wheel - 42444424 Nov 26 07:26:48 2024 = /wrkdirs/usr/ports/textproc/libsass/work/libsass-3.6.6/src/.libs/libsass.s= o.1.0.0 12294 -rwxr-xr-x 1 root wheel - 42444424 Nov 26 07:26:48 2024 = /wrkdirs/usr/ports/textproc/libsass/work/stage/usr/local/lib/libsass.so.1.= 0.0 # objdump -hs = /wrkdirs/usr/ports/textproc/libsass/work/libsass-3.6.6/src/.libs/libsass.s= o.1.0.0 | less . . . 2bed60 78ba2b00 00000000 00000000 00000000 x.+............. 2bed70 00000000 00000000 86a62a00 00000000 ..........*..... 2bed80 96a62a00 00000000 a6a62a00 00000000 ..*.......*..... 2bed90 b6a62a00 00000000 c6a62a00 00000000 ..*.......*..... . . . So the original creation looks okay. But . . . # objdump -hs = /wrkdirs/usr/ports/textproc/libsass/work/stage/usr/local/lib/libsass.so.1.= 0.0 | less . . . 2bed60 00000000 00000000 00000000 00000000 ................ 2bed70 00000000 00000000 00000000 00000000 ................ 2bed80 00000000 00000000 00000000 00000000 ................ 2bed90 00000000 00000000 00000000 00000000 ................ . . . 2befc0 00000000 00000000 00000000 00000000 ................ 2befd0 00000000 00000000 00000000 00000000 ................ 2befe0 00000000 00000000 00000000 00000000 ................ 2beff0 00000000 00000000 00000000 00000000 ................ 2bf000 96ab2a00 00000000 a6ab2a00 00000000 ..*.......*..... 2bf010 b6ab2a00 00000000 c6ab2a00 00000000 ..*.......*..... 2bf020 d6ab2a00 00000000 e6ab2a00 00000000 ..*.......*..... 2bf030 f6ab2a00 00000000 06ac2a00 00000000 ..*.......*..... . . . So: The later, staged copy is a bad copy. Both are in the tmpfs. So copying to the staging area makes a corrupted copy inside the same tmpfs. After that, further copies of staging's bad copy can be expected to be messed up. =3D=3D=3D Mark Millard marklmi at yahoo.com
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3660625A-0EE8-40DA-A248-EC18C734718C>