Date: Thu, 28 Nov 2024 12:29:09 -0800 From: Mark Millard <marklmi@yahoo.com> To: "scf@freebsd.org" <scf@FreeBSD.org> Cc: FreeBSD Current <freebsd-current@freebsd.org>, FreeBSD Mailing List <freebsd-ports@freebsd.org> Subject: Re: port binary dumping core on recent head in poudriere [tmpfs corruptions involving blocks of zeros that should not be all zeros] Message-ID: <4D541C32-7DF4-4CAC-B31E-D4DD17977154@yahoo.com> References: <4D541C32-7DF4-4CAC-B31E-D4DD17977154.ref@yahoo.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Sean C. Farley <scf_at_FreeBSD.org> wrote on Date: Thu, 28 Nov 2024 18:16:16 UTC : > On Mon, 25 Nov 2024, Mark Millard wrote: >=20 > > On Nov 25, 2024, at 18:05, Mark Millard <marklmi@yahoo.com> wrote: > > > >> Top posting going in a different direction that > >> established a way to control the behavior in my > >> context . . . > > > > For folks new to the discoveries: the context here > > is poudriere bulk builds, for USE_TMPFS=3Dall vs. > > USE_TMPFS=3Dno . My test context is amd64 on a > > 7950X3D system with 192 GiBytes of RAM. Others have > > other contexts, including an Intel system. >=20 > I have been seeing some odd behavior from Firefox as well as with=20 > poudriere builds on my system. Both of which are touching a tmpfs=20 > system as I have setup /tmp as tmpfs, which Firefox uses, and=20 > USE_TMPFS=3Dall. >=20 > The system has been an experiment, for me, with undervolting. I have=20= > been attributing any flakiness to the undervolting, but I have reduced=20= > that a lot while the instability has been consistent as in it has = stayed=20 > rare. I cannot tell how many times I have run memtest86 on this = system. >=20 > System setup: > - FreeBSD 14.2-STABLE The context that I investigated --and what was fixed by a commit only applies to-- main [so; 15 as stands], not stable/14 . stable/14 has no commits mentioning "tmpfs" after 2024-Jun-04. > - i7-14700K (latest BIOS which *should* fix Intel power-related bugs) > - 128 GiB RAM > - ZFS (mirrored drives) The primary test context was ZFS but no redundancy or such. (Only really used for bectl activity.) But testing on a UFS copy of the live directory tree also got the problem. The actual problem was in tmpfs support. > - 2 encrypted swap partitions (64 GiB each, lightly used) No encryption involved in my context at all. > - Lightly undervolted (-0.06 offset to Global Core SVID Voltage) Nothing analogous in my context. > - /tmp is tmpfs I have no default areas that are tmpfs: so only what poudriere temporarily created during the bulk builds. > - ${HOME}/.cache is tmpfs No use of ccache or the like. > - Poudriere: > - USE_TMPFS=3Dall I also use TMPFS_BLACKLIST . My personal environment causes use of -gline-tables-only as debug information normally. (That option is clang/clang++ specific. gcc* and clang* do not seem to have a common notation for analogous settings on the command line.) > - ccache No use of ccache or the like. > - jail version in sync with host True for my context. But the issue that was fixed was in the kernel code, not the world code. > - /usr/ports is mounted with nullfs Also true for my context. > I have wondered if it was swap-related, but recently I noticed a build=20= > failure with games/veloren-weekly where swap was available but zero=20 > bytes were used. The system was under little load at the time so less=20= > chance of undervolting being an issue. >=20 > Build failure: > ----------------------------- >=20 > portpicker =3D { path =3D = '/wrkdirs/usr/ports/games/veloren-weekly/work/portpicker-rs-df6b37872f3586= ac3b21d08b56c8ec7cd92fb172' } > =3D=3D=3D> Updating Cargo.lock > error: checksum for `windows_x86_64_msvc v0.42.2` changed between lock = files >=20 > this could be indicative of a few possible errors: >=20 > * the lock file is corrupt > * a replacement source in use (e.g., a mirror) returned a different = checksum > * the source itself may be corrupt in one way or another >=20 > unable to verify that `windows_x86_64_msvc v0.42.2` is the same as = when the lockfile was generated >=20 > *** Error code 101 >=20 > ----------------------------- >=20 > Restarting the build finished successfully. >=20 > >> I changed USE_TMPFS=3Dall to USE_TMPFS=3Dno : > >> > >> USE_TMPFS=3Dall gets the failure >=20 > *snip* >=20 > >> vs. > >> USE_TMPFS=3Dno works just fine > >> > >> So it is a FreeBSD system error associated with > >> use of tmpfs . > > > > Recent work on tmpfs includes: None of this is directly stable/14 : all main [so: 15 as stands]. stable/14 has no commits mentioning "tmpfs" after 2024-Jun-04. So none of these changes are involved for stable/14 . > > > > Mon, 09 Sep 2024 > > =C3=A2=E2=82=AC=C2=A2 git: 8fa5e0f21fd1 - main - tmpfs: Account for = whiteouts during rename/rmdir Jason A. Harmening > > Fri, 04 Oct 2024 > > =C3=A2=E2=82=AC=C2=A2 git: 75734c4360fc - main - tmpfs: check = residence in data_locked Doug Moore > > Sun, 13 Oct 2024 > > =C3=A2=E2=82=AC=C2=A2 git: ec22e705c266 - main - tmpfs: remove = duplicate flags check in tmpfs_rmdir Alan Somers > > Thu, 24 Oct 2024 > > =C3=A2=E2=82=AC=C2=A2 git: db08b0b04dec - main - tmpfs_vnops: move = swap work to swap_pager Doug Moore > > > > swap_pager (given the reference to it above): > > > > Tue, 08 Oct 2024 > > =C3=A2=E2=82=AC=C2=A2 git: d0b225d16418 - main - swap_pager: use = iterators in swp_pager_meta_build Doug Moore > > Fri, 11 Oct 2024 > > =C3=A2=E2=82=AC=C2=A2 git: 1107834090be - main - swap_pager: swapoff = detecting object death Doug Moore > > Thu, 24 Oct 2024 > > =C3=A2=E2=82=AC=C2=A2 git: 34951b0b9e78 - main - swap_pager: move = scan_all_shadowed, use iterators Doug Moore > > =C3=A2=E2=82=AC=C2=A2 git: 02e85d1c8a41 - main - swap_pager: fix = assert in seek_data Doug Moore > > =C3=A2=E2=82=AC=C2=A2 git: faa9356f97d2 - main - swap_pager: fix = seek_hole assert Doug Moore > > Sat, 26 Oct 2024 > > =C3=A2=E2=82=AC=C2=A2 git: 39f6d1e7f835 - main - swap_pager: iter in = haspage, lookup, getpages Doug Moore > > Wed, 13 Nov 2024 > > =C3=A2=E2=82=AC=C2=A2 git: d11d407aee48 - main - swap_pager: Ensure = that swapoff puts swapped-in pages in page queues Mark Johnston > > > > I do not know at this time when the corruptions started. The > > above is only suggestive. >=20 > Thank you for listing those. >=20 > I need to find some time to look over those changes although I am no=20= > kernel guru by a long shot. However, I see now that it looks like much=20= > more knowledgeable people are already looking on the current mailing=20= > list at the issue. None of them were applied to stable/14 . =3D=3D=3D Mark Millard marklmi at yahoo.com
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4D541C32-7DF4-4CAC-B31E-D4DD17977154>