Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 28 Nov 2024 12:29:09 -0800
From:      Mark Millard <marklmi@yahoo.com>
To:        "scf@freebsd.org" <scf@FreeBSD.org>
Cc:        FreeBSD Current <freebsd-current@freebsd.org>, FreeBSD Mailing List <freebsd-ports@freebsd.org>
Subject:   Re: port binary dumping core on recent head in poudriere [tmpfs corruptions involving blocks of zeros that should not be all zeros]
Message-ID:  <4D541C32-7DF4-4CAC-B31E-D4DD17977154@yahoo.com>
References:  <4D541C32-7DF4-4CAC-B31E-D4DD17977154.ref@yahoo.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Sean C. Farley <scf_at_FreeBSD.org> wrote on
Date: Thu, 28 Nov 2024 18:16:16 UTC :

> On Mon, 25 Nov 2024, Mark Millard wrote:
>=20
> > On Nov 25, 2024, at 18:05, Mark Millard <marklmi@yahoo.com> wrote:
> >
> >> Top posting going in a different direction that
> >> established a way to control the behavior in my
> >> context . . .
> >
> > For folks new to the discoveries: the context here
> > is poudriere bulk builds, for USE_TMPFS=3Dall vs.
> > USE_TMPFS=3Dno . My test context is amd64 on a
> > 7950X3D system with 192 GiBytes of RAM. Others have
> > other contexts, including an Intel system.
>=20
> I have been seeing some odd behavior from Firefox as well as with=20
> poudriere builds on my system. Both of which are touching a tmpfs=20
> system as I have setup /tmp as tmpfs, which Firefox uses, and=20
> USE_TMPFS=3Dall.
>=20
> The system has been an experiment, for me, with undervolting. I have=20=

> been attributing any flakiness to the undervolting, but I have reduced=20=

> that a lot while the instability has been consistent as in it has =
stayed=20
> rare. I cannot tell how many times I have run memtest86 on this =
system.
>=20
> System setup:
> - FreeBSD 14.2-STABLE

The context that I investigated --and what was fixed by a commit only
applies to-- main [so; 15 as stands], not stable/14 .

stable/14 has no commits mentioning "tmpfs" after 2024-Jun-04.

> - i7-14700K (latest BIOS which *should* fix Intel power-related bugs)
> - 128 GiB RAM
> - ZFS (mirrored drives)

The primary test context was ZFS but no redundancy or such. (Only
really used for bectl activity.) But testing on a UFS copy of
the live directory tree also got the problem. The actual problem
was in tmpfs support.

> - 2 encrypted swap partitions (64 GiB each, lightly used)

No encryption involved in my context at all.

> - Lightly undervolted (-0.06 offset to Global Core SVID Voltage)

Nothing analogous in my context.

> - /tmp is tmpfs

I have no default areas that are tmpfs: so only what
poudriere temporarily created during the bulk builds.

> - ${HOME}/.cache is tmpfs

No use of ccache or the like.

> - Poudriere:
> - USE_TMPFS=3Dall

I also use TMPFS_BLACKLIST .

My personal environment causes use of -gline-tables-only as
debug information normally. (That option is clang/clang++
specific. gcc* and clang* do not seem to have a common
notation for analogous settings on the command line.)

> - ccache

No use of ccache or the like.

>    - jail version in sync with host

True for my context. But the issue that was fixed was
in the kernel code, not the world code.

> - /usr/ports is mounted with nullfs

Also true for my context.

> I have wondered if it was swap-related, but recently I noticed a build=20=

> failure with games/veloren-weekly where swap was available but zero=20
> bytes were used. The system was under little load at the time so less=20=

> chance of undervolting being an issue.
>=20
> Build failure:
> -----------------------------
>=20
> portpicker =3D { path =3D =
'/wrkdirs/usr/ports/games/veloren-weekly/work/portpicker-rs-df6b37872f3586=
ac3b21d08b56c8ec7cd92fb172' }
> =3D=3D=3D> Updating Cargo.lock
> error: checksum for `windows_x86_64_msvc v0.42.2` changed between lock =
files
>=20
> this could be indicative of a few possible errors:
>=20
> * the lock file is corrupt
> * a replacement source in use (e.g., a mirror) returned a different =
checksum
> * the source itself may be corrupt in one way or another
>=20
> unable to verify that `windows_x86_64_msvc v0.42.2` is the same as =
when the lockfile was generated
>=20
> *** Error code 101
>=20
> -----------------------------
>=20
> Restarting the build finished successfully.
>=20
> >> I changed USE_TMPFS=3Dall to USE_TMPFS=3Dno :
> >>
> >> USE_TMPFS=3Dall gets the failure
>=20
> *snip*
>=20
> >> vs.
> >> USE_TMPFS=3Dno works just fine
> >>
> >> So it is a FreeBSD system error associated with
> >> use of tmpfs .
> >
> > Recent work on tmpfs includes:

None of this is directly stable/14 :  all main
[so: 15 as stands].

stable/14 has no commits mentioning "tmpfs" after 2024-Jun-04. So
none of these changes are involved for stable/14 .

> >
> > Mon, 09 Sep 2024
> > =C3=A2=E2=82=AC=C2=A2 git: 8fa5e0f21fd1 - main - tmpfs: Account for =
whiteouts during rename/rmdir Jason A. Harmening
> > Fri, 04 Oct 2024
> > =C3=A2=E2=82=AC=C2=A2 git: 75734c4360fc - main - tmpfs: check =
residence in data_locked Doug Moore
> > Sun, 13 Oct 2024
> > =C3=A2=E2=82=AC=C2=A2 git: ec22e705c266 - main - tmpfs: remove =
duplicate flags check in tmpfs_rmdir Alan Somers
> > Thu, 24 Oct 2024
> > =C3=A2=E2=82=AC=C2=A2 git: db08b0b04dec - main - tmpfs_vnops: move =
swap work to swap_pager Doug Moore
> >
> > swap_pager (given the reference to it above):
> >
> > Tue, 08 Oct 2024
> > =C3=A2=E2=82=AC=C2=A2 git: d0b225d16418 - main - swap_pager: use =
iterators in swp_pager_meta_build Doug Moore
> > Fri, 11 Oct 2024
> > =C3=A2=E2=82=AC=C2=A2 git: 1107834090be - main - swap_pager: swapoff =
detecting object death Doug Moore
> > Thu, 24 Oct 2024
> > =C3=A2=E2=82=AC=C2=A2 git: 34951b0b9e78 - main - swap_pager: move =
scan_all_shadowed, use iterators Doug Moore
> > =C3=A2=E2=82=AC=C2=A2 git: 02e85d1c8a41 - main - swap_pager: fix =
assert in seek_data Doug Moore
> > =C3=A2=E2=82=AC=C2=A2 git: faa9356f97d2 - main - swap_pager: fix =
seek_hole assert Doug Moore
> > Sat, 26 Oct 2024
> > =C3=A2=E2=82=AC=C2=A2 git: 39f6d1e7f835 - main - swap_pager: iter in =
haspage, lookup, getpages Doug Moore
> > Wed, 13 Nov 2024
> > =C3=A2=E2=82=AC=C2=A2 git: d11d407aee48 - main - swap_pager: Ensure =
that swapoff puts swapped-in pages in page queues Mark Johnston
> >
> > I do not know at this time when the corruptions started. The
> > above is only suggestive.
>=20
> Thank you for listing those.
>=20
> I need to find some time to look over those changes although I am no=20=

> kernel guru by a long shot. However, I see now that it looks like much=20=

> more knowledgeable people are already looking on the current mailing=20=

> list at the issue.

None of them were applied to stable/14 .


=3D=3D=3D
Mark Millard
marklmi at yahoo.com




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4D541C32-7DF4-4CAC-B31E-D4DD17977154>