Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 28 Oct 2023 18:25:18 -0700
From:      Mark Millard <marklmi@yahoo.com>
To:        Glen Barber <gjb@FreeBSD.org>
Cc:        Colin Percival <cperciva@tarsnap.com>, freebsd-arm <freebsd-arm@freebsd.org>
Subject:   Re: 15-aarch64-RPI-snap
Message-ID:  <6CF6E677-CF8F-4DE9-9781-754003FCE0B6@yahoo.com>
In-Reply-To: <183A9CD0-42DB-4A0C-982D-FC6D3980163A@yahoo.com>
References:  <0100018b6a9d257c-b35e4157-ba97-4aa7-988c-aba797c6d2ca-000000@email.amazonses.com> <ACBCBC83-DD61-4E0A-89DC-9DDD1B71B8DE@freebsd.org> <13B64416-4334-4070-8588-71F7D938350B@yahoo.com> <3B40F89C-7E5E-427F-A7A1-2D37CCC06A6F@yahoo.com> <183A9CD0-42DB-4A0C-982D-FC6D3980163A@yahoo.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Oct 28, 2023, at 09:40, Mark Millard <marklmi@yahoo.com> wrote:

> On Oct 27, 2023, at 23:00, Mark Millard <marklmi@yahoo.com> wrote:
>=20
>> On Oct 27, 2023, at 22:24, Mark Millard <marklmi@yahoo.com> wrote:
>>=20
>>> On Oct 27, 2023, at 21:34, Glen Barber <gjb@FreeBSD.org> wrote:
>>>=20
>>>>>> . . .
>>>>>>                                                                   =
                                                                  ^
>>>>>> ./offset.inc:16:19: error: null character ignored =
[-Werror,-Wnull-character]
>>>>>> =
<U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U=
+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0=
000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+000=
0><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000>=
<U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U=
+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0=
000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+000=
0><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000>=
<U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U=
+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0=
000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+000=
0><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000>=
<U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U=
+0000><U+0000><U+0000><U+00
>>>>>> 00><U+0000>#undef _SA
>>>>>>                                                                   =
                                                                         =
 ^
>>>=20
>>> Are the above from a ZFS file system? UFS? Something else?
>>>=20
>>> Back in 2021-Nov (15..21) I had problems where ZFS was leading
>>> to blocks of such on aarch64, not specifically RPi*'s, various
>>> files but not the same ones from test to test. When I updated
>>> past some zfs updates on the 23rd the problem stopped.
>>>=20
>>> I also have notes from 2022-Mar (19..22) about replicating
>>> another example problem someone was having with files ending
>>> up with such blocks of bytes but the testing was on the
>>> ThreadRipper 1950X. (The replication showed that ccache did
>>> not need to be involved since I've never used it.) Again
>>> ZFS was part of the environment that got the replication.
>>> Mark Johnson fixed sys/contrib/openzfs/module/zfs/dnode.c
>>> during this and my ability to replicate the issue then
>>> stopped when I tested the patch.
>>>=20
>>> Which ever file system it is that holds the bad bytes, some
>>> attempted testing for repeatability of the problem could
>>> be of interest, some of that being on aarch64 but not on
>>> RPi*'s, some of it not on aarch64 at all. But it might take
>>> information about the context to know better what/how to
>>> test. That could include information about both the host and
>>> the jail OS versions if such is involved.
>>=20
>> Those last notes are likely too generic, in that normally
>> official buildworld buildkernel activity is done on amd64
>> for all target platforms (last I knew). (Not that running
>> such builds on other platforms would be a bad problem-scope
>> isolation test.)
>>=20
>> Any notes that help delimit what sort of test context
>> would be a reasonable partial replication of the original
>> context could prove useful.
>>=20
>>> . . .
>=20
> If the file system is ZFS, I'll note that main [so: 15] already has
> a zpool feature that is not part of openzfs-2.2 and so not part of
> releng/14.0 or stable/14 . So what zpool features are enabled could
> be relevant to problems that only happen in main and might need to
> be involved in efforts to replicate the problem.
>=20
> But I've not evaluated if redaction_list_spill would be likely to
> possibly be involved for the specific type of file corruptions.

I'll note that the upstream openzfs master commit for the data
corruption issue:

"Zpool can start allocating from metaslab before TRIMs have completed"

was on 2023-Oct-12, so not long ago. If the official builds use ZFS
and TRIM but are based on a system version that predates FreeBSD picking
up that commit, then there is a known data zfs data corruption issue
present in the official build environment.

Since port->package builds are based on a HOST/JAIL such as:

Host OSVERSION: 1500000
Jail OSVERSION: 1500002
or:
Host OSVERSION: 1500000
Jail OSVERSION: 1400097

but the Host kernel is the one in use (with the Host kernel
commit not identified), it could have such an issue.

(Because of such issues, I wish that Host OSVERSION related
commit identification was also reported for the package builds.
Presuming ZFS use, I also wish that the zpool features enabled
were reported for similar reasons.)


=3D=3D=3D
Mark Millard
marklmi at yahoo.com




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?6CF6E677-CF8F-4DE9-9781-754003FCE0B6>