Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 25 Jun 2020 20:40:32 -0700
From:      Mark Millard <marklmi@yahoo.com>
To:        freebsd-arm <freebsd-arm@freebsd.org>
Subject:   Re: USB [USB3 and USB2] problems when using UEFi v1.16 to boot RPi4: Evidence of a read-time problem being involved
Message-ID:  <88B0E169-C42F-42D6-B2BA-957EAEC7DB8C@yahoo.com>
In-Reply-To: <F7BDD05D-C803-4ACB-9C48-6CBEC277F464@yahoo.com>
References:  <476DD0F0-2286-4B2C-8E44-4404AF17F5A8@yahoo.com> <B1FF8DD3-DFD1-4973-B0D2-6AC33BCAA59C@yahoo.com> <CF81584E-75CE-4BFC-8ACC-AB95E561B28D@yahoo.com> <F426CFE6-F619-4B3C-9260-07E72BC709AF@yahoo.com> <ED69F8C1-C042-43C6-941A-E154229E4623@googlemail.com> <F7BDD05D-C803-4ACB-9C48-6CBEC277F464@yahoo.com>

next in thread | previous in thread | raw e-mail | index | archive | help
[Looks like it is a read-time failure in some
new testing.]

On 2020-Jun-25, at 17:52, Mark Millard <marklmi at yahoo.com> wrote:
>=20
> On 2020-Jun-25, at 15:40, Klaus K=C3=BCchemann <maciphone2 at =
googlemail.com> wrote:
>=20
>> Am 25.06.2020 um 21:29 schrieb Mark Millard via freebsd-arm =
<freebsd-arm@freebsd.org>:
>>> =E2=80=A6
>>> .
>>> The test still failed to produce an accurate file copy
>>> but the kernel did not report anything either. I'm
>>> Unsure how get evidence of the context for the bad 4K
>>> chunks.
>>>=20
>> No clue if it has effects but maybe : dd if=3Dxxx of=3Dxxx bs=3D4k ?
>=20
> Something interesting does result from dd testing,
> even though doing file copies that way still gets
> the problem. In fact a couple of interesting points
> show up.
>=20
> Using dd to copy large files still gets corrupted copies.
> (Large files are only because the corruptions are not
> frequent in the files but a sufficiently large file
> seems to always have some corruption.)
>=20
> Interestingly, dd if=3D/dev/zero based large file
> generation has produced good files from what I
> can tell. (Generate separate files and diff them
> after a reboot.)
>=20
> The problem was originally discovered copying
> from another machine to a RPi4. But the Ethernet
> use involved USB in providing data (but not a
> local USB drive) --while /dev/zero does not
> involve USB as a data source and copies of
> data in memory via file content buffering. So
> the contrasting dd if=3D/dev/zero results may be
> indicating something.
>=20
> Another interesting point is that the following
> sequence seems repeatable for step (E)'s resultant
> property below:
>=20
> A) first do a couple of large dd if=3D/dev/zero file generations
> B) then do a (non-zero) large file copy (dd based or cp based)
> C) reboot
> D) diff the 2 files generated in (A): no differences
> E) diff the original large file and the temporary copy
>   from (B): there are differences and the temporary copy
>   has zero in every byte that is different.
>=20
> (E) suggests that the bad file copies via cp or
> via dd are picking up data from the wrong memory
> pages sometimes, (A) just made large numbers of
> pages zero, making it more likely a zero page
> would be used if the wrong page was referenced.
>=20
> An example of checking for (E) was:
>=20
> # diff clang-cortexA53-installworld-poud.tar mmjnk.other=20
> Binary files clang-cortexA53-installworld-poud.tar and mmjnk.other =
differ
>=20
> # cmp -l clang-cortexA53-installworld-poud.tar mmjnk.other | grep -v " =
0$" | more
> --More--(END)
>=20
>=20
> Note about my example "large file" sizes:
>=20
> -rw-r--r--   1 root  wheel  4011026432 Apr 25 21:04:42 2020 =
clang-cortexA53-installworld-poud.tar
>=20
> and I've been mostly using 4 GiByte for the resultant size
> of large files generated via dd.
>=20
> I have not tried to find a minimum size for reliably
> getting corrupted file copies.
>=20

I continued after the above with (no additional reboot):

# cpuset -l0 cp -aRx clang-cortexA53-installworld-poud.tar mmjnk.other2

# diff clang-cortexA53-installworld-poud.tar mmjnk.other2
Binary files clang-cortexA53-installworld-poud.tar and mmjnk.other2 =
differ

# cpuset -l2 diff clang-cortexA53-installworld-poud.tar mmjnk.other2
Binary files clang-cortexA53-installworld-poud.tar and mmjnk.other2 =
differ

# cpuset -l3 cp -aRx clang-cortexA53-installworld-poud.tar mmjnk.other3

# cpuset -l3 diff clang-cortexA53-installworld-poud.tar mmjnk.other3
Binary files clang-cortexA53-installworld-poud.tar and mmjnk.other3 =
differ

Note that the final mmjnk.other2 was via cpu 2.
Note that the mmjnk.other3       was via cpu 3.
Note that the original mmjnk.other was without limiting the cpu usage.

Then I went back and did a compare of files not written since
the reboot and showing zeros earlier above. First I show some
of the output of a prior zeros-producing compare:

# cmp -l clang-cortexA53-installworld-poud.tar mmjnk.other | more
1795768321 264   0
1795768322 167   0
1795768323 272   0
1795768324   6   0
1795768325   3   0
1795768326 370   0
1795768327  10   0
1795768328 112   0
. . .

(Yes, I did not lock down what cpu was to be used for the cmp -l
usage in this activity. In the future I probably should experiment
with that too.)

The new comparison looked like:

# cmp -l clang-cortexA53-installworld-poud.tar  mmjnk.other | more
1442340865  15   0
1442340866 245   0
1442340867   1  30
1442340868   1 353
1442340869   0  11
1442340870 100  17
1442340871 226 271
1442340872  31 125
. . .

Not all-zeros being presented on the right any more! And not
the same offset either (so different left hand side data).
(Some bytes are a match to the left side and so do not show a
line overall.)

So I looked at the new copy made under cpuset -l2 :

# cmp -l clang-cortexA53-installworld-poud.tar  mmjnk.other2 | more
1442340865  15   0
1442340866 245   0
1442340867   1  30
1442340868   1 353
1442340869   0  11
1442340870 100  17
1442340871 226 271
1442340872  31 125
. . .

Same offset in this file and *same* values on the left and right.
(Not just those shown above.)

So I looked at the new copy made under cpuset -l3 :

# cmp -l clang-cortexA53-installworld-poud.tar  mmjnk.other3 | more
981008385  62   0
981008386 111   0
981008387 157  30
981008388  65 353
981008389 123  11
981008390 145  17
981008391 164 271
981008393 160   0
. . .

Different offset in this file but the *same* values on the right.
(Not just those shown above.) The left values are different,
matching up with the offset difference.

(Some bytes are a match to the different data on the left and so
do not show a line but the right side values appear to match the
prior 2 examples even where lines disappear differently because
of left-side content.)

So, apparently, the same page of content used for the right
side material but at a different point in the diff. (Lack
of controlling the cpu used for cmp -l might be contributing?)

Note: 1795768321 % 4096 =3D=3D 1
Note: 1442340865 % 4096 =3D=3D 1
Note:  981008385 % 4096 =3D=3D 1

cmp starts with line "1", so the above all align
at 4096 boundaries.


Overall this indicates that an unmodified file can have
its content appear to change and that multiple files
got the same block of bad data showing up in their
respective comparisons, just not always at the same
offset in the files.

I've no clue if the roles of "left" and "right" could
swap. So far the right seems to be the one that gets
the bad data.

=3D=3D=3D
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?88B0E169-C42F-42D6-B2BA-957EAEC7DB8C>