Date: Wed, 15 Jul 2020 03:35:41 -0700 From: Mark Millard <marklmi@yahoo.com> To: freebsd-arm <freebsd-arm@freebsd.org> Subject: Re: USB [USB3 and USB2] problems when using UEFi v1.16 to boot RPi4: Evidence of a read-time problem being involved (contexts that avoids the issue) Message-ID: <19F98671-4B69-44A6-8254-B186F0ED995F@yahoo.com> In-Reply-To: <88B0E169-C42F-42D6-B2BA-957EAEC7DB8C@yahoo.com> References: <476DD0F0-2286-4B2C-8E44-4404AF17F5A8@yahoo.com> <B1FF8DD3-DFD1-4973-B0D2-6AC33BCAA59C@yahoo.com> <CF81584E-75CE-4BFC-8ACC-AB95E561B28D@yahoo.com> <F426CFE6-F619-4B3C-9260-07E72BC709AF@yahoo.com> <ED69F8C1-C042-43C6-941A-E154229E4623@googlemail.com> <F7BDD05D-C803-4ACB-9C48-6CBEC277F464@yahoo.com> <88B0E169-C42F-42D6-B2BA-957EAEC7DB8C@yahoo.com>
index | next in thread | previous in thread | raw e-mail
On 2020-Jun-25, at 20:40, Mark Millard <marklmi at yahoo.com> wrote: > [Looks like it is a read-time failure in some > new testing.] > > On 2020-Jun-25, at 17:52, Mark Millard <marklmi at yahoo.com> wrote: >> >> On 2020-Jun-25, at 15:40, Klaus Küchemann <maciphone2 at googlemail.com> wrote: >> >>> Am 25.06.2020 um 21:29 schrieb Mark Millard via freebsd-arm <freebsd-arm@freebsd.org>: >>>> … >>>> . >>>> The test still failed to produce an accurate file copy >>>> but the kernel did not report anything either. I'm >>>> Unsure how get evidence of the context for the bad 4K >>>> chunks. >>>> >>> No clue if it has effects but maybe : dd if=xxx of=xxx bs=4k ? >> >> Something interesting does result from dd testing, >> even though doing file copies that way still gets >> the problem. In fact a couple of interesting points >> show up. >> >> Using dd to copy large files still gets corrupted copies. >> (Large files are only because the corruptions are not >> frequent in the files but a sufficiently large file >> seems to always have some corruption.) >> >> Interestingly, dd if=/dev/zero based large file >> generation has produced good files from what I >> can tell. (Generate separate files and diff them >> after a reboot.) >> >> The problem was originally discovered copying >> from another machine to a RPi4. But the Ethernet >> use involved USB in providing data (but not a >> local USB drive) --while /dev/zero does not >> involve USB as a data source and copies of >> data in memory via file content buffering. So >> the contrasting dd if=/dev/zero results may be >> indicating something. >> >> Another interesting point is that the following >> sequence seems repeatable for step (E)'s resultant >> property below: >> >> A) first do a couple of large dd if=/dev/zero file generations >> B) then do a (non-zero) large file copy (dd based or cp based) >> C) reboot >> D) diff the 2 files generated in (A): no differences >> E) diff the original large file and the temporary copy >> from (B): there are differences and the temporary copy >> has zero in every byte that is different. >> >> (E) suggests that the bad file copies via cp or >> via dd are picking up data from the wrong memory >> pages sometimes, (A) just made large numbers of >> pages zero, making it more likely a zero page >> would be used if the wrong page was referenced. >> >> An example of checking for (E) was: >> >> # diff clang-cortexA53-installworld-poud.tar mmjnk.other >> Binary files clang-cortexA53-installworld-poud.tar and mmjnk.other differ >> >> # cmp -l clang-cortexA53-installworld-poud.tar mmjnk.other | grep -v " 0$" | more >> --More--(END) >> >> >> Note about my example "large file" sizes: >> >> -rw-r--r-- 1 root wheel 4011026432 Apr 25 21:04:42 2020 clang-cortexA53-installworld-poud.tar >> >> and I've been mostly using 4 GiByte for the resultant size >> of large files generated via dd. >> >> I have not tried to find a minimum size for reliably >> getting corrupted file copies. >> > > I continued after the above with (no additional reboot): > > # cpuset -l0 cp -aRx clang-cortexA53-installworld-poud.tar mmjnk.other2 > > # diff clang-cortexA53-installworld-poud.tar mmjnk.other2 > Binary files clang-cortexA53-installworld-poud.tar and mmjnk.other2 differ > > # cpuset -l2 diff clang-cortexA53-installworld-poud.tar mmjnk.other2 > Binary files clang-cortexA53-installworld-poud.tar and mmjnk.other2 differ > > # cpuset -l3 cp -aRx clang-cortexA53-installworld-poud.tar mmjnk.other3 > > # cpuset -l3 diff clang-cortexA53-installworld-poud.tar mmjnk.other3 > Binary files clang-cortexA53-installworld-poud.tar and mmjnk.other3 differ > > Note that the final mmjnk.other2 was via cpu 2. > Note that the mmjnk.other3 was via cpu 3. > Note that the original mmjnk.other was without limiting the cpu usage. > > Then I went back and did a compare of files not written since > the reboot and showing zeros earlier above. First I show some > of the output of a prior zeros-producing compare: > > # cmp -l clang-cortexA53-installworld-poud.tar mmjnk.other | more > 1795768321 264 0 > 1795768322 167 0 > 1795768323 272 0 > 1795768324 6 0 > 1795768325 3 0 > 1795768326 370 0 > 1795768327 10 0 > 1795768328 112 0 > . . . > > (Yes, I did not lock down what cpu was to be used for the cmp -l > usage in this activity. In the future I probably should experiment > with that too.) > > The new comparison looked like: > > # cmp -l clang-cortexA53-installworld-poud.tar mmjnk.other | more > 1442340865 15 0 > 1442340866 245 0 > 1442340867 1 30 > 1442340868 1 353 > 1442340869 0 11 > 1442340870 100 17 > 1442340871 226 271 > 1442340872 31 125 > . . . > > Not all-zeros being presented on the right any more! And not > the same offset either (so different left hand side data). > (Some bytes are a match to the left side and so do not show a > line overall.) > > So I looked at the new copy made under cpuset -l2 : > > # cmp -l clang-cortexA53-installworld-poud.tar mmjnk.other2 | more > 1442340865 15 0 > 1442340866 245 0 > 1442340867 1 30 > 1442340868 1 353 > 1442340869 0 11 > 1442340870 100 17 > 1442340871 226 271 > 1442340872 31 125 > . . . > > Same offset in this file and *same* values on the left and right. > (Not just those shown above.) > > So I looked at the new copy made under cpuset -l3 : > > # cmp -l clang-cortexA53-installworld-poud.tar mmjnk.other3 | more > 981008385 62 0 > 981008386 111 0 > 981008387 157 30 > 981008388 65 353 > 981008389 123 11 > 981008390 145 17 > 981008391 164 271 > 981008393 160 0 > . . . > > Different offset in this file but the *same* values on the right. > (Not just those shown above.) The left values are different, > matching up with the offset difference. > > (Some bytes are a match to the different data on the left and so > do not show a line but the right side values appear to match the > prior 2 examples even where lines disappear differently because > of left-side content.) > > So, apparently, the same page of content used for the right > side material but at a different point in the diff. (Lack > of controlling the cpu used for cmp -l might be contributing?) > > Note: 1795768321 % 4096 == 1 > Note: 1442340865 % 4096 == 1 > Note: 981008385 % 4096 == 1 > > cmp starts with line "1", so the above all align > at 4096 boundaries. > > > Overall this indicates that an unmodified file can have > its content appear to change and that multiple files > got the same block of bad data showing up in their > respective comparisons, just not always at the same > offset in the files. > > I've no clue if the roles of "left" and "right" could > swap. So far the right seems to be the one that gets > the bad data. > Turns out that the combination of enabling the 3 GiByte limitation in uefi and not having D25219 applied in the kernel avoids the problem. I only used this combination in order to use artifacts.ci.freebsd.org kernels (that do not have D25219) in some other testing. So, putting back my non-debug kernel that has D25219 in it but leaving the 3 GiByte limit in place in uefi . . . Turns out that also avoids the problem. This suggests that may be D25219 by itself is not keeping everything in the memory range(s) that the uefi 3 GiByte limitation enforces internally: With the limitation enforced, the problem disappears. === Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar)home | help
Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?19F98671-4B69-44A6-8254-B186F0ED995F>
