Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 25 Jun 2020 12:29:28 -0700
From:      Mark Millard <marklmi@yahoo.com>
To:        "freebsd-arm@freebsd.org" <freebsd-arm@freebsd.org>
Subject:   Re: USB [USB3 and USB2] problems when using UEFi v1.16 to boot RPi4: Still produces inaccurate file copies
Message-ID:  <F426CFE6-F619-4B3C-9260-07E72BC709AF@yahoo.com>
In-Reply-To: <CF81584E-75CE-4BFC-8ACC-AB95E561B28D@yahoo.com>
References:  <476DD0F0-2286-4B2C-8E44-4404AF17F5A8@yahoo.com> <B1FF8DD3-DFD1-4973-B0D2-6AC33BCAA59C@yahoo.com> <CF81584E-75CE-4BFC-8ACC-AB95E561B28D@yahoo.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On 2020-Jun-22, at 01:08, Mark Millard <marklmi at yahoo.com> wrote:


> On 2020-Jun-21, at 13:04, Mark Millard <marklmi at yahoo.com> wrote:
>=20
>> On 2020-Jun-21, at 09:02, Mark Millard <marklmi at yahoo.com> wrote:
>>=20
>>> This reports on intermediate results of some
>>> 4 GiBYte RPi4 use via UWFI based booting.
>>>=20
>>> Extracted from my reply to a different message:
>>>=20
>>>> The following may be a function of the conditions/configuration
>>>> I'm experimenting with. For example over_voltage=3D6 and
>>>> arm_freq=3D2000 and it is the 1st time using two USB3 devices (SSD
>>>> and Ethernet): no powered hub involved (yet). I've not investigated
>>>> variations yet. I am using a 5.1V 3.5A power supply. While
>>>> I'm not generally where I can see/use it, an HDMI connection is
>>>> present but nothing is logged in there.
>>>>=20
>>>> It appears that I get occasional USB SSD data corruption
>>>> during writes: building ports a few later extracts of prior
>>>> ports builds get ". . . from package: Lzma library error:
>>>> Corrupted input data". Out of 419 ports built so far I've
>>>> had 4 such failures (40 other ports skipped). The last port
>>>> (llvm10) is still building and probably has 4 or more hours
>>>> to go.
>>>>=20
>>>> Possibly going along with that is that, when I try to
>>>> copy a large tar file during the poudriere bulk, the copy
>>>> ends up corrupted (diff/cmp find differences). I've not
>>>> yet tried when the RPi4 was basically idle. Using cmp shows
>>>> that long sequences of bytes are different. Sometimes the
>>>> new copy has large blocks of binary zeros but not always.
>>>> It looks like the blocks might be 4096 in size. (Some bytes
>>>> at the beginning or ends of 4096 might happen to match
>>>> so the size of the mismatch is can be somewhat less than
>>>> 4096.) The alignment of the mismatched blocks also
>>>> stays inside 4096 alignment boundaries, not crossing.
>>>> (I've not seen back-to-back failed blocks yet.) The messed
>>>> up blocks are rare.
>>>>=20
>>>> The poudriere bulk is using 4 builders, each allowed
>>>> 4 processes. So much of the time there was/is a significant
>>>> load average involved (4+) and there was such when I was
>>>> testing copies.
>>>>=20
>>>> So far I've not seen variability in the read results of the
>>>> files that were created. It appears to be a write-time
>>>> variability.
>>>>=20
>>>> Of note:
>>>>=20
>>>> The USB SSD is the same media also used to boot and
>>>> operate a Rock64. I've not observed any problems in
>>>> that alternate usage context. But I should do more
>>>> explicit checking now.
>>>>=20
>>>> My testing NetBSD with the built-in Ethernet in use and
>>>> only a USB3 SSD has not suggested problems for the
>>>> over_voltage and arm_freq so far. But I need better
>>>> checking than I did. NetBSD was using the same type of
>>>> USB3 SSD on the same RPi4.
>>>=20
>>>=20
>>> Of the 4 port builds that failed for ". . . from package:
>>> Lzma library error Corrupted input data", only 2 files are
>>> involved. 3 of the 4 failures are attempted extractions
>>> of the same package (llvm80-8.0.1_3) and the same file
>>> fails for each of the 3.
>>>=20
>>> But, more interesting is that, prior to the failures, llvm80
>>> was extracted 3 other times successfully after it was built.
>>> This may be nothing more than in-memory copies of content
>>> still being available at the time. (No USB-read required of
>>> what what ended up being written?)
>>>=20
>>> mesa-libs-19.0.8 , mesa-dri-19.0.8 , and xorg-server-1.20.8_1,1
>>> had no failures. The later xf86-video-scfb-0.0.5_2 ,
>>> xf86-input-libinput-0.30.0 , and xf86-video-vesa-2.4.0_3 had
>>> failures while preparing to build.
>>=20
>> The llvm10 build finished.
>>=20
>> As for the bad large-file copy under a
>> head -r360311 based context. . .
>>=20
>> Having the RPi4 otherwise idle made no
>> difference.
>>=20
>> Having only the USB3 SSD as a USB device (on
>> USB3) made no difference. Nor did also not
>> having HDMI connected.
>>=20
>> Changing the arm_freq in use made no difference.
>> Using the default arm_freq (no assignment)
>> and having no over_voltage assignment made no
>> difference.
>>=20
>> Using an external powered hub instead of a
>> direct plug-in for the USB3 SSD made no
>> difference.
>>=20
>> All the above at the same time made no
>> difference.
>>=20
>> Plugging in the USB SSD to a USB2 port instead
>> of a USB3 port and booting that way made no
>> difference.
>>=20
>> Booting the Rock64 with the media and doing
>> the experiment had no problems.
>>=20
>> It looks like the v1.16 UEFI based context
>> has a general problem that shows up in at
>> least USB "disk" I/O.
>>=20
>> The file copied during the tests is:
>>=20
>> # ls -ldT /usr/obj/clang-cortexA53-installworld-poud.tar
>> -rw-r--r--  1 root  wheel  4011026432 Apr 25 21:04:42 2020 =
/usr/obj/clang-cortexA53-installworld-poud.tar
>>=20
>> Note: diffing this file with the original on another
>> machine consistently shows no differences. The above
>> copy was established via copying to the Rock64. It is
>> only attempting to write new copies via the RPi4 that
>> end up with the new copies not fully matching this
>> file.
>>=20
>> Copies over the network (scp and nfs) made to the RPi4
>> from where the original file is also end up partially
>> corrupted on the RPi4. In this context, the RPi4 is
>> using an external USB3 Ethernet device as the source
>> of the data.
>>=20
>> Copies made from the RPi4 to the other machine end
>> up with no differences (i.e., a good copy results).
>>=20
>> It looks like the problem is for writes to the USB
>> media, not reads of the media.
>>=20
>> For reference on the RPi4:
>>=20
>> USB3 boot context:
>>=20
>> ugen0.3: <OWC Envoy Pro mini> at usbus0
>> umass0 on uhub1
>> umass0: <OWC Envoy Pro mini, class 0/0, rev 3.00/1.00, addr 2> on =
usbus0
>> umass0:  SCSI over Bulk-Only; quirks =3D 0x0100
>> umass0:0:0: Attached to scbus0
>> . . . (Root mount waiting for: CAM notices) . . .
>> da0 at umass-sim0 bus 0 scbus0 target 0 lun 0
>> da0: <OWC Envoy Pro mini 0> Fixed Direct Access SPC-4 SCSI device
>> da0: Serial Number #
>> da0: 400.000MB/s transfers
>> da0: 228936MB (468862128 512 byte sectors)
>> da0: quirks=3D0x2<NO_6_BYTE>
>>=20
>> USB2 boot context:
>>=20
>> ugen0.3: <OWC Envoy Pro mini> at usbus0
>> umass0 on uhub2
>> umass0: <OWC Envoy Pro mini, class 0/0, rev 2.10/1.00, addr 2> on =
usbus0
>> umass0:  SCSI over Bulk-Only; quirks =3D 0x0100
>> umass0:0:0: Attached to scbus0
>> . . . (Root mount waiting for: CAM notices) . . .
>> da0 at umass-sim0 bus 0 scbus0 target 0 lun 0
>> da0: <OWC Envoy Pro mini 0> Fixed Direct Access SPC-4 SCSI device
>> da0: Serial Number #
>> da0: 40.000MB/s transfers
>> da0: 228936MB (468862128 512 byte sectors)
>> da0: quirks=3D0x2<NO_6_BYTE>
>>=20
>=20
> I've checked NetBSD operation with large file
> copies for:
>=20
> # uname -ap
> NetBSD NBSDRPi4 9.99.64 NetBSD 9.99.64 (GENERIC64) #1: Sun May 31 =
01:41:16 UTC 2020  =
root@NBSDRPi4:/usr/obj/sys/arch/evbarm/compile/GENERIC64 evbarm aarch64
>=20
> There were no file differences found between
> file copies.
>=20
> The problem seems to be specific to the FreeBSD/RPi4
> combination in some way.


I built and installed a witness+diagnostics kernel and
tried the large file copy test again under a UEFI/ACPI
based boot, this time on a 8 GiByte RPi4.

The test still failed to produce an accurate file copy
but the kernel did not report anything either. I'm
Unsure how get evidence of the context for the bad 4K
chunks.



I'll note that the head -r360311 based environment has
the patches from the below applied:

https://reviews.freebsd.org/D25201
(Use EFI memory map to determine attributes for AcpiOsMapMemory mappings =
on arm64)

https://reviews.freebsd.org/D25219
(ACPI: add support for (inherited) _DMA limits)

https://reviews.freebsd.org/D25203
(Add dwc_otg_acpi)

https://reviews.freebsd.org/D25251
(Add support for bcm54213PE in brgphy)


The following patches were not applied:

QUOTE
A fix for the XHCI firmware loading is here: =
https://reviews.freebsd.org/D25261
(It requires working PCI-E, which is in progress, here: =
https://reviews.freebsd.org/D25068)
END QUOTE

https://reviews.freebsd.org/D25121
(Clean up the pci host generic driver)

(It looks like that last changed some acpi code
and I should probably apply it and try again.
Also: it is checked-in as -r362285 but I'm not
ready for a general update.)

=3D=3D=3D
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?F426CFE6-F619-4B3C-9260-07E72BC709AF>