Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 21 Jun 2020 13:04:17 -0700
From:      Mark Millard <marklmi@yahoo.com>
To:        "freebsd-arm@freebsd.org" <freebsd-arm@freebsd.org>
Subject:   Re: Potential USB [USB3 and USB2] problems when using UEFi v1.16 to boot RPi4: notes as I explore
Message-ID:  <B1FF8DD3-DFD1-4973-B0D2-6AC33BCAA59C@yahoo.com>
In-Reply-To: <476DD0F0-2286-4B2C-8E44-4404AF17F5A8@yahoo.com>
References:  <476DD0F0-2286-4B2C-8E44-4404AF17F5A8@yahoo.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On 2020-Jun-21, at 09:02, Mark Millard <marklmi at yahoo.com> wrote:

> This reports on intermediate results of some
> 4 GiBYte RPi4 use via UWFI based booting.
>=20
> Extracted from my reply to a different message:
>=20
>> The following may be a function of the conditions/configuration
>> I'm experimenting with. For example over_voltage=3D6 and
>> arm_freq=3D2000 and it is the 1st time using two USB3 devices (SSD
>> and Ethernet): no powered hub involved (yet). I've not investigated
>> variations yet. I am using a 5.1V 3.5A power supply. While
>> I'm not generally where I can see/use it, an HDMI connection is
>> present but nothing is logged in there.
>>=20
>> It appears that I get occasional USB SSD data corruption
>> during writes: building ports a few later extracts of prior
>> ports builds get ". . . from package: Lzma library error:
>> Corrupted input data". Out of 419 ports built so far I've
>> had 4 such failures (40 other ports skipped). The last port
>> (llvm10) is still building and probably has 4 or more hours
>> to go.
>>=20
>> Possibly going along with that is that, when I try to
>> copy a large tar file during the poudriere bulk, the copy
>> ends up corrupted (diff/cmp find differences). I've not
>> yet tried when the RPi4 was basically idle. Using cmp shows
>> that long sequences of bytes are different. Sometimes the
>> new copy has large blocks of binary zeros but not always.
>> It looks like the blocks might be 4096 in size. (Some bytes
>> at the beginning or ends of 4096 might happen to match
>> so the size of the mismatch is can be somewhat less than
>> 4096.) The alignment of the mismatched blocks also
>> stays inside 4096 alignment boundaries, not crossing.
>> (I've not seen back-to-back failed blocks yet.) The messed
>> up blocks are rare.
>>=20
>> The poudriere bulk is using 4 builders, each allowed
>> 4 processes. So much of the time there was/is a significant
>> load average involved (4+) and there was such when I was
>> testing copies.
>>=20
>> So far I've not seen variability in the read results of the
>> files that were created. It appears to be a write-time
>> variability.
>>=20
>> Of note:
>>=20
>> The USB SSD is the same media also used to boot and
>> operate a Rock64. I've not observed any problems in
>> that alternate usage context. But I should do more
>> explicit checking now.
>>=20
>> My testing NetBSD with the built-in Ethernet in use and
>> only a USB3 SSD has not suggested problems for the
>> over_voltage and arm_freq so far. But I need better
>> checking than I did. NetBSD was using the same type of
>> USB3 SSD on the same RPi4.
>=20
>=20
> Of the 4 port builds that failed for ". . . from package:
> Lzma library error Corrupted input data", only 2 files are
> involved. 3 of the 4 failures are attempted extractions
> of the same package (llvm80-8.0.1_3) and the same file
> fails for each of the 3.
>=20
> But, more interesting is that, prior to the failures, llvm80
> was extracted 3 other times successfully after it was built.
> This may be nothing more than in-memory copies of content
> still being available at the time. (No USB-read required of
> what what ended up being written?)
>=20
> mesa-libs-19.0.8 , mesa-dri-19.0.8 , and xorg-server-1.20.8_1,1
> had no failures. The later xf86-video-scfb-0.0.5_2 ,
> xf86-input-libinput-0.30.0 , and xf86-video-vesa-2.4.0_3 had
> failures while preparing to build.

The llvm10 build finished.

As for the bad large-file copy under a
head -r360311 based context. . .

Having the RPi4 otherwise idle made no
difference.

Having only the USB3 SSD as a USB device (on
USB3) made no difference. Nor did also not
having HDMI connected.

Changing the arm_freq in use made no difference.
Using the default arm_freq (no assignment)
and having no over_voltage assignment made no
difference.

Using an external powered hub instead of a
direct plug-in for the USB3 SSD made no
difference.

All the above at the same time made no
difference.

Plugging in the USB SSD to a USB2 port instead
of a USB3 port and booting that way made no
difference.

Booting the Rock64 with the media and doing
the experiment had no problems.

It looks like the v1.16 UEFI based context
has a general problem that shows up in at
least USB "disk" I/O.

The file copied during the tests is:

# ls -ldT /usr/obj/clang-cortexA53-installworld-poud.tar
-rw-r--r--  1 root  wheel  4011026432 Apr 25 21:04:42 2020 =
/usr/obj/clang-cortexA53-installworld-poud.tar

Note: diffing this file with the original on another
machine consistently shows no differences. The above
copy was established via copying to the Rock64. It is
only attempting to write new copies via the RPi4 that
end up with the new copies not fully matching this
file.

Copies over the network (scp and nfs) made to the RPi4
from where the original file is also end up partially
corrupted on the RPi4. In this context, the RPi4 is
using an external USB3 Ethernet device as the source
of the data.

Copies made from the RPi4 to the other machine end
up with no differences (i.e., a good copy results).

It looks like the problem is for writes to the USB
media, not reads of the media.

For reference on the RPi4:

USB3 boot context:

ugen0.3: <OWC Envoy Pro mini> at usbus0
umass0 on uhub1
umass0: <OWC Envoy Pro mini, class 0/0, rev 3.00/1.00, addr 2> on usbus0
umass0:  SCSI over Bulk-Only; quirks =3D 0x0100
umass0:0:0: Attached to scbus0
. . . (Root mount waiting for: CAM notices) . . .
da0 at umass-sim0 bus 0 scbus0 target 0 lun 0
da0: <OWC Envoy Pro mini 0> Fixed Direct Access SPC-4 SCSI device
da0: Serial Number #
da0: 400.000MB/s transfers
da0: 228936MB (468862128 512 byte sectors)
da0: quirks=3D0x2<NO_6_BYTE>

USB2 boot context:

ugen0.3: <OWC Envoy Pro mini> at usbus0
umass0 on uhub2
umass0: <OWC Envoy Pro mini, class 0/0, rev 2.10/1.00, addr 2> on usbus0
umass0:  SCSI over Bulk-Only; quirks =3D 0x0100
umass0:0:0: Attached to scbus0
. . . (Root mount waiting for: CAM notices) . . .
da0 at umass-sim0 bus 0 scbus0 target 0 lun 0
da0: <OWC Envoy Pro mini 0> Fixed Direct Access SPC-4 SCSI device
da0: Serial Number #
da0: 40.000MB/s transfers
da0: 228936MB (468862128 512 byte sectors)
da0: quirks=3D0x2<NO_6_BYTE>



=3D=3D=3D
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?B1FF8DD3-DFD1-4973-B0D2-6AC33BCAA59C>