Date: Wed, 19 Jan 2022 01:08:07 +0000 From: bugzilla-noreply@freebsd.org To: freebsd-arm@FreeBSD.org Subject: [Bug 261324] xhci1 resets on ROCKPro64 when reading from ZFS pool Message-ID: <bug-261324-7@https.bugs.freebsd.org/bugzilla/>
next in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D261324 Bug ID: 261324 Summary: xhci1 resets on ROCKPro64 when reading from ZFS pool Product: Base System Version: 13.0-STABLE Hardware: arm64 OS: Any Status: New Severity: Affects Only Me Priority: --- Component: arm Assignee: freebsd-arm@FreeBSD.org Reporter: freebsd@toyingwithfate.com I have a ROCKPro64 single board computer with 4GB of RAM that I am attempti= ng to use as a small NAS. The storage consists of an encrypted (OpenZFS native, not GELI) RAID-Z pool using three 8TB SATA spinning disks and one 512GB SATA SSD (used as cache) in a simple 4-disk JBOD USB enclosure (the "OWC Mercury Elite Pro Quad"). My understanding is that it contains a USB 3 hub with four SATA-to-USB bridges in it. The issue I run into is that after reading ~40GB of data from the ZFS filesystem, disk I/O stops and I get the following messages in dmesg: xhci1: Resetting controller uhub4: at usbus5, port 1, addr 1 (disconnected) ugen5.7: <OWC Mercury Elite Pro Quad> at usbus5 (disconnected) uhub7: at uhub4, port 1, addr 6 (disconnected) uhub7: detached ugen5.2: <OWC Mercury Elite Pro Quad> at usbus5 (disconnected) uhub6: at uhub4, port 2, addr 1 (disconnected) ugen5.3: <OWC Mercury Elite Pro Quad A> at usbus5 (disconnected) (da0:umass-sim0:0:0:0): READ(16). CDB: 88 00 00 00 00 01 01 89 5c f8 00 00 = 00 80 00 00=20 umass0: at uhub6, port 1, addr 2 (disconnected) (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error (da0:umass-sim0:0:0:0): Retrying command, 3 more tries remain (da0:umass-sim0:0:0:0): READ(16). CDB: 88 00 00 00 00 01 01 89 5c f8 00 00 = 00 80 00 00=20 (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error (da0:umass-sim0:0:0:0): Retrying command, 2 more tries remain (da0:umass-sim0:0:0:0): READ(16). CDB: 88 00 00 00 00 01 01 89 5c f8 00 00 = 00 80 00 00=20 (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error (da0:umass-sim0:0:0:0): Retrying command, 1 more tries remain da0 at umass-sim0 bus 0 scbus0 target 0 lun 0 da0: <Mercury Elite Pro Quad A 0> s/n 123456789004 detached (da1:umass-sim1:1:0:0): READ(16). CDB: 88 00 00 00 00 01 01 89 5a 00 00 00 = 00 80 00 00=20 (da1:umass-sim1:1:0:0): CAM status: CCB request completed with an error (da1:umass-sim1:1:0:0): Retrying command, 3 more tries remain (da2:umass-sim2:2:0:0): READ(16). CDB: 88 00 00 00 00 01 01 89 5d 78 00 00 = 00 80 00 00=20 (da2:umass-sim2:2:0:0): CAM status: CCB request completed with an error (da2:umass-sim2:2:0:0): Retrying command, 3 more tries remain (da3:umass-sim3:3:0:0): WRITE(10). CDB: 2a 00 2d 5d 61 31 00 01 00 00=20 (da3:umass-sim3:3:0:0): CAM status: CCB request completed with an error (da3:umass-sim3:3:0:0): Retrying command, 3 more tries remain (da1:umass-sim1:1:0:0): READ(16). CDB: 88 00 00 00 00 01 01 89 5a 00 00 00 = 00 80 00 00=20 (da1:umass-sim1:1:0:0): CAM status: CCB request completed with an error (da1:umass-sim1:1:0:0): Retrying command, 2 more tries remain (da2:umass-sim2:2:0:0): READ(16). CDB: 88 00 00 00 00 01 01 89 5d 78 00 00 = 00 80 00 00=20 (da2:umass-sim2:2:0:0): CAM status: CCB request completed with an error (da2:umass-sim2:2:0:0): Retrying command, 2 more tries remain (da3:umass-sim3:3:0:0): WRITE(10). CDB: 2a 00 2d 5d 61 31 00 01 00 00=20 (da3:umass-sim3:3:0:0): CAM status: CCB request completed with an error (da3:umass-sim3:3:0:0): Retrying command, 2 more tries remain (da1:umass-sim1:1:0:0): READ(16). CDB: 88 00 00 00 00 01 01 89 5a 00 00 00 = 00 80 00 00=20 (da1:umass-sim1:1:0:0): CAM status: CCB request completed with an error (da1:umass-sim1:1:0:0): Retrying command, 1 more tries remain (da2:umass-sim2:2:0:0): READ(16). CDB: 88 00 00 00 00 01 01 89 5d 78 00 00 = 00 80 00 00=20 (da2:umass-sim2:2:0:0): CAM status: CCB request completed with an error (da2:umass-sim2:2:0:0): Retrying command, 1 more tries remain (da3:umass-sim3:3:0:0): WRITE(10). CDB: 2a 00 2d 5d 61 31 00 01 00 00=20 (da3:umass-sim3:3:0:0): CAM status: CCB request completed with an error (da3:umass-sim3:3:0:0): Retrying command, 1 more tries remain (da1:umass-sim1:1:0:0): READ(16). CDB: 88 00 00 00 00 01 01 89 5a 00 00 00 = 00 80 00 00=20 (da1:umass-sim1:1:0:0): CAM status: CCB request completed with an error (da1:umass-sim1:1:0:0): Retrying command, 0 more tries remain (da2:umass-sim2:2:0:0): READ(16). CDB: 88 00 00 00 00 01 01 89 5d 78 00 00 = 00 80 00 00=20 (da2:umass-sim2:2:0:0): CAM status: CCB request completed with an error (da2:umass-sim2:2:0:0): Retrying command, 0 more tries remain (da3:umass-sim3:3:0:0): WRITE(10). CDB: 2a 00 2d 5d 61 31 00 01 00 00=20 (da3:umass-sim3:3:0:0): CAM status: CCB request completed with an error (da3:umass-sim3:3:0:0): Retrying command, 0 more tries remain (da1:umass-sim1:1:0:0): READ(16). CDB: 88 00 00 00 00 01 01 89 5a 00 00 00 = 00 80 00 00=20 (da1:umass-sim1:1:0:0): CAM status: CCB request completed with an error (da1:umass-sim1:1:0:0): Error 5, Retries exhausted (da2:umass-sim2:2:0:0): READ(16). CDB: 88 00 00 00 00 01 01 89 5d 78 00 00 = 00 80 00 00=20 (da2:umass-sim2:2:0:0): CAM status: CCB request completed with an error (da2:umass-sim2:2:0:0): Error 5, Retries exhausted (da3:umass-sim3:3:0:0): WRITE(10). CDB: 2a 00 2d 5d 61 31 00 01 00 00=20 (da3:umass-sim3:3:0:0): CAM status: CCB request completed with an error (da3:umass-sim3:3:0:0): Error 5, Retries exhausted (da1:umass-sim1:1:0:0): READ(16). CDB: 88 00 00 00 00 01 01 89 5b 80 00 00 = 08 00 00 00=20 (da1:umass-sim1:1:0:0): CAM status: CCB request completed with an error (da1:umass-sim1:1:0:0): Retrying command, 3 more tries remain (da2:umass-sim2:2:0:0): READ(16). CDB: 88 00 00 00 00 01 01 89 5e f8 00 00 = 01 80 00 00=20 (da2:umass-sim2:2:0:0): CAM status: CCB request completed with an error (da2:umass-sim2:2:0:0): Retrying command, 3 more tries remain (da3:umass-sim3:3:0:0): WRITE(10). CDB: 2a 00 2d 5d 63 31 00 01 00 00=20 (da3:umass-sim3:3:0:0): CAM status: CCB request completed with an error (da3:umass-sim3:3:0:0): Retrying command, 3 more tries remain (da1:umass-sim1:1:0:0): READ(16). CDB: 88 00 00 00 00 01 01 89 5b 80 00 00 = 08 00 00 00=20 (da1:umass-sim1:1:0:0): CAM status: CCB request completed with an error (da1:umass-sim1:1:0:0): Retrying command, 2 more tries remain (da2:umass-sim2:2:0:0): READ(16). CDB: 88 00 00 00 00 01 01 89 5e f8 00 00 = 01 80 00 00=20 (da2:umass-sim2:2:0:0): CAM status: CCB request completed with an error (da2:umass-sim2:2:0:0): Retrying command, 2 more tries remain (da3:umass-sim3:3:0:0): WRITE(10). CDB: 2a 00 2d 5d 63 31 00 01 00 00=20 (da3:umass-sim3:3:0:0): CAM status: CCB request completed with an error (da3:umass-sim3:3:0:0): Retrying command, 2 more tries remain (da1:umass-sim1:1:0:0): READ(16). CDB: 88 00 00 00 00 01 01 89 5b 80 00 00 = 08 00 00 00=20 (da1:umass-sim1:1:0:0): CAM status: CCB request completed with an error (da1:umass-sim1:1:0:0): Retrying command, 1 more tries remain (da2:umass-sim2:2:0:0): READ(16). CDB: 88 00 00 00 00 01 01 89 5e f8 00 00 = 01 80 00 00=20 (da2:umass-sim2:2:0:0): CAM status: CCB request completed with an error (da2:umass-sim2:2:0:0): Retrying command, 1 more tries remain (da3:umass-sim3:3:0:0): WRITE(10). CDB: 2a 00 2d 5d 63 31 00 01 00 00=20 (da3:umass-sim3:3:0:0): CAM status: CCB request completed with an error (da3:umass-sim3:3:0:0): Retrying command, 1 more tries remain (da1:umass-sim1:1:0:0): READ(16). CDB: 88 00 00 00 00 01 01 89 5b 80 00 00 = 08 00 00 00=20 (da1:umass-sim1:1:0:0): CAM status: CCB request completed with an error (da1:umass-sim1:1:0:0): Retrying command, 0 more tries remain (da2:umass-sim2:2:0:0): READ(16). CDB: 88 00 00 00 00 01 01 89 5e f8 00 00 = 01 80 00 00=20 (da2:umass-sim2:2:0:0): CAM status: CCB request completed with an error (da2:umass-sim2:2:0:0): Retrying command, 0 more tries remain (da3:umass-sim3:3:0:0): WRITE(10). CDB: 2a 00 2d 5d 63 31 00 01 00 00=20 (da3:umass-sim3:3:0:0): CAM status: CCB request completed with an error (da3:umass-sim3:3:0:0): Retrying command, 0 more tries remain (da1:umass-sim1:1:0:0): READ(16). CDB: 88 00 00 00 00 01 01 89 5b 80 00 00 = 08 00 00 00=20 (da1:umass-sim1:1:0:0): CAM status: CCB request completed with an error (da1:umass-sim1:1:0:0): Error 5, Retries exhausted Jan 1 05:47:13 generic ZFS[5395]: vdev probe failure, zpool=3Dzraid-v5 path=3D/dev/da1 Jan 1 05:47:13 generic ZFS[5399]: vdev probe failure, zpool=3Dzraid-v5 path=3D/dev/da1 (da2:umass-sim2:2:0:0): READ(16). CDB: 88 00 00 00 00 01 01 89 5e f8 00 00 = 01 80 00 00=20 (da2:umass-sim2:2:0:0): CAM status: CCB request completed with an error (da2:umass-sim2:2:0:0): Error 5, Retries exhausted Solaris: WARNING: Pool 'zraid-v5' has encountered an uncorrectable I/O fail= ure and has been suspended. Solaris: WARNING: Pool 'zraid-v5' has encountered an uncorrectable I/O fail= ure and has been suspended. Solaris: WARNING: Pool 'zraid-v5' has encountered an uncorrectable I/O fail= ure and has been suspended. Solaris: WARNING: Pool 'zraid-v5' has encountered an uncorrectable I/O fail= ure and has been suspended. Solaris: WARNING: Pool 'zraid-v5' has encountered an uncorrectable I/O fail= ure and has been suspended. Solaris: WARNING: Pool 'zraid-v5' has encountered an uncorrectable I/O fail= ure and has been suspended. Solaris: WARNING: Pool 'zraid-v5' has encountered an uncorrectable I/O fail= ure and has been suspended. Solaris: WARNING: Pool 'zraid-v5' has encountered an uncorrectable I/O fail= ure and has been suspended. Solaris: WARNING: Pool 'zraid-v5' has encountered an uncorrectable I/O fail= ure and has been suspended. Solaris: WARNING: Pool 'zraid-v5' has encountered an uncorrectable I/O fail= ure and has been suspended. Solaris: WARNING: Pool 'zraid-v5' has encountered an uncorrectable I/O fail= ure and has been suspended. Solaris: WARNING: Pool 'zraid-v5' has encountered an uncorrectable I/O fail= ure and has been suspended. Solaris: WARNING: Pool 'zraid-v5' has encountered an uncorrectable I/O fail= ure and has been suspended. Solaris: WARNING: Pool 'zraid-v5' has encountered an uncorrectable I/O fail= ure and has been suspended. Solaris: WARNING: Pool 'zraid-v5' has encountered an uncorrectable I/O fail= ure and has been suspended. Solaris: WARNING: Pool 'zraid-v5' has encountered an uncorrectable I/O fail= ure and has been suspended. Solaris: WARNING: Pool 'zraid-v5' has encountered an uncorrectable I/O fail= ure and has been suspended. Jan 1 05:47:14 generic ZFS[5423]: vdev probe failure, zpool=3Dzraid-v5 path=3D/dev/da2 Jan 1 05:47:14 generic ZFS[5447]: catastrophic pool I/O failure, zpool=3Dzraid-v5 Jan 1 05:47:15 generic ZFS[5495]: catastrophic pool I/O failure, zpool=3Dzraid-v5 Jan 1 05:47:15 generic ZFS[5499]: catastrophic pool I/O failure, zpool=3Dzraid-v5 Jan 1 05:47:15 generic ZFS[5503]: catastrophic pool I/O failure, zpool=3Dzraid-v5 Jan 1 05:47:15 generic ZFS[5507]: catastrophic pool I/O failure, zpool=3Dzraid-v5 Jan 1 05:47:15 generic ZFS[5511]: catastrophic pool I/O failure, zpool=3Dzraid-v5 Jan 1 05:47:15 generic ZFS[5515]: catastrophic pool I/O failure, zpool=3Dzraid-v5 Jan 1 05:47:15 generic ZFS[5519]: catastrophic pool I/O failure, zpool=3Dzraid-v5 Jan 1 05:47:15 generic ZFS[5523]: catastrophic pool I/O failure, zpool=3Dzraid-v5 Jan 1 05:47:15 generic ZFS[5527]: catastrophic pool I/O failure, zpool=3Dzraid-v5 Jan 1 05:47:15 generic ZFS[5531]: catastrophic pool I/O failure, zpool=3Dzraid-v5 Jan 1 05:47:15 generic ZFS[5535]: catastrophic pool I/O failure, zpool=3Dzraid-v5 Jan 1 05:47:15 generic ZFS[5539]: catastrophic pool I/O failure, zpool=3Dzraid-v5 Jan 1 05:47:15 generic ZFS[5543]: catastrophic pool I/O failure, zpool=3Dzraid-v5 Jan 1 05:47:15 generic ZFS[5547]: catastrophic pool I/O failure, zpool=3Dzraid-v5 Jan 1 05:47:15 generic ZFS[5551]: catastrophic pool I/O failure, zpool=3Dzraid-v5 Jan 1 05:47:15 generic ZFS[5555]: catastrophic pool I/O failure, zpool=3Dzraid-v5 (da3:umass-sim3:3:0:0): WRITE(10). CDB: 2a 00 2d 5d 63 31 00 01 00 00=20 (da3:umass-sim3:3:0:0): CAM status: CCB request completed with an error (da3:umass-sim3:3:0:0): Error 5, Retries exhausted ...and so on, including further complaints from Solaris, ZFS, and CAM. The system is still operable, and I can SSH in, but the ZFS pool is unresponsiv= e. I can tell FreeBSD to shut down, and although it will start to do so (disconnecting me from SSH), it will not complete powering off: SSH connect= ions are refused and it still responds to pings. If I unplug the SBC's power, disconnect the enclosure's USB cable (since the ROCKPro64 won't boot with it connected, another issue I'll report separately), then boot the SBC and reconnect the USB, things will start working again. (The ZFS pool will sometimes complain about errors that I need to clear, but if I scrub the po= ol it confirms that no data was corrupted.) Interestingly, I seem to be able to scrub the pool fine: despite the pool having several TB of data, the issue hasn't appeared then. Additionally, I = can use dd to read from the individual block devices for many hours without iss= ue, and can even do so from all the disks simultaneously (averaging ~100MB/s per disk, so ~400MB/s total) without seeing the issue. But if I mount the ZFS filesystem (I do so read-only during my tests to try to reduce risk to the data) and start making a tarball of said filesystem (writing said tar strea= m to /dev/null), it gets about 40GB in and then the issue appears. The exact amo= unt of data it's able to read (as measured by piping the stream through pv) var= ies a bit: my last three tests have failed at 43.3, 40.6, and 54.2 GiB, respectively. The issue is 100% repeatable: although the amount of data it's able to read before failing varies, it always fails. I've tried both the FreeBSD-13.0-RELEASE-arm64-aarch64-ROCKPRO64.img.xz and FreeBSD-13.0-STABLE-arm64-aarch64-ROCKPRO64-20211230-3684bb89d52-248759.img images, but they both have the issue. I've tried Linux on the same SBC with= the same ZFS pool, and it had no issue. I tried a Raspberry Pi 4 running FreeBSD 13.0 RELEASE with the same ZFS pool, and it had no issue. I should note that both the Linux and Raspberry Pi tests lacked hardware crypto support (Linux because Linux apparently doesn't support ARM crypto in ZFS yet, and Raspber= ry Pi because those have no crypto hardware), so they both relied on software crypto which constrained performance to tens of MB/s, rather than hundreds = as FreeBSD on the ROCKPro64 does in my problem case. To that end I also used t= he ZFS pool with an Intel system running FreeBSD 13, which has working hardware crypto and it was able to work with the disks even faster than the ROCKPro6= 4, with no issues. So my suspicion is the enclosure is not the culprit, since = it worked fine on RPi and Intel systems. I also suspect the SBC itself is not = the culprit since it worked fine with Linux, though it's possible the issue only shows up when reading from the disks at a particular speed and Linux's constrained crypto performance just didn't reach that speed. I'm also able = to work with other disks just fine with FreeBSD on the ROCKPro64: it's only th= is multi-disk enclosure that has trouble. I can create an encrypted ZFS pool o= n a USB 3 SSD connected directly to the SBC (no intermediary hub) and it works fine, no trouble at all. Initially I suspected that my SBC or enclosure might be at fault, but throu= gh this testing I've come to instead suspect there is some sort of issue in the xhci driver for the ROCKPro64, perhaps somehow related to the enclosure hav= ing a USB hub in front of the disks. But I am only speculating! Could someone h= elp me troubleshoot this issue further, to see why xhci1 is resetting unexpecte= dly? Thank you for your help! --=20 You are receiving this mail because: You are the assignee for the bug.=
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-261324-7>