Date: Sat, 19 Sep 2020 05:22:43 -0700 From: Mark Millard <marklmi@yahoo.com> To: Hans Petter Selasky <hps@selasky.org> Cc: freebsd-arm <freebsd-arm@freebsd.org> Subject: Re: Comment #135 for bugzilla 237666 : a USB3-handling problem with a investigatory fix for a cortex-a72 context Message-ID: <723E6915-94F5-417C-B4AF-EEEBFBDF6162@yahoo.com> In-Reply-To: <578faf26-6042-91ec-c639-2bca17105fae@selasky.org> References: <6E618C3D-12DF-429E-A249-5BAB90FC6B15.ref@yahoo.com> <6E618C3D-12DF-429E-A249-5BAB90FC6B15@yahoo.com> <866bd652-5a8b-7500-b1e0-7528c035048b@selasky.org> <F9E8A6A0-619B-4C6F-8485-032B8358B780@yahoo.com> <578faf26-6042-91ec-c639-2bca17105fae@selasky.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On 2020-Sep-19, at 04:46, Hans Petter Selasky <hps at selasky.org> = wrote: > On 2020-09-19 11:31, Mark Millard wrote: >> On 2020-Sep-19, at 01:08, Hans Petter Selasky <hps at selasky.org> = wrote: >>> On 2020-09-19 05:41, Mark Millard via freebsd-arm wrote: >>>> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D237666 is about = USB3 >>>> problems that folks have been having for over a year. Bjoern A. = Zeeb >>>> comment #125 that looked like something I'd seen on a cortex-a72 = when >>>> I tried a kernel that had been built with -mcpu=3Dcortex-a72 : an >>>> indefinite looping that involved "Resetting controller" for xhci0. >>>> (This prevented booting from the USB3 device.) >>>> Well, I got the A72 to boot and comment #135 indicates how but I'll >>>> paste the information below. All that was involved was adding >>>> examples of "dsb st" and one "dsb ld". >>>> Comment 135's content: >>>> A cortex-a72 success! (In the form of investigatory code, not >>>> code to check-in as is.) >>>> Just by adding some "dsb st" commands and a "dsb ld" command in >>>> places in the xhci code (just for __arach64__), I've gotten the >>>> cortext-A72 to boot from the USB3 SSD via a -mcpu=3Dcortex-a72 >>>> based kernel build. No more indefinitely repeating "Resetting >>>> controller" problem. >>>> I do not claim the additions are the minimal ones. I know one place >>>> is required for sure: the last one added enabled the boot (given >>>> the others were already present). Prior to that I still had the >>>> indefinitely repeating "Resetting controller" problem. >>>> There could be more places that should have dsb st or dsb ld for >>>> overall correctness. >>>> This evidence may suggest somewhat analogous problems for other >>>> platforms than aarch64 that are seeing problems. >>>> For the aarch64 context, the evidence is (the changes are) >>>> as follows. The first "dsb st" textually is the last one I >>>> added. >>>> # svnlite diff /usr/src/sys/dev/usb/controller/xhci.c >>>> Index: /usr/src/sys/dev/usb/controller/xhci.c >>>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >>>> --- /usr/src/sys/dev/usb/controller/xhci.c (revision 363590) >>>> +++ /usr/src/sys/dev/usb/controller/xhci.c (working copy) >>>> @@ -431,6 +431,9 @@ >>>> phwr->hwr_ring_seg[0].qwEvrsTablePtr =3D htole64(addr); >>>> phwr->hwr_ring_seg[0].dwEvrsTableSize =3D = htole32(XHCI_MAX_EVENTS); >>>> +#ifdef __aarch64__ >>>> +__asm __volatile("dsb st" : : : "memory"); >>>> +#endif >>>> DPRINTF("ERDP(0)=3D0x%016llx\n", (unsigned long = long)addr); >>>> @@ -1098,6 +1101,9 @@ >>>> while (1) { >>>> +#ifdef __aarch64__ >>>> +__asm __volatile("dsb ld" : : : "memory"); >>>> +#endif >>>> temp =3D le32toh(phwr->hwr_events[i].dwTrb3); >>>> k =3D (temp & XHCI_TRB_3_CYCLE_BIT) ? 1 : 0; >>>> @@ -1107,6 +1113,9 @@ >>>> event =3D XHCI_TRB_3_TYPE_GET(temp); >>>> +#ifdef __aarch64__ >>>> +__asm __volatile("dsb ld" : : : "memory"); >>>> +#endif >>>> DPRINTFN(10, "event[%u] =3D %u (0x%016llx 0x%08lx = 0x%08lx)\n", >>>> i, event, (long = long)le64toh(phwr->hwr_events[i].qwTrb0), >>>> (long)le32toh(phwr->hwr_events[i].dwTrb2), >>>> @@ -1239,6 +1248,9 @@ >>>> sc->sc_command_idx =3D i; >>>> sc->sc_command_ccs =3D j; >>>> +#ifdef __aarch64__ >>>> +__asm __volatile("dsb st" : : : "memory"); >>>> +#endif >>>> XWRITE4(sc, door, XHCI_DOORBELL(0), 0); >>>> err =3D cv_timedwait(&sc->sc_cmd_cv, = &sc->sc_bus.bus_mtx, >>>> @@ -2855,6 +2867,9 @@ >>>> index =3D xfer->xroot->udev->controller_slot_id; >>>> if (xfer->xroot->udev->flags.self_suspended =3D=3D 0) { >>>> +#ifdef __aarch64__ >>>> +__asm __volatile("dsb st" : : : "memory"); >>>> +#endif >>>> XWRITE4(sc, door, XHCI_DOORBELL(index), >>>> epno | XHCI_DB_SID_SET(xfer->stream_id)); >>>> } >>>> @@ -4180,6 +4195,9 @@ >>>> for (n =3D 1; n !=3D XHCI_MAX_ENDPOINTS; n++) { >>>> for (p =3D 0; p !=3D XHCI_MAX_STREAMS; p++) { >>>> +#ifdef __aarch64__ >>>> +__asm __volatile("dsb st" : : : "memory"); >>>> +#endif >>>> XWRITE4(sc, door, XHCI_DOORBELL(index), >>>> n | XHCI_DB_SID_SET(p)); >>>> } >>>> I'm clearly just "evidence hacking" here. I've no clue what a >>>> good design for allowing aarch64 build to supply having any of >>>> those "dsb st"s or the "dsb ld". >>>> Hopefully someone that knows what they are doing can figure out >>>> if any of the other examples on other platforms are analogous >>>> --and ,if they are, provide some hook for platform to contribute >>>> such code to the xhci code. >>>=20 >>> Hi Mark, >>>=20 >>> Your finding indicate a problem in usb_pc_cpu_flush() and >>>=20 >>> bus_dmamap_sync(pc->tag, pc->map, BUS_DMASYNC_PREWRITE); >>>=20 >>> Try to put the dsb only after dmamap_sync. >>>=20 >>> void >>> usb_pc_cpu_flush(struct usb_page_cache *pc) >>> { >>> if (pc->page_offset_end =3D=3D pc->page_offset_buf) { >>> /* nothing has been loaded into this page cache! */ >>> return; >>> } >>> bus_dmamap_sync(pc->tag, pc->map, BUS_DMASYNC_PREWRITE); >>> } >>>=20 >>> The PCI I/O memory should be coherent and should not need any DSB's. >> [I'll note that historically I've gotten one "Resetting Controller" >> notice from the -mcpu=3Dcortex-a53 kernel during the sequence leading >> to the root file system mount. I get none from the -mcpu=3Dcortex-a72 >> kernel after the changes that I reported.] >> After adding a couple of more DSB ST instructions and rebuilding >> (cross build), installing, and booting, I started a self-hosted, >> from-scratch, buildworld buildlkernel and intend to installkernel >> installworld and reboot if it appears to go well. (Better than >> just a boot test for if things seem at least sufficient, even if >> overkill.) But it will be hours before buildworld build kernel is >> done (-j3). (I'll sleep during much of that time.) I woke up so I figured I'd look at this before trying to sleep again. >> I'll get to experiments you suggest after that. I assume that >> you want the two (not one) dsb ld instructions removed as well. >> I'm happy to do your experiments, despite the following note >> about what I expect the result will be for the first one. >> I do not expect your experiment to work in one place because >> the addition that started things working does not involve a >> usb_pc_cpu_flush for the event ring initialization that the >> specific change was for: >> phwr =3D buf_res.buffer; >> addr =3D buf_res.physaddr; >> addr +=3D (uintptr_t)&((struct xhci_hw_root = *)0)->hwr_events[0]; >> /* reset hardware root structure */ >> memset(phwr, 0, sizeof(*phwr)); >> phwr->hwr_ring_seg[0].qwEvrsTablePtr =3D = htole64(addr); >> phwr->hwr_ring_seg[0].dwEvrsTableSize =3D = htole32(XHCI_MAX_EVENTS); >> DPRINTF("ERDP(0)=3D0x%016llx\n", (unsigned long = long)addr); >> #ifdef __aarch64__ >> __asm __volatile("dsb st" : : : "memory"); >> #endif >=20 > Hi Mark, >=20 > The values in ERDP_LO/HI are not used until the RUN bit is set. I just picked between the qwEvrsTablePtr and dwEvrsTableSize writes and the following several XWRITE4's, figuring it would be sufficient placement if it helped at all. (It did help.) >> XWRITE4(sc, runt, XHCI_ERDP_LO(0), (uint32_t)addr); >> XWRITE4(sc, runt, XHCI_ERDP_HI(0), (uint32_t)(addr >> 32)); >> addr =3D buf_res.physaddr; >> DPRINTF("ERSTBA(0)=3D0x%016llx\n", (unsigned long = long)addr); >> XWRITE4(sc, runt, XHCI_ERSTBA_LO(0), (uint32_t)addr); >> XWRITE4(sc, runt, XHCI_ERSTBA_HI(0), (uint32_t)(addr >> 32)); >> My understanding is that those last two lines start the event ring >> and without the DSB ST above the: >=20 > No they don't. The text associated with "The following steps describe the xHC Event = Ring Enqueue Pointer (EREP) Advancement algorithm (left side of Figure = 4-12)" in extensible-host-controler-interface-usb-xhci.pdf says: QUOTE (partial) 1. When the ERST Base Address (ERSTBA) register is initially = written the Event Ring State Machine enters the Start state. 2. The xHC initializes its internal PCS flag to =E2=80=981=E2=80=99= . 3. The xHC sets its internal ERST Count to =E2=80=980=E2=80=99. 4. The xHC then fetches the entry in the Event Ring Segment = Table referenced by the ERST Count (ERSTE =3D ERST[ERST Count]) and = initializes its Enqueue Pointer (EREP) with the value of the Ring = Segment Base Address field (ERSTE.BaseAddr), and the TRB Count with the = value of the Segment Size field (ERSTE.Size). 5. If the USBCMD Run/Stop (R/S) flag =3D =E2=80=980=E2=80=99 the = Event Ring State Machine shall wait for Run/Stop (R/S) to return to = =E2=80=981=E2=80=99. When Run/Stop (R/S) flag=3D =E2=80=981=E2=80=99 the = xHC shall proceeds to check if an event is posted (step 6., otherwise it = proceeds immediately to step 6. END QUOTE (Steps 6-8 are omitted above.) May be I read too much into (4) and the = first reference to R/S being in step (5). >> phwr->hwr_ring_seg[0].qwEvrsTablePtr =3D htole64(addr); >> phwr->hwr_ring_seg[0].dwEvrsTableSize =3D = htole32(XHCI_MAX_EVENTS); >> result was not observed by xhci but needs to be observed at that ring >> start time. >> Without the DSB ST above, xhci_interrupt_poll's code (with one >> of my additions shown): >> #ifdef __aarch64__ >> __asm __volatile("dsb ld" : : : "memory"); >> #endif >=20 > Probably same bug in the USB invalidate function. >=20 > Try to patch those generic flush/invalidate functions instead. I've = been very careful about those. Sure. >> temp =3D le32toh(phwr->hwr_events[i].dwTrb3); >> k =3D (temp & XHCI_TRB_3_CYCLE_BIT) ? 1 : 0; >> if (j !=3D k) >> break; >> always took the break because temp was always zero, k was always = zero, >> and j was 1. I expect this is because the wrong qwEvrsTablePtr and >> dwEvrsTableSize content was in use by the xhci (fixed by the DSB ST). >> (I had a temporarily version that printed the values. With and = without >> the DSB LD made no difference without the DSB ST that was later = added.) >> =46rom what I understand, the cortex-a72 has far more structure that >> involves the "DSB completes when" behavior being needed compared >> to the cortex-a53 and code tuned to the A72 is more likely to end >> up needing such. (Speculation and speculative-driven cache contents, >> out of order execution, and probably more.) >> QUOTE >> =E2=80=A2 all explicit memory accesses that are observed by Pe = before the DSB is executed, are of the required access types, and are = from observers in the same required shareability domain as Pe, are = complete for the set of observers in the required shareability domain >> =E2=80=A2 all cache, branch predictor, and TLB maintenance = operations issued by Pe before the DSB are complete for the required = shareability domain. >> END QUOTE >=20 > It smells like a generic problem in the busdma synchronize code. >=20 >> (Pe means "executing processor".) >> I expect the removal of that specific DSB ST to result in >> xhci_interrupt_poll's code again taking the break every time, >> just based on event ring behavior alone. >=20 > Again, try to patch both > usb_pc_cpu_flush() > and > usb_pc_cpu_invalidate() Sure. >> I'll note that the RPi4B SBC has a not-physically-exposed PCIe >> that the USB3 is off of, and the UEFI/ACPI that I have in use for >> the RPi4B does not expose the PICe to the loader or OS at all. >> ACPI does expose the USB3 related material. FYI: The buildworld buildkernel is continuing. =3D=3D=3D Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar)
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?723E6915-94F5-417C-B4AF-EEEBFBDF6162>