From owner-freebsd-arm@freebsd.org Sat Sep 19 12:22:54 2020 Return-Path: Delivered-To: freebsd-arm@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 1440D3DE4F6 for ; Sat, 19 Sep 2020 12:22:54 +0000 (UTC) (envelope-from marklmi@yahoo.com) Received: from sonic310-20.consmr.mail.gq1.yahoo.com (sonic310-20.consmr.mail.gq1.yahoo.com [98.137.69.146]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4Btqbc6nQSz3Wl4 for ; Sat, 19 Sep 2020 12:22:52 +0000 (UTC) (envelope-from marklmi@yahoo.com) X-YMail-OSG: PctSJrQVM1kdXnM0S3Ymdht_kpRSrej3rBQ.tZbBXrqD.OzMkEaZnlYtXBrOA5V GFney2A84ZprexPu6wA9Qnv8.XUWc7.o9zTC0L_WL3.2pmDsl.3BAgtvXhM56cFwA9z9R3v5p9Gf aPFyA7.eNZ3hktfibx0h_wV0n.NPdsFmpkWnErbvAQSnYGTMbELvS.YRZrIYeGk1WoHHJqNLdrts fenDN8PckwOki53q6MoFk_9ON3yPtzW1LGE4Kfod3ZrOu3QEtbF3zZ8ZKjX4Z690vQLGt8zks8g0 I1BnifC7SY20BUf.WVF5AjLrq2Onas1_lIAcLeWK3I53CpDlv6LV2sBF3OyIMZHtozQt_hDMQo5h 5E2GjFrODyc1lwtHqJwo1AfYxYkhu_CODixmcWOdXAyVFl.cTz920z5lBs_9OkflpRv.T7vQUjxP 5UhenPjc6nYGfryMXauuyJU4FRz7eO6FKC8Vb1zpk8mNh1W1XUlU2FD4sp6dfMTuWFrBuj9D6vNw y_1EA4gVTmKGIGB213J.PkN.dpIqP2LfIi9nIorNle2BNFDQu4uQAQ9VkjmkrvZDk3yCxIvCLY0W 4FC0Sb9Iqs5cN9vgUupfrSvyUmpHrdE1w7fMfW3c3d4CcdGBnF994eRniJhiLie6YZLtfdleTjjJ Cd3GFFOF_YBIFw5zdFwmGfW_DqKJm_elOSb1n7CWitQr_rO10OkisGipArhHgz65z4a.1jfIcjsG rptF8YZGzE4A1HJLS4VQDc5VCokgNlIzfmddjshGQXwpxMH2BdrEnGtDaTUc8r2kTkPV1KWzSizQ .Iyfa5NZ28tiPutiyUkPjYJrQCxV2bLv83odc4Uk_OIe3HuEj48hxHDx2nGdfyjEVeKXzLsCjhAs jBdkn68oLDrPKUmgzXC_mlfriAK0xm3pfy.6LL.R1WmlIIar8uVTBIcIxhyPqCZxgx3qd6_sCeG0 Cr40..F0.GHJF3uLmVg31BbfAELA6Wiv.gQeX4._5xa6z1lw0ZL6U1puIIKkFeSS5ajYOPyg2g3. _pPx6u_.IqgYG1o3xowjGGfsvRVMxy9itN9_hQI5wlH2wj8Mwg90.FJfXxOIwh0CjEsaMxsuEP.a iW8ccOW9fYLBLgXUx8rni8HlXiGtsPabCdaAdUn570b7Oa0qRUVQrSenbUPLxdHK7qWTyi_Fv9tW O6ebU5NakRdmFuPeou76QBDbN0WHBkBV7KDDcOvlADlBMr0nZ6lAz3FXDLrqHlXXW1tfcVh_C.BN DV7TtVtaUFmjweQqFu6VamuPxXYkbjk9cPQmqMffHCuU7DPJmp6hx45D.8MAZMAVtlWI34Y7c61t SXvbu.c.b3r5jiK6A7wg7zLpmQAOm9M0wiuXCsQP2.PQSbapXc4Kmv4bO8wO7TItKdpuEOPPb1LN spG6RopCmlYfHo01RTCH5kWfyVf.O8MyTUQPC.7EdZ0BWGFV590.Mih.Nj2oRn6Sm.3BZHkVEkzw RROx49qW735pfNujMGczFGwk1YtmFWou53Db0XfU0XGQ2rTgX6kr66uNqeoN6kUNUKywtuDI- Received: from sonic.gate.mail.ne1.yahoo.com by sonic310.consmr.mail.gq1.yahoo.com with HTTP; Sat, 19 Sep 2020 12:22:50 +0000 Received: by smtp402.mail.bf1.yahoo.com (VZM Hermes SMTP Server) with ESMTPA ID 28a426e723e61ad79d2968e5de8cb4d7; Sat, 19 Sep 2020 12:22:45 +0000 (UTC) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 13.4 \(3608.120.23.2.1\)) Subject: Re: Comment #135 for bugzilla 237666 : a USB3-handling problem with a investigatory fix for a cortex-a72 context From: Mark Millard In-Reply-To: <578faf26-6042-91ec-c639-2bca17105fae@selasky.org> Date: Sat, 19 Sep 2020 05:22:43 -0700 Cc: freebsd-arm Content-Transfer-Encoding: quoted-printable Message-Id: <723E6915-94F5-417C-B4AF-EEEBFBDF6162@yahoo.com> References: <6E618C3D-12DF-429E-A249-5BAB90FC6B15.ref@yahoo.com> <6E618C3D-12DF-429E-A249-5BAB90FC6B15@yahoo.com> <866bd652-5a8b-7500-b1e0-7528c035048b@selasky.org> <578faf26-6042-91ec-c639-2bca17105fae@selasky.org> To: Hans Petter Selasky X-Mailer: Apple Mail (2.3608.120.23.2.1) X-Rspamd-Queue-Id: 4Btqbc6nQSz3Wl4 X-Spamd-Bar: --- X-Spamd-Result: default: False [-3.21 / 15.00]; FREEMAIL_FROM(0.00)[yahoo.com]; MV_CASE(0.50)[]; R_SPF_ALLOW(-0.20)[+ptr:yahoo.com]; TO_DN_ALL(0.00)[]; DKIM_TRACE(0.00)[yahoo.com:+]; RCPT_COUNT_TWO(0.00)[2]; DMARC_POLICY_ALLOW(-0.50)[yahoo.com,reject]; NEURAL_HAM_SHORT(-0.69)[-0.688]; FROM_EQ_ENVFROM(0.00)[]; RCVD_TLS_LAST(0.00)[]; MIME_TRACE(0.00)[0:+]; FREEMAIL_ENVFROM(0.00)[yahoo.com]; ASN(0.00)[asn:36647, ipnet:98.137.64.0/20, country:US]; MID_RHS_MATCH_FROM(0.00)[]; DWL_DNSWL_NONE(0.00)[yahoo.com:dkim]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-1.01)[-1.005]; R_DKIM_ALLOW(-0.20)[yahoo.com:s=s2048]; FROM_HAS_DN(0.00)[]; NEURAL_HAM_LONG(-1.02)[-1.020]; MIME_GOOD(-0.10)[text/plain]; TO_MATCH_ENVRCPT_SOME(0.00)[]; RCVD_IN_DNSWL_NONE(0.00)[98.137.69.146:from]; RWL_MAILSPIKE_POSSIBLE(0.00)[98.137.69.146:from]; RCVD_COUNT_TWO(0.00)[2]; MAILMAN_DEST(0.00)[freebsd-arm] X-BeenThere: freebsd-arm@freebsd.org X-Mailman-Version: 2.1.33 Precedence: list List-Id: "Porting FreeBSD to ARM processors." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 19 Sep 2020 12:22:54 -0000 On 2020-Sep-19, at 04:46, Hans Petter Selasky = wrote: > On 2020-09-19 11:31, Mark Millard wrote: >> On 2020-Sep-19, at 01:08, Hans Petter Selasky = wrote: >>> On 2020-09-19 05:41, Mark Millard via freebsd-arm wrote: >>>> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D237666 is about = USB3 >>>> problems that folks have been having for over a year. Bjoern A. = Zeeb >>>> comment #125 that looked like something I'd seen on a cortex-a72 = when >>>> I tried a kernel that had been built with -mcpu=3Dcortex-a72 : an >>>> indefinite looping that involved "Resetting controller" for xhci0. >>>> (This prevented booting from the USB3 device.) >>>> Well, I got the A72 to boot and comment #135 indicates how but I'll >>>> paste the information below. All that was involved was adding >>>> examples of "dsb st" and one "dsb ld". >>>> Comment 135's content: >>>> A cortex-a72 success! (In the form of investigatory code, not >>>> code to check-in as is.) >>>> Just by adding some "dsb st" commands and a "dsb ld" command in >>>> places in the xhci code (just for __arach64__), I've gotten the >>>> cortext-A72 to boot from the USB3 SSD via a -mcpu=3Dcortex-a72 >>>> based kernel build. No more indefinitely repeating "Resetting >>>> controller" problem. >>>> I do not claim the additions are the minimal ones. I know one place >>>> is required for sure: the last one added enabled the boot (given >>>> the others were already present). Prior to that I still had the >>>> indefinitely repeating "Resetting controller" problem. >>>> There could be more places that should have dsb st or dsb ld for >>>> overall correctness. >>>> This evidence may suggest somewhat analogous problems for other >>>> platforms than aarch64 that are seeing problems. >>>> For the aarch64 context, the evidence is (the changes are) >>>> as follows. The first "dsb st" textually is the last one I >>>> added. >>>> # svnlite diff /usr/src/sys/dev/usb/controller/xhci.c >>>> Index: /usr/src/sys/dev/usb/controller/xhci.c >>>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >>>> --- /usr/src/sys/dev/usb/controller/xhci.c (revision 363590) >>>> +++ /usr/src/sys/dev/usb/controller/xhci.c (working copy) >>>> @@ -431,6 +431,9 @@ >>>> phwr->hwr_ring_seg[0].qwEvrsTablePtr =3D htole64(addr); >>>> phwr->hwr_ring_seg[0].dwEvrsTableSize =3D = htole32(XHCI_MAX_EVENTS); >>>> +#ifdef __aarch64__ >>>> +__asm __volatile("dsb st" : : : "memory"); >>>> +#endif >>>> DPRINTF("ERDP(0)=3D0x%016llx\n", (unsigned long = long)addr); >>>> @@ -1098,6 +1101,9 @@ >>>> while (1) { >>>> +#ifdef __aarch64__ >>>> +__asm __volatile("dsb ld" : : : "memory"); >>>> +#endif >>>> temp =3D le32toh(phwr->hwr_events[i].dwTrb3); >>>> k =3D (temp & XHCI_TRB_3_CYCLE_BIT) ? 1 : 0; >>>> @@ -1107,6 +1113,9 @@ >>>> event =3D XHCI_TRB_3_TYPE_GET(temp); >>>> +#ifdef __aarch64__ >>>> +__asm __volatile("dsb ld" : : : "memory"); >>>> +#endif >>>> DPRINTFN(10, "event[%u] =3D %u (0x%016llx 0x%08lx = 0x%08lx)\n", >>>> i, event, (long = long)le64toh(phwr->hwr_events[i].qwTrb0), >>>> (long)le32toh(phwr->hwr_events[i].dwTrb2), >>>> @@ -1239,6 +1248,9 @@ >>>> sc->sc_command_idx =3D i; >>>> sc->sc_command_ccs =3D j; >>>> +#ifdef __aarch64__ >>>> +__asm __volatile("dsb st" : : : "memory"); >>>> +#endif >>>> XWRITE4(sc, door, XHCI_DOORBELL(0), 0); >>>> err =3D cv_timedwait(&sc->sc_cmd_cv, = &sc->sc_bus.bus_mtx, >>>> @@ -2855,6 +2867,9 @@ >>>> index =3D xfer->xroot->udev->controller_slot_id; >>>> if (xfer->xroot->udev->flags.self_suspended =3D=3D 0) { >>>> +#ifdef __aarch64__ >>>> +__asm __volatile("dsb st" : : : "memory"); >>>> +#endif >>>> XWRITE4(sc, door, XHCI_DOORBELL(index), >>>> epno | XHCI_DB_SID_SET(xfer->stream_id)); >>>> } >>>> @@ -4180,6 +4195,9 @@ >>>> for (n =3D 1; n !=3D XHCI_MAX_ENDPOINTS; n++) { >>>> for (p =3D 0; p !=3D XHCI_MAX_STREAMS; p++) { >>>> +#ifdef __aarch64__ >>>> +__asm __volatile("dsb st" : : : "memory"); >>>> +#endif >>>> XWRITE4(sc, door, XHCI_DOORBELL(index), >>>> n | XHCI_DB_SID_SET(p)); >>>> } >>>> I'm clearly just "evidence hacking" here. I've no clue what a >>>> good design for allowing aarch64 build to supply having any of >>>> those "dsb st"s or the "dsb ld". >>>> Hopefully someone that knows what they are doing can figure out >>>> if any of the other examples on other platforms are analogous >>>> --and ,if they are, provide some hook for platform to contribute >>>> such code to the xhci code. >>>=20 >>> Hi Mark, >>>=20 >>> Your finding indicate a problem in usb_pc_cpu_flush() and >>>=20 >>> bus_dmamap_sync(pc->tag, pc->map, BUS_DMASYNC_PREWRITE); >>>=20 >>> Try to put the dsb only after dmamap_sync. >>>=20 >>> void >>> usb_pc_cpu_flush(struct usb_page_cache *pc) >>> { >>> if (pc->page_offset_end =3D=3D pc->page_offset_buf) { >>> /* nothing has been loaded into this page cache! */ >>> return; >>> } >>> bus_dmamap_sync(pc->tag, pc->map, BUS_DMASYNC_PREWRITE); >>> } >>>=20 >>> The PCI I/O memory should be coherent and should not need any DSB's. >> [I'll note that historically I've gotten one "Resetting Controller" >> notice from the -mcpu=3Dcortex-a53 kernel during the sequence leading >> to the root file system mount. I get none from the -mcpu=3Dcortex-a72 >> kernel after the changes that I reported.] >> After adding a couple of more DSB ST instructions and rebuilding >> (cross build), installing, and booting, I started a self-hosted, >> from-scratch, buildworld buildlkernel and intend to installkernel >> installworld and reboot if it appears to go well. (Better than >> just a boot test for if things seem at least sufficient, even if >> overkill.) But it will be hours before buildworld build kernel is >> done (-j3). (I'll sleep during much of that time.) I woke up so I figured I'd look at this before trying to sleep again. >> I'll get to experiments you suggest after that. I assume that >> you want the two (not one) dsb ld instructions removed as well. >> I'm happy to do your experiments, despite the following note >> about what I expect the result will be for the first one. >> I do not expect your experiment to work in one place because >> the addition that started things working does not involve a >> usb_pc_cpu_flush for the event ring initialization that the >> specific change was for: >> phwr =3D buf_res.buffer; >> addr =3D buf_res.physaddr; >> addr +=3D (uintptr_t)&((struct xhci_hw_root = *)0)->hwr_events[0]; >> /* reset hardware root structure */ >> memset(phwr, 0, sizeof(*phwr)); >> phwr->hwr_ring_seg[0].qwEvrsTablePtr =3D = htole64(addr); >> phwr->hwr_ring_seg[0].dwEvrsTableSize =3D = htole32(XHCI_MAX_EVENTS); >> DPRINTF("ERDP(0)=3D0x%016llx\n", (unsigned long = long)addr); >> #ifdef __aarch64__ >> __asm __volatile("dsb st" : : : "memory"); >> #endif >=20 > Hi Mark, >=20 > The values in ERDP_LO/HI are not used until the RUN bit is set. I just picked between the qwEvrsTablePtr and dwEvrsTableSize writes and the following several XWRITE4's, figuring it would be sufficient placement if it helped at all. (It did help.) >> XWRITE4(sc, runt, XHCI_ERDP_LO(0), (uint32_t)addr); >> XWRITE4(sc, runt, XHCI_ERDP_HI(0), (uint32_t)(addr >> 32)); >> addr =3D buf_res.physaddr; >> DPRINTF("ERSTBA(0)=3D0x%016llx\n", (unsigned long = long)addr); >> XWRITE4(sc, runt, XHCI_ERSTBA_LO(0), (uint32_t)addr); >> XWRITE4(sc, runt, XHCI_ERSTBA_HI(0), (uint32_t)(addr >> 32)); >> My understanding is that those last two lines start the event ring >> and without the DSB ST above the: >=20 > No they don't. The text associated with "The following steps describe the xHC Event = Ring Enqueue Pointer (EREP) Advancement algorithm (left side of Figure = 4-12)" in extensible-host-controler-interface-usb-xhci.pdf says: QUOTE (partial) 1. When the ERST Base Address (ERSTBA) register is initially = written the Event Ring State Machine enters the Start state. 2. The xHC initializes its internal PCS flag to =E2=80=981=E2=80=99= . 3. The xHC sets its internal ERST Count to =E2=80=980=E2=80=99. 4. The xHC then fetches the entry in the Event Ring Segment = Table referenced by the ERST Count (ERSTE =3D ERST[ERST Count]) and = initializes its Enqueue Pointer (EREP) with the value of the Ring = Segment Base Address field (ERSTE.BaseAddr), and the TRB Count with the = value of the Segment Size field (ERSTE.Size). 5. If the USBCMD Run/Stop (R/S) flag =3D =E2=80=980=E2=80=99 the = Event Ring State Machine shall wait for Run/Stop (R/S) to return to = =E2=80=981=E2=80=99. When Run/Stop (R/S) flag=3D =E2=80=981=E2=80=99 the = xHC shall proceeds to check if an event is posted (step 6., otherwise it = proceeds immediately to step 6. END QUOTE (Steps 6-8 are omitted above.) May be I read too much into (4) and the = first reference to R/S being in step (5). >> phwr->hwr_ring_seg[0].qwEvrsTablePtr =3D htole64(addr); >> phwr->hwr_ring_seg[0].dwEvrsTableSize =3D = htole32(XHCI_MAX_EVENTS); >> result was not observed by xhci but needs to be observed at that ring >> start time. >> Without the DSB ST above, xhci_interrupt_poll's code (with one >> of my additions shown): >> #ifdef __aarch64__ >> __asm __volatile("dsb ld" : : : "memory"); >> #endif >=20 > Probably same bug in the USB invalidate function. >=20 > Try to patch those generic flush/invalidate functions instead. I've = been very careful about those. Sure. >> temp =3D le32toh(phwr->hwr_events[i].dwTrb3); >> k =3D (temp & XHCI_TRB_3_CYCLE_BIT) ? 1 : 0; >> if (j !=3D k) >> break; >> always took the break because temp was always zero, k was always = zero, >> and j was 1. I expect this is because the wrong qwEvrsTablePtr and >> dwEvrsTableSize content was in use by the xhci (fixed by the DSB ST). >> (I had a temporarily version that printed the values. With and = without >> the DSB LD made no difference without the DSB ST that was later = added.) >> =46rom what I understand, the cortex-a72 has far more structure that >> involves the "DSB completes when" behavior being needed compared >> to the cortex-a53 and code tuned to the A72 is more likely to end >> up needing such. (Speculation and speculative-driven cache contents, >> out of order execution, and probably more.) >> QUOTE >> =E2=80=A2 all explicit memory accesses that are observed by Pe = before the DSB is executed, are of the required access types, and are = from observers in the same required shareability domain as Pe, are = complete for the set of observers in the required shareability domain >> =E2=80=A2 all cache, branch predictor, and TLB maintenance = operations issued by Pe before the DSB are complete for the required = shareability domain. >> END QUOTE >=20 > It smells like a generic problem in the busdma synchronize code. >=20 >> (Pe means "executing processor".) >> I expect the removal of that specific DSB ST to result in >> xhci_interrupt_poll's code again taking the break every time, >> just based on event ring behavior alone. >=20 > Again, try to patch both > usb_pc_cpu_flush() > and > usb_pc_cpu_invalidate() Sure. >> I'll note that the RPi4B SBC has a not-physically-exposed PCIe >> that the USB3 is off of, and the UEFI/ACPI that I have in use for >> the RPi4B does not expose the PICe to the loader or OS at all. >> ACPI does expose the USB3 related material. FYI: The buildworld buildkernel is continuing. =3D=3D=3D Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar)