From owner-freebsd-arm@freebsd.org Sat Sep 19 09:31:12 2020 Return-Path: Delivered-To: freebsd-arm@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 35D6C3D8292 for ; Sat, 19 Sep 2020 09:31:12 +0000 (UTC) (envelope-from marklmi@yahoo.com) Received: from sonic317-22.consmr.mail.gq1.yahoo.com (sonic317-22.consmr.mail.gq1.yahoo.com [98.137.66.148]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4BtlnV4CnFz4nRd for ; Sat, 19 Sep 2020 09:31:10 +0000 (UTC) (envelope-from marklmi@yahoo.com) X-YMail-OSG: XLEFc4gVM1mkv_I80AqM95NAO2rJaOliu4tzgHrobYsj7Qg0x6K99wJKaMW.t.x xRhKWTrAOURJCGGxUIcbe4DZ1hXd9Yy3RDkmeHJ_NptuxZtQkUZe2V4sQ2Jin.niq5fSxvL8ice_ Jt82rLbK9sYcrNWbrLHJEPuHMPYaIZ4KDFDUODH_QY6.yYe3DDHj8iLfoOXAat7gUZUwxqFrc6Tx nT.XShvgfd73x52iGjU.G1uNI6cL2pamjoD_5oaH99WflWqTVSXyed.3zpeBuIARUI33NLwm.1L0 oWwjluCyDtf3xfEKQRGK3OMmX0hNYcW.xiZRoZiwhy0.O3FM_xhMKEFNUeWJCvfsN3XUV.mKHyTw Q4oFMiCEV7U4Ve0iFfkzSKhY4Gi17BYdPcPYPcYm0HEhd0FTPSpbBpTmxLosXYnV5VwYTad9Yz0e mH1SK9Xmq.5A2bH9WvuJelQsUB_AGZhoDUR0oO1F2LP8IcrrP3fBaeYeXY4TyugityqbqN7kqyY2 iBdWVrR92O0D2iuXXpy.XfKIruuSQHaxcNO7KR_FtJyKQ1oKzO1Xh0halo4yfUhFDEVgxls5yn7U .TOk8FR77jxDQbhNiR5n7E3mcrbgCpcdsfbUl05q_ogRN2oP6vZEBcCreWkaEJPL5By4j1MR14pz 6tzQYhXjEpAmeznxIbTjBp2_MlXyZrg4RA1W.By0s3KF5V1J.6LptZ1xik15OEcw6aeV2RkrE6Xf rWPHftvlrSDinNKccPT7EMBSKjmaJPVqRfKTy2sMZtrIIamVq5PD6objJtwNm19Z0Ax0QmdGWAyY ZKXAtmMQoekr_awSbVoAt5d1D3whvU74TDjg1FUbbx0QU3KiUUUTkawLQQ6FuBOhNrfDk0L2XIkz WQQiRBbd9yeu9N3OZsrYTW_h7ZpLEucuWRpRMv0rlod4zY7fgKheNSAylp3nEnI0XRc1eRRM2uQm EznzWlMV6fo._qzSChJgtgDdrhFkKGjLVwTNAWO4ntq5v9VgugRgrjdSIl6YqiTgQa2YNJc7fYQ6 ANe2BaCDyZ0hZQZv1Q0F94_.dDLv.kwiVql3GONJ3PvX.PduBQfGKlvJXGmHfhIk71u0N2eTHk20 uhKkntY11ZppWzP_j2FVlFHbUFJmKkiNrELhU_KVqM.R2xBtNw4VZZ45rXnaJCoiagjEKjoomPoj B1Qdyg_7yn85R3a.mNamfAYQhsbOd7LmoAezHAGIArFNLq16JqOC7lW8Re7gPR_uzlUk_sTAaOyM NZKNapSFyQ2cH1EAM4RW_QvucR6.hnq7RcE4FkxPgpJ1nqmFzKUl1pVqNGD_djCNyXVLWZ8n8FMZ kOp5fxtQeAzHcPtpid.JIyEk1TYJiZGGecjkNcn8Qorv_HCqo0CS3H_UyEIP4Yo5_1UQVza.ZvIy WJ2MKCn6GuaVkQqjk48lMu1Bi4pZA01ovb4kxTUrCFZ0AQhz12anNZysHQnwq40O5rq2KE309unY bQpP7UCM7UIj0sHodCKMKKv2s_QgiqcotYsXdirOjB5Cg_WbbfNbkQlsw5_pTr2dNUfvtDuU- Received: from sonic.gate.mail.ne1.yahoo.com by sonic317.consmr.mail.gq1.yahoo.com with HTTP; Sat, 19 Sep 2020 09:31:08 +0000 Received: by smtp421.mail.ne1.yahoo.com (VZM Hermes SMTP Server) with ESMTPA ID 20edb5877937c6833140279aa39ac9db; Sat, 19 Sep 2020 09:31:04 +0000 (UTC) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 13.4 \(3608.120.23.2.1\)) Subject: Re: Comment #135 for bugzilla 237666 : a USB3-handling problem with a investigatory fix for a cortex-a72 context From: Mark Millard In-Reply-To: <866bd652-5a8b-7500-b1e0-7528c035048b@selasky.org> Date: Sat, 19 Sep 2020 02:31:03 -0700 Cc: freebsd-arm Content-Transfer-Encoding: quoted-printable Message-Id: References: <6E618C3D-12DF-429E-A249-5BAB90FC6B15.ref@yahoo.com> <6E618C3D-12DF-429E-A249-5BAB90FC6B15@yahoo.com> <866bd652-5a8b-7500-b1e0-7528c035048b@selasky.org> To: Hans Petter Selasky X-Mailer: Apple Mail (2.3608.120.23.2.1) X-Rspamd-Queue-Id: 4BtlnV4CnFz4nRd X-Spamd-Bar: --- X-Spamd-Result: default: False [-3.79 / 15.00]; FREEMAIL_FROM(0.00)[yahoo.com]; MV_CASE(0.50)[]; R_SPF_ALLOW(-0.20)[+ptr:yahoo.com]; TO_DN_ALL(0.00)[]; DKIM_TRACE(0.00)[yahoo.com:+]; RCPT_COUNT_TWO(0.00)[2]; DMARC_POLICY_ALLOW(-0.50)[yahoo.com,reject]; NEURAL_HAM_SHORT(-1.27)[-1.269]; FROM_EQ_ENVFROM(0.00)[]; RCVD_TLS_LAST(0.00)[]; MIME_TRACE(0.00)[0:+]; FREEMAIL_ENVFROM(0.00)[yahoo.com]; ASN(0.00)[asn:36647, ipnet:98.137.64.0/20, country:US]; MID_RHS_MATCH_FROM(0.00)[]; DWL_DNSWL_NONE(0.00)[yahoo.com:dkim]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.005]; R_DKIM_ALLOW(-0.20)[yahoo.com:s=s2048]; FROM_HAS_DN(0.00)[]; NEURAL_HAM_LONG(-1.02)[-1.021]; MIME_GOOD(-0.10)[text/plain]; TO_MATCH_ENVRCPT_SOME(0.00)[]; RCVD_IN_DNSWL_NONE(0.00)[98.137.66.148:from]; RWL_MAILSPIKE_POSSIBLE(0.00)[98.137.66.148:from]; RCVD_COUNT_TWO(0.00)[2]; MAILMAN_DEST(0.00)[freebsd-arm] X-BeenThere: freebsd-arm@freebsd.org X-Mailman-Version: 2.1.33 Precedence: list List-Id: "Porting FreeBSD to ARM processors." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 19 Sep 2020 09:31:12 -0000 On 2020-Sep-19, at 01:08, Hans Petter Selasky = wrote: > On 2020-09-19 05:41, Mark Millard via freebsd-arm wrote: >> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D237666 is about = USB3 >> problems that folks have been having for over a year. Bjoern A. Zeeb >> comment #125 that looked like something I'd seen on a cortex-a72 when >> I tried a kernel that had been built with -mcpu=3Dcortex-a72 : an >> indefinite looping that involved "Resetting controller" for xhci0. >> (This prevented booting from the USB3 device.) >> Well, I got the A72 to boot and comment #135 indicates how but I'll >> paste the information below. All that was involved was adding >> examples of "dsb st" and one "dsb ld". >> Comment 135's content: >> A cortex-a72 success! (In the form of investigatory code, not >> code to check-in as is.) >> Just by adding some "dsb st" commands and a "dsb ld" command in >> places in the xhci code (just for __arach64__), I've gotten the >> cortext-A72 to boot from the USB3 SSD via a -mcpu=3Dcortex-a72 >> based kernel build. No more indefinitely repeating "Resetting >> controller" problem. >> I do not claim the additions are the minimal ones. I know one place >> is required for sure: the last one added enabled the boot (given >> the others were already present). Prior to that I still had the >> indefinitely repeating "Resetting controller" problem. >> There could be more places that should have dsb st or dsb ld for >> overall correctness. >> This evidence may suggest somewhat analogous problems for other >> platforms than aarch64 that are seeing problems. >> For the aarch64 context, the evidence is (the changes are) >> as follows. The first "dsb st" textually is the last one I >> added. >> # svnlite diff /usr/src/sys/dev/usb/controller/xhci.c >> Index: /usr/src/sys/dev/usb/controller/xhci.c >> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >> --- /usr/src/sys/dev/usb/controller/xhci.c (revision 363590) >> +++ /usr/src/sys/dev/usb/controller/xhci.c (working copy) >> @@ -431,6 +431,9 @@ >> phwr->hwr_ring_seg[0].qwEvrsTablePtr =3D htole64(addr); >> phwr->hwr_ring_seg[0].dwEvrsTableSize =3D = htole32(XHCI_MAX_EVENTS); >> +#ifdef __aarch64__ >> +__asm __volatile("dsb st" : : : "memory"); >> +#endif >> DPRINTF("ERDP(0)=3D0x%016llx\n", (unsigned long long)addr); >> @@ -1098,6 +1101,9 @@ >> while (1) { >> +#ifdef __aarch64__ >> +__asm __volatile("dsb ld" : : : "memory"); >> +#endif >> temp =3D le32toh(phwr->hwr_events[i].dwTrb3); >> k =3D (temp & XHCI_TRB_3_CYCLE_BIT) ? 1 : 0; >> @@ -1107,6 +1113,9 @@ >> event =3D XHCI_TRB_3_TYPE_GET(temp); >> +#ifdef __aarch64__ >> +__asm __volatile("dsb ld" : : : "memory"); >> +#endif >> DPRINTFN(10, "event[%u] =3D %u (0x%016llx 0x%08lx = 0x%08lx)\n", >> i, event, (long = long)le64toh(phwr->hwr_events[i].qwTrb0), >> (long)le32toh(phwr->hwr_events[i].dwTrb2), >> @@ -1239,6 +1248,9 @@ >> sc->sc_command_idx =3D i; >> sc->sc_command_ccs =3D j; >> +#ifdef __aarch64__ >> +__asm __volatile("dsb st" : : : "memory"); >> +#endif >> XWRITE4(sc, door, XHCI_DOORBELL(0), 0); >> err =3D cv_timedwait(&sc->sc_cmd_cv, &sc->sc_bus.bus_mtx, >> @@ -2855,6 +2867,9 @@ >> index =3D xfer->xroot->udev->controller_slot_id; >> if (xfer->xroot->udev->flags.self_suspended =3D=3D 0) { >> +#ifdef __aarch64__ >> +__asm __volatile("dsb st" : : : "memory"); >> +#endif >> XWRITE4(sc, door, XHCI_DOORBELL(index), >> epno | XHCI_DB_SID_SET(xfer->stream_id)); >> } >> @@ -4180,6 +4195,9 @@ >> for (n =3D 1; n !=3D XHCI_MAX_ENDPOINTS; n++) { >> for (p =3D 0; p !=3D XHCI_MAX_STREAMS; p++) { >> +#ifdef __aarch64__ >> +__asm __volatile("dsb st" : : : "memory"); >> +#endif >> XWRITE4(sc, door, XHCI_DOORBELL(index), >> n | XHCI_DB_SID_SET(p)); >> } >> I'm clearly just "evidence hacking" here. I've no clue what a >> good design for allowing aarch64 build to supply having any of >> those "dsb st"s or the "dsb ld". >> Hopefully someone that knows what they are doing can figure out >> if any of the other examples on other platforms are analogous >> --and ,if they are, provide some hook for platform to contribute >> such code to the xhci code. >=20 > Hi Mark, >=20 > Your finding indicate a problem in usb_pc_cpu_flush() and >=20 > bus_dmamap_sync(pc->tag, pc->map, BUS_DMASYNC_PREWRITE); >=20 > Try to put the dsb only after dmamap_sync. >=20 > void > usb_pc_cpu_flush(struct usb_page_cache *pc) > { > if (pc->page_offset_end =3D=3D pc->page_offset_buf) { > /* nothing has been loaded into this page cache! */ > return; > } > bus_dmamap_sync(pc->tag, pc->map, BUS_DMASYNC_PREWRITE); > } >=20 > The PCI I/O memory should be coherent and should not need any DSB's. [I'll note that historically I've gotten one "Resetting Controller" notice from the -mcpu=3Dcortex-a53 kernel during the sequence leading to the root file system mount. I get none from the -mcpu=3Dcortex-a72 kernel after the changes that I reported.] After adding a couple of more DSB ST instructions and rebuilding (cross build), installing, and booting, I started a self-hosted, from-scratch, buildworld buildlkernel and intend to installkernel installworld and reboot if it appears to go well. (Better than just a boot test for if things seem at least sufficient, even if overkill.) But it will be hours before buildworld build kernel is done (-j3). (I'll sleep during much of that time.) I'll get to experiments you suggest after that. I assume that you want the two (not one) dsb ld instructions removed as well. I'm happy to do your experiments, despite the following note about what I expect the result will be for the first one. I do not expect your experiment to work in one place because the addition that started things working does not involve a usb_pc_cpu_flush for the event ring initialization that the specific change was for: phwr =3D buf_res.buffer; addr =3D buf_res.physaddr; addr +=3D (uintptr_t)&((struct xhci_hw_root *)0)->hwr_events[0]; =20 /* reset hardware root structure */ memset(phwr, 0, sizeof(*phwr)); =20 phwr->hwr_ring_seg[0].qwEvrsTablePtr =3D htole64(addr); phwr->hwr_ring_seg[0].dwEvrsTableSize =3D = htole32(XHCI_MAX_EVENTS); =20 DPRINTF("ERDP(0)=3D0x%016llx\n", (unsigned long long)addr); =20 #ifdef __aarch64__ __asm __volatile("dsb st" : : : "memory"); #endif XWRITE4(sc, runt, XHCI_ERDP_LO(0), (uint32_t)addr); XWRITE4(sc, runt, XHCI_ERDP_HI(0), (uint32_t)(addr >> 32)); =20 addr =3D buf_res.physaddr; =20 DPRINTF("ERSTBA(0)=3D0x%016llx\n", (unsigned long long)addr); XWRITE4(sc, runt, XHCI_ERSTBA_LO(0), (uint32_t)addr); XWRITE4(sc, runt, XHCI_ERSTBA_HI(0), (uint32_t)(addr >> 32)); My understanding is that those last two lines start the event ring and without the DSB ST above the: phwr->hwr_ring_seg[0].qwEvrsTablePtr =3D htole64(addr); phwr->hwr_ring_seg[0].dwEvrsTableSize =3D = htole32(XHCI_MAX_EVENTS); result was not observed by xhci but needs to be observed at that ring start time. Without the DSB ST above, xhci_interrupt_poll's code (with one of my additions shown): #ifdef __aarch64__ __asm __volatile("dsb ld" : : : "memory"); #endif temp =3D le32toh(phwr->hwr_events[i].dwTrb3); =20 k =3D (temp & XHCI_TRB_3_CYCLE_BIT) ? 1 : 0; =20 if (j !=3D k) break; always took the break because temp was always zero, k was always zero, and j was 1. I expect this is because the wrong qwEvrsTablePtr and dwEvrsTableSize content was in use by the xhci (fixed by the DSB ST). (I had a temporarily version that printed the values. With and without the DSB LD made no difference without the DSB ST that was later added.) =46rom what I understand, the cortex-a72 has far more structure that involves the "DSB completes when" behavior being needed compared to the cortex-a53 and code tuned to the A72 is more likely to end up needing such. (Speculation and speculative-driven cache contents, out of order execution, and probably more.) QUOTE =E2=80=A2 all explicit memory accesses that are observed by Pe = before the DSB is executed, are of the required access types, and are = from observers in the same required shareability domain as Pe, are = complete for the set of observers in the required shareability domain =E2=80=A2 all cache, branch predictor, and TLB maintenance = operations issued by Pe before the DSB are complete for the required = shareability domain. END QUOTE (Pe means "executing processor".) I expect the removal of that specific DSB ST to result in xhci_interrupt_poll's code again taking the break every time, just based on event ring behavior alone. I'll note that the RPi4B SBC has a not-physically-exposed PCIe that the USB3 is off of, and the UEFI/ACPI that I have in use for the RPi4B does not expose the PICe to the loader or OS at all. ACPI does expose the USB3 related material. =3D=3D=3D Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar)