Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 19 Sep 2020 12:18:50 -0700
From:      Mark Millard <marklmi@yahoo.com>
To:        Hans Petter Selasky <hps@selasky.org>
Cc:        freebsd-arm <freebsd-arm@freebsd.org>
Subject:   Re: Comment #135 for bugzilla 237666 : a USB3-handling problem with a investigatory fix for a cortex-a72 context
Message-ID:  <565258A0-BEE1-48F8-9851-E6C7CF7ADAE7@yahoo.com>
In-Reply-To: <723E6915-94F5-417C-B4AF-EEEBFBDF6162@yahoo.com>
References:  <6E618C3D-12DF-429E-A249-5BAB90FC6B15.ref@yahoo.com> <6E618C3D-12DF-429E-A249-5BAB90FC6B15@yahoo.com> <866bd652-5a8b-7500-b1e0-7528c035048b@selasky.org> <F9E8A6A0-619B-4C6F-8485-032B8358B780@yahoo.com> <578faf26-6042-91ec-c639-2bca17105fae@selasky.org> <723E6915-94F5-417C-B4AF-EEEBFBDF6162@yahoo.com>

next in thread | previous in thread | raw e-mail | index | archive | help
In the below I've cut out text without always indicating
where so that code looks complete that is complete. I've
also included some text from another reply of yours that
fills in some specifics of what you were after.

On 2020-Sep-19, at 04:46, Hans Petter Selasky <hps at selasky.org> =
wrote:
>=20
>>>> . . .
>>>>=20
>>>> Your finding indicate a problem in usb_pc_cpu_flush() and
>>>>=20
>>>> bus_dmamap_sync(pc->tag, pc->map, BUS_DMASYNC_PREWRITE);
>>>>=20
>>>> Try to put the dsb only after dmamap_sync.
>>>>=20
>>>> void
>>>> usb_pc_cpu_flush(struct usb_page_cache *pc)
>>>> {
>>>>       if (pc->page_offset_end =3D=3D pc->page_offset_buf) {
>>>>               /* nothing has been loaded into this page cache! */
>>>>               return;
>>>>       }
>>>>       bus_dmamap_sync(pc->tag, pc->map, BUS_DMASYNC_PREWRITE);
>>>> }
>>>> . . .
>>> After adding a couple of more DSB ST instructions and rebuilding
>>> (cross build), installing, and booting, I started a self-hosted,
>>> from-scratch, buildworld buildlkernel and intend to installkernel
>>> installworld and reboot if it appears to go well. (Better than
>>> just a boot test for if things seem at least sufficient, even if
>>> overkill.) But it will be hours before buildworld build kernel is
>>> done (-j3). (I'll sleep during much of that time.)

The buildworld buildkernel, installkernel installworld, reboot
sequence seems to have gone fine.

>>> . . .
>>>        phwr =3D buf_res.buffer;
>>>        addr =3D buf_res.physaddr;
>>>        addr +=3D (uintptr_t)&((struct xhci_hw_root =
*)0)->hwr_events[0];
>>>                 /* reset hardware root structure */
>>>        memset(phwr, 0, sizeof(*phwr));
>>>                 phwr->hwr_ring_seg[0].qwEvrsTablePtr =3D =
htole64(addr);
>>>        phwr->hwr_ring_seg[0].dwEvrsTableSize =3D =
htole32(XHCI_MAX_EVENTS);
>>>                 DPRINTF("ERDP(0)=3D0x%016llx\n", (unsigned long =
long)addr);
>>> #ifdef __aarch64__
>>> __asm __volatile("dsb st" : : : "memory");
>>> #endif
>>>        XWRITE4(sc, runt, XHCI_ERDP_LO(0), (uint32_t)addr);
>>>        XWRITE4(sc, runt, XHCI_ERDP_HI(0), (uint32_t)(addr >> 32));
>>>                 addr =3D buf_res.physaddr;
>>>                 DPRINTF("ERSTBA(0)=3D0x%016llx\n", (unsigned long =
long)addr);
>>>        XWRITE4(sc, runt, XHCI_ERSTBA_LO(0), (uint32_t)addr);
>>>        XWRITE4(sc, runt, XHCI_ERSTBA_HI(0), (uint32_t)(addr >> 32));
>> . . .
>> Probably same bug in the USB invalidate function.

>> On 2020-09-19 11:31, Mark Millard wrote:
>>> #ifdef __aarch64__
>>> __asm __volatile("dsb ld" : : : "memory");
>>> #endif
>>>                 temp =3D le32toh(phwr->hwr_events[i].dwTrb3);
>>>        =20
>>=20
>>=20
>> If you look a few lines up you'll find the:
>>=20
>>        usb_pc_cpu_invalidate(&sc->sc_hw.root_pc);
>>=20
>> That's where this instruction belongs.

>> Try to patch those generic flush/invalidate functions instead. I've =
been very careful about those.
>=20
> Sure.
>=20
>>>                temp =3D le32toh(phwr->hwr_events[i].dwTrb3);
>>>                         k =3D (temp & XHCI_TRB_3_CYCLE_BIT) ? 1 : 0;
>>>                         if (j !=3D k)
>>>                        break;
>>> always took the break because temp was always zero, k was always =
zero,
>>> and j was 1. I expect this is because the wrong qwEvrsTablePtr and
>>> dwEvrsTableSize content was in use by the xhci (fixed by the DSB =
ST).
>>> . . .
>>=20
>> It smells like a generic problem in the busdma synchronize code.
>>=20
>>> (Pe means "executing processor".)
>>> I expect the removal of that specific DSB ST to result in
>>> xhci_interrupt_poll's code again taking the break every time,
>>> just based on event ring behavior alone.
>>=20
>> Again, try to patch both
>> usb_pc_cpu_flush()
>> and
>> usb_pc_cpu_invalidate()
>=20
> Sure.
>=20

Unfortunately, it fails the same old way as far as
what is seen on the console.

So the steps used were:

I reverted /usr/src/sys/dev/usb/controller/xhci.c .

Then changed things so that:

# svnlite diff /usr/src/sys/dev/usb/usb_busdma.c
Index: /usr/src/sys/dev/usb/usb_busdma.c
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
--- /usr/src/sys/dev/usb/usb_busdma.c	(revision 363590)
+++ /usr/src/sys/dev/usb/usb_busdma.c	(working copy)
@@ -737,6 +737,9 @@
 	 */
 	bus_dmamap_sync(pc->tag, pc->map, BUS_DMASYNC_POSTREAD);
 	bus_dmamap_sync(pc->tag, pc->map, BUS_DMASYNC_PREREAD);
+#ifdef __aarch64__
+__asm __volatile("dsb ld" : : : "memory");
+#endif
 }
=20
 =
/*------------------------------------------------------------------------=
*
@@ -750,6 +753,9 @@
 		return;
 	}
 	bus_dmamap_sync(pc->tag, pc->map, BUS_DMASYNC_PREWRITE);
+#ifdef __aarch64__
+__asm __volatile("dsb st" : : : "memory");
+#endif
 }
=20
 =
/*------------------------------------------------------------------------=
*


As I expected, the resulting build fails to boot the same old
way: a sequence messages that repeats indefinitely and
involves "xhci0: Resetting controller" and no root file system
mount happens.

So I put back the just last change that had previously lead to
the successful booting (given the prior DSB ST / DSB LD
additions). I did so exactly as I originally had it:

# svnlite diff /usr/src/sys/dev/usb/controller/xhci.c
Index: /usr/src/sys/dev/usb/controller/xhci.c
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
--- /usr/src/sys/dev/usb/controller/xhci.c	(revision 363590)
+++ /usr/src/sys/dev/usb/controller/xhci.c	(working copy)
@@ -406,6 +406,9 @@
=20
 	addr =3D buf_res.physaddr;
=20
+#ifdef __aarch64__
+__asm __volatile("dsb st" : : : "memory");
+#endif
 	XWRITE4(sc, oper, XHCI_DCBAAP_LO, (uint32_t)addr);
 	XWRITE4(sc, oper, XHCI_DCBAAP_HI, (uint32_t)(addr >> 32));
 	XWRITE4(sc, oper, XHCI_DCBAAP_LO, (uint32_t)addr);

The result still fails. So more is required.

I'll do some exploring to try to find a combination that has
fewer DSB ST / DSB LD changes in usb/controller/xhci.c than my
original A72 success, leaving the usb/usb_busdma.c patch in
place.

It may take a while for me to have a successful result to
report.




Side note:

The from-scratch buildworld buildkernel reported:

World built in 37469 seconds, ncpu: 4, make -j3
Kernel(s)  GENERIC-NODBG built in 2474 seconds, ncpu: 4, make -j3

which totals to somewhat over 1hr 50min faster than the
last time I did such a -j3 from-scratch rebuild on the
same RPi4B:

World built in 44034 seconds, ncpu: 4, make -j3
Kernel(s)  GENERIC-NODBG built in 2895 seconds, ncpu: 4, make -j3

Both were based on a head -r363590 variation rebuilding itself
but the newer build is based on everything being -mcpu=3Dcortex-a72
based already in the running system. (Same over_voltage=3D6
and arm_freq=3D2000 setting in use both times.)

(I can cross build, which does not take long.)

=3D=3D=3D
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?565258A0-BEE1-48F8-9851-E6C7CF7ADAE7>