Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 30 Sep 2020 21:15:30 +0000
From:      Robert Crowston <crowston@protonmail.com>
To:        Mark Millard <marklmi@yahoo.com>
Cc:        freebsd-arm <freebsd-arm@freebsd.org>
Subject:   Re: RPi4B's DMA11 (DMA4 engine example) vs. xHCI/pcie
Message-ID:  <tNJ_d5vRy5yTyYQw2MoZvybqy_7lqaHUfmXjedMUax0-LUolwajbPIPJLpQZqV6e9ymgkUogKFKRv0E0LrfDmLMiE99QraRHPamDyMDPVm4=@protonmail.com>
In-Reply-To: <903FE769-ED46-4FBC-A272-4D2C89A9CD7A@yahoo.com>
References:  <8C6DE44F-6CE2-4C74-8748-3BBFB54AE183@yahoo.com> <0FE382AB-8DE3-4467-9CB0-E8582AC70EA2@yahoo.com> <85FEDC51-B5B0-4ED4-A5ED-62B63EF9D5A8@yahoo.com> <B440C8D8-AA02-49E4-A0D6-3EA9B7FFD13A@yahoo.com> <903FE769-ED46-4FBC-A272-4D2C89A9CD7A@yahoo.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Very interesting analysis. Certainly uncovered a few things I wasn't aware =
of.

By default sc->sc_bus.dma_bits in xhci_init is 64 bits; I toggle it back to=
 32 bits in the xhci shim I wrote for the Pi 4. You can see that output in =
a verbose dmesg.

    =E2=80=94 RHC.

=E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90 Original Me=
ssage =E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90
On Wednesday, 30 September 2020 19:13, Mark Millard <marklmi@yahoo.com> wro=
te:

>
>
> On 2020-Sep-29, at 10:35, Mark Millard <marklmi atyahoo.com> wrote:
>
> > On 2020-Sep-28, at 21:45, Mark Millard <marklmi at yahoo.com> wrote:
> >
> > > On 2020-Sep-28, at 19:04, Mark Millard <marklmi at yahoo.com> wrote:
> > >
> > > > On 2020-Sep-28, at 18:29, Mark Millard <marklmi at yahoo.com> wrote=
:
> > > >
> > > > > > [Be warned that the material is not familiar so I may need
> > > > > > educating. THis is based ont he example context that I
> > > > > > happen to have around.]
> > > > > > In the u-boot fdt print / output there are 2 distinct sets of d=
ma channel
> > > > > > information, 1 for soc and 1 for scb, where the dma_tag values =
for the two
> > > > > > sets should be distinct as far as I can tell:
> > > > > > U-Boot> fdt address 0x7ef1000
> > > > > > U-Boot> fdt print /
> > > > > > / {
> > > > > > . . .
> > > > > > soc {
> > > > > > dma@7e007000 {
> > > > > > compatible =3D "brcm,bcm2835-dma";
> > > > > > reg =3D <0x7e007000 0x00000b00>;
> > > > > > interrupts =3D * 0x0000000007ef645c [0x00000084];
> > > > > > interrupt-names =3D "dma0", "dma1", "dma2", "dma3", "dma4", "dm=
a5", "dma6", "dma7", "dma8", "dma9", "dma10";
> > > > > > #dma-cells =3D <0x00000001>;
> > > > > > brcm,dma-channel-mask =3D <0x000001f5>;
> > > > > > phandle =3D <0x0000000b>;
> > > > > > };
> > > > > >
> > > > > >     scb {
> > > > > >
> > > > > >
> > > > > > . . .
> > > > > > dma@7e007b00 {
> > > > > > compatible =3D "brcm,bcm2711-dma";
> > > > > > reg =3D <0x00000000 0x7e007b00 0x00000000 0x00000400>;
> > > > > > interrupts =3D <0x00000000 0x00000059 0x00000004 0x00000000 0x0=
000005a 0x00000004 0x00000000 0x0000005b 0x00000004 0x00000000 0x0000005c 0=
x00000004>;
> > > > > > interrupt-names =3D "dma11", "dma12", "dma13", "dma14";
> > > > > > #dma-cells =3D <0x00000001>;
> > > > > > brcm,dma-channel-mask =3D <0x00007000>;
> > > > > > phandle =3D <0x0000003d>;
> > > > > > };
> > > > > > . . .
> >
> > I had presumed that the dma@7e007b00 would be processed. But
> > I finally happened to search for "bcm2711-dma" in FreeBSD and
> > it does not occur.
> > That appears to mean that BCM_DMA_CH_MAX being 12 is depending
> > on dma@7e007000's brcm,dma-channel-mask to avoid referencing
> > number 11 that does not exist in that bcm2835-dma context.
> > I think this makes what I wrote about DMA4 engines (the most
> > capable ones) somewhat incoherent in the details but the basic
> > not-supported-in-the-code and not-used status appears to be
> > true.
> > As for DMA0-DMA10 (bcm2835-dma), some DMA (0-6) vs. DMA LITE
> > (7-10) distinctions not being handled (for example 65536
> > maxsegsz for DMA LITE) still looks to be true to me.
>
> Looks like FreeBSD is limited to 32-bit via usb/controller/generic_xhci.c
> has nothing explicit for other than 32 address lines (and overall the
> only alternative is 64 address lines):
>
> #define IS_DMA_32B 1
>
> int
> generic_xhci_attach(device_t dev)
> {
> . . .
> err =3D xhci_init(sc, dev, IS_DMA_32B);
> if (err !=3D 0) {
> device_printf(dev, "Failed to init XHCI, with error %d\n", err);
> generic_xhci_detach(dev);
> return (ENXIO);
> }
> . . .
> /*
>
> -   The following structure describes the parent USB DMA tag.
>     /
>     #if USB_HAVE_BUSDMA
>     struct usb_dma_parent_tag {
>     . . .
>     uint8_t dma_bits; / number of DMA address lines /
>     . . .
>     };
>     #else
>     struct usb_dma_parent_tag {}; / empty struct */#endif
>     . . .
>     usb_error_t
>     xhci_init(struct xhci_softc sc, device_t self, uint8_t dma32)
>     {
>     . . .
>     / get DMA bits */sc->sc_bus.dma_bits =3D (XHCI_HCS0_AC64(temp) &&
>
>              xhcidma32 =3D=3D 0 && dma32 =3D=3D 0) ? 64 : 32;
>
>
>
> . . .
>
> Overall it looks like a bunch of places would need changes to
> support the RPi4B's 3 GiByte capability. (Probably more than
> I've discovered, ignoring things like DMA4 engine use to get
> write bursts and the like.)
>
> I will note that I found code in NetBSD that classifies "normal"
> DMA engines vs. DMA LITE engines (via testing a debug register)
> for bcm2835-dma and only requests normal DMA engines be used,
> skipping DMA LITE. (This is for DTB/fdt contexts I think. I've
> not done as well figuring out even such narrow aspects of ACPI
> handling of things.) This tends to confirm my worries over
> FreeBSD's bcm2835-dma handling of the DMA LITE engines existing
> but being less capable.
>
> > > > > > So, 0 through 10 need the soc criteria (mix of DMA and DMA LITE=
 engine criteria)
> > > > > > and 11 through 14 need the scb criteria (DMA4 engine criteria).=
 (I'm ignore
> > > > > > dma-channel-mask's at this point.)
> > > > > > I'll here note the code has:
> > > > > > #define BCM_DMA_CH_MAX 12
> > > > > > for use in code like:
> > > > > >
> > > > > >     /* setup initial settings */
> > > > > >     for (i =3D 0; i < BCM_DMA_CH_MAX; i++) {
> > > > > >             ch =3D &sc->sc_dma_ch[i];
> > > > > >
> > > > > >             bzero(ch, sizeof(struct bcm_dma_ch));
> > > > > >             ch->ch =3D i;
> > > > > >             ch->flags =3D BCM_DMA_CH_UNMAP;
> > > > > >
> > > > > >             if ((bcm_dma_channel_mask & (1 << i)) =3D=3D 0)
> > > > > >                     continue;
> > > > > >
> > > > > >
> > > > > > . . .
> > > > > > It looks to me like the only scb/DMA4-engine "dma11" is covered
> > > > > > by such loops and that the "brcm,dma-channel-mask =3D <0x000070=
00>"
> > > > > > means that dma11 will not be used.
> > > > > > So: No scb/DMA4 engine will be used??? (That could explain the
> > > > > > 1 GiByte limit?)
> > > > > > rpi_DATA_2711_1p0.pdf reports that soc/0-10 have 2 types (0-6 v=
s. 7-10
> > > > > > as it turns out) as well as the scb/DM4-engines (11-14):
> > > > > > QUOTE (with omitted marked by ". . .")
> > > > > > . . .
> > > > > > The BCM2711 DMA Controller provides a total of 16 DMA channels.=
 Four of these are DMA Lite channels (with reduced performance and features=
), and four of them are DMA4 channels (with increased performance and a wid=
er address range).
> > > > > > . . .
> > > > > > 4.5. DMA LITE Engines
> > > > > > Several of the DMA engines are of the LITE design. This is a re=
duced specification engine designed to save space. The engine behaves in th=
e same way as a normal DMA engine except for the following differences:
> > > > > > . . .
> > > > > > =E2=80=A2 The DMA length register is now 16 bits, limiting the =
maximum transferable length to 65536 bytes.
> > > > > > . . .
> > > > > > 4.6. DMA4 Engines
> > > > > > Several of the DMA engines are of the DMA4 design. These have h=
igher performance due to their uncoupled read/write design and can access u=
p to 40 address bits. Unlike the other DMA engines they are also capable of=
 performing write bursts. Note that they directly access the full 35-bit ad=
dress bus of the BCM2711 and so bypass the paging registers of the DMA and =
DMA Lite engines.
> > > > > > DMA channel 11 is additionally able to access the PCIe interfac=
e.
> > > > > > END QUOTE
> > > > > > The register map indicates (with some extra notes added):
> > > > > > 0-6: DMA
> > > > > > 7-10: DMA LITE (65536 bytes limit, for example)
> > > > > > 11-14: DMA4 (11 is special relative to "PCIe interface")
> > > > > > ("DMA Channel 15 is exclusively used by the VPU.")
> > > > > > Yet what I see in the head -r365932 code is:
> > > > > > #define BCM_DMA_CH_MAX 12
> > > > > > . . .
> > > > > > struct bcm_dma_softc {
> > > > > > device_t sc_dev;
> > > > > > struct mtx sc_mtx;
> > > > > > struct resource * sc_mem;
> > > > > > struct resource * sc_irq[BCM_DMA_CH_MAX];
> > > > > > void * sc_intrhand[BCM_DMA_CH_MAX];
> > > > > > struct bcm_dma_ch sc_dma_ch[BCM_DMA_CH_MAX];
> > > > > > bus_dma_tag_t sc_dma_tag;
> > > > > > };
> > > > > > . . .
> > > > > > err =3D bus_dma_tag_create(bus_get_dma_tag(dev),
> > > > > > 1, 0, BUS_SPACE_MAXADDR_32BIT,
> > > > > > BUS_SPACE_MAXADDR, NULL, NULL,
> > > > > > sizeof(struct bcm_dma_cb), 1,
> > > > > > sizeof(struct bcm_dma_cb),
> > > > > > BUS_DMA_ALLOCNOW, NULL, NULL,
> > > > > > &sc->sc_dma_tag);
> > > > > > As an example: does that deal with the likes of DMA LITE (so 7-=
10) "limiting
> > > > > > the maximum transferable length to 65536 bytes"?
> > > > > > As another example: Does it deal with the DMA4 (11-14) distinct=
ions (if
> > > > > > such were in use anyway)?
> > > > > > For reference from the fdt print / :
> > > > > > / {
> > > > > > . . .
> > > > > > #address-cells =3D <0x00000002>;
> > > > > > #size-cells =3D <0x00000001>;
> > > > > > . . .
> > > > > > soc {
> > > > > > compatible =3D "simple-bus";
> > > > > > #address-cells =3D <0x00000001>;
> > > > > > #size-cells =3D <0x00000001>;
> > > > > > . . .
> > > > > > dma-ranges =3D <0xc0000000 0x00000000 0x00000000 0x40000000>;
> > > > > > . . .
> > > > > > firmware {
> > > > > > compatible =3D "raspberrypi,bcm2835-firmware", "simple-bus";
> > > > > > mboxes =3D <0x0000001c>;
> > > > > > dma-ranges;
> > > > > > . . .
> > > > > > emmc2bus {
> > > > > > compatible =3D "simple-bus";
> > > > > > #address-cells =3D <0x00000002>;
> > > > > > #size-cells =3D <0x00000001>;
> > > > > > . . .
> > > > > > dma-ranges =3D <0x00000000 0xc0000000 0x00000000 0x00000000 0x4=
0000000>;
> > > > > > . . .
> > > > > > scb {
> > > > > > compatible =3D "simple-bus";
> > > > > > #address-cells =3D <0x00000002>;
> > > > > > #size-cells =3D <0x00000002>;
> > > > > > . . .
> > > > > > dma-ranges =3D <0x00000000 0x00000000 0x00000000 0x00000000 0x0=
0000000 0xfc000000 0x00000001 0x00000000 0x00000001 0x00000000 0x00000001 0=
x00000000>;
> > > > > > . . .
> > > > > > pcie@7d500000 {
> > > > > > compatible =3D "brcm,bcm2711-pcie";
> > > > > > . . .
> > > > > > #address-cells =3D <0x00000003>;
> > > > > > . . .
> > > > > > #size-cells =3D <0x00000002>;
> > > > > > . . .
> > > > > > dma-ranges =3D <0x02000000 0x00000000 0x00000000 0x00000000 0x0=
0000000 0x00000000 0xc0000000>;
> > > > > > . . .
> > > > > > v3dbus {
> > > > > > compatible =3D "simple-bus";
> > > > > > #address-cells =3D <0x00000001>;
> > > > > > #size-cells =3D <0x00000002>;
> > > > > > . . .
> > > > > > dma-ranges =3D <0x00000000 0x00000000 0x00000000 0x00000004 0x0=
0000000>;
> > > > > > . . .
> > > > >
> > > > > rpi_DATA_2711_1p0.pdf reports:
> > > > > (I ignore 2D DMA transfer mode here.)
> > > > > For DMA engines 0-6: XLENGTH has bits 29:0
> > > > > bits 31:30 are write as 0, read as do not care.
> > > > > That would put maxsegsz as 2**30 =3D=3D 1,073,741,824
> > > > > which matches a 1 GiByte space.
> > > > > For DMA LITE engines 7-10: XLENGTH has bit 15:0
> > > > > bits 31:16 are write as 0, read as do not care.
> > > > > That would put maxsegsz as 2**16 =3D=3D 65,536.
> > > > > For DMA4 engines 11-14: XLENGTH has bits 29:0
> > > > > bits 31:30 are write as 0, read as do not care.
> > > > > That would put maxsegsz as 2**30 =3D=3D 1,073,741,824
> > > > > which is smaller than the 3 GiByte space associated
> > > > > with xHCI.
> > >
> > > rpi_DATA_2711_1p0.pdf reports the following specifically for
> > > DMA11-DMA14 (so the DMA4 engines) for what goes in the CB and
> > > NEXT_CB ADDR fields:
> > > QUOTE
> > > The address must be 256-bit aligned and so the bottom 5 bits of the b=
yte address are discarded, i.e. write cb_byte_address[39:0]>>5 into the CB
> > > END QUOTE
> > > This is not true for DMA0-DMA10 (DMA and DMA LITE).
> > > The following is extracted from various places to
> > > bring them together. I do not see evidence of handling
> > > the cb_byte_address[39:0]>>5 involved for DMA11-DMA14:
> > > #define ARMC_TO_VCBUS(pa) bcm283x_armc_to_vcbus(pa)
> > > vm_paddr_t
> > > bcm283x_armc_to_vcbus(vm_paddr_t pa)
> > > {
> > > struct bcm283x_memory_soc_cfg *cfg;
> > > struct bcm283x_memory_mapping *map, *ment;
> > >
> > >       /* Guaranteed not NULL if we haven't panicked yet. */
> > >       cfg =3D bcm283x_get_current_memcfg();
> > >       map =3D cfg->memmap;
> > >       for (ment =3D map; !BCM283X_MEMMAP_ISTERM(ment); ++ment) {
> > >               if (pa >=3D ment->armc_start &&
> > >                   pa < ment->armc_start + ment->armc_size) {
> > >                       return (pa - ment->armc_start) + ment->vcbus_st=
art;
> > >               }
> > >       }
> > >
> > >       /*
> > >        * Assume 1:1 mapping for anything else, but complain about it =
on
> > >        * verbose boots.
> > >        */
> > >       if (bootverbose)
> > >               printf("bcm283x_vcbus: No armc -> vcbus mapping found: =
%jx\\n",
> > >                   (uintmax_t)pa);
> > >       return (pa);
> > >
> > >
> > > }
> > > static void
> > > bcm_dmamap_cb(void *arg, bus_dma_segment_t *segs,
> > > int nseg, int err)
> > > {
> > > bus_addr_t *addr;
> > >
> > >       if (err)
> > >               return;
> > >
> > >       addr =3D (bus_addr_t*)arg;
> > >       *addr =3D ARMC_TO_VCBUS(segs[0].ds_addr);
> > >
> > >
> > > }
> > > Note ds_addr assignments in:
> > > static bus_size_t
> > > _bus_dmamap_addseg(bus_dma_tag_t dmat, bus_dmamap_t map, bus_addr_t c=
uraddr,
> > > bus_size_t sgsize, bus_dma_segment_t *segs, int *segp)
> > > {
> > > bus_addr_t baddr, bmask;
> > > int seg;
> > >
> > >       /*
> > >        * Make sure we don't cross any boundaries.
> > >        */
> > >       bmask =3D ~(dmat->common.boundary - 1);
> > >       if (dmat->common.boundary > 0) {
> > >               baddr =3D (curaddr + dmat->common.boundary) & bmask;
> > >               if (sgsize > (baddr - curaddr))
> > >                       sgsize =3D (baddr - curaddr);
> > >       }
> > >
> > >       /*
> > >        * Insert chunk into a segment, coalescing with
> > >        * previous segment if possible.
> > >        */
> > >       seg =3D *segp;
> > >       if (seg =3D=3D -1) {
> > >               seg =3D 0;
> > >               segs[seg].ds_addr =3D curaddr;
> > >               segs[seg].ds_len =3D sgsize;
> > >       } else {
> > >               if (curaddr =3D=3D segs[seg].ds_addr + segs[seg].ds_len=
 &&
> > >                   (segs[seg].ds_len + sgsize) <=3D dmat->common.maxse=
gsz &&
> > >                   (dmat->common.boundary =3D=3D 0 ||
> > >                    (segs[seg].ds_addr & bmask) =3D=3D (curaddr & bmas=
k)))
> > >                       segs[seg].ds_len +=3D sgsize;
> > >               else {
> > >                       if (++seg >=3D dmat->common.nsegments)
> > >                               return (0);
> > >                       segs[seg].ds_addr =3D curaddr;
> > >                       segs[seg].ds_len =3D sgsize;
> > >               }
> > >       }
> > >       *segp =3D seg;
> > >       return (sgsize);
> > >
> > >
> > > }
> > > Note cb_phys and ch->vc_cb in:
> > > static int
> > > bcm_dma_init(device_t dev)
> > > {
> > > . . .
> > > /* setup initial settings */
> > > for (i =3D 0; i < BCM_DMA_CH_MAX; i++) {
> > > . . .
> > > err =3D bus_dmamap_load(sc->sc_dma_tag, ch->dma_map, cb_virt,
> > > sizeof(struct bcm_dma_cb), bcm_dmamap_cb, &cb_phys,
> > > BUS_DMA_WAITOK);
> > > if (err) {
> > > device_printf(dev, "cannot load DMA memory\n");
> > > break;
> > > }
> > >
> > >               ch->cb =3D cb_virt;
> > >               ch->vc_cb =3D cb_phys;
> > >
> > >
> > > . . .
> > > int
> > > bcm_dma_start(int ch, vm_paddr_t src, vm_paddr_t dst, int len)
> > > {
> > > struct bcm_dma_softc *sc =3D bcm_dma_sc;
> > > struct bcm_dma_cb *cb;
> > >
> > >       if (ch < 0 || ch >=3D BCM_DMA_CH_MAX)
> > >               return (-1);
> > >
> > >       if (!(sc->sc_dma_ch[ch].flags & BCM_DMA_CH_USED))
> > >               return (-1);
> > >
> > >       cb =3D sc->sc_dma_ch[ch].cb;
> > >       cb->src =3D ARMC_TO_VCBUS(src);
> > >       cb->dst =3D ARMC_TO_VCBUS(dst);
> > >
> > >       cb->len =3D len;
> > >
> > >       bus_dmamap_sync(sc->sc_dma_tag,
> > >           sc->sc_dma_ch[ch].dma_map, BUS_DMASYNC_PREWRITE);
> > >
> > >       bus_write_4(sc->sc_mem, BCM_DMA_CBADDR(ch),
> > >           sc->sc_dma_ch[ch].vc_cb);
> > >       bus_write_4(sc->sc_mem, BCM_DMA_CS(ch), CS_ACTIVE);
> > >
> > >
> > > #ifdef DEBUG
> > > bcm_dma_cb_dump(sc->sc_dma_ch[ch].cb);
> > > bcm_dma_reg_dump(ch);
> > > #endif
> > >
> > >       return (0);
> > >
> > >
> > > }
> > > It looks to me like FreeBSD is not set up to use the DMA4
> > > engines (DMA11-DMA14) and happens to not use them for the
> > > DTB that I get from u-boot.bin in my context.
> > > Of course, I may just have missed something in looking
> > > around at the unfamiliar material.
>
> =3D=3D
>
> Mark Millard
> marklmi at yahoo.com
> ( dsl-only.net went
> away in early 2018-Mar)





Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?tNJ_d5vRy5yTyYQw2MoZvybqy_7lqaHUfmXjedMUax0-LUolwajbPIPJLpQZqV6e9ymgkUogKFKRv0E0LrfDmLMiE99QraRHPamDyMDPVm4=>