Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 29 Sep 2020 10:35:28 -0700
From:      Mark Millard <marklmi@yahoo.com>
To:        Robert Crowston <crowston@protonmail.com>, freebsd-arm <freebsd-arm@freebsd.org>
Subject:   Re: RPi4B's DMA11 (DMA4 engine example) vs. xHCI/pcie
Message-ID:  <B440C8D8-AA02-49E4-A0D6-3EA9B7FFD13A@yahoo.com>
In-Reply-To: <85FEDC51-B5B0-4ED4-A5ED-62B63EF9D5A8@yahoo.com>
References:  <8C6DE44F-6CE2-4C74-8748-3BBFB54AE183@yahoo.com> <0FE382AB-8DE3-4467-9CB0-E8582AC70EA2@yahoo.com> <85FEDC51-B5B0-4ED4-A5ED-62B63EF9D5A8@yahoo.com>

next in thread | previous in thread | raw e-mail | index | archive | help


On 2020-Sep-28, at 21:45, Mark Millard <marklmi at yahoo.com> wrote:

> On 2020-Sep-28, at 19:04, Mark Millard <marklmi at yahoo.com> wrote:
>=20
>> On 2020-Sep-28, at 18:29, Mark Millard <marklmi at yahoo.com> wrote:
>>>=20
>>>> [Be warned that the material is not familiar so I may need
>>>> educating. THis is based ont he example context that I
>>>> happen to have around.]
>>>>=20
>>>> In the u-boot fdt print / output there are 2 distinct sets of dma =
channel
>>>> information, 1 for soc and 1 for scb, where the dma_tag values for =
the two
>>>> sets should be distinct as far as I can tell:
>>>>=20
>>>> U-Boot> fdt address 0x7ef1000
>>>> U-Boot> fdt print /         =20
>>>> / {
>>>> . . .
>>>>      soc {
>>>>              dma@7e007000 {
>>>>                      compatible =3D "brcm,bcm2835-dma";
>>>>                      reg =3D <0x7e007000 0x00000b00>;
>>>>                      interrupts =3D * 0x0000000007ef645c =
[0x00000084];
>>>>                      interrupt-names =3D "dma0", "dma1", "dma2", =
"dma3", "dma4", "dma5", "dma6", "dma7", "dma8", "dma9", "dma10";
>>>>                      #dma-cells =3D <0x00000001>;
>>>>                      brcm,dma-channel-mask =3D <0x000001f5>;
>>>>                      phandle =3D <0x0000000b>;
>>>>              };
>>>>=20
>>>>      scb {
>>>> . . .
>>>>              dma@7e007b00 {
>>>>                      compatible =3D "brcm,bcm2711-dma";
>>>>                      reg =3D <0x00000000 0x7e007b00 0x00000000 =
0x00000400>;
>>>>                      interrupts =3D <0x00000000 0x00000059 =
0x00000004 0x00000000 0x0000005a 0x00000004 0x00000000 0x0000005b =
0x00000004 0x00000000 0x0000005c 0x00000004>;
>>>>                      interrupt-names =3D "dma11", "dma12", "dma13", =
"dma14";
>>>>                      #dma-cells =3D <0x00000001>;
>>>>                      brcm,dma-channel-mask =3D <0x00007000>;
>>>>                      phandle =3D <0x0000003d>;
>>>>              };
>>>> . . .

I had presumed that the dma@7e007b00 would be processed. But
I finally happened to search for "bcm2711-dma" in FreeBSD and
it does not occur.

That appears to mean that BCM_DMA_CH_MAX being 12 is depending
on dma@7e007000's brcm,dma-channel-mask to avoid referencing
number 11 that does not exist in that bcm2835-dma context.

I think this makes what I wrote about DMA4 engines (the most
capable ones) somewhat incoherent in the details but the basic
not-supported-in-the-code and not-used status appears to be
true.

As for DMA0-DMA10 (bcm2835-dma), some DMA (0-6) vs. DMA  LITE
(7-10) distinctions not being handled (for example 65536
maxsegsz for DMA LITE) still looks to be true to me.

>>>> So,  0 through 10 need the soc criteria (mix of DMA and DMA LITE =
engine criteria)
>>>> and 11 through 14 need the scb criteria (DMA4 engine criteria). =
(I'm ignore
>>>> dma-channel-mask's at this point.)
>>>>=20
>>>>=20
>>>> I'll here note the code has:
>>>>=20
>>>> #define	BCM_DMA_CH_MAX		12
>>>>=20
>>>> for use in code like:
>>>>=20
>>>>      /* setup initial settings */
>>>>      for (i =3D 0; i < BCM_DMA_CH_MAX; i++) {
>>>>              ch =3D &sc->sc_dma_ch[i];
>>>>=20
>>>>              bzero(ch, sizeof(struct bcm_dma_ch));
>>>>              ch->ch =3D i;
>>>>              ch->flags =3D BCM_DMA_CH_UNMAP;
>>>>=20
>>>>              if ((bcm_dma_channel_mask & (1 << i)) =3D=3D 0)
>>>>                      continue;
>>>> . . .
>>>>=20
>>>> It looks to me like the only scb/DMA4-engine "dma11" is covered
>>>> by such loops and that the "brcm,dma-channel-mask =3D <0x00007000>"
>>>> means that dma11 will not be used.
>>>>=20
>>>> So: No scb/DMA4 engine will be used??? (That could explain the
>>>> 1 GiByte limit?)
>>>>=20
>>>>=20
>>>> rpi_DATA_2711_1p0.pdf reports that soc/0-10 have 2 types (0-6 vs. =
7-10
>>>> as it turns out) as well as the scb/DM4-engines (11-14):
>>>>=20
>>>> QUOTE (with omitted marked by ". . .")
>>>> . . .
>>>> The BCM2711 DMA Controller provides a total of 16 DMA channels. =
Four of these are DMA Lite channels (with reduced performance and =
features), and four of them are DMA4 channels (with increased =
performance and a wider address range).
>>>> . . .
>>>> 4.5. DMA LITE Engines
>>>>=20
>>>> Several of the DMA engines are of the LITE design. This is a =
reduced specification engine designed to save space. The engine behaves =
in the same way as a normal DMA engine except for the following =
differences:
>>>> . . .
>>>> 	=E2=80=A2 The DMA length register is now 16 bits, limiting the =
maximum transferable length to 65536 bytes.
>>>> . . .
>>>> 4.6. DMA4 Engines
>>>>=20
>>>> Several of the DMA engines are of the DMA4 design. These have =
higher performance due to their uncoupled read/write design and can =
access up to 40 address bits. Unlike the other DMA engines they are also =
capable of performing write bursts. Note that they directly access the =
full 35-bit address bus of the BCM2711 and so bypass the paging =
registers of the DMA and DMA Lite engines.
>>>>=20
>>>> DMA channel 11 is additionally able to access the PCIe interface.
>>>> END QUOTE
>>>>=20
>>>> The register map indicates (with some extra notes added):
>>>>=20
>>>> 0-6:   DMA
>>>> 7-10:  DMA LITE (65536 bytes limit, for example)
>>>> 11-14: DMA4 (11 is special relative to "PCIe interface")
>>>> ("DMA Channel 15 is exclusively used by the VPU.")
>>>>=20
>>>> Yet what I see in the head -r365932 code is:
>>>>=20
>>>> #define	BCM_DMA_CH_MAX		12
>>>> . . .
>>>> struct bcm_dma_softc {
>>>>      device_t                sc_dev;
>>>>      struct mtx              sc_mtx;
>>>>      struct resource *       sc_mem;
>>>>      struct resource *       sc_irq[BCM_DMA_CH_MAX];
>>>>      void *                  sc_intrhand[BCM_DMA_CH_MAX];
>>>>      struct bcm_dma_ch       sc_dma_ch[BCM_DMA_CH_MAX];
>>>>      bus_dma_tag_t           sc_dma_tag;
>>>> };
>>>> . . .
>>>>      err =3D bus_dma_tag_create(bus_get_dma_tag(dev),
>>>>          1, 0, BUS_SPACE_MAXADDR_32BIT,
>>>>          BUS_SPACE_MAXADDR, NULL, NULL,
>>>>          sizeof(struct bcm_dma_cb), 1,
>>>>          sizeof(struct bcm_dma_cb),
>>>>          BUS_DMA_ALLOCNOW, NULL, NULL,
>>>>          &sc->sc_dma_tag);
>>>>=20
>>>> As an example: does that deal with the likes of DMA LITE (so 7-10) =
"limiting
>>>> the maximum transferable length to 65536 bytes"?
>>>>=20
>>>> As another example: Does it deal with the DMA4 (11-14) distinctions =
(if
>>>> such were in use anyway)?
>>>>=20
>>>> For reference from the fdt print / :
>>>>=20
>>>> / {
>>>> . . .
>>>>      #address-cells =3D <0x00000002>;
>>>>      #size-cells =3D <0x00000001>;
>>>> . . .
>>>>      soc {
>>>>              compatible =3D "simple-bus";
>>>>              #address-cells =3D <0x00000001>;
>>>>              #size-cells =3D <0x00000001>;
>>>> . . .
>>>>              dma-ranges =3D <0xc0000000 0x00000000 0x00000000 =
0x40000000>;
>>>> . . .
>>>>              firmware {
>>>>                      compatible =3D "raspberrypi,bcm2835-firmware", =
"simple-bus";
>>>>                      mboxes =3D <0x0000001c>;
>>>>                      dma-ranges;
>>>> . . .
>>>>      emmc2bus {
>>>>              compatible =3D "simple-bus";
>>>>              #address-cells =3D <0x00000002>;
>>>>              #size-cells =3D <0x00000001>;
>>>> . . .
>>>>              dma-ranges =3D <0x00000000 0xc0000000 0x00000000 =
0x00000000 0x40000000>;
>>>> . . .
>>>>      scb {
>>>>              compatible =3D "simple-bus";
>>>>              #address-cells =3D <0x00000002>;
>>>>              #size-cells =3D <0x00000002>;
>>>> . . .
>>>>              dma-ranges =3D <0x00000000 0x00000000 0x00000000 =
0x00000000 0x00000000 0xfc000000 0x00000001 0x00000000 0x00000001 =
0x00000000 0x00000001 0x00000000>;
>>>> . . .
>>>>              pcie@7d500000 {
>>>>                      compatible =3D "brcm,bcm2711-pcie";
>>>> . . .
>>>>                      #address-cells =3D <0x00000003>;
>>>> . . .
>>>>                      #size-cells =3D <0x00000002>;
>>>> . . .
>>>>                      dma-ranges =3D <0x02000000 0x00000000 =
0x00000000 0x00000000 0x00000000 0x00000000 0xc0000000>;
>>>> . . .
>>>>      v3dbus {
>>>>              compatible =3D "simple-bus";
>>>>              #address-cells =3D <0x00000001>;
>>>>              #size-cells =3D <0x00000002>;
>>>> . . .
>>>>              dma-ranges =3D <0x00000000 0x00000000 0x00000000 =
0x00000004 0x00000000>;
>>>> . . .
>>>=20
>>> rpi_DATA_2711_1p0.pdf reports:
>>> (I ignore 2D DMA transfer mode here.)
>>>=20
>>> For DMA engines 0-6: XLENGTH has bits 29:0
>>> bits 31:30 are write as 0, read as do not care.
>>> That would put maxsegsz as 2**30 =3D=3D 1,073,741,824
>>> which matches a 1 GiByte space.
>>>=20
>>> For DMA LITE engines 7-10: XLENGTH has bit 15:0
>>> bits 31:16 are write as 0, read as do not care.
>>> That would put maxsegsz as 2**16 =3D=3D 65,536.
>>>=20
>>> For DMA4 engines 11-14: XLENGTH has bits 29:0
>>> bits 31:30 are write as 0, read as do not care.
>>> That would put maxsegsz as 2**30 =3D=3D 1,073,741,824
>>> which is smaller than the 3 GiByte space associated
>>> with xHCI.
>=20
> rpi_DATA_2711_1p0.pdf reports the following specifically for
> DMA11-DMA14 (so the DMA4 engines) for what goes in the CB and
> NEXT_CB ADDR fields:
>=20
> QUOTE
> The address must be 256-bit aligned and so the bottom 5 bits of the =
byte address are discarded, i.e. write cb_byte_address[39:0]>>5 into the =
CB
> END QUOTE
>=20
> This is not true for DMA0-DMA10 (DMA and DMA LITE).
>=20
> The following is extracted from various places to
> bring them together. I do not see evidence of handling
> the cb_byte_address[39:0]>>5 involved for DMA11-DMA14:
>=20
> #define ARMC_TO_VCBUS(pa)       bcm283x_armc_to_vcbus(pa)
>=20
> vm_paddr_t
> bcm283x_armc_to_vcbus(vm_paddr_t pa)
> {
>        struct bcm283x_memory_soc_cfg *cfg;
>        struct bcm283x_memory_mapping *map, *ment;
>=20
>        /* Guaranteed not NULL if we haven't panicked yet. */
>        cfg =3D bcm283x_get_current_memcfg();
>        map =3D cfg->memmap;
>        for (ment =3D map; !BCM283X_MEMMAP_ISTERM(ment); ++ment) {
>                if (pa >=3D ment->armc_start &&
>                    pa < ment->armc_start + ment->armc_size) {
>                        return (pa - ment->armc_start) + =
ment->vcbus_start;
>                }
>        }
>=20
>        /*
>         * Assume 1:1 mapping for anything else, but complain about it =
on
>         * verbose boots.
>         */
>        if (bootverbose)
>                printf("bcm283x_vcbus: No armc -> vcbus mapping found: =
%jx\n",
>                    (uintmax_t)pa);
>        return (pa);
> }
>=20
> static void
> bcm_dmamap_cb(void *arg, bus_dma_segment_t *segs,
>        int nseg, int err)
> {
>        bus_addr_t *addr;
>=20
>        if (err)
>                return;
>=20
>        addr =3D (bus_addr_t*)arg;
>        *addr =3D ARMC_TO_VCBUS(segs[0].ds_addr);
> }
>=20
> Note ds_addr assignments in:
>=20
> static bus_size_t
> _bus_dmamap_addseg(bus_dma_tag_t dmat, bus_dmamap_t map, bus_addr_t =
curaddr,
>    bus_size_t sgsize, bus_dma_segment_t *segs, int *segp)
> {
>        bus_addr_t baddr, bmask;
>        int seg;
>=20
>        /*
>         * Make sure we don't cross any boundaries.
>         */
>        bmask =3D ~(dmat->common.boundary - 1);
>        if (dmat->common.boundary > 0) {
>                baddr =3D (curaddr + dmat->common.boundary) & bmask;
>                if (sgsize > (baddr - curaddr))
>                        sgsize =3D (baddr - curaddr);
>        }
>=20
>        /*
>         * Insert chunk into a segment, coalescing with
>         * previous segment if possible.
>         */
>        seg =3D *segp;
>        if (seg =3D=3D -1) {
>                seg =3D 0;
>                segs[seg].ds_addr =3D curaddr;
>                segs[seg].ds_len =3D sgsize;
>        } else {
>                if (curaddr =3D=3D segs[seg].ds_addr + segs[seg].ds_len =
&&
>                    (segs[seg].ds_len + sgsize) <=3D =
dmat->common.maxsegsz &&
>                    (dmat->common.boundary =3D=3D 0 ||
>                     (segs[seg].ds_addr & bmask) =3D=3D (curaddr & =
bmask)))
>                        segs[seg].ds_len +=3D sgsize;
>                else {
>                        if (++seg >=3D dmat->common.nsegments)
>                                return (0);
>                        segs[seg].ds_addr =3D curaddr;
>                        segs[seg].ds_len =3D sgsize;
>                }
>        }
>        *segp =3D seg;
>        return (sgsize);
> }
>=20
>=20
> Note cb_phys and ch->vc_cb in:
>=20
> static int
> bcm_dma_init(device_t dev)
> {
> . . .
>        /* setup initial settings */
>        for (i =3D 0; i < BCM_DMA_CH_MAX; i++) {
> . . .
>                err =3D bus_dmamap_load(sc->sc_dma_tag, ch->dma_map, =
cb_virt,
>                    sizeof(struct bcm_dma_cb), bcm_dmamap_cb, &cb_phys,
>                    BUS_DMA_WAITOK);
>                if (err) {
>                        device_printf(dev, "cannot load DMA memory\n");
>                        break;
>                }
>=20
>                ch->cb =3D cb_virt;
>                ch->vc_cb =3D cb_phys;
> . . .
>=20
> int
> bcm_dma_start(int ch, vm_paddr_t src, vm_paddr_t dst, int len)
> {
>        struct bcm_dma_softc *sc =3D bcm_dma_sc;
>        struct bcm_dma_cb *cb;
>=20
>        if (ch < 0 || ch >=3D BCM_DMA_CH_MAX)
>                return (-1);
>=20
>        if (!(sc->sc_dma_ch[ch].flags & BCM_DMA_CH_USED))
>                return (-1);
>=20
>        cb =3D sc->sc_dma_ch[ch].cb;
>        cb->src =3D ARMC_TO_VCBUS(src);
>        cb->dst =3D ARMC_TO_VCBUS(dst);
>=20
>        cb->len =3D len;
>=20
>        bus_dmamap_sync(sc->sc_dma_tag,
>            sc->sc_dma_ch[ch].dma_map, BUS_DMASYNC_PREWRITE);
>=20
>        bus_write_4(sc->sc_mem, BCM_DMA_CBADDR(ch),
>            sc->sc_dma_ch[ch].vc_cb);
>        bus_write_4(sc->sc_mem, BCM_DMA_CS(ch), CS_ACTIVE);
>=20
> #ifdef DEBUG
>        bcm_dma_cb_dump(sc->sc_dma_ch[ch].cb);
>        bcm_dma_reg_dump(ch);
> #endif
>=20
>        return (0);
> }
>=20
> It looks to me like FreeBSD is not set up to use the DMA4
> engines (DMA11-DMA14) and happens to not use them for the
> DTB that I get from u-boot.bin in my context.
>=20
> Of course, I may just have missed something in looking
> around at the unfamiliar material.




=3D=3D=3D
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?B440C8D8-AA02-49E4-A0D6-3EA9B7FFD13A>