Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 28 Sep 2020 21:45:27 -0700
From:      Mark Millard <marklmi@yahoo.com>
To:        Robert Crowston <crowston@protonmail.com>, freebsd-arm <freebsd-arm@freebsd.org>
Subject:   Re: RPi4B's DMA11 (DMA4 engine example) vs. xHCI/pcie
Message-ID:  <85FEDC51-B5B0-4ED4-A5ED-62B63EF9D5A8@yahoo.com>
In-Reply-To: <0FE382AB-8DE3-4467-9CB0-E8582AC70EA2@yahoo.com>
References:  <8C6DE44F-6CE2-4C74-8748-3BBFB54AE183@yahoo.com> <0FE382AB-8DE3-4467-9CB0-E8582AC70EA2@yahoo.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On 2020-Sep-28, at 19:04, Mark Millard <marklmi at yahoo.com> wrote:

> On 2020-Sep-28, at 18:29, Mark Millard <marklmi at yahoo.com> wrote:
>>=20
>>> [Be warned that the material is not familiar so I may need
>>> educating. THis is based ont he example context that I
>>> happen to have around.]
>>>=20
>>> In the u-boot fdt print / output there are 2 distinct sets of dma =
channel
>>> information, 1 for soc and 1 for scb, where the dma_tag values for =
the two
>>> sets should be distinct as far as I can tell:
>>>=20
>>> U-Boot> fdt address 0x7ef1000
>>> U-Boot> fdt print /         =20
>>> / {
>>> . . .
>>>       soc {
>>>               dma@7e007000 {
>>>                       compatible =3D "brcm,bcm2835-dma";
>>>                       reg =3D <0x7e007000 0x00000b00>;
>>>                       interrupts =3D * 0x0000000007ef645c =
[0x00000084];
>>>                       interrupt-names =3D "dma0", "dma1", "dma2", =
"dma3", "dma4", "dma5", "dma6", "dma7", "dma8", "dma9", "dma10";
>>>                       #dma-cells =3D <0x00000001>;
>>>                       brcm,dma-channel-mask =3D <0x000001f5>;
>>>                       phandle =3D <0x0000000b>;
>>>               };
>>>=20
>>>       scb {
>>> . . .
>>>               dma@7e007b00 {
>>>                       compatible =3D "brcm,bcm2711-dma";
>>>                       reg =3D <0x00000000 0x7e007b00 0x00000000 =
0x00000400>;
>>>                       interrupts =3D <0x00000000 0x00000059 =
0x00000004 0x00000000 0x0000005a 0x00000004 0x00000000 0x0000005b =
0x00000004 0x00000000 0x0000005c 0x00000004>;
>>>                       interrupt-names =3D "dma11", "dma12", "dma13", =
"dma14";
>>>                       #dma-cells =3D <0x00000001>;
>>>                       brcm,dma-channel-mask =3D <0x00007000>;
>>>                       phandle =3D <0x0000003d>;
>>>               };
>>> . . .
>>>=20
>>> So,  0 through 10 need the soc criteria (mix of DMA and DMA LITE =
engine criteria)
>>> and 11 through 14 need the scb criteria (DMA4 engine criteria). (I'm =
ignore
>>> dma-channel-mask's at this point.)
>>>=20
>>>=20
>>> I'll here note the code has:
>>>=20
>>> #define	BCM_DMA_CH_MAX		12
>>>=20
>>> for use in code like:
>>>=20
>>>       /* setup initial settings */
>>>       for (i =3D 0; i < BCM_DMA_CH_MAX; i++) {
>>>               ch =3D &sc->sc_dma_ch[i];
>>>=20
>>>               bzero(ch, sizeof(struct bcm_dma_ch));
>>>               ch->ch =3D i;
>>>               ch->flags =3D BCM_DMA_CH_UNMAP;
>>>=20
>>>               if ((bcm_dma_channel_mask & (1 << i)) =3D=3D 0)
>>>                       continue;
>>> . . .
>>>=20
>>> It looks to me like the only scb/DMA4-engine "dma11" is covered
>>> by such loops and that the "brcm,dma-channel-mask =3D <0x00007000>"
>>> means that dma11 will not be used.
>>>=20
>>> So: No scb/DMA4 engine will be used??? (That could explain the
>>> 1 GiByte limit?)
>>>=20
>>>=20
>>> rpi_DATA_2711_1p0.pdf reports that soc/0-10 have 2 types (0-6 vs. =
7-10
>>> as it turns out) as well as the scb/DM4-engines (11-14):
>>>=20
>>> QUOTE (with omitted marked by ". . .")
>>> . . .
>>> The BCM2711 DMA Controller provides a total of 16 DMA channels. Four =
of these are DMA Lite channels (with reduced performance and features), =
and four of them are DMA4 channels (with increased performance and a =
wider address range).
>>> . . .
>>> 4.5. DMA LITE Engines
>>>=20
>>> Several of the DMA engines are of the LITE design. This is a reduced =
specification engine designed to save space. The engine behaves in the =
same way as a normal DMA engine except for the following differences:
>>> . . .
>>> 	=E2=80=A2 The DMA length register is now 16 bits, limiting the =
maximum transferable length to 65536 bytes.
>>> . . .
>>> 4.6. DMA4 Engines
>>>=20
>>> Several of the DMA engines are of the DMA4 design. These have higher =
performance due to their uncoupled read/write design and can access up =
to 40 address bits. Unlike the other DMA engines they are also capable =
of performing write bursts. Note that they directly access the full =
35-bit address bus of the BCM2711 and so bypass the paging registers of =
the DMA and DMA Lite engines.
>>>=20
>>> DMA channel 11 is additionally able to access the PCIe interface.
>>> END QUOTE
>>>=20
>>> The register map indicates (with some extra notes added):
>>>=20
>>> 0-6:   DMA
>>> 7-10:  DMA LITE (65536 bytes limit, for example)
>>> 11-14: DMA4 (11 is special relative to "PCIe interface")
>>> ("DMA Channel 15 is exclusively used by the VPU.")
>>>=20
>>> Yet what I see in the head -r365932 code is:
>>>=20
>>> #define	BCM_DMA_CH_MAX		12
>>> . . .
>>> struct bcm_dma_softc {
>>>       device_t                sc_dev;
>>>       struct mtx              sc_mtx;
>>>       struct resource *       sc_mem;
>>>       struct resource *       sc_irq[BCM_DMA_CH_MAX];
>>>       void *                  sc_intrhand[BCM_DMA_CH_MAX];
>>>       struct bcm_dma_ch       sc_dma_ch[BCM_DMA_CH_MAX];
>>>       bus_dma_tag_t           sc_dma_tag;
>>> };
>>> . . .
>>>       err =3D bus_dma_tag_create(bus_get_dma_tag(dev),
>>>           1, 0, BUS_SPACE_MAXADDR_32BIT,
>>>           BUS_SPACE_MAXADDR, NULL, NULL,
>>>           sizeof(struct bcm_dma_cb), 1,
>>>           sizeof(struct bcm_dma_cb),
>>>           BUS_DMA_ALLOCNOW, NULL, NULL,
>>>           &sc->sc_dma_tag);
>>>=20
>>> As an example: does that deal with the likes of DMA LITE (so 7-10) =
"limiting
>>> the maximum transferable length to 65536 bytes"?
>>>=20
>>> As another example: Does it deal with the DMA4 (11-14) distinctions =
(if
>>> such were in use anyway)?
>>>=20
>>> For reference from the fdt print / :
>>>=20
>>> / {
>>> . . .
>>>       #address-cells =3D <0x00000002>;
>>>       #size-cells =3D <0x00000001>;
>>> . . .
>>>       soc {
>>>               compatible =3D "simple-bus";
>>>               #address-cells =3D <0x00000001>;
>>>               #size-cells =3D <0x00000001>;
>>> . . .
>>>               dma-ranges =3D <0xc0000000 0x00000000 0x00000000 =
0x40000000>;
>>> . . .
>>>               firmware {
>>>                       compatible =3D "raspberrypi,bcm2835-firmware", =
"simple-bus";
>>>                       mboxes =3D <0x0000001c>;
>>>                       dma-ranges;
>>> . . .
>>>       emmc2bus {
>>>               compatible =3D "simple-bus";
>>>               #address-cells =3D <0x00000002>;
>>>               #size-cells =3D <0x00000001>;
>>> . . .
>>>               dma-ranges =3D <0x00000000 0xc0000000 0x00000000 =
0x00000000 0x40000000>;
>>> . . .
>>>       scb {
>>>               compatible =3D "simple-bus";
>>>               #address-cells =3D <0x00000002>;
>>>               #size-cells =3D <0x00000002>;
>>> . . .
>>>               dma-ranges =3D <0x00000000 0x00000000 0x00000000 =
0x00000000 0x00000000 0xfc000000 0x00000001 0x00000000 0x00000001 =
0x00000000 0x00000001 0x00000000>;
>>> . . .
>>>               pcie@7d500000 {
>>>                       compatible =3D "brcm,bcm2711-pcie";
>>> . . .
>>>                       #address-cells =3D <0x00000003>;
>>> . . .
>>>                       #size-cells =3D <0x00000002>;
>>> . . .
>>>                       dma-ranges =3D <0x02000000 0x00000000 =
0x00000000 0x00000000 0x00000000 0x00000000 0xc0000000>;
>>> . . .
>>>       v3dbus {
>>>               compatible =3D "simple-bus";
>>>               #address-cells =3D <0x00000001>;
>>>               #size-cells =3D <0x00000002>;
>>> . . .
>>>               dma-ranges =3D <0x00000000 0x00000000 0x00000000 =
0x00000004 0x00000000>;
>>> . . .
>>=20
>> rpi_DATA_2711_1p0.pdf reports:
>> (I ignore 2D DMA transfer mode here.)
>>=20
>> For DMA engines 0-6: XLENGTH has bits 29:0
>> bits 31:30 are write as 0, read as do not care.
>> That would put maxsegsz as 2**30 =3D=3D 1,073,741,824
>> which matches a 1 GiByte space.
>>=20
>> For DMA LITE engines 7-10: XLENGTH has bit 15:0
>> bits 31:16 are write as 0, read as do not care.
>> That would put maxsegsz as 2**16 =3D=3D 65,536.
>>=20
>> For DMA4 engines 11-14: XLENGTH has bits 29:0
>> bits 31:30 are write as 0, read as do not care.
>> That would put maxsegsz as 2**30 =3D=3D 1,073,741,824
>> which is smaller than the 3 GiByte space associated
>> with xHCI.

rpi_DATA_2711_1p0.pdf reports the following specifically for
DMA11-DMA14 (so the DMA4 engines) for what goes in the CB and
NEXT_CB ADDR fields:

QUOTE
The address must be 256-bit aligned and so the bottom 5 bits of the byte =
address are discarded, i.e. write cb_byte_address[39:0]>>5 into the CB
END QUOTE

This is not true for DMA0-DMA10 (DMA and DMA LITE).

The following is extracted from various places to
bring them together. I do not see evidence of handling
the cb_byte_address[39:0]>>5 involved for DMA11-DMA14:

#define ARMC_TO_VCBUS(pa)       bcm283x_armc_to_vcbus(pa)

vm_paddr_t
bcm283x_armc_to_vcbus(vm_paddr_t pa)
{
        struct bcm283x_memory_soc_cfg *cfg;
        struct bcm283x_memory_mapping *map, *ment;
=20
        /* Guaranteed not NULL if we haven't panicked yet. */
        cfg =3D bcm283x_get_current_memcfg();
        map =3D cfg->memmap;
        for (ment =3D map; !BCM283X_MEMMAP_ISTERM(ment); ++ment) {
                if (pa >=3D ment->armc_start &&
                    pa < ment->armc_start + ment->armc_size) {
                        return (pa - ment->armc_start) + =
ment->vcbus_start;
                }
        }

        /*
         * Assume 1:1 mapping for anything else, but complain about it =
on
         * verbose boots.
         */
        if (bootverbose)
                printf("bcm283x_vcbus: No armc -> vcbus mapping found: =
%jx\n",
                    (uintmax_t)pa);
        return (pa);
}

static void
bcm_dmamap_cb(void *arg, bus_dma_segment_t *segs,
        int nseg, int err)
{
        bus_addr_t *addr;

        if (err)
                return;

        addr =3D (bus_addr_t*)arg;
        *addr =3D ARMC_TO_VCBUS(segs[0].ds_addr);
}

Note ds_addr assignments in:

static bus_size_t
_bus_dmamap_addseg(bus_dma_tag_t dmat, bus_dmamap_t map, bus_addr_t =
curaddr,
    bus_size_t sgsize, bus_dma_segment_t *segs, int *segp)
{
        bus_addr_t baddr, bmask;
        int seg;
=20
        /*
         * Make sure we don't cross any boundaries.
         */
        bmask =3D ~(dmat->common.boundary - 1);
        if (dmat->common.boundary > 0) {
                baddr =3D (curaddr + dmat->common.boundary) & bmask;
                if (sgsize > (baddr - curaddr))
                        sgsize =3D (baddr - curaddr);
        }
=20
        /*
         * Insert chunk into a segment, coalescing with
         * previous segment if possible.
         */
        seg =3D *segp;
        if (seg =3D=3D -1) {
                seg =3D 0;
                segs[seg].ds_addr =3D curaddr;
                segs[seg].ds_len =3D sgsize;
        } else {
                if (curaddr =3D=3D segs[seg].ds_addr + segs[seg].ds_len =
&&
                    (segs[seg].ds_len + sgsize) <=3D =
dmat->common.maxsegsz &&
                    (dmat->common.boundary =3D=3D 0 ||
                     (segs[seg].ds_addr & bmask) =3D=3D (curaddr & =
bmask)))
                        segs[seg].ds_len +=3D sgsize;
                else {
                        if (++seg >=3D dmat->common.nsegments)
                                return (0);
                        segs[seg].ds_addr =3D curaddr;
                        segs[seg].ds_len =3D sgsize;
                }
        }
        *segp =3D seg;
        return (sgsize);
}


Note cb_phys and ch->vc_cb in:

static int
bcm_dma_init(device_t dev)
{
. . .
        /* setup initial settings */
        for (i =3D 0; i < BCM_DMA_CH_MAX; i++) {
. . .
                err =3D bus_dmamap_load(sc->sc_dma_tag, ch->dma_map, =
cb_virt,
                    sizeof(struct bcm_dma_cb), bcm_dmamap_cb, &cb_phys,
                    BUS_DMA_WAITOK);
                if (err) {
                        device_printf(dev, "cannot load DMA memory\n");
                        break;
                }

                ch->cb =3D cb_virt;
                ch->vc_cb =3D cb_phys;
. . .

int
bcm_dma_start(int ch, vm_paddr_t src, vm_paddr_t dst, int len)
{
        struct bcm_dma_softc *sc =3D bcm_dma_sc;
        struct bcm_dma_cb *cb;

        if (ch < 0 || ch >=3D BCM_DMA_CH_MAX)
                return (-1);
                   =20
        if (!(sc->sc_dma_ch[ch].flags & BCM_DMA_CH_USED))
                return (-1);
                       =20
        cb =3D sc->sc_dma_ch[ch].cb;
        cb->src =3D ARMC_TO_VCBUS(src);
        cb->dst =3D ARMC_TO_VCBUS(dst);
                =20
        cb->len =3D len;
                =20
        bus_dmamap_sync(sc->sc_dma_tag,
            sc->sc_dma_ch[ch].dma_map, BUS_DMASYNC_PREWRITE);
                       =20
        bus_write_4(sc->sc_mem, BCM_DMA_CBADDR(ch),
            sc->sc_dma_ch[ch].vc_cb);
        bus_write_4(sc->sc_mem, BCM_DMA_CS(ch), CS_ACTIVE);
                =20
#ifdef DEBUG
        bcm_dma_cb_dump(sc->sc_dma_ch[ch].cb);
        bcm_dma_reg_dump(ch);
#endif

        return (0);
}

It looks to me like FreeBSD is not set up to use the DMA4
engines (DMA11-DMA14) and happens to not use them for the
DTB that I get from u-boot.bin in my context.

Of course, I may just have missed something in looking
around at the unfamiliar material.

=3D=3D=3D
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?85FEDC51-B5B0-4ED4-A5ED-62B63EF9D5A8>