Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 30 Sep 2020 11:13:06 -0700
From:      Mark Millard <marklmi@yahoo.com>
To:        Robert Crowston <crowston@protonmail.com>, freebsd-arm <freebsd-arm@freebsd.org>
Subject:   Re: RPi4B's DMA11 (DMA4 engine example) vs. xHCI/pcie
Message-ID:  <903FE769-ED46-4FBC-A272-4D2C89A9CD7A@yahoo.com>
In-Reply-To: <B440C8D8-AA02-49E4-A0D6-3EA9B7FFD13A@yahoo.com>
References:  <8C6DE44F-6CE2-4C74-8748-3BBFB54AE183@yahoo.com> <0FE382AB-8DE3-4467-9CB0-E8582AC70EA2@yahoo.com> <85FEDC51-B5B0-4ED4-A5ED-62B63EF9D5A8@yahoo.com> <B440C8D8-AA02-49E4-A0D6-3EA9B7FFD13A@yahoo.com>

next in thread | previous in thread | raw e-mail | index | archive | help

On 2020-Sep-29, at 10:35, Mark Millard <marklmi at yahoo.com> wrote:

> On 2020-Sep-28, at 21:45, Mark Millard <marklmi at yahoo.com> wrote:
>=20
>> On 2020-Sep-28, at 19:04, Mark Millard <marklmi at yahoo.com> wrote:
>>=20
>>> On 2020-Sep-28, at 18:29, Mark Millard <marklmi at yahoo.com> wrote:
>>>>=20
>>>>> [Be warned that the material is not familiar so I may need
>>>>> educating. THis is based ont he example context that I
>>>>> happen to have around.]
>>>>>=20
>>>>> In the u-boot fdt print / output there are 2 distinct sets of dma =
channel
>>>>> information, 1 for soc and 1 for scb, where the dma_tag values for =
the two
>>>>> sets should be distinct as far as I can tell:
>>>>>=20
>>>>> U-Boot> fdt address 0x7ef1000
>>>>> U-Boot> fdt print /         =20
>>>>> / {
>>>>> . . .
>>>>>     soc {
>>>>>             dma@7e007000 {
>>>>>                     compatible =3D "brcm,bcm2835-dma";
>>>>>                     reg =3D <0x7e007000 0x00000b00>;
>>>>>                     interrupts =3D * 0x0000000007ef645c =
[0x00000084];
>>>>>                     interrupt-names =3D "dma0", "dma1", "dma2", =
"dma3", "dma4", "dma5", "dma6", "dma7", "dma8", "dma9", "dma10";
>>>>>                     #dma-cells =3D <0x00000001>;
>>>>>                     brcm,dma-channel-mask =3D <0x000001f5>;
>>>>>                     phandle =3D <0x0000000b>;
>>>>>             };
>>>>>=20
>>>>>     scb {
>>>>> . . .
>>>>>             dma@7e007b00 {
>>>>>                     compatible =3D "brcm,bcm2711-dma";
>>>>>                     reg =3D <0x00000000 0x7e007b00 0x00000000 =
0x00000400>;
>>>>>                     interrupts =3D <0x00000000 0x00000059 =
0x00000004 0x00000000 0x0000005a 0x00000004 0x00000000 0x0000005b =
0x00000004 0x00000000 0x0000005c 0x00000004>;
>>>>>                     interrupt-names =3D "dma11", "dma12", "dma13", =
"dma14";
>>>>>                     #dma-cells =3D <0x00000001>;
>>>>>                     brcm,dma-channel-mask =3D <0x00007000>;
>>>>>                     phandle =3D <0x0000003d>;
>>>>>             };
>>>>> . . .
>=20
> I had presumed that the dma@7e007b00 would be processed. But
> I finally happened to search for "bcm2711-dma" in FreeBSD and
> it does not occur.
>=20
> That appears to mean that BCM_DMA_CH_MAX being 12 is depending
> on dma@7e007000's brcm,dma-channel-mask to avoid referencing
> number 11 that does not exist in that bcm2835-dma context.
>=20
> I think this makes what I wrote about DMA4 engines (the most
> capable ones) somewhat incoherent in the details but the basic
> not-supported-in-the-code and not-used status appears to be
> true.
>=20
> As for DMA0-DMA10 (bcm2835-dma), some DMA (0-6) vs. DMA  LITE
> (7-10) distinctions not being handled (for example 65536
> maxsegsz for DMA LITE) still looks to be true to me.

Looks like FreeBSD is limited to 32-bit via =
usb/controller/generic_xhci.c
has nothing explicit for other than 32 address lines (and overall the
only alternative is 64 address lines):

#define IS_DMA_32B      1

int
generic_xhci_attach(device_t dev)
{
. . .
        err =3D xhci_init(sc, dev, IS_DMA_32B);
        if (err !=3D 0) {
                device_printf(dev, "Failed to init XHCI, with error =
%d\n", err);
                generic_xhci_detach(dev);
                return (ENXIO);
        }
. . .
/*
 * The following structure describes the parent USB DMA tag.
 */
#if USB_HAVE_BUSDMA
struct usb_dma_parent_tag {
. . .
        uint8_t dma_bits;               /* number of DMA address lines =
*/
. . .
};
#else
struct usb_dma_parent_tag {};           /* empty struct */
#endif
. . .
usb_error_t
xhci_init(struct xhci_softc *sc, device_t self, uint8_t dma32)
{
. . .
        /* get DMA bits */
        sc->sc_bus.dma_bits =3D (XHCI_HCS0_AC64(temp) &&
            xhcidma32 =3D=3D 0 && dma32 =3D=3D 0) ? 64 : 32;
. . .

Overall it looks like a bunch of places would need changes to
support the RPi4B's 3 GiByte capability. (Probably  more than
I've discovered, ignoring things like DMA4 engine use to get
write bursts and the like.)


I will note that I found code in NetBSD that classifies "normal"
DMA engines vs. DMA LITE engines (via testing a debug register)
for bcm2835-dma and only requests normal DMA engines be used,
skipping DMA LITE. (This is for DTB/fdt contexts I think. I've
not done as well figuring out even such narrow aspects of ACPI
handling of things.) This tends to confirm my worries over
FreeBSD's bcm2835-dma handling of the DMA LITE engines existing
but being less capable.

>>>>> So,  0 through 10 need the soc criteria (mix of DMA and DMA LITE =
engine criteria)
>>>>> and 11 through 14 need the scb criteria (DMA4 engine criteria). =
(I'm ignore
>>>>> dma-channel-mask's at this point.)
>>>>>=20
>>>>>=20
>>>>> I'll here note the code has:
>>>>>=20
>>>>> #define	BCM_DMA_CH_MAX		12
>>>>>=20
>>>>> for use in code like:
>>>>>=20
>>>>>     /* setup initial settings */
>>>>>     for (i =3D 0; i < BCM_DMA_CH_MAX; i++) {
>>>>>             ch =3D &sc->sc_dma_ch[i];
>>>>>=20
>>>>>             bzero(ch, sizeof(struct bcm_dma_ch));
>>>>>             ch->ch =3D i;
>>>>>             ch->flags =3D BCM_DMA_CH_UNMAP;
>>>>>=20
>>>>>             if ((bcm_dma_channel_mask & (1 << i)) =3D=3D 0)
>>>>>                     continue;
>>>>> . . .
>>>>>=20
>>>>> It looks to me like the only scb/DMA4-engine "dma11" is covered
>>>>> by such loops and that the "brcm,dma-channel-mask =3D =
<0x00007000>"
>>>>> means that dma11 will not be used.
>>>>>=20
>>>>> So: No scb/DMA4 engine will be used??? (That could explain the
>>>>> 1 GiByte limit?)
>>>>>=20
>>>>>=20
>>>>> rpi_DATA_2711_1p0.pdf reports that soc/0-10 have 2 types (0-6 vs. =
7-10
>>>>> as it turns out) as well as the scb/DM4-engines (11-14):
>>>>>=20
>>>>> QUOTE (with omitted marked by ". . .")
>>>>> . . .
>>>>> The BCM2711 DMA Controller provides a total of 16 DMA channels. =
Four of these are DMA Lite channels (with reduced performance and =
features), and four of them are DMA4 channels (with increased =
performance and a wider address range).
>>>>> . . .
>>>>> 4.5. DMA LITE Engines
>>>>>=20
>>>>> Several of the DMA engines are of the LITE design. This is a =
reduced specification engine designed to save space. The engine behaves =
in the same way as a normal DMA engine except for the following =
differences:
>>>>> . . .
>>>>> 	=E2=80=A2 The DMA length register is now 16 bits, limiting the =
maximum transferable length to 65536 bytes.
>>>>> . . .
>>>>> 4.6. DMA4 Engines
>>>>>=20
>>>>> Several of the DMA engines are of the DMA4 design. These have =
higher performance due to their uncoupled read/write design and can =
access up to 40 address bits. Unlike the other DMA engines they are also =
capable of performing write bursts. Note that they directly access the =
full 35-bit address bus of the BCM2711 and so bypass the paging =
registers of the DMA and DMA Lite engines.
>>>>>=20
>>>>> DMA channel 11 is additionally able to access the PCIe interface.
>>>>> END QUOTE
>>>>>=20
>>>>> The register map indicates (with some extra notes added):
>>>>>=20
>>>>> 0-6:   DMA
>>>>> 7-10:  DMA LITE (65536 bytes limit, for example)
>>>>> 11-14: DMA4 (11 is special relative to "PCIe interface")
>>>>> ("DMA Channel 15 is exclusively used by the VPU.")
>>>>>=20
>>>>> Yet what I see in the head -r365932 code is:
>>>>>=20
>>>>> #define	BCM_DMA_CH_MAX		12
>>>>> . . .
>>>>> struct bcm_dma_softc {
>>>>>     device_t                sc_dev;
>>>>>     struct mtx              sc_mtx;
>>>>>     struct resource *       sc_mem;
>>>>>     struct resource *       sc_irq[BCM_DMA_CH_MAX];
>>>>>     void *                  sc_intrhand[BCM_DMA_CH_MAX];
>>>>>     struct bcm_dma_ch       sc_dma_ch[BCM_DMA_CH_MAX];
>>>>>     bus_dma_tag_t           sc_dma_tag;
>>>>> };
>>>>> . . .
>>>>>     err =3D bus_dma_tag_create(bus_get_dma_tag(dev),
>>>>>         1, 0, BUS_SPACE_MAXADDR_32BIT,
>>>>>         BUS_SPACE_MAXADDR, NULL, NULL,
>>>>>         sizeof(struct bcm_dma_cb), 1,
>>>>>         sizeof(struct bcm_dma_cb),
>>>>>         BUS_DMA_ALLOCNOW, NULL, NULL,
>>>>>         &sc->sc_dma_tag);
>>>>>=20
>>>>> As an example: does that deal with the likes of DMA LITE (so 7-10) =
"limiting
>>>>> the maximum transferable length to 65536 bytes"?
>>>>>=20
>>>>> As another example: Does it deal with the DMA4 (11-14) =
distinctions (if
>>>>> such were in use anyway)?
>>>>>=20
>>>>> For reference from the fdt print / :
>>>>>=20
>>>>> / {
>>>>> . . .
>>>>>     #address-cells =3D <0x00000002>;
>>>>>     #size-cells =3D <0x00000001>;
>>>>> . . .
>>>>>     soc {
>>>>>             compatible =3D "simple-bus";
>>>>>             #address-cells =3D <0x00000001>;
>>>>>             #size-cells =3D <0x00000001>;
>>>>> . . .
>>>>>             dma-ranges =3D <0xc0000000 0x00000000 0x00000000 =
0x40000000>;
>>>>> . . .
>>>>>             firmware {
>>>>>                     compatible =3D "raspberrypi,bcm2835-firmware", =
"simple-bus";
>>>>>                     mboxes =3D <0x0000001c>;
>>>>>                     dma-ranges;
>>>>> . . .
>>>>>     emmc2bus {
>>>>>             compatible =3D "simple-bus";
>>>>>             #address-cells =3D <0x00000002>;
>>>>>             #size-cells =3D <0x00000001>;
>>>>> . . .
>>>>>             dma-ranges =3D <0x00000000 0xc0000000 0x00000000 =
0x00000000 0x40000000>;
>>>>> . . .
>>>>>     scb {
>>>>>             compatible =3D "simple-bus";
>>>>>             #address-cells =3D <0x00000002>;
>>>>>             #size-cells =3D <0x00000002>;
>>>>> . . .
>>>>>             dma-ranges =3D <0x00000000 0x00000000 0x00000000 =
0x00000000 0x00000000 0xfc000000 0x00000001 0x00000000 0x00000001 =
0x00000000 0x00000001 0x00000000>;
>>>>> . . .
>>>>>             pcie@7d500000 {
>>>>>                     compatible =3D "brcm,bcm2711-pcie";
>>>>> . . .
>>>>>                     #address-cells =3D <0x00000003>;
>>>>> . . .
>>>>>                     #size-cells =3D <0x00000002>;
>>>>> . . .
>>>>>                     dma-ranges =3D <0x02000000 0x00000000 =
0x00000000 0x00000000 0x00000000 0x00000000 0xc0000000>;
>>>>> . . .
>>>>>     v3dbus {
>>>>>             compatible =3D "simple-bus";
>>>>>             #address-cells =3D <0x00000001>;
>>>>>             #size-cells =3D <0x00000002>;
>>>>> . . .
>>>>>             dma-ranges =3D <0x00000000 0x00000000 0x00000000 =
0x00000004 0x00000000>;
>>>>> . . .
>>>>=20
>>>> rpi_DATA_2711_1p0.pdf reports:
>>>> (I ignore 2D DMA transfer mode here.)
>>>>=20
>>>> For DMA engines 0-6: XLENGTH has bits 29:0
>>>> bits 31:30 are write as 0, read as do not care.
>>>> That would put maxsegsz as 2**30 =3D=3D 1,073,741,824
>>>> which matches a 1 GiByte space.
>>>>=20
>>>> For DMA LITE engines 7-10: XLENGTH has bit 15:0
>>>> bits 31:16 are write as 0, read as do not care.
>>>> That would put maxsegsz as 2**16 =3D=3D 65,536.
>>>>=20
>>>> For DMA4 engines 11-14: XLENGTH has bits 29:0
>>>> bits 31:30 are write as 0, read as do not care.
>>>> That would put maxsegsz as 2**30 =3D=3D 1,073,741,824
>>>> which is smaller than the 3 GiByte space associated
>>>> with xHCI.
>>=20
>> rpi_DATA_2711_1p0.pdf reports the following specifically for
>> DMA11-DMA14 (so the DMA4 engines) for what goes in the CB and
>> NEXT_CB ADDR fields:
>>=20
>> QUOTE
>> The address must be 256-bit aligned and so the bottom 5 bits of the =
byte address are discarded, i.e. write cb_byte_address[39:0]>>5 into the =
CB
>> END QUOTE
>>=20
>> This is not true for DMA0-DMA10 (DMA and DMA LITE).
>>=20
>> The following is extracted from various places to
>> bring them together. I do not see evidence of handling
>> the cb_byte_address[39:0]>>5 involved for DMA11-DMA14:
>>=20
>> #define ARMC_TO_VCBUS(pa)       bcm283x_armc_to_vcbus(pa)
>>=20
>> vm_paddr_t
>> bcm283x_armc_to_vcbus(vm_paddr_t pa)
>> {
>>       struct bcm283x_memory_soc_cfg *cfg;
>>       struct bcm283x_memory_mapping *map, *ment;
>>=20
>>       /* Guaranteed not NULL if we haven't panicked yet. */
>>       cfg =3D bcm283x_get_current_memcfg();
>>       map =3D cfg->memmap;
>>       for (ment =3D map; !BCM283X_MEMMAP_ISTERM(ment); ++ment) {
>>               if (pa >=3D ment->armc_start &&
>>                   pa < ment->armc_start + ment->armc_size) {
>>                       return (pa - ment->armc_start) + =
ment->vcbus_start;
>>               }
>>       }
>>=20
>>       /*
>>        * Assume 1:1 mapping for anything else, but complain about it =
on
>>        * verbose boots.
>>        */
>>       if (bootverbose)
>>               printf("bcm283x_vcbus: No armc -> vcbus mapping found: =
%jx\n",
>>                   (uintmax_t)pa);
>>       return (pa);
>> }
>>=20
>> static void
>> bcm_dmamap_cb(void *arg, bus_dma_segment_t *segs,
>>       int nseg, int err)
>> {
>>       bus_addr_t *addr;
>>=20
>>       if (err)
>>               return;
>>=20
>>       addr =3D (bus_addr_t*)arg;
>>       *addr =3D ARMC_TO_VCBUS(segs[0].ds_addr);
>> }
>>=20
>> Note ds_addr assignments in:
>>=20
>> static bus_size_t
>> _bus_dmamap_addseg(bus_dma_tag_t dmat, bus_dmamap_t map, bus_addr_t =
curaddr,
>>   bus_size_t sgsize, bus_dma_segment_t *segs, int *segp)
>> {
>>       bus_addr_t baddr, bmask;
>>       int seg;
>>=20
>>       /*
>>        * Make sure we don't cross any boundaries.
>>        */
>>       bmask =3D ~(dmat->common.boundary - 1);
>>       if (dmat->common.boundary > 0) {
>>               baddr =3D (curaddr + dmat->common.boundary) & bmask;
>>               if (sgsize > (baddr - curaddr))
>>                       sgsize =3D (baddr - curaddr);
>>       }
>>=20
>>       /*
>>        * Insert chunk into a segment, coalescing with
>>        * previous segment if possible.
>>        */
>>       seg =3D *segp;
>>       if (seg =3D=3D -1) {
>>               seg =3D 0;
>>               segs[seg].ds_addr =3D curaddr;
>>               segs[seg].ds_len =3D sgsize;
>>       } else {
>>               if (curaddr =3D=3D segs[seg].ds_addr + segs[seg].ds_len =
&&
>>                   (segs[seg].ds_len + sgsize) <=3D =
dmat->common.maxsegsz &&
>>                   (dmat->common.boundary =3D=3D 0 ||
>>                    (segs[seg].ds_addr & bmask) =3D=3D (curaddr & =
bmask)))
>>                       segs[seg].ds_len +=3D sgsize;
>>               else {
>>                       if (++seg >=3D dmat->common.nsegments)
>>                               return (0);
>>                       segs[seg].ds_addr =3D curaddr;
>>                       segs[seg].ds_len =3D sgsize;
>>               }
>>       }
>>       *segp =3D seg;
>>       return (sgsize);
>> }
>>=20
>>=20
>> Note cb_phys and ch->vc_cb in:
>>=20
>> static int
>> bcm_dma_init(device_t dev)
>> {
>> . . .
>>       /* setup initial settings */
>>       for (i =3D 0; i < BCM_DMA_CH_MAX; i++) {
>> . . .
>>               err =3D bus_dmamap_load(sc->sc_dma_tag, ch->dma_map, =
cb_virt,
>>                   sizeof(struct bcm_dma_cb), bcm_dmamap_cb, &cb_phys,
>>                   BUS_DMA_WAITOK);
>>               if (err) {
>>                       device_printf(dev, "cannot load DMA memory\n");
>>                       break;
>>               }
>>=20
>>               ch->cb =3D cb_virt;
>>               ch->vc_cb =3D cb_phys;
>> . . .
>>=20
>> int
>> bcm_dma_start(int ch, vm_paddr_t src, vm_paddr_t dst, int len)
>> {
>>       struct bcm_dma_softc *sc =3D bcm_dma_sc;
>>       struct bcm_dma_cb *cb;
>>=20
>>       if (ch < 0 || ch >=3D BCM_DMA_CH_MAX)
>>               return (-1);
>>=20
>>       if (!(sc->sc_dma_ch[ch].flags & BCM_DMA_CH_USED))
>>               return (-1);
>>=20
>>       cb =3D sc->sc_dma_ch[ch].cb;
>>       cb->src =3D ARMC_TO_VCBUS(src);
>>       cb->dst =3D ARMC_TO_VCBUS(dst);
>>=20
>>       cb->len =3D len;
>>=20
>>       bus_dmamap_sync(sc->sc_dma_tag,
>>           sc->sc_dma_ch[ch].dma_map, BUS_DMASYNC_PREWRITE);
>>=20
>>       bus_write_4(sc->sc_mem, BCM_DMA_CBADDR(ch),
>>           sc->sc_dma_ch[ch].vc_cb);
>>       bus_write_4(sc->sc_mem, BCM_DMA_CS(ch), CS_ACTIVE);
>>=20
>> #ifdef DEBUG
>>       bcm_dma_cb_dump(sc->sc_dma_ch[ch].cb);
>>       bcm_dma_reg_dump(ch);
>> #endif
>>=20
>>       return (0);
>> }
>>=20
>> It looks to me like FreeBSD is not set up to use the DMA4
>> engines (DMA11-DMA14) and happens to not use them for the
>> DTB that I get from u-boot.bin in my context.
>>=20
>> Of course, I may just have missed something in looking
>> around at the unfamiliar material.
>=20
>=20




=3D=3D=3D
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?903FE769-ED46-4FBC-A272-4D2C89A9CD7A>