Date: Tue, 29 Sep 2020 10:35:28 -0700 From: Mark Millard <marklmi@yahoo.com> To: Robert Crowston <crowston@protonmail.com>, freebsd-arm <freebsd-arm@freebsd.org> Subject: Re: RPi4B's DMA11 (DMA4 engine example) vs. xHCI/pcie Message-ID: <B440C8D8-AA02-49E4-A0D6-3EA9B7FFD13A@yahoo.com> In-Reply-To: <85FEDC51-B5B0-4ED4-A5ED-62B63EF9D5A8@yahoo.com> References: <8C6DE44F-6CE2-4C74-8748-3BBFB54AE183@yahoo.com> <0FE382AB-8DE3-4467-9CB0-E8582AC70EA2@yahoo.com> <85FEDC51-B5B0-4ED4-A5ED-62B63EF9D5A8@yahoo.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On 2020-Sep-28, at 21:45, Mark Millard <marklmi at yahoo.com> wrote: > On 2020-Sep-28, at 19:04, Mark Millard <marklmi at yahoo.com> wrote: >=20 >> On 2020-Sep-28, at 18:29, Mark Millard <marklmi at yahoo.com> wrote: >>>=20 >>>> [Be warned that the material is not familiar so I may need >>>> educating. THis is based ont he example context that I >>>> happen to have around.] >>>>=20 >>>> In the u-boot fdt print / output there are 2 distinct sets of dma = channel >>>> information, 1 for soc and 1 for scb, where the dma_tag values for = the two >>>> sets should be distinct as far as I can tell: >>>>=20 >>>> U-Boot> fdt address 0x7ef1000 >>>> U-Boot> fdt print / =20 >>>> / { >>>> . . . >>>> soc { >>>> dma@7e007000 { >>>> compatible =3D "brcm,bcm2835-dma"; >>>> reg =3D <0x7e007000 0x00000b00>; >>>> interrupts =3D * 0x0000000007ef645c = [0x00000084]; >>>> interrupt-names =3D "dma0", "dma1", "dma2", = "dma3", "dma4", "dma5", "dma6", "dma7", "dma8", "dma9", "dma10"; >>>> #dma-cells =3D <0x00000001>; >>>> brcm,dma-channel-mask =3D <0x000001f5>; >>>> phandle =3D <0x0000000b>; >>>> }; >>>>=20 >>>> scb { >>>> . . . >>>> dma@7e007b00 { >>>> compatible =3D "brcm,bcm2711-dma"; >>>> reg =3D <0x00000000 0x7e007b00 0x00000000 = 0x00000400>; >>>> interrupts =3D <0x00000000 0x00000059 = 0x00000004 0x00000000 0x0000005a 0x00000004 0x00000000 0x0000005b = 0x00000004 0x00000000 0x0000005c 0x00000004>; >>>> interrupt-names =3D "dma11", "dma12", "dma13", = "dma14"; >>>> #dma-cells =3D <0x00000001>; >>>> brcm,dma-channel-mask =3D <0x00007000>; >>>> phandle =3D <0x0000003d>; >>>> }; >>>> . . . I had presumed that the dma@7e007b00 would be processed. But I finally happened to search for "bcm2711-dma" in FreeBSD and it does not occur. That appears to mean that BCM_DMA_CH_MAX being 12 is depending on dma@7e007000's brcm,dma-channel-mask to avoid referencing number 11 that does not exist in that bcm2835-dma context. I think this makes what I wrote about DMA4 engines (the most capable ones) somewhat incoherent in the details but the basic not-supported-in-the-code and not-used status appears to be true. As for DMA0-DMA10 (bcm2835-dma), some DMA (0-6) vs. DMA LITE (7-10) distinctions not being handled (for example 65536 maxsegsz for DMA LITE) still looks to be true to me. >>>> So, 0 through 10 need the soc criteria (mix of DMA and DMA LITE = engine criteria) >>>> and 11 through 14 need the scb criteria (DMA4 engine criteria). = (I'm ignore >>>> dma-channel-mask's at this point.) >>>>=20 >>>>=20 >>>> I'll here note the code has: >>>>=20 >>>> #define BCM_DMA_CH_MAX 12 >>>>=20 >>>> for use in code like: >>>>=20 >>>> /* setup initial settings */ >>>> for (i =3D 0; i < BCM_DMA_CH_MAX; i++) { >>>> ch =3D &sc->sc_dma_ch[i]; >>>>=20 >>>> bzero(ch, sizeof(struct bcm_dma_ch)); >>>> ch->ch =3D i; >>>> ch->flags =3D BCM_DMA_CH_UNMAP; >>>>=20 >>>> if ((bcm_dma_channel_mask & (1 << i)) =3D=3D 0) >>>> continue; >>>> . . . >>>>=20 >>>> It looks to me like the only scb/DMA4-engine "dma11" is covered >>>> by such loops and that the "brcm,dma-channel-mask =3D <0x00007000>" >>>> means that dma11 will not be used. >>>>=20 >>>> So: No scb/DMA4 engine will be used??? (That could explain the >>>> 1 GiByte limit?) >>>>=20 >>>>=20 >>>> rpi_DATA_2711_1p0.pdf reports that soc/0-10 have 2 types (0-6 vs. = 7-10 >>>> as it turns out) as well as the scb/DM4-engines (11-14): >>>>=20 >>>> QUOTE (with omitted marked by ". . .") >>>> . . . >>>> The BCM2711 DMA Controller provides a total of 16 DMA channels. = Four of these are DMA Lite channels (with reduced performance and = features), and four of them are DMA4 channels (with increased = performance and a wider address range). >>>> . . . >>>> 4.5. DMA LITE Engines >>>>=20 >>>> Several of the DMA engines are of the LITE design. This is a = reduced specification engine designed to save space. The engine behaves = in the same way as a normal DMA engine except for the following = differences: >>>> . . . >>>> =E2=80=A2 The DMA length register is now 16 bits, limiting the = maximum transferable length to 65536 bytes. >>>> . . . >>>> 4.6. DMA4 Engines >>>>=20 >>>> Several of the DMA engines are of the DMA4 design. These have = higher performance due to their uncoupled read/write design and can = access up to 40 address bits. Unlike the other DMA engines they are also = capable of performing write bursts. Note that they directly access the = full 35-bit address bus of the BCM2711 and so bypass the paging = registers of the DMA and DMA Lite engines. >>>>=20 >>>> DMA channel 11 is additionally able to access the PCIe interface. >>>> END QUOTE >>>>=20 >>>> The register map indicates (with some extra notes added): >>>>=20 >>>> 0-6: DMA >>>> 7-10: DMA LITE (65536 bytes limit, for example) >>>> 11-14: DMA4 (11 is special relative to "PCIe interface") >>>> ("DMA Channel 15 is exclusively used by the VPU.") >>>>=20 >>>> Yet what I see in the head -r365932 code is: >>>>=20 >>>> #define BCM_DMA_CH_MAX 12 >>>> . . . >>>> struct bcm_dma_softc { >>>> device_t sc_dev; >>>> struct mtx sc_mtx; >>>> struct resource * sc_mem; >>>> struct resource * sc_irq[BCM_DMA_CH_MAX]; >>>> void * sc_intrhand[BCM_DMA_CH_MAX]; >>>> struct bcm_dma_ch sc_dma_ch[BCM_DMA_CH_MAX]; >>>> bus_dma_tag_t sc_dma_tag; >>>> }; >>>> . . . >>>> err =3D bus_dma_tag_create(bus_get_dma_tag(dev), >>>> 1, 0, BUS_SPACE_MAXADDR_32BIT, >>>> BUS_SPACE_MAXADDR, NULL, NULL, >>>> sizeof(struct bcm_dma_cb), 1, >>>> sizeof(struct bcm_dma_cb), >>>> BUS_DMA_ALLOCNOW, NULL, NULL, >>>> &sc->sc_dma_tag); >>>>=20 >>>> As an example: does that deal with the likes of DMA LITE (so 7-10) = "limiting >>>> the maximum transferable length to 65536 bytes"? >>>>=20 >>>> As another example: Does it deal with the DMA4 (11-14) distinctions = (if >>>> such were in use anyway)? >>>>=20 >>>> For reference from the fdt print / : >>>>=20 >>>> / { >>>> . . . >>>> #address-cells =3D <0x00000002>; >>>> #size-cells =3D <0x00000001>; >>>> . . . >>>> soc { >>>> compatible =3D "simple-bus"; >>>> #address-cells =3D <0x00000001>; >>>> #size-cells =3D <0x00000001>; >>>> . . . >>>> dma-ranges =3D <0xc0000000 0x00000000 0x00000000 = 0x40000000>; >>>> . . . >>>> firmware { >>>> compatible =3D "raspberrypi,bcm2835-firmware", = "simple-bus"; >>>> mboxes =3D <0x0000001c>; >>>> dma-ranges; >>>> . . . >>>> emmc2bus { >>>> compatible =3D "simple-bus"; >>>> #address-cells =3D <0x00000002>; >>>> #size-cells =3D <0x00000001>; >>>> . . . >>>> dma-ranges =3D <0x00000000 0xc0000000 0x00000000 = 0x00000000 0x40000000>; >>>> . . . >>>> scb { >>>> compatible =3D "simple-bus"; >>>> #address-cells =3D <0x00000002>; >>>> #size-cells =3D <0x00000002>; >>>> . . . >>>> dma-ranges =3D <0x00000000 0x00000000 0x00000000 = 0x00000000 0x00000000 0xfc000000 0x00000001 0x00000000 0x00000001 = 0x00000000 0x00000001 0x00000000>; >>>> . . . >>>> pcie@7d500000 { >>>> compatible =3D "brcm,bcm2711-pcie"; >>>> . . . >>>> #address-cells =3D <0x00000003>; >>>> . . . >>>> #size-cells =3D <0x00000002>; >>>> . . . >>>> dma-ranges =3D <0x02000000 0x00000000 = 0x00000000 0x00000000 0x00000000 0x00000000 0xc0000000>; >>>> . . . >>>> v3dbus { >>>> compatible =3D "simple-bus"; >>>> #address-cells =3D <0x00000001>; >>>> #size-cells =3D <0x00000002>; >>>> . . . >>>> dma-ranges =3D <0x00000000 0x00000000 0x00000000 = 0x00000004 0x00000000>; >>>> . . . >>>=20 >>> rpi_DATA_2711_1p0.pdf reports: >>> (I ignore 2D DMA transfer mode here.) >>>=20 >>> For DMA engines 0-6: XLENGTH has bits 29:0 >>> bits 31:30 are write as 0, read as do not care. >>> That would put maxsegsz as 2**30 =3D=3D 1,073,741,824 >>> which matches a 1 GiByte space. >>>=20 >>> For DMA LITE engines 7-10: XLENGTH has bit 15:0 >>> bits 31:16 are write as 0, read as do not care. >>> That would put maxsegsz as 2**16 =3D=3D 65,536. >>>=20 >>> For DMA4 engines 11-14: XLENGTH has bits 29:0 >>> bits 31:30 are write as 0, read as do not care. >>> That would put maxsegsz as 2**30 =3D=3D 1,073,741,824 >>> which is smaller than the 3 GiByte space associated >>> with xHCI. >=20 > rpi_DATA_2711_1p0.pdf reports the following specifically for > DMA11-DMA14 (so the DMA4 engines) for what goes in the CB and > NEXT_CB ADDR fields: >=20 > QUOTE > The address must be 256-bit aligned and so the bottom 5 bits of the = byte address are discarded, i.e. write cb_byte_address[39:0]>>5 into the = CB > END QUOTE >=20 > This is not true for DMA0-DMA10 (DMA and DMA LITE). >=20 > The following is extracted from various places to > bring them together. I do not see evidence of handling > the cb_byte_address[39:0]>>5 involved for DMA11-DMA14: >=20 > #define ARMC_TO_VCBUS(pa) bcm283x_armc_to_vcbus(pa) >=20 > vm_paddr_t > bcm283x_armc_to_vcbus(vm_paddr_t pa) > { > struct bcm283x_memory_soc_cfg *cfg; > struct bcm283x_memory_mapping *map, *ment; >=20 > /* Guaranteed not NULL if we haven't panicked yet. */ > cfg =3D bcm283x_get_current_memcfg(); > map =3D cfg->memmap; > for (ment =3D map; !BCM283X_MEMMAP_ISTERM(ment); ++ment) { > if (pa >=3D ment->armc_start && > pa < ment->armc_start + ment->armc_size) { > return (pa - ment->armc_start) + = ment->vcbus_start; > } > } >=20 > /* > * Assume 1:1 mapping for anything else, but complain about it = on > * verbose boots. > */ > if (bootverbose) > printf("bcm283x_vcbus: No armc -> vcbus mapping found: = %jx\n", > (uintmax_t)pa); > return (pa); > } >=20 > static void > bcm_dmamap_cb(void *arg, bus_dma_segment_t *segs, > int nseg, int err) > { > bus_addr_t *addr; >=20 > if (err) > return; >=20 > addr =3D (bus_addr_t*)arg; > *addr =3D ARMC_TO_VCBUS(segs[0].ds_addr); > } >=20 > Note ds_addr assignments in: >=20 > static bus_size_t > _bus_dmamap_addseg(bus_dma_tag_t dmat, bus_dmamap_t map, bus_addr_t = curaddr, > bus_size_t sgsize, bus_dma_segment_t *segs, int *segp) > { > bus_addr_t baddr, bmask; > int seg; >=20 > /* > * Make sure we don't cross any boundaries. > */ > bmask =3D ~(dmat->common.boundary - 1); > if (dmat->common.boundary > 0) { > baddr =3D (curaddr + dmat->common.boundary) & bmask; > if (sgsize > (baddr - curaddr)) > sgsize =3D (baddr - curaddr); > } >=20 > /* > * Insert chunk into a segment, coalescing with > * previous segment if possible. > */ > seg =3D *segp; > if (seg =3D=3D -1) { > seg =3D 0; > segs[seg].ds_addr =3D curaddr; > segs[seg].ds_len =3D sgsize; > } else { > if (curaddr =3D=3D segs[seg].ds_addr + segs[seg].ds_len = && > (segs[seg].ds_len + sgsize) <=3D = dmat->common.maxsegsz && > (dmat->common.boundary =3D=3D 0 || > (segs[seg].ds_addr & bmask) =3D=3D (curaddr & = bmask))) > segs[seg].ds_len +=3D sgsize; > else { > if (++seg >=3D dmat->common.nsegments) > return (0); > segs[seg].ds_addr =3D curaddr; > segs[seg].ds_len =3D sgsize; > } > } > *segp =3D seg; > return (sgsize); > } >=20 >=20 > Note cb_phys and ch->vc_cb in: >=20 > static int > bcm_dma_init(device_t dev) > { > . . . > /* setup initial settings */ > for (i =3D 0; i < BCM_DMA_CH_MAX; i++) { > . . . > err =3D bus_dmamap_load(sc->sc_dma_tag, ch->dma_map, = cb_virt, > sizeof(struct bcm_dma_cb), bcm_dmamap_cb, &cb_phys, > BUS_DMA_WAITOK); > if (err) { > device_printf(dev, "cannot load DMA memory\n"); > break; > } >=20 > ch->cb =3D cb_virt; > ch->vc_cb =3D cb_phys; > . . . >=20 > int > bcm_dma_start(int ch, vm_paddr_t src, vm_paddr_t dst, int len) > { > struct bcm_dma_softc *sc =3D bcm_dma_sc; > struct bcm_dma_cb *cb; >=20 > if (ch < 0 || ch >=3D BCM_DMA_CH_MAX) > return (-1); >=20 > if (!(sc->sc_dma_ch[ch].flags & BCM_DMA_CH_USED)) > return (-1); >=20 > cb =3D sc->sc_dma_ch[ch].cb; > cb->src =3D ARMC_TO_VCBUS(src); > cb->dst =3D ARMC_TO_VCBUS(dst); >=20 > cb->len =3D len; >=20 > bus_dmamap_sync(sc->sc_dma_tag, > sc->sc_dma_ch[ch].dma_map, BUS_DMASYNC_PREWRITE); >=20 > bus_write_4(sc->sc_mem, BCM_DMA_CBADDR(ch), > sc->sc_dma_ch[ch].vc_cb); > bus_write_4(sc->sc_mem, BCM_DMA_CS(ch), CS_ACTIVE); >=20 > #ifdef DEBUG > bcm_dma_cb_dump(sc->sc_dma_ch[ch].cb); > bcm_dma_reg_dump(ch); > #endif >=20 > return (0); > } >=20 > It looks to me like FreeBSD is not set up to use the DMA4 > engines (DMA11-DMA14) and happens to not use them for the > DTB that I get from u-boot.bin in my context. >=20 > Of course, I may just have missed something in looking > around at the unfamiliar material. =3D=3D=3D Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar)
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?B440C8D8-AA02-49E4-A0D6-3EA9B7FFD13A>