Date: Wed, 30 Sep 2020 11:13:06 -0700 From: Mark Millard <marklmi@yahoo.com> To: Robert Crowston <crowston@protonmail.com>, freebsd-arm <freebsd-arm@freebsd.org> Subject: Re: RPi4B's DMA11 (DMA4 engine example) vs. xHCI/pcie Message-ID: <903FE769-ED46-4FBC-A272-4D2C89A9CD7A@yahoo.com> In-Reply-To: <B440C8D8-AA02-49E4-A0D6-3EA9B7FFD13A@yahoo.com> References: <8C6DE44F-6CE2-4C74-8748-3BBFB54AE183@yahoo.com> <0FE382AB-8DE3-4467-9CB0-E8582AC70EA2@yahoo.com> <85FEDC51-B5B0-4ED4-A5ED-62B63EF9D5A8@yahoo.com> <B440C8D8-AA02-49E4-A0D6-3EA9B7FFD13A@yahoo.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On 2020-Sep-29, at 10:35, Mark Millard <marklmi at yahoo.com> wrote: > On 2020-Sep-28, at 21:45, Mark Millard <marklmi at yahoo.com> wrote: >=20 >> On 2020-Sep-28, at 19:04, Mark Millard <marklmi at yahoo.com> wrote: >>=20 >>> On 2020-Sep-28, at 18:29, Mark Millard <marklmi at yahoo.com> wrote: >>>>=20 >>>>> [Be warned that the material is not familiar so I may need >>>>> educating. THis is based ont he example context that I >>>>> happen to have around.] >>>>>=20 >>>>> In the u-boot fdt print / output there are 2 distinct sets of dma = channel >>>>> information, 1 for soc and 1 for scb, where the dma_tag values for = the two >>>>> sets should be distinct as far as I can tell: >>>>>=20 >>>>> U-Boot> fdt address 0x7ef1000 >>>>> U-Boot> fdt print / =20 >>>>> / { >>>>> . . . >>>>> soc { >>>>> dma@7e007000 { >>>>> compatible =3D "brcm,bcm2835-dma"; >>>>> reg =3D <0x7e007000 0x00000b00>; >>>>> interrupts =3D * 0x0000000007ef645c = [0x00000084]; >>>>> interrupt-names =3D "dma0", "dma1", "dma2", = "dma3", "dma4", "dma5", "dma6", "dma7", "dma8", "dma9", "dma10"; >>>>> #dma-cells =3D <0x00000001>; >>>>> brcm,dma-channel-mask =3D <0x000001f5>; >>>>> phandle =3D <0x0000000b>; >>>>> }; >>>>>=20 >>>>> scb { >>>>> . . . >>>>> dma@7e007b00 { >>>>> compatible =3D "brcm,bcm2711-dma"; >>>>> reg =3D <0x00000000 0x7e007b00 0x00000000 = 0x00000400>; >>>>> interrupts =3D <0x00000000 0x00000059 = 0x00000004 0x00000000 0x0000005a 0x00000004 0x00000000 0x0000005b = 0x00000004 0x00000000 0x0000005c 0x00000004>; >>>>> interrupt-names =3D "dma11", "dma12", "dma13", = "dma14"; >>>>> #dma-cells =3D <0x00000001>; >>>>> brcm,dma-channel-mask =3D <0x00007000>; >>>>> phandle =3D <0x0000003d>; >>>>> }; >>>>> . . . >=20 > I had presumed that the dma@7e007b00 would be processed. But > I finally happened to search for "bcm2711-dma" in FreeBSD and > it does not occur. >=20 > That appears to mean that BCM_DMA_CH_MAX being 12 is depending > on dma@7e007000's brcm,dma-channel-mask to avoid referencing > number 11 that does not exist in that bcm2835-dma context. >=20 > I think this makes what I wrote about DMA4 engines (the most > capable ones) somewhat incoherent in the details but the basic > not-supported-in-the-code and not-used status appears to be > true. >=20 > As for DMA0-DMA10 (bcm2835-dma), some DMA (0-6) vs. DMA LITE > (7-10) distinctions not being handled (for example 65536 > maxsegsz for DMA LITE) still looks to be true to me. Looks like FreeBSD is limited to 32-bit via = usb/controller/generic_xhci.c has nothing explicit for other than 32 address lines (and overall the only alternative is 64 address lines): #define IS_DMA_32B 1 int generic_xhci_attach(device_t dev) { . . . err =3D xhci_init(sc, dev, IS_DMA_32B); if (err !=3D 0) { device_printf(dev, "Failed to init XHCI, with error = %d\n", err); generic_xhci_detach(dev); return (ENXIO); } . . . /* * The following structure describes the parent USB DMA tag. */ #if USB_HAVE_BUSDMA struct usb_dma_parent_tag { . . . uint8_t dma_bits; /* number of DMA address lines = */ . . . }; #else struct usb_dma_parent_tag {}; /* empty struct */ #endif . . . usb_error_t xhci_init(struct xhci_softc *sc, device_t self, uint8_t dma32) { . . . /* get DMA bits */ sc->sc_bus.dma_bits =3D (XHCI_HCS0_AC64(temp) && xhcidma32 =3D=3D 0 && dma32 =3D=3D 0) ? 64 : 32; . . . Overall it looks like a bunch of places would need changes to support the RPi4B's 3 GiByte capability. (Probably more than I've discovered, ignoring things like DMA4 engine use to get write bursts and the like.) I will note that I found code in NetBSD that classifies "normal" DMA engines vs. DMA LITE engines (via testing a debug register) for bcm2835-dma and only requests normal DMA engines be used, skipping DMA LITE. (This is for DTB/fdt contexts I think. I've not done as well figuring out even such narrow aspects of ACPI handling of things.) This tends to confirm my worries over FreeBSD's bcm2835-dma handling of the DMA LITE engines existing but being less capable. >>>>> So, 0 through 10 need the soc criteria (mix of DMA and DMA LITE = engine criteria) >>>>> and 11 through 14 need the scb criteria (DMA4 engine criteria). = (I'm ignore >>>>> dma-channel-mask's at this point.) >>>>>=20 >>>>>=20 >>>>> I'll here note the code has: >>>>>=20 >>>>> #define BCM_DMA_CH_MAX 12 >>>>>=20 >>>>> for use in code like: >>>>>=20 >>>>> /* setup initial settings */ >>>>> for (i =3D 0; i < BCM_DMA_CH_MAX; i++) { >>>>> ch =3D &sc->sc_dma_ch[i]; >>>>>=20 >>>>> bzero(ch, sizeof(struct bcm_dma_ch)); >>>>> ch->ch =3D i; >>>>> ch->flags =3D BCM_DMA_CH_UNMAP; >>>>>=20 >>>>> if ((bcm_dma_channel_mask & (1 << i)) =3D=3D 0) >>>>> continue; >>>>> . . . >>>>>=20 >>>>> It looks to me like the only scb/DMA4-engine "dma11" is covered >>>>> by such loops and that the "brcm,dma-channel-mask =3D = <0x00007000>" >>>>> means that dma11 will not be used. >>>>>=20 >>>>> So: No scb/DMA4 engine will be used??? (That could explain the >>>>> 1 GiByte limit?) >>>>>=20 >>>>>=20 >>>>> rpi_DATA_2711_1p0.pdf reports that soc/0-10 have 2 types (0-6 vs. = 7-10 >>>>> as it turns out) as well as the scb/DM4-engines (11-14): >>>>>=20 >>>>> QUOTE (with omitted marked by ". . .") >>>>> . . . >>>>> The BCM2711 DMA Controller provides a total of 16 DMA channels. = Four of these are DMA Lite channels (with reduced performance and = features), and four of them are DMA4 channels (with increased = performance and a wider address range). >>>>> . . . >>>>> 4.5. DMA LITE Engines >>>>>=20 >>>>> Several of the DMA engines are of the LITE design. This is a = reduced specification engine designed to save space. The engine behaves = in the same way as a normal DMA engine except for the following = differences: >>>>> . . . >>>>> =E2=80=A2 The DMA length register is now 16 bits, limiting the = maximum transferable length to 65536 bytes. >>>>> . . . >>>>> 4.6. DMA4 Engines >>>>>=20 >>>>> Several of the DMA engines are of the DMA4 design. These have = higher performance due to their uncoupled read/write design and can = access up to 40 address bits. Unlike the other DMA engines they are also = capable of performing write bursts. Note that they directly access the = full 35-bit address bus of the BCM2711 and so bypass the paging = registers of the DMA and DMA Lite engines. >>>>>=20 >>>>> DMA channel 11 is additionally able to access the PCIe interface. >>>>> END QUOTE >>>>>=20 >>>>> The register map indicates (with some extra notes added): >>>>>=20 >>>>> 0-6: DMA >>>>> 7-10: DMA LITE (65536 bytes limit, for example) >>>>> 11-14: DMA4 (11 is special relative to "PCIe interface") >>>>> ("DMA Channel 15 is exclusively used by the VPU.") >>>>>=20 >>>>> Yet what I see in the head -r365932 code is: >>>>>=20 >>>>> #define BCM_DMA_CH_MAX 12 >>>>> . . . >>>>> struct bcm_dma_softc { >>>>> device_t sc_dev; >>>>> struct mtx sc_mtx; >>>>> struct resource * sc_mem; >>>>> struct resource * sc_irq[BCM_DMA_CH_MAX]; >>>>> void * sc_intrhand[BCM_DMA_CH_MAX]; >>>>> struct bcm_dma_ch sc_dma_ch[BCM_DMA_CH_MAX]; >>>>> bus_dma_tag_t sc_dma_tag; >>>>> }; >>>>> . . . >>>>> err =3D bus_dma_tag_create(bus_get_dma_tag(dev), >>>>> 1, 0, BUS_SPACE_MAXADDR_32BIT, >>>>> BUS_SPACE_MAXADDR, NULL, NULL, >>>>> sizeof(struct bcm_dma_cb), 1, >>>>> sizeof(struct bcm_dma_cb), >>>>> BUS_DMA_ALLOCNOW, NULL, NULL, >>>>> &sc->sc_dma_tag); >>>>>=20 >>>>> As an example: does that deal with the likes of DMA LITE (so 7-10) = "limiting >>>>> the maximum transferable length to 65536 bytes"? >>>>>=20 >>>>> As another example: Does it deal with the DMA4 (11-14) = distinctions (if >>>>> such were in use anyway)? >>>>>=20 >>>>> For reference from the fdt print / : >>>>>=20 >>>>> / { >>>>> . . . >>>>> #address-cells =3D <0x00000002>; >>>>> #size-cells =3D <0x00000001>; >>>>> . . . >>>>> soc { >>>>> compatible =3D "simple-bus"; >>>>> #address-cells =3D <0x00000001>; >>>>> #size-cells =3D <0x00000001>; >>>>> . . . >>>>> dma-ranges =3D <0xc0000000 0x00000000 0x00000000 = 0x40000000>; >>>>> . . . >>>>> firmware { >>>>> compatible =3D "raspberrypi,bcm2835-firmware", = "simple-bus"; >>>>> mboxes =3D <0x0000001c>; >>>>> dma-ranges; >>>>> . . . >>>>> emmc2bus { >>>>> compatible =3D "simple-bus"; >>>>> #address-cells =3D <0x00000002>; >>>>> #size-cells =3D <0x00000001>; >>>>> . . . >>>>> dma-ranges =3D <0x00000000 0xc0000000 0x00000000 = 0x00000000 0x40000000>; >>>>> . . . >>>>> scb { >>>>> compatible =3D "simple-bus"; >>>>> #address-cells =3D <0x00000002>; >>>>> #size-cells =3D <0x00000002>; >>>>> . . . >>>>> dma-ranges =3D <0x00000000 0x00000000 0x00000000 = 0x00000000 0x00000000 0xfc000000 0x00000001 0x00000000 0x00000001 = 0x00000000 0x00000001 0x00000000>; >>>>> . . . >>>>> pcie@7d500000 { >>>>> compatible =3D "brcm,bcm2711-pcie"; >>>>> . . . >>>>> #address-cells =3D <0x00000003>; >>>>> . . . >>>>> #size-cells =3D <0x00000002>; >>>>> . . . >>>>> dma-ranges =3D <0x02000000 0x00000000 = 0x00000000 0x00000000 0x00000000 0x00000000 0xc0000000>; >>>>> . . . >>>>> v3dbus { >>>>> compatible =3D "simple-bus"; >>>>> #address-cells =3D <0x00000001>; >>>>> #size-cells =3D <0x00000002>; >>>>> . . . >>>>> dma-ranges =3D <0x00000000 0x00000000 0x00000000 = 0x00000004 0x00000000>; >>>>> . . . >>>>=20 >>>> rpi_DATA_2711_1p0.pdf reports: >>>> (I ignore 2D DMA transfer mode here.) >>>>=20 >>>> For DMA engines 0-6: XLENGTH has bits 29:0 >>>> bits 31:30 are write as 0, read as do not care. >>>> That would put maxsegsz as 2**30 =3D=3D 1,073,741,824 >>>> which matches a 1 GiByte space. >>>>=20 >>>> For DMA LITE engines 7-10: XLENGTH has bit 15:0 >>>> bits 31:16 are write as 0, read as do not care. >>>> That would put maxsegsz as 2**16 =3D=3D 65,536. >>>>=20 >>>> For DMA4 engines 11-14: XLENGTH has bits 29:0 >>>> bits 31:30 are write as 0, read as do not care. >>>> That would put maxsegsz as 2**30 =3D=3D 1,073,741,824 >>>> which is smaller than the 3 GiByte space associated >>>> with xHCI. >>=20 >> rpi_DATA_2711_1p0.pdf reports the following specifically for >> DMA11-DMA14 (so the DMA4 engines) for what goes in the CB and >> NEXT_CB ADDR fields: >>=20 >> QUOTE >> The address must be 256-bit aligned and so the bottom 5 bits of the = byte address are discarded, i.e. write cb_byte_address[39:0]>>5 into the = CB >> END QUOTE >>=20 >> This is not true for DMA0-DMA10 (DMA and DMA LITE). >>=20 >> The following is extracted from various places to >> bring them together. I do not see evidence of handling >> the cb_byte_address[39:0]>>5 involved for DMA11-DMA14: >>=20 >> #define ARMC_TO_VCBUS(pa) bcm283x_armc_to_vcbus(pa) >>=20 >> vm_paddr_t >> bcm283x_armc_to_vcbus(vm_paddr_t pa) >> { >> struct bcm283x_memory_soc_cfg *cfg; >> struct bcm283x_memory_mapping *map, *ment; >>=20 >> /* Guaranteed not NULL if we haven't panicked yet. */ >> cfg =3D bcm283x_get_current_memcfg(); >> map =3D cfg->memmap; >> for (ment =3D map; !BCM283X_MEMMAP_ISTERM(ment); ++ment) { >> if (pa >=3D ment->armc_start && >> pa < ment->armc_start + ment->armc_size) { >> return (pa - ment->armc_start) + = ment->vcbus_start; >> } >> } >>=20 >> /* >> * Assume 1:1 mapping for anything else, but complain about it = on >> * verbose boots. >> */ >> if (bootverbose) >> printf("bcm283x_vcbus: No armc -> vcbus mapping found: = %jx\n", >> (uintmax_t)pa); >> return (pa); >> } >>=20 >> static void >> bcm_dmamap_cb(void *arg, bus_dma_segment_t *segs, >> int nseg, int err) >> { >> bus_addr_t *addr; >>=20 >> if (err) >> return; >>=20 >> addr =3D (bus_addr_t*)arg; >> *addr =3D ARMC_TO_VCBUS(segs[0].ds_addr); >> } >>=20 >> Note ds_addr assignments in: >>=20 >> static bus_size_t >> _bus_dmamap_addseg(bus_dma_tag_t dmat, bus_dmamap_t map, bus_addr_t = curaddr, >> bus_size_t sgsize, bus_dma_segment_t *segs, int *segp) >> { >> bus_addr_t baddr, bmask; >> int seg; >>=20 >> /* >> * Make sure we don't cross any boundaries. >> */ >> bmask =3D ~(dmat->common.boundary - 1); >> if (dmat->common.boundary > 0) { >> baddr =3D (curaddr + dmat->common.boundary) & bmask; >> if (sgsize > (baddr - curaddr)) >> sgsize =3D (baddr - curaddr); >> } >>=20 >> /* >> * Insert chunk into a segment, coalescing with >> * previous segment if possible. >> */ >> seg =3D *segp; >> if (seg =3D=3D -1) { >> seg =3D 0; >> segs[seg].ds_addr =3D curaddr; >> segs[seg].ds_len =3D sgsize; >> } else { >> if (curaddr =3D=3D segs[seg].ds_addr + segs[seg].ds_len = && >> (segs[seg].ds_len + sgsize) <=3D = dmat->common.maxsegsz && >> (dmat->common.boundary =3D=3D 0 || >> (segs[seg].ds_addr & bmask) =3D=3D (curaddr & = bmask))) >> segs[seg].ds_len +=3D sgsize; >> else { >> if (++seg >=3D dmat->common.nsegments) >> return (0); >> segs[seg].ds_addr =3D curaddr; >> segs[seg].ds_len =3D sgsize; >> } >> } >> *segp =3D seg; >> return (sgsize); >> } >>=20 >>=20 >> Note cb_phys and ch->vc_cb in: >>=20 >> static int >> bcm_dma_init(device_t dev) >> { >> . . . >> /* setup initial settings */ >> for (i =3D 0; i < BCM_DMA_CH_MAX; i++) { >> . . . >> err =3D bus_dmamap_load(sc->sc_dma_tag, ch->dma_map, = cb_virt, >> sizeof(struct bcm_dma_cb), bcm_dmamap_cb, &cb_phys, >> BUS_DMA_WAITOK); >> if (err) { >> device_printf(dev, "cannot load DMA memory\n"); >> break; >> } >>=20 >> ch->cb =3D cb_virt; >> ch->vc_cb =3D cb_phys; >> . . . >>=20 >> int >> bcm_dma_start(int ch, vm_paddr_t src, vm_paddr_t dst, int len) >> { >> struct bcm_dma_softc *sc =3D bcm_dma_sc; >> struct bcm_dma_cb *cb; >>=20 >> if (ch < 0 || ch >=3D BCM_DMA_CH_MAX) >> return (-1); >>=20 >> if (!(sc->sc_dma_ch[ch].flags & BCM_DMA_CH_USED)) >> return (-1); >>=20 >> cb =3D sc->sc_dma_ch[ch].cb; >> cb->src =3D ARMC_TO_VCBUS(src); >> cb->dst =3D ARMC_TO_VCBUS(dst); >>=20 >> cb->len =3D len; >>=20 >> bus_dmamap_sync(sc->sc_dma_tag, >> sc->sc_dma_ch[ch].dma_map, BUS_DMASYNC_PREWRITE); >>=20 >> bus_write_4(sc->sc_mem, BCM_DMA_CBADDR(ch), >> sc->sc_dma_ch[ch].vc_cb); >> bus_write_4(sc->sc_mem, BCM_DMA_CS(ch), CS_ACTIVE); >>=20 >> #ifdef DEBUG >> bcm_dma_cb_dump(sc->sc_dma_ch[ch].cb); >> bcm_dma_reg_dump(ch); >> #endif >>=20 >> return (0); >> } >>=20 >> It looks to me like FreeBSD is not set up to use the DMA4 >> engines (DMA11-DMA14) and happens to not use them for the >> DTB that I get from u-boot.bin in my context. >>=20 >> Of course, I may just have missed something in looking >> around at the unfamiliar material. >=20 >=20 =3D=3D=3D Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar)
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?903FE769-ED46-4FBC-A272-4D2C89A9CD7A>