From owner-freebsd-arm@freebsd.org Wed Sep 30 21:15:47 2020 Return-Path: Delivered-To: freebsd-arm@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 30949431518 for ; Wed, 30 Sep 2020 21:15:47 +0000 (UTC) (envelope-from crowston@protonmail.com) Received: from mail-40136.protonmail.ch (mail-40136.protonmail.ch [185.70.40.136]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "protonmail.com", Issuer "SwissSign Server Gold CA 2014 - G22" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4C1pvQ0crFz3Ybb for ; Wed, 30 Sep 2020 21:15:45 +0000 (UTC) (envelope-from crowston@protonmail.com) Date: Wed, 30 Sep 2020 21:15:30 +0000 To: Mark Millard From: Robert Crowston Cc: freebsd-arm Reply-To: Robert Crowston Subject: Re: RPi4B's DMA11 (DMA4 engine example) vs. xHCI/pcie Message-ID: In-Reply-To: <903FE769-ED46-4FBC-A272-4D2C89A9CD7A@yahoo.com> References: <8C6DE44F-6CE2-4C74-8748-3BBFB54AE183@yahoo.com> <0FE382AB-8DE3-4467-9CB0-E8582AC70EA2@yahoo.com> <85FEDC51-B5B0-4ED4-A5ED-62B63EF9D5A8@yahoo.com> <903FE769-ED46-4FBC-A272-4D2C89A9CD7A@yahoo.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-1.2 required=10.0 tests=ALL_TRUSTED,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM shortcircuit=no autolearn=disabled version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on mailout.protonmail.ch X-Rspamd-Queue-Id: 4C1pvQ0crFz3Ybb X-Spamd-Bar: --- X-Spamd-Result: default: False [-3.72 / 15.00]; HAS_REPLYTO(0.00)[crowston@protonmail.com]; FREEMAIL_FROM(0.00)[protonmail.com]; R_SPF_ALLOW(-0.20)[+ip4:185.70.40.0/24]; TO_DN_ALL(0.00)[]; DKIM_TRACE(0.00)[protonmail.com:+]; RCPT_COUNT_TWO(0.00)[2]; DMARC_POLICY_ALLOW(-0.50)[protonmail.com,quarantine]; NEURAL_HAM_SHORT(-0.72)[-0.717]; FREEMAIL_TO(0.00)[yahoo.com]; RCVD_COUNT_ZERO(0.00)[0]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; FREEMAIL_ENVFROM(0.00)[protonmail.com]; ASN(0.00)[asn:62371, ipnet:185.70.40.0/24, country:CH]; MID_RHS_MATCH_FROM(0.00)[]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-1.02)[-1.021]; R_DKIM_ALLOW(-0.20)[protonmail.com:s=protonmail]; REPLYTO_EQ_FROM(0.00)[]; FROM_HAS_DN(0.00)[]; NEURAL_HAM_LONG(-0.99)[-0.985]; MIME_GOOD(-0.10)[text/plain]; FREEMAIL_REPLYTO(0.00)[protonmail.com]; TO_MATCH_ENVRCPT_SOME(0.00)[]; RCVD_IN_DNSWL_NONE(0.00)[185.70.40.136:from]; RWL_MAILSPIKE_POSSIBLE(0.00)[185.70.40.136:from]; MAILMAN_DEST(0.00)[freebsd-arm] X-BeenThere: freebsd-arm@freebsd.org X-Mailman-Version: 2.1.33 Precedence: list List-Id: "Porting FreeBSD to ARM processors." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 30 Sep 2020 21:15:47 -0000 Very interesting analysis. Certainly uncovered a few things I wasn't aware = of. By default sc->sc_bus.dma_bits in xhci_init is 64 bits; I toggle it back to= 32 bits in the xhci shim I wrote for the Pi 4. You can see that output in = a verbose dmesg. =E2=80=94 RHC. =E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90 Original Me= ssage =E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90 On Wednesday, 30 September 2020 19:13, Mark Millard wro= te: > > > On 2020-Sep-29, at 10:35, Mark Millard wrote: > > > On 2020-Sep-28, at 21:45, Mark Millard wrote: > > > > > On 2020-Sep-28, at 19:04, Mark Millard wrote: > > > > > > > On 2020-Sep-28, at 18:29, Mark Millard wrote= : > > > > > > > > > > [Be warned that the material is not familiar so I may need > > > > > > educating. THis is based ont he example context that I > > > > > > happen to have around.] > > > > > > In the u-boot fdt print / output there are 2 distinct sets of d= ma channel > > > > > > information, 1 for soc and 1 for scb, where the dma_tag values = for the two > > > > > > sets should be distinct as far as I can tell: > > > > > > U-Boot> fdt address 0x7ef1000 > > > > > > U-Boot> fdt print / > > > > > > / { > > > > > > . . . > > > > > > soc { > > > > > > dma@7e007000 { > > > > > > compatible =3D "brcm,bcm2835-dma"; > > > > > > reg =3D <0x7e007000 0x00000b00>; > > > > > > interrupts =3D * 0x0000000007ef645c [0x00000084]; > > > > > > interrupt-names =3D "dma0", "dma1", "dma2", "dma3", "dma4", "dm= a5", "dma6", "dma7", "dma8", "dma9", "dma10"; > > > > > > #dma-cells =3D <0x00000001>; > > > > > > brcm,dma-channel-mask =3D <0x000001f5>; > > > > > > phandle =3D <0x0000000b>; > > > > > > }; > > > > > > > > > > > > scb { > > > > > > > > > > > > > > > > > > . . . > > > > > > dma@7e007b00 { > > > > > > compatible =3D "brcm,bcm2711-dma"; > > > > > > reg =3D <0x00000000 0x7e007b00 0x00000000 0x00000400>; > > > > > > interrupts =3D <0x00000000 0x00000059 0x00000004 0x00000000 0x0= 000005a 0x00000004 0x00000000 0x0000005b 0x00000004 0x00000000 0x0000005c 0= x00000004>; > > > > > > interrupt-names =3D "dma11", "dma12", "dma13", "dma14"; > > > > > > #dma-cells =3D <0x00000001>; > > > > > > brcm,dma-channel-mask =3D <0x00007000>; > > > > > > phandle =3D <0x0000003d>; > > > > > > }; > > > > > > . . . > > > > I had presumed that the dma@7e007b00 would be processed. But > > I finally happened to search for "bcm2711-dma" in FreeBSD and > > it does not occur. > > That appears to mean that BCM_DMA_CH_MAX being 12 is depending > > on dma@7e007000's brcm,dma-channel-mask to avoid referencing > > number 11 that does not exist in that bcm2835-dma context. > > I think this makes what I wrote about DMA4 engines (the most > > capable ones) somewhat incoherent in the details but the basic > > not-supported-in-the-code and not-used status appears to be > > true. > > As for DMA0-DMA10 (bcm2835-dma), some DMA (0-6) vs. DMA LITE > > (7-10) distinctions not being handled (for example 65536 > > maxsegsz for DMA LITE) still looks to be true to me. > > Looks like FreeBSD is limited to 32-bit via usb/controller/generic_xhci.c > has nothing explicit for other than 32 address lines (and overall the > only alternative is 64 address lines): > > #define IS_DMA_32B 1 > > int > generic_xhci_attach(device_t dev) > { > . . . > err =3D xhci_init(sc, dev, IS_DMA_32B); > if (err !=3D 0) { > device_printf(dev, "Failed to init XHCI, with error %d\n", err); > generic_xhci_detach(dev); > return (ENXIO); > } > . . . > /* > > - The following structure describes the parent USB DMA tag. > / > #if USB_HAVE_BUSDMA > struct usb_dma_parent_tag { > . . . > uint8_t dma_bits; / number of DMA address lines / > . . . > }; > #else > struct usb_dma_parent_tag {}; / empty struct */#endif > . . . > usb_error_t > xhci_init(struct xhci_softc sc, device_t self, uint8_t dma32) > { > . . . > / get DMA bits */sc->sc_bus.dma_bits =3D (XHCI_HCS0_AC64(temp) && > > xhcidma32 =3D=3D 0 && dma32 =3D=3D 0) ? 64 : 32; > > > > . . . > > Overall it looks like a bunch of places would need changes to > support the RPi4B's 3 GiByte capability. (Probably more than > I've discovered, ignoring things like DMA4 engine use to get > write bursts and the like.) > > I will note that I found code in NetBSD that classifies "normal" > DMA engines vs. DMA LITE engines (via testing a debug register) > for bcm2835-dma and only requests normal DMA engines be used, > skipping DMA LITE. (This is for DTB/fdt contexts I think. I've > not done as well figuring out even such narrow aspects of ACPI > handling of things.) This tends to confirm my worries over > FreeBSD's bcm2835-dma handling of the DMA LITE engines existing > but being less capable. > > > > > > > So, 0 through 10 need the soc criteria (mix of DMA and DMA LITE= engine criteria) > > > > > > and 11 through 14 need the scb criteria (DMA4 engine criteria).= (I'm ignore > > > > > > dma-channel-mask's at this point.) > > > > > > I'll here note the code has: > > > > > > #define BCM_DMA_CH_MAX 12 > > > > > > for use in code like: > > > > > > > > > > > > /* setup initial settings */ > > > > > > for (i =3D 0; i < BCM_DMA_CH_MAX; i++) { > > > > > > ch =3D &sc->sc_dma_ch[i]; > > > > > > > > > > > > bzero(ch, sizeof(struct bcm_dma_ch)); > > > > > > ch->ch =3D i; > > > > > > ch->flags =3D BCM_DMA_CH_UNMAP; > > > > > > > > > > > > if ((bcm_dma_channel_mask & (1 << i)) =3D=3D 0) > > > > > > continue; > > > > > > > > > > > > > > > > > > . . . > > > > > > It looks to me like the only scb/DMA4-engine "dma11" is covered > > > > > > by such loops and that the "brcm,dma-channel-mask =3D <0x000070= 00>" > > > > > > means that dma11 will not be used. > > > > > > So: No scb/DMA4 engine will be used??? (That could explain the > > > > > > 1 GiByte limit?) > > > > > > rpi_DATA_2711_1p0.pdf reports that soc/0-10 have 2 types (0-6 v= s. 7-10 > > > > > > as it turns out) as well as the scb/DM4-engines (11-14): > > > > > > QUOTE (with omitted marked by ". . .") > > > > > > . . . > > > > > > The BCM2711 DMA Controller provides a total of 16 DMA channels.= Four of these are DMA Lite channels (with reduced performance and features= ), and four of them are DMA4 channels (with increased performance and a wid= er address range). > > > > > > . . . > > > > > > 4.5. DMA LITE Engines > > > > > > Several of the DMA engines are of the LITE design. This is a re= duced specification engine designed to save space. The engine behaves in th= e same way as a normal DMA engine except for the following differences: > > > > > > . . . > > > > > > =E2=80=A2 The DMA length register is now 16 bits, limiting the = maximum transferable length to 65536 bytes. > > > > > > . . . > > > > > > 4.6. DMA4 Engines > > > > > > Several of the DMA engines are of the DMA4 design. These have h= igher performance due to their uncoupled read/write design and can access u= p to 40 address bits. Unlike the other DMA engines they are also capable of= performing write bursts. Note that they directly access the full 35-bit ad= dress bus of the BCM2711 and so bypass the paging registers of the DMA and = DMA Lite engines. > > > > > > DMA channel 11 is additionally able to access the PCIe interfac= e. > > > > > > END QUOTE > > > > > > The register map indicates (with some extra notes added): > > > > > > 0-6: DMA > > > > > > 7-10: DMA LITE (65536 bytes limit, for example) > > > > > > 11-14: DMA4 (11 is special relative to "PCIe interface") > > > > > > ("DMA Channel 15 is exclusively used by the VPU.") > > > > > > Yet what I see in the head -r365932 code is: > > > > > > #define BCM_DMA_CH_MAX 12 > > > > > > . . . > > > > > > struct bcm_dma_softc { > > > > > > device_t sc_dev; > > > > > > struct mtx sc_mtx; > > > > > > struct resource * sc_mem; > > > > > > struct resource * sc_irq[BCM_DMA_CH_MAX]; > > > > > > void * sc_intrhand[BCM_DMA_CH_MAX]; > > > > > > struct bcm_dma_ch sc_dma_ch[BCM_DMA_CH_MAX]; > > > > > > bus_dma_tag_t sc_dma_tag; > > > > > > }; > > > > > > . . . > > > > > > err =3D bus_dma_tag_create(bus_get_dma_tag(dev), > > > > > > 1, 0, BUS_SPACE_MAXADDR_32BIT, > > > > > > BUS_SPACE_MAXADDR, NULL, NULL, > > > > > > sizeof(struct bcm_dma_cb), 1, > > > > > > sizeof(struct bcm_dma_cb), > > > > > > BUS_DMA_ALLOCNOW, NULL, NULL, > > > > > > &sc->sc_dma_tag); > > > > > > As an example: does that deal with the likes of DMA LITE (so 7-= 10) "limiting > > > > > > the maximum transferable length to 65536 bytes"? > > > > > > As another example: Does it deal with the DMA4 (11-14) distinct= ions (if > > > > > > such were in use anyway)? > > > > > > For reference from the fdt print / : > > > > > > / { > > > > > > . . . > > > > > > #address-cells =3D <0x00000002>; > > > > > > #size-cells =3D <0x00000001>; > > > > > > . . . > > > > > > soc { > > > > > > compatible =3D "simple-bus"; > > > > > > #address-cells =3D <0x00000001>; > > > > > > #size-cells =3D <0x00000001>; > > > > > > . . . > > > > > > dma-ranges =3D <0xc0000000 0x00000000 0x00000000 0x40000000>; > > > > > > . . . > > > > > > firmware { > > > > > > compatible =3D "raspberrypi,bcm2835-firmware", "simple-bus"; > > > > > > mboxes =3D <0x0000001c>; > > > > > > dma-ranges; > > > > > > . . . > > > > > > emmc2bus { > > > > > > compatible =3D "simple-bus"; > > > > > > #address-cells =3D <0x00000002>; > > > > > > #size-cells =3D <0x00000001>; > > > > > > . . . > > > > > > dma-ranges =3D <0x00000000 0xc0000000 0x00000000 0x00000000 0x4= 0000000>; > > > > > > . . . > > > > > > scb { > > > > > > compatible =3D "simple-bus"; > > > > > > #address-cells =3D <0x00000002>; > > > > > > #size-cells =3D <0x00000002>; > > > > > > . . . > > > > > > dma-ranges =3D <0x00000000 0x00000000 0x00000000 0x00000000 0x0= 0000000 0xfc000000 0x00000001 0x00000000 0x00000001 0x00000000 0x00000001 0= x00000000>; > > > > > > . . . > > > > > > pcie@7d500000 { > > > > > > compatible =3D "brcm,bcm2711-pcie"; > > > > > > . . . > > > > > > #address-cells =3D <0x00000003>; > > > > > > . . . > > > > > > #size-cells =3D <0x00000002>; > > > > > > . . . > > > > > > dma-ranges =3D <0x02000000 0x00000000 0x00000000 0x00000000 0x0= 0000000 0x00000000 0xc0000000>; > > > > > > . . . > > > > > > v3dbus { > > > > > > compatible =3D "simple-bus"; > > > > > > #address-cells =3D <0x00000001>; > > > > > > #size-cells =3D <0x00000002>; > > > > > > . . . > > > > > > dma-ranges =3D <0x00000000 0x00000000 0x00000000 0x00000004 0x0= 0000000>; > > > > > > . . . > > > > > > > > > > rpi_DATA_2711_1p0.pdf reports: > > > > > (I ignore 2D DMA transfer mode here.) > > > > > For DMA engines 0-6: XLENGTH has bits 29:0 > > > > > bits 31:30 are write as 0, read as do not care. > > > > > That would put maxsegsz as 2**30 =3D=3D 1,073,741,824 > > > > > which matches a 1 GiByte space. > > > > > For DMA LITE engines 7-10: XLENGTH has bit 15:0 > > > > > bits 31:16 are write as 0, read as do not care. > > > > > That would put maxsegsz as 2**16 =3D=3D 65,536. > > > > > For DMA4 engines 11-14: XLENGTH has bits 29:0 > > > > > bits 31:30 are write as 0, read as do not care. > > > > > That would put maxsegsz as 2**30 =3D=3D 1,073,741,824 > > > > > which is smaller than the 3 GiByte space associated > > > > > with xHCI. > > > > > > rpi_DATA_2711_1p0.pdf reports the following specifically for > > > DMA11-DMA14 (so the DMA4 engines) for what goes in the CB and > > > NEXT_CB ADDR fields: > > > QUOTE > > > The address must be 256-bit aligned and so the bottom 5 bits of the b= yte address are discarded, i.e. write cb_byte_address[39:0]>>5 into the CB > > > END QUOTE > > > This is not true for DMA0-DMA10 (DMA and DMA LITE). > > > The following is extracted from various places to > > > bring them together. I do not see evidence of handling > > > the cb_byte_address[39:0]>>5 involved for DMA11-DMA14: > > > #define ARMC_TO_VCBUS(pa) bcm283x_armc_to_vcbus(pa) > > > vm_paddr_t > > > bcm283x_armc_to_vcbus(vm_paddr_t pa) > > > { > > > struct bcm283x_memory_soc_cfg *cfg; > > > struct bcm283x_memory_mapping *map, *ment; > > > > > > /* Guaranteed not NULL if we haven't panicked yet. */ > > > cfg =3D bcm283x_get_current_memcfg(); > > > map =3D cfg->memmap; > > > for (ment =3D map; !BCM283X_MEMMAP_ISTERM(ment); ++ment) { > > > if (pa >=3D ment->armc_start && > > > pa < ment->armc_start + ment->armc_size) { > > > return (pa - ment->armc_start) + ment->vcbus_st= art; > > > } > > > } > > > > > > /* > > > * Assume 1:1 mapping for anything else, but complain about it = on > > > * verbose boots. > > > */ > > > if (bootverbose) > > > printf("bcm283x_vcbus: No armc -> vcbus mapping found: = %jx\\n", > > > (uintmax_t)pa); > > > return (pa); > > > > > > > > > } > > > static void > > > bcm_dmamap_cb(void *arg, bus_dma_segment_t *segs, > > > int nseg, int err) > > > { > > > bus_addr_t *addr; > > > > > > if (err) > > > return; > > > > > > addr =3D (bus_addr_t*)arg; > > > *addr =3D ARMC_TO_VCBUS(segs[0].ds_addr); > > > > > > > > > } > > > Note ds_addr assignments in: > > > static bus_size_t > > > _bus_dmamap_addseg(bus_dma_tag_t dmat, bus_dmamap_t map, bus_addr_t c= uraddr, > > > bus_size_t sgsize, bus_dma_segment_t *segs, int *segp) > > > { > > > bus_addr_t baddr, bmask; > > > int seg; > > > > > > /* > > > * Make sure we don't cross any boundaries. > > > */ > > > bmask =3D ~(dmat->common.boundary - 1); > > > if (dmat->common.boundary > 0) { > > > baddr =3D (curaddr + dmat->common.boundary) & bmask; > > > if (sgsize > (baddr - curaddr)) > > > sgsize =3D (baddr - curaddr); > > > } > > > > > > /* > > > * Insert chunk into a segment, coalescing with > > > * previous segment if possible. > > > */ > > > seg =3D *segp; > > > if (seg =3D=3D -1) { > > > seg =3D 0; > > > segs[seg].ds_addr =3D curaddr; > > > segs[seg].ds_len =3D sgsize; > > > } else { > > > if (curaddr =3D=3D segs[seg].ds_addr + segs[seg].ds_len= && > > > (segs[seg].ds_len + sgsize) <=3D dmat->common.maxse= gsz && > > > (dmat->common.boundary =3D=3D 0 || > > > (segs[seg].ds_addr & bmask) =3D=3D (curaddr & bmas= k))) > > > segs[seg].ds_len +=3D sgsize; > > > else { > > > if (++seg >=3D dmat->common.nsegments) > > > return (0); > > > segs[seg].ds_addr =3D curaddr; > > > segs[seg].ds_len =3D sgsize; > > > } > > > } > > > *segp =3D seg; > > > return (sgsize); > > > > > > > > > } > > > Note cb_phys and ch->vc_cb in: > > > static int > > > bcm_dma_init(device_t dev) > > > { > > > . . . > > > /* setup initial settings */ > > > for (i =3D 0; i < BCM_DMA_CH_MAX; i++) { > > > . . . > > > err =3D bus_dmamap_load(sc->sc_dma_tag, ch->dma_map, cb_virt, > > > sizeof(struct bcm_dma_cb), bcm_dmamap_cb, &cb_phys, > > > BUS_DMA_WAITOK); > > > if (err) { > > > device_printf(dev, "cannot load DMA memory\n"); > > > break; > > > } > > > > > > ch->cb =3D cb_virt; > > > ch->vc_cb =3D cb_phys; > > > > > > > > > . . . > > > int > > > bcm_dma_start(int ch, vm_paddr_t src, vm_paddr_t dst, int len) > > > { > > > struct bcm_dma_softc *sc =3D bcm_dma_sc; > > > struct bcm_dma_cb *cb; > > > > > > if (ch < 0 || ch >=3D BCM_DMA_CH_MAX) > > > return (-1); > > > > > > if (!(sc->sc_dma_ch[ch].flags & BCM_DMA_CH_USED)) > > > return (-1); > > > > > > cb =3D sc->sc_dma_ch[ch].cb; > > > cb->src =3D ARMC_TO_VCBUS(src); > > > cb->dst =3D ARMC_TO_VCBUS(dst); > > > > > > cb->len =3D len; > > > > > > bus_dmamap_sync(sc->sc_dma_tag, > > > sc->sc_dma_ch[ch].dma_map, BUS_DMASYNC_PREWRITE); > > > > > > bus_write_4(sc->sc_mem, BCM_DMA_CBADDR(ch), > > > sc->sc_dma_ch[ch].vc_cb); > > > bus_write_4(sc->sc_mem, BCM_DMA_CS(ch), CS_ACTIVE); > > > > > > > > > #ifdef DEBUG > > > bcm_dma_cb_dump(sc->sc_dma_ch[ch].cb); > > > bcm_dma_reg_dump(ch); > > > #endif > > > > > > return (0); > > > > > > > > > } > > > It looks to me like FreeBSD is not set up to use the DMA4 > > > engines (DMA11-DMA14) and happens to not use them for the > > > DTB that I get from u-boot.bin in my context. > > > Of course, I may just have missed something in looking > > > around at the unfamiliar material. > > =3D=3D > > Mark Millard > marklmi at yahoo.com > ( dsl-only.net went > away in early 2018-Mar)