From owner-freebsd-arm@freebsd.org Tue Sep 29 04:45:37 2020 Return-Path: Delivered-To: freebsd-arm@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id A205E3F589D for ; Tue, 29 Sep 2020 04:45:37 +0000 (UTC) (envelope-from marklmi@yahoo.com) Received: from sonic304-56.consmr.mail.bf2.yahoo.com (sonic304-56.consmr.mail.bf2.yahoo.com [74.6.128.31]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4C0mzN1qXHz4J0d for ; Tue, 29 Sep 2020 04:45:36 +0000 (UTC) (envelope-from marklmi@yahoo.com) X-YMail-OSG: hLfSroEVM1ky0wBinR6T.SemDAP6x383zNai7Yc9098d7gGbBb7RPGfTw3W46_4 _.7sXsCM2TJ8j7OTz.zMq9T98D85rOOhNBbG3iI03sioJ4Sogd2fqBSo9MlUNs3re1ChAVXrzH5. raZhyiY1fknJXze5ybrx4CTsAjsW8B_ZHHVVl3ijWIRrNmUZLCtbF_JPk0u_btPv8Kuov4vPCXUN f_d484s4e6aGR1OYKRzjj8WHN_cA0h1ZZ14KwrQjMZ1GHXlizPIS8tKIyuXGQ72S4xB_xjCzzmxE YZYHddYDK2.9Nk0KdhDGhLQdidbWp2O4uVNIsXDHpFJFIwflgwb90PkTJUHM6L9CtLMY2cyQlDfy 43Cnh_DnYlt4ZpkH1sgwSbQ3sz.soCS.tYVU9ZR76hvqGjZAEYV5jenwhxXgVv0DgpTkHlzH5Y02 GvuRbWIss8II5pdMDvDBGV6l6P5AZfd3x96IBCvl7mK5i7_Nzz8xmUQzh5MnbtNIL9OTuDqi1hdF yRv5CkBNeG0iDTU7pEBSuPApPZL.qb3dCFTH8H5tJrwDdx5DQin8tCw.S8HWZInJrWjBzmMgcOJJ STmkoMGRUe7s_z0cqKEJ58hXznLGHu62.pDxc4ttK_9X.tcWiErNJtOA3nQDVneNUGLY1Zv_qkDg 4OU4MeaG242g6dbZKzIoCkLp3_4ALoGzFZw6LmRkuev6r_9mBR9FyojgnxLDwLqFmBwnzrCby_vX ErIo1Uh99RknvUX5Sujo1H_K92Zd_KYeMHzOOISPukKNDcvAPjDwnr5P2eHO7C9k3jdMAwcLQnyc eWucMWvrAf6eJupK_eocgzm.y49mzK7TtA4IDNFQ6xzuIjeXqa0JLOXenXwk4IBFVMnspblS7TX6 _DoZnfAYaWQ_5LVAKWzMIgFyyEEHRgI4jPLsyssh2lmx2AyAb1XIOE3xJBf_XiNqMJ0DBpfUZbQ4 .Wh8.rxnaTmzV1R31pfl9ZxDhO.b7VX3JjSw_n5YvrBWfdxC_k5mAfy5yLZ94emmho9ni_TcocoS e4PgsC3hYJqiBfMqCMAyn6oNdJN24BWuJkndSHyZHDEXPgFNRTlyTlF.VnP4JQBH8sGVi2MJiGD7 4iVqenv_Xm2bpodSu7EtZJ5dx7I4.g0EpeUxGJbNHodRL1yK1mXFOmYjpZD2Yuz1ksP0OyhvpZQp RFqOB2owTOpdeMiS6LQpKi4myoPZ6gm.M8s5lyL8Bz.FLnVBtM6e0qlcqeU8Fdv2lp9MAXtGuh1T 7JSVJdKVFZHC.TZ6OeLCMWvfGncohlvGj5b.emeGwSrSoyn9Ikyeplgxe2_RV3l9s5pTTbBnGesi AAsDvAT_cxKojrquD.woTfuiDXEiu3rvrhKYzBMTxnYtEtB9xZDv39VTmrzSu5JyIIspKSQN8Oge e_2bWB1wFSamgJL9G4jmCOAgB7sJmmZASNar25iYyURHJghmgiVUtfwNkgkQyMLmrmyE_NeczENF .DNTPgGyHyc6YMYtPcb9TfIibi3GQVb4tfKJzfvG1makcWLwl1q_41.OMnY399H6BN__pDSzsNpw 5fVCUGTb.DTekidPkvDAWoO0J_W9JiCsB Received: from sonic.gate.mail.ne1.yahoo.com by sonic304.consmr.mail.bf2.yahoo.com with HTTP; Tue, 29 Sep 2020 04:45:34 +0000 Received: by smtp417.mail.ne1.yahoo.com (VZM Hermes SMTP Server) with ESMTPA ID 12e0997a853a7607a1ca8210cb24f76e; Tue, 29 Sep 2020 04:45:28 +0000 (UTC) From: Mark Millard Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Mime-Version: 1.0 (Mac OS X Mail 13.4 \(3608.120.23.2.1\)) Subject: Re: RPi4B's DMA11 (DMA4 engine example) vs. xHCI/pcie Date: Mon, 28 Sep 2020 21:45:27 -0700 References: <8C6DE44F-6CE2-4C74-8748-3BBFB54AE183@yahoo.com> <0FE382AB-8DE3-4467-9CB0-E8582AC70EA2@yahoo.com> To: Robert Crowston , freebsd-arm In-Reply-To: <0FE382AB-8DE3-4467-9CB0-E8582AC70EA2@yahoo.com> Message-Id: <85FEDC51-B5B0-4ED4-A5ED-62B63EF9D5A8@yahoo.com> X-Mailer: Apple Mail (2.3608.120.23.2.1) X-Rspamd-Queue-Id: 4C0mzN1qXHz4J0d X-Spamd-Bar: --- X-Spamd-Result: default: False [-3.17 / 15.00]; MV_CASE(0.50)[]; FREEMAIL_FROM(0.00)[yahoo.com]; R_SPF_ALLOW(-0.20)[+ptr:yahoo.com]; TO_DN_ALL(0.00)[]; DKIM_TRACE(0.00)[yahoo.com:+]; RCPT_COUNT_TWO(0.00)[2]; DMARC_POLICY_ALLOW(-0.50)[yahoo.com,reject]; NEURAL_HAM_SHORT(-0.68)[-0.684]; FREEMAIL_TO(0.00)[protonmail.com,freebsd.org]; FROM_EQ_ENVFROM(0.00)[]; RCVD_TLS_LAST(0.00)[]; MIME_TRACE(0.00)[0:+]; FREEMAIL_ENVFROM(0.00)[yahoo.com]; ASN(0.00)[asn:26101, ipnet:74.6.128.0/21, country:US]; MID_RHS_MATCH_FROM(0.00)[]; DWL_DNSWL_NONE(0.00)[yahoo.com:dkim]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-0.97)[-0.968]; R_DKIM_ALLOW(-0.20)[yahoo.com:s=s2048]; FROM_HAS_DN(0.00)[]; NEURAL_HAM_LONG(-1.02)[-1.015]; MIME_GOOD(-0.10)[text/plain]; TO_MATCH_ENVRCPT_SOME(0.00)[]; RCVD_IN_DNSWL_NONE(0.00)[74.6.128.31:from]; RWL_MAILSPIKE_POSSIBLE(0.00)[74.6.128.31:from]; RCVD_COUNT_TWO(0.00)[2]; MAILMAN_DEST(0.00)[freebsd-arm] X-BeenThere: freebsd-arm@freebsd.org X-Mailman-Version: 2.1.33 Precedence: list List-Id: "Porting FreeBSD to ARM processors." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 29 Sep 2020 04:45:37 -0000 On 2020-Sep-28, at 19:04, Mark Millard wrote: > On 2020-Sep-28, at 18:29, Mark Millard wrote: >>=20 >>> [Be warned that the material is not familiar so I may need >>> educating. THis is based ont he example context that I >>> happen to have around.] >>>=20 >>> In the u-boot fdt print / output there are 2 distinct sets of dma = channel >>> information, 1 for soc and 1 for scb, where the dma_tag values for = the two >>> sets should be distinct as far as I can tell: >>>=20 >>> U-Boot> fdt address 0x7ef1000 >>> U-Boot> fdt print / =20 >>> / { >>> . . . >>> soc { >>> dma@7e007000 { >>> compatible =3D "brcm,bcm2835-dma"; >>> reg =3D <0x7e007000 0x00000b00>; >>> interrupts =3D * 0x0000000007ef645c = [0x00000084]; >>> interrupt-names =3D "dma0", "dma1", "dma2", = "dma3", "dma4", "dma5", "dma6", "dma7", "dma8", "dma9", "dma10"; >>> #dma-cells =3D <0x00000001>; >>> brcm,dma-channel-mask =3D <0x000001f5>; >>> phandle =3D <0x0000000b>; >>> }; >>>=20 >>> scb { >>> . . . >>> dma@7e007b00 { >>> compatible =3D "brcm,bcm2711-dma"; >>> reg =3D <0x00000000 0x7e007b00 0x00000000 = 0x00000400>; >>> interrupts =3D <0x00000000 0x00000059 = 0x00000004 0x00000000 0x0000005a 0x00000004 0x00000000 0x0000005b = 0x00000004 0x00000000 0x0000005c 0x00000004>; >>> interrupt-names =3D "dma11", "dma12", "dma13", = "dma14"; >>> #dma-cells =3D <0x00000001>; >>> brcm,dma-channel-mask =3D <0x00007000>; >>> phandle =3D <0x0000003d>; >>> }; >>> . . . >>>=20 >>> So, 0 through 10 need the soc criteria (mix of DMA and DMA LITE = engine criteria) >>> and 11 through 14 need the scb criteria (DMA4 engine criteria). (I'm = ignore >>> dma-channel-mask's at this point.) >>>=20 >>>=20 >>> I'll here note the code has: >>>=20 >>> #define BCM_DMA_CH_MAX 12 >>>=20 >>> for use in code like: >>>=20 >>> /* setup initial settings */ >>> for (i =3D 0; i < BCM_DMA_CH_MAX; i++) { >>> ch =3D &sc->sc_dma_ch[i]; >>>=20 >>> bzero(ch, sizeof(struct bcm_dma_ch)); >>> ch->ch =3D i; >>> ch->flags =3D BCM_DMA_CH_UNMAP; >>>=20 >>> if ((bcm_dma_channel_mask & (1 << i)) =3D=3D 0) >>> continue; >>> . . . >>>=20 >>> It looks to me like the only scb/DMA4-engine "dma11" is covered >>> by such loops and that the "brcm,dma-channel-mask =3D <0x00007000>" >>> means that dma11 will not be used. >>>=20 >>> So: No scb/DMA4 engine will be used??? (That could explain the >>> 1 GiByte limit?) >>>=20 >>>=20 >>> rpi_DATA_2711_1p0.pdf reports that soc/0-10 have 2 types (0-6 vs. = 7-10 >>> as it turns out) as well as the scb/DM4-engines (11-14): >>>=20 >>> QUOTE (with omitted marked by ". . .") >>> . . . >>> The BCM2711 DMA Controller provides a total of 16 DMA channels. Four = of these are DMA Lite channels (with reduced performance and features), = and four of them are DMA4 channels (with increased performance and a = wider address range). >>> . . . >>> 4.5. DMA LITE Engines >>>=20 >>> Several of the DMA engines are of the LITE design. This is a reduced = specification engine designed to save space. The engine behaves in the = same way as a normal DMA engine except for the following differences: >>> . . . >>> =E2=80=A2 The DMA length register is now 16 bits, limiting the = maximum transferable length to 65536 bytes. >>> . . . >>> 4.6. DMA4 Engines >>>=20 >>> Several of the DMA engines are of the DMA4 design. These have higher = performance due to their uncoupled read/write design and can access up = to 40 address bits. Unlike the other DMA engines they are also capable = of performing write bursts. Note that they directly access the full = 35-bit address bus of the BCM2711 and so bypass the paging registers of = the DMA and DMA Lite engines. >>>=20 >>> DMA channel 11 is additionally able to access the PCIe interface. >>> END QUOTE >>>=20 >>> The register map indicates (with some extra notes added): >>>=20 >>> 0-6: DMA >>> 7-10: DMA LITE (65536 bytes limit, for example) >>> 11-14: DMA4 (11 is special relative to "PCIe interface") >>> ("DMA Channel 15 is exclusively used by the VPU.") >>>=20 >>> Yet what I see in the head -r365932 code is: >>>=20 >>> #define BCM_DMA_CH_MAX 12 >>> . . . >>> struct bcm_dma_softc { >>> device_t sc_dev; >>> struct mtx sc_mtx; >>> struct resource * sc_mem; >>> struct resource * sc_irq[BCM_DMA_CH_MAX]; >>> void * sc_intrhand[BCM_DMA_CH_MAX]; >>> struct bcm_dma_ch sc_dma_ch[BCM_DMA_CH_MAX]; >>> bus_dma_tag_t sc_dma_tag; >>> }; >>> . . . >>> err =3D bus_dma_tag_create(bus_get_dma_tag(dev), >>> 1, 0, BUS_SPACE_MAXADDR_32BIT, >>> BUS_SPACE_MAXADDR, NULL, NULL, >>> sizeof(struct bcm_dma_cb), 1, >>> sizeof(struct bcm_dma_cb), >>> BUS_DMA_ALLOCNOW, NULL, NULL, >>> &sc->sc_dma_tag); >>>=20 >>> As an example: does that deal with the likes of DMA LITE (so 7-10) = "limiting >>> the maximum transferable length to 65536 bytes"? >>>=20 >>> As another example: Does it deal with the DMA4 (11-14) distinctions = (if >>> such were in use anyway)? >>>=20 >>> For reference from the fdt print / : >>>=20 >>> / { >>> . . . >>> #address-cells =3D <0x00000002>; >>> #size-cells =3D <0x00000001>; >>> . . . >>> soc { >>> compatible =3D "simple-bus"; >>> #address-cells =3D <0x00000001>; >>> #size-cells =3D <0x00000001>; >>> . . . >>> dma-ranges =3D <0xc0000000 0x00000000 0x00000000 = 0x40000000>; >>> . . . >>> firmware { >>> compatible =3D "raspberrypi,bcm2835-firmware", = "simple-bus"; >>> mboxes =3D <0x0000001c>; >>> dma-ranges; >>> . . . >>> emmc2bus { >>> compatible =3D "simple-bus"; >>> #address-cells =3D <0x00000002>; >>> #size-cells =3D <0x00000001>; >>> . . . >>> dma-ranges =3D <0x00000000 0xc0000000 0x00000000 = 0x00000000 0x40000000>; >>> . . . >>> scb { >>> compatible =3D "simple-bus"; >>> #address-cells =3D <0x00000002>; >>> #size-cells =3D <0x00000002>; >>> . . . >>> dma-ranges =3D <0x00000000 0x00000000 0x00000000 = 0x00000000 0x00000000 0xfc000000 0x00000001 0x00000000 0x00000001 = 0x00000000 0x00000001 0x00000000>; >>> . . . >>> pcie@7d500000 { >>> compatible =3D "brcm,bcm2711-pcie"; >>> . . . >>> #address-cells =3D <0x00000003>; >>> . . . >>> #size-cells =3D <0x00000002>; >>> . . . >>> dma-ranges =3D <0x02000000 0x00000000 = 0x00000000 0x00000000 0x00000000 0x00000000 0xc0000000>; >>> . . . >>> v3dbus { >>> compatible =3D "simple-bus"; >>> #address-cells =3D <0x00000001>; >>> #size-cells =3D <0x00000002>; >>> . . . >>> dma-ranges =3D <0x00000000 0x00000000 0x00000000 = 0x00000004 0x00000000>; >>> . . . >>=20 >> rpi_DATA_2711_1p0.pdf reports: >> (I ignore 2D DMA transfer mode here.) >>=20 >> For DMA engines 0-6: XLENGTH has bits 29:0 >> bits 31:30 are write as 0, read as do not care. >> That would put maxsegsz as 2**30 =3D=3D 1,073,741,824 >> which matches a 1 GiByte space. >>=20 >> For DMA LITE engines 7-10: XLENGTH has bit 15:0 >> bits 31:16 are write as 0, read as do not care. >> That would put maxsegsz as 2**16 =3D=3D 65,536. >>=20 >> For DMA4 engines 11-14: XLENGTH has bits 29:0 >> bits 31:30 are write as 0, read as do not care. >> That would put maxsegsz as 2**30 =3D=3D 1,073,741,824 >> which is smaller than the 3 GiByte space associated >> with xHCI. rpi_DATA_2711_1p0.pdf reports the following specifically for DMA11-DMA14 (so the DMA4 engines) for what goes in the CB and NEXT_CB ADDR fields: QUOTE The address must be 256-bit aligned and so the bottom 5 bits of the byte = address are discarded, i.e. write cb_byte_address[39:0]>>5 into the CB END QUOTE This is not true for DMA0-DMA10 (DMA and DMA LITE). The following is extracted from various places to bring them together. I do not see evidence of handling the cb_byte_address[39:0]>>5 involved for DMA11-DMA14: #define ARMC_TO_VCBUS(pa) bcm283x_armc_to_vcbus(pa) vm_paddr_t bcm283x_armc_to_vcbus(vm_paddr_t pa) { struct bcm283x_memory_soc_cfg *cfg; struct bcm283x_memory_mapping *map, *ment; =20 /* Guaranteed not NULL if we haven't panicked yet. */ cfg =3D bcm283x_get_current_memcfg(); map =3D cfg->memmap; for (ment =3D map; !BCM283X_MEMMAP_ISTERM(ment); ++ment) { if (pa >=3D ment->armc_start && pa < ment->armc_start + ment->armc_size) { return (pa - ment->armc_start) + = ment->vcbus_start; } } /* * Assume 1:1 mapping for anything else, but complain about it = on * verbose boots. */ if (bootverbose) printf("bcm283x_vcbus: No armc -> vcbus mapping found: = %jx\n", (uintmax_t)pa); return (pa); } static void bcm_dmamap_cb(void *arg, bus_dma_segment_t *segs, int nseg, int err) { bus_addr_t *addr; if (err) return; addr =3D (bus_addr_t*)arg; *addr =3D ARMC_TO_VCBUS(segs[0].ds_addr); } Note ds_addr assignments in: static bus_size_t _bus_dmamap_addseg(bus_dma_tag_t dmat, bus_dmamap_t map, bus_addr_t = curaddr, bus_size_t sgsize, bus_dma_segment_t *segs, int *segp) { bus_addr_t baddr, bmask; int seg; =20 /* * Make sure we don't cross any boundaries. */ bmask =3D ~(dmat->common.boundary - 1); if (dmat->common.boundary > 0) { baddr =3D (curaddr + dmat->common.boundary) & bmask; if (sgsize > (baddr - curaddr)) sgsize =3D (baddr - curaddr); } =20 /* * Insert chunk into a segment, coalescing with * previous segment if possible. */ seg =3D *segp; if (seg =3D=3D -1) { seg =3D 0; segs[seg].ds_addr =3D curaddr; segs[seg].ds_len =3D sgsize; } else { if (curaddr =3D=3D segs[seg].ds_addr + segs[seg].ds_len = && (segs[seg].ds_len + sgsize) <=3D = dmat->common.maxsegsz && (dmat->common.boundary =3D=3D 0 || (segs[seg].ds_addr & bmask) =3D=3D (curaddr & = bmask))) segs[seg].ds_len +=3D sgsize; else { if (++seg >=3D dmat->common.nsegments) return (0); segs[seg].ds_addr =3D curaddr; segs[seg].ds_len =3D sgsize; } } *segp =3D seg; return (sgsize); } Note cb_phys and ch->vc_cb in: static int bcm_dma_init(device_t dev) { . . . /* setup initial settings */ for (i =3D 0; i < BCM_DMA_CH_MAX; i++) { . . . err =3D bus_dmamap_load(sc->sc_dma_tag, ch->dma_map, = cb_virt, sizeof(struct bcm_dma_cb), bcm_dmamap_cb, &cb_phys, BUS_DMA_WAITOK); if (err) { device_printf(dev, "cannot load DMA memory\n"); break; } ch->cb =3D cb_virt; ch->vc_cb =3D cb_phys; . . . int bcm_dma_start(int ch, vm_paddr_t src, vm_paddr_t dst, int len) { struct bcm_dma_softc *sc =3D bcm_dma_sc; struct bcm_dma_cb *cb; if (ch < 0 || ch >=3D BCM_DMA_CH_MAX) return (-1); =20 if (!(sc->sc_dma_ch[ch].flags & BCM_DMA_CH_USED)) return (-1); =20 cb =3D sc->sc_dma_ch[ch].cb; cb->src =3D ARMC_TO_VCBUS(src); cb->dst =3D ARMC_TO_VCBUS(dst); =20 cb->len =3D len; =20 bus_dmamap_sync(sc->sc_dma_tag, sc->sc_dma_ch[ch].dma_map, BUS_DMASYNC_PREWRITE); =20 bus_write_4(sc->sc_mem, BCM_DMA_CBADDR(ch), sc->sc_dma_ch[ch].vc_cb); bus_write_4(sc->sc_mem, BCM_DMA_CS(ch), CS_ACTIVE); =20 #ifdef DEBUG bcm_dma_cb_dump(sc->sc_dma_ch[ch].cb); bcm_dma_reg_dump(ch); #endif return (0); } It looks to me like FreeBSD is not set up to use the DMA4 engines (DMA11-DMA14) and happens to not use them for the DTB that I get from u-boot.bin in my context. Of course, I may just have missed something in looking around at the unfamiliar material. =3D=3D=3D Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar)