Date: Wed, 30 Sep 2020 11:13:06 -0700 From: Mark Millard <marklmi@yahoo.com> To: Robert Crowston <crowston@protonmail.com>, freebsd-arm <freebsd-arm@freebsd.org> Subject: Re: RPi4B's DMA11 (DMA4 engine example) vs. xHCI/pcie Message-ID: <903FE769-ED46-4FBC-A272-4D2C89A9CD7A@yahoo.com> In-Reply-To: <B440C8D8-AA02-49E4-A0D6-3EA9B7FFD13A@yahoo.com> References: <8C6DE44F-6CE2-4C74-8748-3BBFB54AE183@yahoo.com> <0FE382AB-8DE3-4467-9CB0-E8582AC70EA2@yahoo.com> <85FEDC51-B5B0-4ED4-A5ED-62B63EF9D5A8@yahoo.com> <B440C8D8-AA02-49E4-A0D6-3EA9B7FFD13A@yahoo.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On 2020-Sep-29, at 10:35, Mark Millard <marklmi at yahoo.com> wrote:
> On 2020-Sep-28, at 21:45, Mark Millard <marklmi at yahoo.com> wrote:
>=20
>> On 2020-Sep-28, at 19:04, Mark Millard <marklmi at yahoo.com> wrote:
>>=20
>>> On 2020-Sep-28, at 18:29, Mark Millard <marklmi at yahoo.com> wrote:
>>>>=20
>>>>> [Be warned that the material is not familiar so I may need
>>>>> educating. THis is based ont he example context that I
>>>>> happen to have around.]
>>>>>=20
>>>>> In the u-boot fdt print / output there are 2 distinct sets of dma =
channel
>>>>> information, 1 for soc and 1 for scb, where the dma_tag values for =
the two
>>>>> sets should be distinct as far as I can tell:
>>>>>=20
>>>>> U-Boot> fdt address 0x7ef1000
>>>>> U-Boot> fdt print / =20
>>>>> / {
>>>>> . . .
>>>>> soc {
>>>>> dma@7e007000 {
>>>>> compatible =3D "brcm,bcm2835-dma";
>>>>> reg =3D <0x7e007000 0x00000b00>;
>>>>> interrupts =3D * 0x0000000007ef645c =
[0x00000084];
>>>>> interrupt-names =3D "dma0", "dma1", "dma2", =
"dma3", "dma4", "dma5", "dma6", "dma7", "dma8", "dma9", "dma10";
>>>>> #dma-cells =3D <0x00000001>;
>>>>> brcm,dma-channel-mask =3D <0x000001f5>;
>>>>> phandle =3D <0x0000000b>;
>>>>> };
>>>>>=20
>>>>> scb {
>>>>> . . .
>>>>> dma@7e007b00 {
>>>>> compatible =3D "brcm,bcm2711-dma";
>>>>> reg =3D <0x00000000 0x7e007b00 0x00000000 =
0x00000400>;
>>>>> interrupts =3D <0x00000000 0x00000059 =
0x00000004 0x00000000 0x0000005a 0x00000004 0x00000000 0x0000005b =
0x00000004 0x00000000 0x0000005c 0x00000004>;
>>>>> interrupt-names =3D "dma11", "dma12", "dma13", =
"dma14";
>>>>> #dma-cells =3D <0x00000001>;
>>>>> brcm,dma-channel-mask =3D <0x00007000>;
>>>>> phandle =3D <0x0000003d>;
>>>>> };
>>>>> . . .
>=20
> I had presumed that the dma@7e007b00 would be processed. But
> I finally happened to search for "bcm2711-dma" in FreeBSD and
> it does not occur.
>=20
> That appears to mean that BCM_DMA_CH_MAX being 12 is depending
> on dma@7e007000's brcm,dma-channel-mask to avoid referencing
> number 11 that does not exist in that bcm2835-dma context.
>=20
> I think this makes what I wrote about DMA4 engines (the most
> capable ones) somewhat incoherent in the details but the basic
> not-supported-in-the-code and not-used status appears to be
> true.
>=20
> As for DMA0-DMA10 (bcm2835-dma), some DMA (0-6) vs. DMA LITE
> (7-10) distinctions not being handled (for example 65536
> maxsegsz for DMA LITE) still looks to be true to me.
Looks like FreeBSD is limited to 32-bit via =
usb/controller/generic_xhci.c
has nothing explicit for other than 32 address lines (and overall the
only alternative is 64 address lines):
#define IS_DMA_32B 1
int
generic_xhci_attach(device_t dev)
{
. . .
err =3D xhci_init(sc, dev, IS_DMA_32B);
if (err !=3D 0) {
device_printf(dev, "Failed to init XHCI, with error =
%d\n", err);
generic_xhci_detach(dev);
return (ENXIO);
}
. . .
/*
* The following structure describes the parent USB DMA tag.
*/
#if USB_HAVE_BUSDMA
struct usb_dma_parent_tag {
. . .
uint8_t dma_bits; /* number of DMA address lines =
*/
. . .
};
#else
struct usb_dma_parent_tag {}; /* empty struct */
#endif
. . .
usb_error_t
xhci_init(struct xhci_softc *sc, device_t self, uint8_t dma32)
{
. . .
/* get DMA bits */
sc->sc_bus.dma_bits =3D (XHCI_HCS0_AC64(temp) &&
xhcidma32 =3D=3D 0 && dma32 =3D=3D 0) ? 64 : 32;
. . .
Overall it looks like a bunch of places would need changes to
support the RPi4B's 3 GiByte capability. (Probably more than
I've discovered, ignoring things like DMA4 engine use to get
write bursts and the like.)
I will note that I found code in NetBSD that classifies "normal"
DMA engines vs. DMA LITE engines (via testing a debug register)
for bcm2835-dma and only requests normal DMA engines be used,
skipping DMA LITE. (This is for DTB/fdt contexts I think. I've
not done as well figuring out even such narrow aspects of ACPI
handling of things.) This tends to confirm my worries over
FreeBSD's bcm2835-dma handling of the DMA LITE engines existing
but being less capable.
>>>>> So, 0 through 10 need the soc criteria (mix of DMA and DMA LITE =
engine criteria)
>>>>> and 11 through 14 need the scb criteria (DMA4 engine criteria). =
(I'm ignore
>>>>> dma-channel-mask's at this point.)
>>>>>=20
>>>>>=20
>>>>> I'll here note the code has:
>>>>>=20
>>>>> #define BCM_DMA_CH_MAX 12
>>>>>=20
>>>>> for use in code like:
>>>>>=20
>>>>> /* setup initial settings */
>>>>> for (i =3D 0; i < BCM_DMA_CH_MAX; i++) {
>>>>> ch =3D &sc->sc_dma_ch[i];
>>>>>=20
>>>>> bzero(ch, sizeof(struct bcm_dma_ch));
>>>>> ch->ch =3D i;
>>>>> ch->flags =3D BCM_DMA_CH_UNMAP;
>>>>>=20
>>>>> if ((bcm_dma_channel_mask & (1 << i)) =3D=3D 0)
>>>>> continue;
>>>>> . . .
>>>>>=20
>>>>> It looks to me like the only scb/DMA4-engine "dma11" is covered
>>>>> by such loops and that the "brcm,dma-channel-mask =3D =
<0x00007000>"
>>>>> means that dma11 will not be used.
>>>>>=20
>>>>> So: No scb/DMA4 engine will be used??? (That could explain the
>>>>> 1 GiByte limit?)
>>>>>=20
>>>>>=20
>>>>> rpi_DATA_2711_1p0.pdf reports that soc/0-10 have 2 types (0-6 vs. =
7-10
>>>>> as it turns out) as well as the scb/DM4-engines (11-14):
>>>>>=20
>>>>> QUOTE (with omitted marked by ". . .")
>>>>> . . .
>>>>> The BCM2711 DMA Controller provides a total of 16 DMA channels. =
Four of these are DMA Lite channels (with reduced performance and =
features), and four of them are DMA4 channels (with increased =
performance and a wider address range).
>>>>> . . .
>>>>> 4.5. DMA LITE Engines
>>>>>=20
>>>>> Several of the DMA engines are of the LITE design. This is a =
reduced specification engine designed to save space. The engine behaves =
in the same way as a normal DMA engine except for the following =
differences:
>>>>> . . .
>>>>> =E2=80=A2 The DMA length register is now 16 bits, limiting the =
maximum transferable length to 65536 bytes.
>>>>> . . .
>>>>> 4.6. DMA4 Engines
>>>>>=20
>>>>> Several of the DMA engines are of the DMA4 design. These have =
higher performance due to their uncoupled read/write design and can =
access up to 40 address bits. Unlike the other DMA engines they are also =
capable of performing write bursts. Note that they directly access the =
full 35-bit address bus of the BCM2711 and so bypass the paging =
registers of the DMA and DMA Lite engines.
>>>>>=20
>>>>> DMA channel 11 is additionally able to access the PCIe interface.
>>>>> END QUOTE
>>>>>=20
>>>>> The register map indicates (with some extra notes added):
>>>>>=20
>>>>> 0-6: DMA
>>>>> 7-10: DMA LITE (65536 bytes limit, for example)
>>>>> 11-14: DMA4 (11 is special relative to "PCIe interface")
>>>>> ("DMA Channel 15 is exclusively used by the VPU.")
>>>>>=20
>>>>> Yet what I see in the head -r365932 code is:
>>>>>=20
>>>>> #define BCM_DMA_CH_MAX 12
>>>>> . . .
>>>>> struct bcm_dma_softc {
>>>>> device_t sc_dev;
>>>>> struct mtx sc_mtx;
>>>>> struct resource * sc_mem;
>>>>> struct resource * sc_irq[BCM_DMA_CH_MAX];
>>>>> void * sc_intrhand[BCM_DMA_CH_MAX];
>>>>> struct bcm_dma_ch sc_dma_ch[BCM_DMA_CH_MAX];
>>>>> bus_dma_tag_t sc_dma_tag;
>>>>> };
>>>>> . . .
>>>>> err =3D bus_dma_tag_create(bus_get_dma_tag(dev),
>>>>> 1, 0, BUS_SPACE_MAXADDR_32BIT,
>>>>> BUS_SPACE_MAXADDR, NULL, NULL,
>>>>> sizeof(struct bcm_dma_cb), 1,
>>>>> sizeof(struct bcm_dma_cb),
>>>>> BUS_DMA_ALLOCNOW, NULL, NULL,
>>>>> &sc->sc_dma_tag);
>>>>>=20
>>>>> As an example: does that deal with the likes of DMA LITE (so 7-10) =
"limiting
>>>>> the maximum transferable length to 65536 bytes"?
>>>>>=20
>>>>> As another example: Does it deal with the DMA4 (11-14) =
distinctions (if
>>>>> such were in use anyway)?
>>>>>=20
>>>>> For reference from the fdt print / :
>>>>>=20
>>>>> / {
>>>>> . . .
>>>>> #address-cells =3D <0x00000002>;
>>>>> #size-cells =3D <0x00000001>;
>>>>> . . .
>>>>> soc {
>>>>> compatible =3D "simple-bus";
>>>>> #address-cells =3D <0x00000001>;
>>>>> #size-cells =3D <0x00000001>;
>>>>> . . .
>>>>> dma-ranges =3D <0xc0000000 0x00000000 0x00000000 =
0x40000000>;
>>>>> . . .
>>>>> firmware {
>>>>> compatible =3D "raspberrypi,bcm2835-firmware", =
"simple-bus";
>>>>> mboxes =3D <0x0000001c>;
>>>>> dma-ranges;
>>>>> . . .
>>>>> emmc2bus {
>>>>> compatible =3D "simple-bus";
>>>>> #address-cells =3D <0x00000002>;
>>>>> #size-cells =3D <0x00000001>;
>>>>> . . .
>>>>> dma-ranges =3D <0x00000000 0xc0000000 0x00000000 =
0x00000000 0x40000000>;
>>>>> . . .
>>>>> scb {
>>>>> compatible =3D "simple-bus";
>>>>> #address-cells =3D <0x00000002>;
>>>>> #size-cells =3D <0x00000002>;
>>>>> . . .
>>>>> dma-ranges =3D <0x00000000 0x00000000 0x00000000 =
0x00000000 0x00000000 0xfc000000 0x00000001 0x00000000 0x00000001 =
0x00000000 0x00000001 0x00000000>;
>>>>> . . .
>>>>> pcie@7d500000 {
>>>>> compatible =3D "brcm,bcm2711-pcie";
>>>>> . . .
>>>>> #address-cells =3D <0x00000003>;
>>>>> . . .
>>>>> #size-cells =3D <0x00000002>;
>>>>> . . .
>>>>> dma-ranges =3D <0x02000000 0x00000000 =
0x00000000 0x00000000 0x00000000 0x00000000 0xc0000000>;
>>>>> . . .
>>>>> v3dbus {
>>>>> compatible =3D "simple-bus";
>>>>> #address-cells =3D <0x00000001>;
>>>>> #size-cells =3D <0x00000002>;
>>>>> . . .
>>>>> dma-ranges =3D <0x00000000 0x00000000 0x00000000 =
0x00000004 0x00000000>;
>>>>> . . .
>>>>=20
>>>> rpi_DATA_2711_1p0.pdf reports:
>>>> (I ignore 2D DMA transfer mode here.)
>>>>=20
>>>> For DMA engines 0-6: XLENGTH has bits 29:0
>>>> bits 31:30 are write as 0, read as do not care.
>>>> That would put maxsegsz as 2**30 =3D=3D 1,073,741,824
>>>> which matches a 1 GiByte space.
>>>>=20
>>>> For DMA LITE engines 7-10: XLENGTH has bit 15:0
>>>> bits 31:16 are write as 0, read as do not care.
>>>> That would put maxsegsz as 2**16 =3D=3D 65,536.
>>>>=20
>>>> For DMA4 engines 11-14: XLENGTH has bits 29:0
>>>> bits 31:30 are write as 0, read as do not care.
>>>> That would put maxsegsz as 2**30 =3D=3D 1,073,741,824
>>>> which is smaller than the 3 GiByte space associated
>>>> with xHCI.
>>=20
>> rpi_DATA_2711_1p0.pdf reports the following specifically for
>> DMA11-DMA14 (so the DMA4 engines) for what goes in the CB and
>> NEXT_CB ADDR fields:
>>=20
>> QUOTE
>> The address must be 256-bit aligned and so the bottom 5 bits of the =
byte address are discarded, i.e. write cb_byte_address[39:0]>>5 into the =
CB
>> END QUOTE
>>=20
>> This is not true for DMA0-DMA10 (DMA and DMA LITE).
>>=20
>> The following is extracted from various places to
>> bring them together. I do not see evidence of handling
>> the cb_byte_address[39:0]>>5 involved for DMA11-DMA14:
>>=20
>> #define ARMC_TO_VCBUS(pa) bcm283x_armc_to_vcbus(pa)
>>=20
>> vm_paddr_t
>> bcm283x_armc_to_vcbus(vm_paddr_t pa)
>> {
>> struct bcm283x_memory_soc_cfg *cfg;
>> struct bcm283x_memory_mapping *map, *ment;
>>=20
>> /* Guaranteed not NULL if we haven't panicked yet. */
>> cfg =3D bcm283x_get_current_memcfg();
>> map =3D cfg->memmap;
>> for (ment =3D map; !BCM283X_MEMMAP_ISTERM(ment); ++ment) {
>> if (pa >=3D ment->armc_start &&
>> pa < ment->armc_start + ment->armc_size) {
>> return (pa - ment->armc_start) + =
ment->vcbus_start;
>> }
>> }
>>=20
>> /*
>> * Assume 1:1 mapping for anything else, but complain about it =
on
>> * verbose boots.
>> */
>> if (bootverbose)
>> printf("bcm283x_vcbus: No armc -> vcbus mapping found: =
%jx\n",
>> (uintmax_t)pa);
>> return (pa);
>> }
>>=20
>> static void
>> bcm_dmamap_cb(void *arg, bus_dma_segment_t *segs,
>> int nseg, int err)
>> {
>> bus_addr_t *addr;
>>=20
>> if (err)
>> return;
>>=20
>> addr =3D (bus_addr_t*)arg;
>> *addr =3D ARMC_TO_VCBUS(segs[0].ds_addr);
>> }
>>=20
>> Note ds_addr assignments in:
>>=20
>> static bus_size_t
>> _bus_dmamap_addseg(bus_dma_tag_t dmat, bus_dmamap_t map, bus_addr_t =
curaddr,
>> bus_size_t sgsize, bus_dma_segment_t *segs, int *segp)
>> {
>> bus_addr_t baddr, bmask;
>> int seg;
>>=20
>> /*
>> * Make sure we don't cross any boundaries.
>> */
>> bmask =3D ~(dmat->common.boundary - 1);
>> if (dmat->common.boundary > 0) {
>> baddr =3D (curaddr + dmat->common.boundary) & bmask;
>> if (sgsize > (baddr - curaddr))
>> sgsize =3D (baddr - curaddr);
>> }
>>=20
>> /*
>> * Insert chunk into a segment, coalescing with
>> * previous segment if possible.
>> */
>> seg =3D *segp;
>> if (seg =3D=3D -1) {
>> seg =3D 0;
>> segs[seg].ds_addr =3D curaddr;
>> segs[seg].ds_len =3D sgsize;
>> } else {
>> if (curaddr =3D=3D segs[seg].ds_addr + segs[seg].ds_len =
&&
>> (segs[seg].ds_len + sgsize) <=3D =
dmat->common.maxsegsz &&
>> (dmat->common.boundary =3D=3D 0 ||
>> (segs[seg].ds_addr & bmask) =3D=3D (curaddr & =
bmask)))
>> segs[seg].ds_len +=3D sgsize;
>> else {
>> if (++seg >=3D dmat->common.nsegments)
>> return (0);
>> segs[seg].ds_addr =3D curaddr;
>> segs[seg].ds_len =3D sgsize;
>> }
>> }
>> *segp =3D seg;
>> return (sgsize);
>> }
>>=20
>>=20
>> Note cb_phys and ch->vc_cb in:
>>=20
>> static int
>> bcm_dma_init(device_t dev)
>> {
>> . . .
>> /* setup initial settings */
>> for (i =3D 0; i < BCM_DMA_CH_MAX; i++) {
>> . . .
>> err =3D bus_dmamap_load(sc->sc_dma_tag, ch->dma_map, =
cb_virt,
>> sizeof(struct bcm_dma_cb), bcm_dmamap_cb, &cb_phys,
>> BUS_DMA_WAITOK);
>> if (err) {
>> device_printf(dev, "cannot load DMA memory\n");
>> break;
>> }
>>=20
>> ch->cb =3D cb_virt;
>> ch->vc_cb =3D cb_phys;
>> . . .
>>=20
>> int
>> bcm_dma_start(int ch, vm_paddr_t src, vm_paddr_t dst, int len)
>> {
>> struct bcm_dma_softc *sc =3D bcm_dma_sc;
>> struct bcm_dma_cb *cb;
>>=20
>> if (ch < 0 || ch >=3D BCM_DMA_CH_MAX)
>> return (-1);
>>=20
>> if (!(sc->sc_dma_ch[ch].flags & BCM_DMA_CH_USED))
>> return (-1);
>>=20
>> cb =3D sc->sc_dma_ch[ch].cb;
>> cb->src =3D ARMC_TO_VCBUS(src);
>> cb->dst =3D ARMC_TO_VCBUS(dst);
>>=20
>> cb->len =3D len;
>>=20
>> bus_dmamap_sync(sc->sc_dma_tag,
>> sc->sc_dma_ch[ch].dma_map, BUS_DMASYNC_PREWRITE);
>>=20
>> bus_write_4(sc->sc_mem, BCM_DMA_CBADDR(ch),
>> sc->sc_dma_ch[ch].vc_cb);
>> bus_write_4(sc->sc_mem, BCM_DMA_CS(ch), CS_ACTIVE);
>>=20
>> #ifdef DEBUG
>> bcm_dma_cb_dump(sc->sc_dma_ch[ch].cb);
>> bcm_dma_reg_dump(ch);
>> #endif
>>=20
>> return (0);
>> }
>>=20
>> It looks to me like FreeBSD is not set up to use the DMA4
>> engines (DMA11-DMA14) and happens to not use them for the
>> DTB that I get from u-boot.bin in my context.
>>=20
>> Of course, I may just have missed something in looking
>> around at the unfamiliar material.
>=20
>=20
=3D=3D=3D
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?903FE769-ED46-4FBC-A272-4D2C89A9CD7A>
