Date: Wed, 14 Jan 2015 08:03:52 -0700 From: Ian Lepore <ian@freebsd.org> To: Hans Petter Selasky <hps@selasky.org> Cc: kott <kmatpral@yahoo.com>, freebsd-usb@freebsd.org Subject: Re: usb_pc_cpu_flush Message-ID: <1421247832.14601.258.camel@freebsd.org> In-Reply-To: <54B5FAE8.7020705@selasky.org> References: <1419359192795-5975583.post@n5.nabble.com> <5499E734.1070507@selasky.org> <1419392511197-5975691.post@n5.nabble.com> <549A811D.3060204@selasky.org> <1419416870924-5975752.post@n5.nabble.com> <1419423740820-5975763.post@n5.nabble.com> <549AB711.8070005@selasky.org> <1419431704871-5975773.post@n5.nabble.com> <549BF430.8000207@selasky.org> <1419877515606-5976832.post@n5.nabble.com> <1421133295061-5980199.post@n5.nabble.com> <1421160576.14601.175.camel@freebsd.org> <54B53956.4090708@selasky.org> <1421163656.14601.184.camel@freebsd.org> <54B54073.6000409@selasky.org> <1421166591.14601.195.camel@freebsd.org> <54B54D4D.3010805@selasky.org> <1421180700.14601.209.camel@freebsd.org> <54B5FAE8.7020705@selasky.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, 2015-01-14 at 06:13 +0100, Hans Petter Selasky wrote: > On 01/13/15 21:25, Ian Lepore wrote: > > On Tue, 2015-01-13 at 17:52 +0100, Hans Petter Selasky wrote: > >> On 01/13/15 17:29, Ian Lepore wrote: > >>> On Tue, 2015-01-13 at 16:57 +0100, Hans Petter Selasky wrote: [...] > >> > >> Hi, > >> > >> Can you give an example of a NIC driver which you consider a good > >> example which use DMA both for data (not only mbufs) and the control > >> path and use busdma as you consider to be correct? > >> > >> --HPS > > > > dev/ffec/if_ffec.c. I'm not happy with the fact that it takes two calls > > (a PRE and a POST) to accomplish a single action, but that's the right > > way to do things in the current busdma world, PRE and POST operations > > need to be paired. > > > > I think we need new busdma support for shared concurrent access > > descriptors, because it's a type of dma that just isn't supported well > > by the existing API. There should be a new flag for bus_dmamem_alloc() > > that indicates the memory is going to be used for such shared access > > (because some platforms may be able to provide memory that's mapped > > optimally for such situations), and there should be new sync ops that > > don't require a pair of calls to accomplish a single action. > > > > All of this is in the context of shared descriptor memory. Regular IO > > buffers should just use the proper sequence of PRE and POST syncs (and > > most especially should *never* do POSTREAD before PREREAD like the > > current usb_pc_cpu_invalidate() does, because with bounce buffers that > > will just not work right). > > > > Hi, > > I understand and that can be done for IO-buffers like in the FFEC > driver. For other buffers it is a bigger task, however: > > I see that some memory is allocated using "BUS_DMA_COHERENT" in > if_ffec.c and in those cases no busdma sync operations are performed! Is > that correct? For example "rxdesc_ring" is allocated coherently and no > cache sync ops are done before and after access. > No, it does do sync operations on the descriptors rings, because one of the documented rules of coherent memory is that you still have to do all the normal sync ops. Traditionally coherent has meant "completely disable caching" on arm and you could get away with not doing the sync ops, but that's not true anymore. Arm memory subsystems are more complex these days and sync ops are needed in some cases even for uncached memory. In the current ffec code, sync ops on descriptor memory are done on line 655/657 (these could really be back to back without the TDAR write between them). At this point new descriptors have been written to the tx ring and the PREWRITE makes the changes visible to the hardware, the write into the TDAR register informs the hardware that descriptors were changed and it should rescan the list. The POSTWRITE is a no-op but included for technical correctness. Another desc-ring related set of sync ops is on lines 683-684. In this case the hardware has updated some of the descriptors and the PREREAD/POSTREAD sync ops ensure that the cpu will see the changes. The same is true on lines 850-851. This is the place where having two calls is really a performance hit, because both PRE and POSTREAD sync will do an invalidate, and it really only needs to be done once, but the busdma conventions are that PRE and POST ops need to come in properly-sequenced pairs. And finally on 899-900 is the rx code similar to the tx code on 655/657, where the driver has updated descriptors in the ring and done a PREWRITE to make the changes visible to the hardware. > The buffer that Mr. Kott hit a crash on should also be allocated > coherently. Looking at the ARM busdma code in -current I see a possible bug: > > > int > > _bus_dmamap_load_buffer(bus_dma_tag_t dmat, bus_dmamap_t map, void *buf, > > bus_size_t buflen, struct pmap *pmap, int flags, bus_dma_segment_t *segs, > > int *segp) > > { > > bus_size_t sgsize; > > bus_addr_t curaddr; > > struct sync_list *sl; > > vm_offset_t vaddr = (vm_offset_t)buf; > > int error = 0; > > > > if (segs == NULL) > > segs = dmat->segments; > > if ((flags & BUS_DMA_LOAD_MBUF) != 0) > > map->flags |= DMAMAP_CACHE_ALIGNED; > > > > if ((dmat->flags & BUS_DMA_COULD_BOUNCE) != 0) { > > _bus_dmamap_count_pages(dmat, map, pmap, buf, buflen, flags); > > if (map->pagesneeded != 0) { > > error = _bus_dmamap_reserve_pages(dmat, map, flags); > > if (error) > > return (error); > > } > > } > > CTR3(KTR_BUSDMA, "lowaddr= %d boundary= %d, " > > "alignment= %d", dmat->lowaddr, dmat->boundary, dmat->alignment); > > > > while (buflen > 0) { > > /* > > * Get the physical address for this segment. > > */ > > if (__predict_true(pmap == kernel_pmap)) { > > curaddr = pmap_kextract(vaddr); > > } else { > > curaddr = pmap_extract(pmap, vaddr); > > map->flags &= ~DMAMAP_COHERENT; > > } > > If the memory was allocated coherently and we clear the DMAMAP_COHERENT > flag when we are loading the memory, the memory will get freed to the > wrong bucket when we later free it - right? > > > if (map->flags & DMAMAP_COHERENT) > > ba = coherent_allocator; > > else > > ba = standard_allocator; > > > > Kott: Can you try to comment out: > > map->flags &= ~DMAMAP_COHERENT; > > In "sys/arm/arm/busdma_machdep.c" and see if it makes any difference? > > Ian: Can you explain what the check "pmap == kernel_pmap" means when it > is true? Does it mean that the memory is mapped into virtual kernel > space? And if false? > Firstly, that's armv4 code that isn't in play on an armv6/7 platform. Secondly, remember how I said recently that handling IO directly in/out of userland buffers wasn't likely to work correctly on arm? (I haven't forgotten that you asked a followup question, I just haven't found time to reply in detail yet.) What you've spotted above is a little bit of the old attempts to try to support that on armv4. It never worked right. The bottom line is that pmap will always equal kernel_pmap. The else of that code really should be changed into a panic(). > To optimise BUSDMA allocations the USB stack will allocate a PAGE_SIZE > object and then load smaller parts of the PAGE_SIZE allocation using > BUSDMA. Is this supported by BUSDMA? If you are allocating the memory using bus_dmamem_alloc() then no, absolutely not. It's working for you by accident because of the USB_HOST_ALIGN hacks which I think date back to freebsd 8 or so. When I talked to you a couple years ago about doing it the right way, you were outraged (perhaps rightly so) about the inefficiency involved, because of things like allocating a 16 byte buffer causing the allocation of a full page just so that the caching attributes could be changed. So I fixed that with the busdma zone allocator stuff that makes it very efficient to allocate many tiny dma buffers (coherent or not), and when you use buffers allocated that way, the sync operations code can make some safe assumptions and avoid doing bounces due to cacheline alignments. If buffers were allocated right-sized now that it's efficient to do so, the USB_HOST_ALIGN hack could go away. Hmm, actually it might be necessary to start using the busdma zone allocator in mips as well before undoing the hack. -- Ian
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1421247832.14601.258.camel>