Date: Fri, 20 Mar 2015 08:27:27 -0600 From: Ian Lepore <ian@freebsd.org> To: John Wehle <john@feith.com> Cc: freebsd-arm@freebsd.org Subject: Re: current meaning of BUS_DMA_COHERENT Message-ID: <1426861647.24655.12.camel@freebsd.org> In-Reply-To: <201503200535.t2K5ZQdo011380@jwlab.FEITH.COM> References: <201503200535.t2K5ZQdo011380@jwlab.FEITH.COM>
next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, 2015-03-20 at 01:35 -0400, John Wehle wrote: > The bus_dmamap_create manual page says: > > Attempt to map the memory loaded with this map such that cache sync > operations are as cheap as possible. > ... > Use of this flag does not remove the requirement of using bus_dmamap_sync() > > However busdma_machdep-v6.c for ARM has the comment: > > * Create a cache of buffers in uncacheable memory, to implement the > * BUS_DMA_COHERENT (and potentially BUS_DMA_NOCACHE) flag. > > Is the manual page out of date? Does FreeBSD now guarentee that memory > allocated by bus_dmamem_alloc using BUS_DMA_COHERENT doesn't require > bus_dmamap_sync operations? > > Part of my concern is that there appears to be existing FreeBSD device > driver code that assumes BUS_DMA_COHERENT means uncacheable memory > (i.e. bus_dmamap_sync is unnecessary) which I thought was indicated > by BUS_DMA_NOCACHE, not BUS_DMA_COHERENT. > No, the manpage is right and far too much driver code is wrong. The main problem is shared concurrent-access memory, such as buffer descriptor rings which are accessed simultaneously by the cpu and network hardware. (The same thing happens with other hardware, but NICs are the prime example of it.) We don't have any sensible sync operations that correspond to what you're logically doing when you access such shared memory. About the closest you can come is to do both pre and post sync ops back to back: bus_dmamap_sync(tag, map, BUS_DMASYNC_PREREAD); bus_dmamap_sync(tag, map, BUS_DMASYNC_POSTREAD); It might be nice to use PREREAD|POSTREAD, except for some reason the manpage specifically says to not do that. (We could fix the manpage, but only after examining the implementation for every arch to make sure it doesn't cause problems.) The requirement that a sync is done on an entire map is part of the problem with this stuff. A descriptor ring may be several pages of data containing a thousand descriptors, and needing to sync it before and after every access by the CPU causes a whole lot of slow expensive (and mostly unecessary) cache operations. What we really need is a new type of busdma memory (BUS_DMA_DESCRIPTOR) and a special sync call to use in conjunction with it that takes an offset and length, and the sync is a single operation, no pre/post stuff. Then you could sync each descriptor immediately before reading and writing it, which would translate to a single cacheline flush instead of a loop that does all the lines in the whole ring. For an example of a driver that has all the right calls, see dev/ffec. I know it does it right, because I was experimenting with changing COHERENT on arm to something other than strongly-ordered and suddenly the network stopped working until I put in all the right sync ops. -- Ian
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1426861647.24655.12.camel>