Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 20 Mar 2015 08:27:27 -0600
From:      Ian Lepore <ian@freebsd.org>
To:        John Wehle <john@feith.com>
Cc:        freebsd-arm@freebsd.org
Subject:   Re: current meaning of BUS_DMA_COHERENT
Message-ID:  <1426861647.24655.12.camel@freebsd.org>
In-Reply-To: <201503200535.t2K5ZQdo011380@jwlab.FEITH.COM>
References:  <201503200535.t2K5ZQdo011380@jwlab.FEITH.COM>

next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, 2015-03-20 at 01:35 -0400, John Wehle wrote:
> The bus_dmamap_create manual page says:
> 
>   Attempt to map the memory loaded with this map such that cache sync
>   operations are as cheap as possible.
>   ...
>   Use of this flag does not remove the requirement of using bus_dmamap_sync()
> 
> However busdma_machdep-v6.c for ARM has the comment:
> 
>  * Create a cache of buffers in uncacheable memory, to implement the
>  * BUS_DMA_COHERENT (and potentially BUS_DMA_NOCACHE) flag.
> 
> Is the manual page out of date?  Does FreeBSD now guarentee that memory
> allocated by bus_dmamem_alloc using BUS_DMA_COHERENT doesn't require
> bus_dmamap_sync operations?
> 
> Part of my concern is that there appears to be existing FreeBSD device
> driver code that assumes BUS_DMA_COHERENT means uncacheable memory
> (i.e. bus_dmamap_sync is unnecessary) which I thought was indicated
> by BUS_DMA_NOCACHE, not BUS_DMA_COHERENT.
> 

No, the manpage is right and far too much driver code is wrong.

The main problem is shared concurrent-access memory, such as buffer
descriptor rings which are accessed simultaneously by the cpu and
network hardware.  (The same thing happens with other hardware, but NICs
are the prime example of it.)  We don't have any sensible sync
operations that correspond to what you're logically doing when you
access such shared memory.  About the closest you can come is to do both
pre and post sync ops back to back:

	bus_dmamap_sync(tag, map, BUS_DMASYNC_PREREAD);
	bus_dmamap_sync(tag, map, BUS_DMASYNC_POSTREAD);

It might be nice to use PREREAD|POSTREAD, except for some reason the
manpage specifically says to not do that.  (We could fix the manpage,
but only after examining the implementation for every arch to make sure
it doesn't cause problems.)

The requirement that a sync is done on an entire map is part of the
problem with this stuff.  A descriptor ring may be several pages of data
containing a thousand descriptors, and needing to sync it before and
after every access by the CPU causes a whole lot of slow expensive (and
mostly unecessary) cache operations.

What we really need is a new type of busdma memory (BUS_DMA_DESCRIPTOR)
and a special sync call to use in conjunction with it that takes an
offset and length, and the sync is a single operation, no pre/post
stuff.  Then you could sync each descriptor immediately before reading
and writing it, which would translate to a single cacheline flush
instead of a loop that does all the lines in the whole ring.

For an example of a driver that has all the right calls, see dev/ffec.
I know it does it right, because I was experimenting with changing
COHERENT on arm to something other than strongly-ordered and suddenly
the network stopped working until I put in all the right sync ops.

-- Ian





Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1426861647.24655.12.camel>