Date: Tue, 22 May 2012 07:15:29 -0400 From: Alexander Kabaev <kabaev@gmail.com> To: Hans Petter Selasky <hselasky@c2i.net> Cc: freebsd-hackers@freebsd.org, hackers@freebsd.org, Svatopluk Kraus <onwahe@gmail.com> Subject: Re: ARM + CACHE_LINE_SIZE + DMA Message-ID: <20120522071529.7024604c@kan.dyndns.org> In-Reply-To: <201205220756.43031.hselasky@c2i.net> References: <CAFHCsPUdZXGKFvmVGgaEUsfhwd28mNVGaY84ExcJp=ogQxzPJQ@mail.gmail.com> <CAP%2BM-_GbAnAZzJaa=diGACGuYGeSo6zqD-CBbiOL61vw%2B1eJEg@mail.gmail.com> <20120521193548.0b03a39a@kan.dyndns.org> <201205220756.43031.hselasky@c2i.net>
next in thread | previous in thread | raw e-mail | index | archive | help
--Sig_/.VimYYiHvRXBu6csQ.HtaeF Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: quoted-printable On Tue, 22 May 2012 07:56:42 +0200 Hans Petter Selasky <hselasky@c2i.net> wrote: > On Tuesday 22 May 2012 01:35:48 Alexander Kabaev wrote: > > On Thu, 17 May 2012 11:01:34 -0500 > >=20 > > Mark Tinguely <marktinguely@gmail.com> wrote: > > > On Thu, May 17, 2012 at 8:20 AM, Svatopluk Kraus > > > <onwahe@gmail.com> > > >=20 > > > wrote: > > > > Hi, > > > >=20 > > > > I'm working on DMA bus implementation for ARM11mpcore platform. > > > > I've looked at implementation in ARM tree, but IMHO it only > > > > works with some assumptions. There is a problem with DMA on > > > > memory block which is not aligned on CACHE_LINE_SIZE (start and > > > > end) if memory is not coherent. > > > >=20 > > > > Let's have a buffer for DMA which is no aligned on > > > > CACHE_LINE_SIZE. Then first cache line associated with the > > > > buffer can be divided into two parts, A and B, where A is a > > > > memory we know nothing about it and B is buffer memory. The > > > > same stands for last cache line associatted with the buffer. We > > > > have no problem if a memory is coherent. Otherwise it depends > > > > on memory attributes. > > > >=20 > > > > 1. [no cache] attribute > > > > No problem as memory is coherent. > > > >=20 > > > > 2. [write throught] attribute > > > > The part A can be invalidated without loss of any data. It's not > > > > problem too. > > > >=20 > > > > 3. [write back] attribute > > > > In general, there is no way how to keep both parts consistent. > > > > At the start of DMA transaction, the cache line is written back > > > > and invalidated. However, as we know nothing about memory > > > > associated with part A of the cache line, the cache line can be > > > > filled again at any time and messing up DMA transaction if > > > > flushed. Even if the cache line is only filled but not flushed > > > > during DMA transaction, we must make it coherent with memory > > > > after that. There is a trick with saving part A of the line > > > > into temporary buffer, invalidating the line, and restoring > > > > part A in current ARM (MIPS) implementation. However, if > > > > somebody is writting to memory associated with part A of the > > > > line during this trick, the part A will be messed up. Moreover, > > > > the part A can be part of another DMA transaction. > > > >=20 > > > > To safely use DMA with no coherent memory, a memory with [no > > > > cache] or [write throught] attributes can be used without > > > > problem. A memory with [write back] attribute must be aligned on > > > > CACHE_LINE_SIZE. > > > >=20 > > > > However, for example mbuf, a buffer for DMA can be part of a > > > > structure which can be aligned on CACHE_LINE_SIZE, but not the > > > > buffer itself. We can know that nobody will write to the > > > > structure during DMA transaction, so it's safe to use the > > > > buffer event if it's not aligned on CACHE_LINE_SIZE. > > > >=20 > > > > So, in practice, if DMA buffer is not aligned on > > > > CACHE_LINE_SIZE and we want to avoid bounce pages overhead, we > > > > must support additional information to DMA transaction. It > > > > should be easy to support the information about drivers data > > > > buffers. However, what about OS data buffers like mentioned > > > > mbufs? > > > >=20 > > > > The question is following. Is or can be guaranteed for all or at > > > > least well-known OS data buffers which can be part of DMA access > > > > that the not CACHE_LINE_SIZE aligned buffers are surrounded by > > > > data which belongs to the same object as the buffer and the > > > > data is not written by OS when given to a driver? > > > >=20 > > > > Any answer is appreciated. However, 'bounce pages' is not an > > > > answer. > > > >=20 > > > > Thanks, Svata > > >=20 > > > Sigh. A several ideas by several people, but a good answer has not > > > been created yet. SMP will make this worse. > > >=20 > > > To make things worse, there are drivers that use memory from the > > > stack as DMA buffers. > > >=20 > > > I was hoping that mbufs are pretty well self-contained, unless you > > > expect to modify them while DMA is happening. > > >=20 > > > This is on my to-do list. > > >=20 > > > --Mark. > >=20 > > Drivers that do DMA from memory that was not allocated by proper > > busdma methods or load buffers for DMA using not properly > > constrained busdma tags are broken drivers. We did not have a > > busdma tag inheritance from parent bus to child devices before, but > > now we should just take advantage of that and just make cache line > > alignment a requirement for the platform. USB is firmly in that > > 'broken' category btw and is currently being worked around by the > > USB_HOST_ALIGN hack on MIPS, which suffers from the very same cache > > coherency issues you describe. >=20 > Hi, >=20 > Drivers do not always use the same buffer format. That mean two > entities exchanging data using different buffer allocations must > either: >=20 > 1) Copy the data > 2) Negotiate parameters for zero copy >=20 > Many USB protocols have headers which are designed without any > thought about ARM's and CACHE alignment. That means byte access via > DMA must be supported, else you end up having to copy the data > en-mass. >=20 > The USB_HOST_ALIGN is not a hack. It is coherently implemented across > EHCI, OHCI, UHCI and XHCI drivers, which are currently the only USB > drivers using DMA. >=20 > BUSDMA must instruct use of bounce buffers for case 1) for such CPU's > where the loading address does not satisfy the CACHE alignment > restrictions for DMA. >=20 > Simply copying the data into a correctly aligned buffer can sometimes > be much quicker than trying to handle the cache correctly. Even > though the data will be copied one extra time. This of course depends > on how much data is moved at a time. >=20 > --HPS There is a difference between dealing with data of arbitrary origin, such as zero-copy userland buffer, which might force the use of bounce buffers, and allocating dma-able region in the middle of data structure which itself is allocated by the plain malloc, with no regards to DMA restrictions that device's parent bus might we trying to enforce on its children. Former is a necessity, latter is a self-inflicted pain. --=20 Alexander Kabaev --Sig_/.VimYYiHvRXBu6csQ.HtaeF Content-Type: application/pgp-signature; name=signature.asc Content-Disposition: attachment; filename=signature.asc -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.18 (FreeBSD) iD8DBQFPu3VVQ6z1jMm+XZYRAms6AKDCNkgASUTvFNrFD0oQSUmpL8lzbwCeOlfe wVRp9mtE/20r5EgaGtsCQys= =4n6g -----END PGP SIGNATURE----- --Sig_/.VimYYiHvRXBu6csQ.HtaeF--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20120522071529.7024604c>