From owner-freebsd-hackers@FreeBSD.ORG Tue May 22 06:02:38 2012 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A205C1065692; Tue, 22 May 2012 06:02:38 +0000 (UTC) (envelope-from hselasky@c2i.net) Received: from swip.net (mailfe03.c2i.net [212.247.154.66]) by mx1.freebsd.org (Postfix) with ESMTP id DD0AA8FC17; Tue, 22 May 2012 06:02:37 +0000 (UTC) X-T2-Spam-Status: No, hits=-1.0 required=5.0 tests=ALL_TRUSTED Received: from [176.74.212.201] (account mc467741@c2i.net HELO laptop015.hselasky.homeunix.org) by mailfe03.swip.net (CommuniGate Pro SMTP 5.4.4) with ESMTPA id 110189664; Tue, 22 May 2012 07:57:29 +0200 From: Hans Petter Selasky To: freebsd-hackers@freebsd.org Date: Tue, 22 May 2012 07:56:42 +0200 User-Agent: KMail/1.13.7 (FreeBSD/9.0-STABLE; KDE/4.7.4; amd64; ; ) References: <20120521193548.0b03a39a@kan.dyndns.org> In-Reply-To: <20120521193548.0b03a39a@kan.dyndns.org> X-Face: 'mmZ:T{)),Oru^0c+/}w'`gU1$ubmG?lp!=R4Wy\ELYo2)@'UZ24N@d2+AyewRX}mAm; Yp |U[@, _z/([?1bCfM{_"B<.J>mICJCHAzzGHI{y7{%JVz%R~yJHIji`y>Y}k1C4TfysrsUI -%GU9V5]iUZF&nRn9mJ'?&>O MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-15" Content-Transfer-Encoding: 7bit Message-Id: <201205220756.43031.hselasky@c2i.net> Cc: hackers@freebsd.org, Svatopluk Kraus Subject: Re: ARM + CACHE_LINE_SIZE + DMA X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 22 May 2012 06:02:38 -0000 On Tuesday 22 May 2012 01:35:48 Alexander Kabaev wrote: > On Thu, 17 May 2012 11:01:34 -0500 > > Mark Tinguely wrote: > > On Thu, May 17, 2012 at 8:20 AM, Svatopluk Kraus > > > > wrote: > > > Hi, > > > > > > I'm working on DMA bus implementation for ARM11mpcore platform. I've > > > looked at implementation in ARM tree, but IMHO it only works with > > > some assumptions. There is a problem with DMA on memory block which > > > is not aligned on CACHE_LINE_SIZE (start and end) if memory is not > > > coherent. > > > > > > Let's have a buffer for DMA which is no aligned on CACHE_LINE_SIZE. > > > Then first cache line associated with the buffer can be divided into > > > two parts, A and B, where A is a memory we know nothing about it > > > and B is buffer memory. The same stands for last cache line > > > associatted with the buffer. We have no problem if a memory is > > > coherent. Otherwise it depends on memory attributes. > > > > > > 1. [no cache] attribute > > > No problem as memory is coherent. > > > > > > 2. [write throught] attribute > > > The part A can be invalidated without loss of any data. It's not > > > problem too. > > > > > > 3. [write back] attribute > > > In general, there is no way how to keep both parts consistent. At > > > the start of DMA transaction, the cache line is written back and > > > invalidated. However, as we know nothing about memory associated > > > with part A of the cache line, the cache line can be filled again > > > at any time and messing up DMA transaction if flushed. Even if the > > > cache line is only filled but not flushed during DMA transaction, > > > we must make it coherent with memory after that. There is a trick > > > with saving part A of the line into temporary buffer, invalidating > > > the line, and restoring part A in current ARM (MIPS) > > > implementation. However, if somebody is writting to memory > > > associated with part A of the line during this trick, the part A > > > will be messed up. Moreover, the part A can be part of another DMA > > > transaction. > > > > > > To safely use DMA with no coherent memory, a memory with [no cache] > > > or [write throught] attributes can be used without problem. A > > > memory with [write back] attribute must be aligned on > > > CACHE_LINE_SIZE. > > > > > > However, for example mbuf, a buffer for DMA can be part of a > > > structure which can be aligned on CACHE_LINE_SIZE, but not the > > > buffer itself. We can know that nobody will write to the structure > > > during DMA transaction, so it's safe to use the buffer event if > > > it's not aligned on CACHE_LINE_SIZE. > > > > > > So, in practice, if DMA buffer is not aligned on CACHE_LINE_SIZE and > > > we want to avoid bounce pages overhead, we must support additional > > > information to DMA transaction. It should be easy to support the > > > information about drivers data buffers. However, what about OS data > > > buffers like mentioned mbufs? > > > > > > The question is following. Is or can be guaranteed for all or at > > > least well-known OS data buffers which can be part of DMA access > > > that the not CACHE_LINE_SIZE aligned buffers are surrounded by data > > > which belongs to the same object as the buffer and the data is not > > > written by OS when given to a driver? > > > > > > Any answer is appreciated. However, 'bounce pages' is not an answer. > > > > > > Thanks, Svata > > > > Sigh. A several ideas by several people, but a good answer has not > > been created yet. SMP will make this worse. > > > > To make things worse, there are drivers that use memory from the > > stack as DMA buffers. > > > > I was hoping that mbufs are pretty well self-contained, unless you > > expect to modify them while DMA is happening. > > > > This is on my to-do list. > > > > --Mark. > > Drivers that do DMA from memory that was not allocated by proper busdma > methods or load buffers for DMA using not properly constrained busdma > tags are broken drivers. We did not have a busdma tag inheritance from > parent bus to child devices before, but now we should just take > advantage of that and just make cache line alignment a requirement for > the platform. USB is firmly in that 'broken' category btw and is > currently being worked around by the USB_HOST_ALIGN hack on MIPS, which > suffers from the very same cache coherency issues you describe. Hi, Drivers do not always use the same buffer format. That mean two entities exchanging data using different buffer allocations must either: 1) Copy the data 2) Negotiate parameters for zero copy Many USB protocols have headers which are designed without any thought about ARM's and CACHE alignment. That means byte access via DMA must be supported, else you end up having to copy the data en-mass. The USB_HOST_ALIGN is not a hack. It is coherently implemented across EHCI, OHCI, UHCI and XHCI drivers, which are currently the only USB drivers using DMA. BUSDMA must instruct use of bounce buffers for case 1) for such CPU's where the loading address does not satisfy the CACHE alignment restrictions for DMA. Simply copying the data into a correctly aligned buffer can sometimes be much quicker than trying to handle the cache correctly. Even though the data will be copied one extra time. This of course depends on how much data is moved at a time. --HPS