Date: Thu, 23 Aug 2012 21:56:21 -0600 From: Warner Losh <imp@bsdimp.com> To: Ian Lepore <freebsd@damnhippie.dyndns.org> Cc: freebsd-arm@freebsd.org, freebsd-mips@freebsd.org, freebsd-arch@freebsd.org Subject: Re: Partial cacheline flush problems on ARM and MIPS Message-ID: <D1D31F7F-5A70-4139-85DC-D5A931BE0233@bsdimp.com> In-Reply-To: <1345765503.27688.602.camel@revolution.hippie.lan> References: <1345757300.27688.535.camel@revolution.hippie.lan> <3A08EB08-2BBF-4B0F-97F2-A3264754C4B7@bsdimp.com> <1345763393.27688.578.camel@revolution.hippie.lan> <FD8DC82C-AD3B-4EBC-A625-62A37B9ECBF1@bsdimp.com> <1345765503.27688.602.camel@revolution.hippie.lan>
next in thread | previous in thread | raw e-mail | index | archive | help
On Aug 23, 2012, at 5:45 PM, Ian Lepore wrote: > On Thu, 2012-08-23 at 17:26 -0600, Warner Losh wrote: >>> On Aug 23, 2012, at 3:28 PM, Ian Lepore wrote: >>> Now we have a new type of constraint, I think of it as = "granularity". >>> In effect, we have a DMA system that can only do DMA in cacheline = sized >>> chunks. Even when the IO size -- and thus the number of "bits on = the >>> wire" -- is less than the cacheline size, at the end of the DMA >>> operation (which includes the software-assisted coherency = operations) >>> the number of bytes in memory that may be modified is the size of a >>> cacheline. This is because "the DMA system" is not just the engine = that >>> moves bytes around, it's the combination of hardware and software = that >>> work together to maintain cache coherency. >> But this isn't new. It is an alignment requirement, which carries >> with it an implicit size requirement. If you enforce the alignment, >> and force all 'sub buffers' to have this alignment, you don't need = the >> new thing.=20 >=20 > So do you think it's safe to assume that any given dma tag that has an > alignment constraint also implicitly has a buffer size constraint that > the size must be a multiple of the alignment? Yes. If something must be aligned to N bits, chances are it doesn't = decode the lower N bits which implies a size constraint. > What if we have a platform with a 32-byte cacheline / DMA granularity, > and then we have a builtin device on that SoC which can only do DMA on = a > 64K alignment (which its tag would reflect), but the hardware can move > as little as 1 byte at a time? Children of that bridge device come > along and allocate little 16-byte buffers that eat 16 pages each. It > doesn't seem all that far-fetched to me. This would be a very odd hardware. DMA aligned to 64k that can only = move one byte seems far fetched. How useful would such a design be? = How would you do scatter gather on such a design? But this isn't what I'm saying. If the cache line size is 32, then for = DMA we only ever give out chunks of 32 or larger. In that case, the = split cache line situation you gave as an example can't happen.=
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?D1D31F7F-5A70-4139-85DC-D5A931BE0233>