Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 05 Sep 2012 09:09:59 -0600
From:      Ian Lepore <freebsd@damnhippie.dyndns.org>
To:        Warner Losh <imp@bsdimp.com>
Cc:        freebsd-arm@freebsd.org, freebsd-mips@freebsd.org, freebsd-arch@freebsd.org
Subject:   Re: Some busdma stats
Message-ID:  <1346857799.59094.66.camel@revolution.hippie.lan>
In-Reply-To: <3AFC763F-011C-46B4-B500-FE21B704259F@bsdimp.com>
References:  <1346689154.1140.601.camel@revolution.hippie.lan> <3AFC763F-011C-46B4-B500-FE21B704259F@bsdimp.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, 2012-09-05 at 08:30 -0600, Warner Losh wrote:
> > Regardless of whether we eventually fix every driver to eliminate
> > transfers that aren't aligned to cache line boundaries, or somehow
> > change the busdma code to automatically bounce unaligned requests,
> we
> > need efficient allocation of small buffers aligned and sized to
> cache
> > lines.
> 
> The issue can't be fixed in the busdma code because partial, unaligned
> transfers are fine, so long as the calling code avoids the entire
> cache line during the transfer.  Returning cache-line aligned buffers
> from the allocator will do that, of course, but it is also valid for
> the code to only use part of the buffer for the transfer.

Right.  My goal with the dma buffer pool changes isn't some sort of
magical automatic fix in the busdma layer, it's just a whittling away of
one small roadblock on the path to fixing this stuff.  When I first
started asking about how we should address these problems, the experts
said to keep platform-specific alignment and padding information
encapsulated within the busdma layer rather than inventing a new
mechanism to export that info to drivers.  That implies that drivers
should be allocating DMA buffers from busdma instead of allocating big
chunks of memory and sub-dividing them into smaller buffers.  

For that to work, the busdma implementation needs to be able to
efficiently allocate buffers that are properly aligned and padded and
especially that are guaranteed not to share a cache line with some other
unrelated data.  The busdma implementation can't get those guarantees
from malloc(9), and the alternatives (contigmalloc(), and the kmem_alloc
family) only work in page-sized chunks.  We're asking drivers to
allocate individual buffers of sometimes no more than a few bytes each.

So that's all I'm addressing in the patchset I submitted:  make sure
that when we start fixing drivers to allocate 256 individual 16-byte IO
descriptors that it shares with the hardware, that doesn't result in
allocating 256 pages of memory.  Also, if the request is for
BUS_DMA_COHERENT memory, make sure that doesn't result in turning off
caching in up to 256 pages that each contain a 16 byte IO buffer and
4080 bytes of unrelated data.

-- Ian





Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1346857799.59094.66.camel>