Date: Wed, 05 Sep 2012 09:09:59 -0600 From: Ian Lepore <freebsd@damnhippie.dyndns.org> To: Warner Losh <imp@bsdimp.com> Cc: freebsd-arm@freebsd.org, freebsd-mips@freebsd.org, freebsd-arch@freebsd.org Subject: Re: Some busdma stats Message-ID: <1346857799.59094.66.camel@revolution.hippie.lan> In-Reply-To: <3AFC763F-011C-46B4-B500-FE21B704259F@bsdimp.com> References: <1346689154.1140.601.camel@revolution.hippie.lan> <3AFC763F-011C-46B4-B500-FE21B704259F@bsdimp.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, 2012-09-05 at 08:30 -0600, Warner Losh wrote: > > Regardless of whether we eventually fix every driver to eliminate > > transfers that aren't aligned to cache line boundaries, or somehow > > change the busdma code to automatically bounce unaligned requests, > we > > need efficient allocation of small buffers aligned and sized to > cache > > lines. > > The issue can't be fixed in the busdma code because partial, unaligned > transfers are fine, so long as the calling code avoids the entire > cache line during the transfer. Returning cache-line aligned buffers > from the allocator will do that, of course, but it is also valid for > the code to only use part of the buffer for the transfer. Right. My goal with the dma buffer pool changes isn't some sort of magical automatic fix in the busdma layer, it's just a whittling away of one small roadblock on the path to fixing this stuff. When I first started asking about how we should address these problems, the experts said to keep platform-specific alignment and padding information encapsulated within the busdma layer rather than inventing a new mechanism to export that info to drivers. That implies that drivers should be allocating DMA buffers from busdma instead of allocating big chunks of memory and sub-dividing them into smaller buffers. For that to work, the busdma implementation needs to be able to efficiently allocate buffers that are properly aligned and padded and especially that are guaranteed not to share a cache line with some other unrelated data. The busdma implementation can't get those guarantees from malloc(9), and the alternatives (contigmalloc(), and the kmem_alloc family) only work in page-sized chunks. We're asking drivers to allocate individual buffers of sometimes no more than a few bytes each. So that's all I'm addressing in the patchset I submitted: make sure that when we start fixing drivers to allocate 256 individual 16-byte IO descriptors that it shares with the hardware, that doesn't result in allocating 256 pages of memory. Also, if the request is for BUS_DMA_COHERENT memory, make sure that doesn't result in turning off caching in up to 256 pages that each contain a 16 byte IO buffer and 4080 bytes of unrelated data. -- Ian
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1346857799.59094.66.camel>