Date: Wed, 08 Jan 2014 08:19:54 -0700 From: Ian Lepore <ian@FreeBSD.org> To: Nathan Whitehorn <nwhitehorn@FreeBSD.org> Cc: svn-src-head@FreeBSD.org, svn-src-all@FreeBSD.org, src-committers@FreeBSD.org Subject: Re: svn commit: r260440 - head/sys/arm/conf Message-ID: <1389194394.1158.362.camel@revolution.hippie.lan> In-Reply-To: <52CCD1DA.7010008@freebsd.org> References: <201401080340.s083eIDG054652@svn.freebsd.org> <52CCD1DA.7010008@freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, 2014-01-07 at 23:19 -0500, Nathan Whitehorn wrote: > On 01/07/14 22:40, Ian Lepore wrote: > > Author: ian > > Date: Wed Jan 8 03:40:18 2014 > > New Revision: 260440 > > URL: http://svnweb.freebsd.org/changeset/base/260440 > > > > Log: > > Add option USB_HOST_ALIGN to configs that contain 'device usb'. Setting > > this to the cache line size is required to avoid data corruption on armv4 > > and armv5, and improves performance on armv6, in both cases by avoiding > > partial cacheline flushes for USB IO. > > > > All these configs already exist in 10-stable. A few that don't (and > > thus can't be MFC'd yet) will be committed separately. > > > > There has to be -- and I do not mean this as a criticism of your patch > -- a better solution to this problem than USB_HOST_ALIGN. Isn't busdma > supposed to handle this kind of thing? Why is USB different? > -Nathan > USB is different because it doesn't follow the busdma rules. It allocates one large buffer, then sub-divides it internally into bits that are used for DMA IO and adjacent bits that are accessed by the cpu concurrently with the DMA. If it doesn't do that subdividing with an awareness of the cache line boundaries, it ends up with concurrent CPU and DMA access to data in the same cache line, and there's no way a software-assisted cache coherency scheme can reliably do busdma sync ops that don't corrupt either the CPU data or the DMA data. On armv6 we now automatically bounce IO that's not sized and aligned on cache line boundaries. The overhead for doing so is non-trivial, doubly so in the case of USB, because it's the only consumer of busdma in the system that requires that the offset-within-page for a bounced IO be the same as the offset in the original page (so a pool of small bounce buffers for small unligned IOs is not an option, it must allocate full bounce pages for every IO). It used to be (on armv4) that when you used the busdma alloc functions to allocate small DMA buffers (a few bytes) the implementation allocated entire pages, which is pretty inefficient and can add up to a lot of allocation overhead. That was cited as a reason not to change USB's "allocate big then subdivide" scheme. I wrote new busdma allocators that use UMA pools to efficiently handle small aligned buffers of both normal and uncachable (BUSDMA_COHERENT) memory, so that's not a roadblock anymore. (Arm uses the new allocator, mips never got converted.) So, since we keep getting reports on arm@ of data corruption that shows up as 32-byte chunks of bad data, and it costs real time and resources to try to debug each case, I figured we should just go with the fix that nobody likes but it actually works. -- Ian
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1389194394.1158.362.camel>