Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 20 Dec 2012 08:27:40 +0100
From:      Hans Petter Selasky <hselasky@c2i.net>
To:        Warner Losh <imp@bsdimp.com>
Cc:        Oleksandr Tymoshenko <gonzo@freebsd.org>, Andrew Turner <andrew@fubar.geek.nz>, freebsd-usb@freebsd.org
Subject:   Re: EHCI on armv6 with Write-Back caches
Message-ID:  <201212200827.40860.hselasky@c2i.net>
In-Reply-To: <4A66C6C9-22EC-45AA-987D-49F958D7A8F9@bsdimp.com>
References:  <20121218204931.5322922d@fubar.geek.nz> <201212190956.28609.hselasky@c2i.net> <4A66C6C9-22EC-45AA-987D-49F958D7A8F9@bsdimp.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wednesday 19 December 2012 20:12:55 Warner Losh wrote:
> On Dec 19, 2012, at 1:56 AM, Hans Petter Selasky wrote:
> > Hi again,
> > 
> > Different vendors use different naming conventions about sync operations.
> 
> The different names are often for subtly different types of cache flushing.
> 
> > Maybe we should start defining some names and agree about that first?
> > 
> > usb_pc_cpu_flush:
> > 
> > This function is a system abstraction which is supposed to ensure that
> > all CPU cached data for the given buffer is written to RAM before this
> > function returns.
> 

Hi,

> First off, why can't you just use the normal busdma interface?

Because the API was confusing when I wrote the USB code. It wasn't obvious 
what the different functions were doing.

> 
> Second, does this also invalidate the cache lines? There's many kinds of
> flushes as well, depending on the cache coherency model for MP..

No, flush as used in the USB code does not require cache line invalidation.

> 
> > usb_pc_cpu_invalidate:
> > 
> > This function is a system abstraction which is supposed to ensure that
> > all CPU cached data is cleared for the given buffer.
> 
> So does this throw the data that's in the cache away? Or does it write back
> before discarding it?

It should trow away all data in the cache for the given range. No write back 
first.

> 
> > These functions have been carefully added in all the USB drivers using
> > DMA.
> > 
> > Atomicity:
> > 
> > I understand that the ARM hardware is not always compatible to this
> > approach.
> 
> Yes, since it fails to capture the subtly of the ARM hardware which the
> busdma abstraction captures.
> 
> > 1) Flushing data to RAM is not a problem in any case? Do you agree?
> 
> As long as there's no DMA outstanding to the memory addressed by the cache
> line being flushed, yes.
> 
> > 2) Invalidating data is a problem, because invalidation can cause nearby
> > data to be cleared aswell. So basically for those systems which are not
> > handling this, flushing data means a lock of all CPU's until the
> > flush/invalidate sequence is complete? Any dispute about this?
> 
> Yes. You can generally only invalidate to the cache line level. That's why
> I was asking about the subtly in what you mean by invalidate.  ARM can and
> does flush data properly, but this might not match your usage.
> 
> You forgot
> 
> 3) Touching data in the same cache line while the DMA is going on.
> 
> This is a complete no-no and must never be done. Cache lines must not be
> polluted while the DMA is happening.
> 
> > If the CPU does not support certain features we cannot have an efficient
> > system. It is like having a CPU which doesn't support switching off
> > interrupt levels, like an 8-bit AVR.
> 
> Most RISC CPUs don't support a coherent cache, it is true. However, they do
> support efficient and succinct cache operations. The busdma abtraction
> handles this nicely for all other kinds of devices, why is usb so
> different?

> 
> > Then no matter how you twist it, you cannot
> > postpone an interrupt to a software thread. Same goes for DMA support. If
> > their DMA engine doesn't support byte granularity and possibility to
> > flush/invalidate on a per-byte basis, then implement a global system lock
> > to flush/invalidate data like I suggest, if this is not doable in the
> > hardware by the CPU instruction set.
> 
> The DMA engine support it. You can do byte-aligned transfers all day long.
> 
> However, the cache does not. You must segregate data to get proper
> operations. There's nothing that you can do about that. You must not mix
> accesses to memory within a cache line. To do so will cause problems, as
> evidence by the problems we are seeing.
> 
> > The approach that I was recommended several years ago, is that I can pass
> > a pointer to a buffer, which then can be transferred by the USB engine.
> > This pointer can be any pointer except NULL.
> > 
> > 3) As per my knowledge, using busdma to allocate a separate buffer for a
> > 13- byte buffer, results in having a buffer of PAGE_SIZE bytes
> > allocated.
> 
> Not with Ian's slab allocator...
> 
> > 4) BUSDMA experts: Please verify that the flags passed to
> > bus_dmamap_sync() causes the exact behaviour has listed on top of this
> > e-mail to occur in the following two functions in
> > sys/dev/usb/usb_busdma.c
> > 
> > void
> > usb_pc_cpu_invalidate(struct usb_page_cache *pc)
> > {
> > 
> >        if (pc->page_offset_end == pc->page_offset_buf) {
> >        
> >                /* nothing has been loaded into this page cache! */
> >                return;
> >        
> >        }
> >        
> >        /*
> >        
> >         * TODO: We currently do XXX_POSTREAD and XXX_PREREAD at the
> >         * same time, but in the future we should try to isolate the
> >         * different cases to optimise the code. --HPS
> >         */
> >        
> >        bus_dmamap_sync(pc->tag, pc->map, BUS_DMASYNC_POSTREAD);
> >        bus_dmamap_sync(pc->tag, pc->map, BUS_DMASYNC_PREREAD);
> > 
> > }
> 
> This is almost certainly wrong.  POSTREAD should be called right after the
> read had completed.  PREREAD should be called immediately before starting
> a DMA operation after all adjacent memory accesses have stopped.
> 
> > void
> > usb_pc_cpu_flush(struct usb_page_cache *pc)
> > {
> > 
> >        if (pc->page_offset_end == pc->page_offset_buf) {
> >        
> >                /* nothing has been loaded into this page cache! */
> >                return;
> >        
> >        }
> >        bus_dmamap_sync(pc->tag, pc->map, BUS_DMASYNC_PREWRITE);
> > 
> > }
> 
> This is likely correct, but insufficient.  There's no POSTWRITE sync that's
> done after all the data has been transferred.

I can offer to implement a USB bounce buffer that fixes these issues. I think 
the existing USB API's and ways of doing stuff below the USB controllers is 
fine, and then check the drivers which pass pointers, like umass that the 
buffers we get are allocated correctly.

Is there a way in 9-stable and 10-stable to force allocation of non-cached 
memory for the platforms in question, ARM, RISC, SPARC ... Because really, I 
think that would be the best! For example it is very slow to update a word and 
then invalidate a 4K region, which is very frequently the use-case for USB.

--HPS



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201212200827.40860.hselasky>