Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 26 Aug 2012 11:42:02 -0600
From:      Ian Lepore <freebsd@damnhippie.dyndns.org>
To:        Warner Losh <imp@bsdimp.com>
Cc:        Adrian Chadd <adrian@freebsd.org>, Hans Petter Selasky <hans.petter.selasky@bitfrost.no>, freebsd-arm@freebsd.org, freebsd-mips@freebsd.org, freebsd-arch@freebsd.org
Subject:   Re: Partial cacheline flush problems on ARM and MIPS
Message-ID:  <1346002922.1140.56.camel@revolution.hippie.lan>
In-Reply-To: <F8C9E811-8597-4ED0-9F9D-786EB2301D6F@bsdimp.com>
References:  <1345757300.27688.535.camel@revolution.hippie.lan> <3A08EB08-2BBF-4B0F-97F2-A3264754C4B7@bsdimp.com> <1345763393.27688.578.camel@revolution.hippie.lan> <FD8DC82C-AD3B-4EBC-A625-62A37B9ECBF1@bsdimp.com> <1345765503.27688.602.camel@revolution.hippie.lan> <CAJ-VmonOwgR7TNuYGtTOhAbgz-opti_MRJgc8G%2BB9xB3NvPFJQ@mail.gmail.com> <1345766109.27688.606.camel@revolution.hippie.lan> <CAJ-VmomFhqV5rTDf-kKQfbSuW7SSiSnqPEjGPtxWjaHFA046kQ@mail.gmail.com> <F8C9E811-8597-4ED0-9F9D-786EB2301D6F@bsdimp.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, 2012-08-23 at 22:00 -0600, Warner Losh wrote:
> The bottom line is that you can't mix things like that when cache
> lines are involved.  The current code that tries is doomed to failure.
> Doomed. You just can't control all flushes, as Ian's missive
> demonstrates, and trying to accommodate code that does this I don't
> think can possibly work.  All the interrupt masking, copying in and
> out, etc I fear is doomed to utter and abject failure.  
> 
Until last weekend I was in the camp that thought the partial cacheline
flush problem was solvable with sufficiently clever code.  Now I agree
that we're doomed to failure and it's time to try another direction.

We're going to have some implementation work to do in arm and mips
busdma, but I think the larger part of the task is going to be defining
more rigorously how a driver must interact with the busdma system to
function correctly on all types of platforms, and then update existing
drivers to conform.

The busdma manpage currently has some vague words about the usage and
sequencing of sync ops, such as "If read and write operations are not
preceded and followed by the appropriate synchronization operations,
behavior is undefined."  I think we should more explicitly spell out
what the appropriate sequences are.  In particular:

      * The PRE and POST operations must occur in pairs; a PREREAD must
        be followed eventually by a POSTREAD and a PREWRITE must be
        followed by a POSTWRITE.  
      * The CPU is not allowed to access the mapped memory after a PRE
        sync and before the corresponding POST sync.  
      * The DMA hardware is not allowed to access the mapped memory
        after a POST sync and before the next PRE sync. 
      * Read and write sync operators may be combined in a single call,
        PRE and POST operators may not be.  E.G., PREREAD|PREWRITE is
        allowed, PREREAD|POSTREAD is not.  We should note that while
        read and write operations may be combined, on some platforms
        PREREAD|PREWRITE is needlessly expensive when only a read is
        being performed.

We also need some rules about working with buffers obtained from
bus_dmamem_alloc() and external buffers passed to bus_dmamap_load().  I
think the rule should be that a buffer obtained from bus_dmamem_alloc(),
or more formally any region of memory mapped by a bus_dmamap_load(), is
a single logical object which can only be accessed by one entity at a
time.  That means that there cannot be two concurrent DMA operations
happening in different regions of the same buffer, nor can DMA and CPU
access be happening concurrently even if in different parts of the
buffer.  

I've always thought that allocating a dma buffer feels like a big
hassle.  You sometimes have to create a tag for the sole purpose of
setting the maxsize to get the buffer size you need when you call
bus_dmamem_alloc().  If bus_dmamem_alloc() took a size parm you could
just use your parent tag, or a generic tag appropriate to all the IO
you're doing for a given device.  If you need a variety of buffers for
small control and command and status transfers of different sizes, you
end up having to manage up to a dozen tags and maps and buffers.  It's
all very clunky and inconvenient.  It's just the sort of thing that
makes you want to allocate a big buffer and subdivide it. Surely we
could do something to make it easier?

-- Ian





Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1346002922.1140.56.camel>