Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 26 Aug 2012 17:13:31 -0600
From:      Warner Losh <imp@bsdimp.com>
To:        Ian Lepore <freebsd@damnhippie.dyndns.org>
Cc:        freebsd-arm@freebsd.org, freebsd-arch@freebsd.org, freebsd-mips@freebsd.org, Hans Petter Selasky <hans.petter.selasky@bitfrost.no>
Subject:   Re: Partial cacheline flush problems on ARM and MIPS
Message-ID:  <10307B47-13F3-45C0-87F7-66FD3ACA3F86@bsdimp.com>
In-Reply-To: <1346005507.1140.69.camel@revolution.hippie.lan>
References:  <1345757300.27688.535.camel@revolution.hippie.lan> <3A08EB08-2BBF-4B0F-97F2-A3264754C4B7@bsdimp.com> <1345763393.27688.578.camel@revolution.hippie.lan> <FD8DC82C-AD3B-4EBC-A625-62A37B9ECBF1@bsdimp.com> <1345765503.27688.602.camel@revolution.hippie.lan> <CAJ-VmonOwgR7TNuYGtTOhAbgz-opti_MRJgc8G%2BB9xB3NvPFJQ@mail.gmail.com> <1345766109.27688.606.camel@revolution.hippie.lan> <CAJ-VmomFhqV5rTDf-kKQfbSuW7SSiSnqPEjGPtxWjaHFA046kQ@mail.gmail.com> <F8C9E811-8597-4ED0-9F9D-786EB2301D6F@bsdimp.com> <1346002922.1140.56.camel@revolution.hippie.lan> <CAP%2BM-_HZ4yARwZA2koPJDeJWHT-1LORupjymuVnMtLBzeXe=DA@mail.gmail.com> <1346005507.1140.69.camel@revolution.hippie.lan>

next in thread | previous in thread | raw e-mail | index | archive | help

On Aug 26, 2012, at 12:25 PM, Ian Lepore wrote:

> On Sun, 2012-08-26 at 13:05 -0500, Mark Tinguely wrote:
>> I did a quick look at the drivers last summer.
>>=20
>> Most drivers do the right thing and use memory allocated from
>> bus_dmamem_alloc(). It is easy for us to give them a cache aligned
>> buffer.
>>=20
>> Some drivers use mbufs - 256 bytes which cache safe.
>>=20
>> Some drivers directly or indirectly malloc() a buffer and then use it
>> to dma - rather than try to fix them all,  I was okay with making the
>> smallest malloc() amount equal to the cache line size. It amounts to
>> getting rid of the 16 byte allocation on some ARM architectures. The
>> power of 2 allocator will then give us cache line safe allocation.
>>=20
>> A few drivers take a small memory amount from the kernel stack and =
dma
>> to it <- broken driver.
>>=20
>> The few drivers that use data from a structure and that memory is not
>> cached aligned <- broken driver.
>>=20
>=20
> I disagree about those last two points -- drivers that choose to use
> stack memory or malloc'd memory as IO buffers are not broken.

Stack DMA is bad policy, at best, and broken at worst.  The reason is =
because of alignment of the underlying unit.  Since there's no way to =
say that something is aligned to a given spot on the stack, you are =
asking for random stack corruption.

Also, malloced area is similarly problematic: There's no cache line =
informing of the allocator, so you can wind up with an allocation of =
memory that's corrupted due to cache effects.

>  Drivers
> can do IO directly to/from userland buffers, do we say that an
> application that calls read(2) and passes the address of a stack
> variable is broken?

Yes, if it is smaller than a cache line size, and not aligned to the =
cache line.  That's the point of the uio load variant.

> In this regard, it's the busdma implementation that's broken, because =
it
> should bounce those IOs through a DMA-safe buffer.  There's absolutely
> no rule that I've ever heard of in FreeBSD that says IO can only take
> place using memory allocated from busdma.

That's partially true.  Since BUSDMA grew up in the storage area, you =
must allocate the memory from busdma, or it must be page aligned has =
been the de-facto rule here.  The mbuf and uio variants of load were =
invented to cope with common cases of mbufs and user I/O to properly =
flag things.

How does busdma know that it is using memory that's not from its =
allocator?

> The rule is only that the
> proper sequence of busdma operation must be called, and beyond that =
it's
> up to the busdma implementation to make it work. =20

No.  Bouncing is needed due to poor alignment of the underlying device.  =
Not due to cache effects.

There's a limited number of things that we support with busdma.  =
Arbitrary data from malloc that might be shared with the CPU isn't on =
that list.

> Our biggest problem, I think, is that we don't have a sufficient
> definition of "the proper sequence of busdma operations."

I disagree.  The sequence has been known for a long time.

> I don't think it will be very hard to make the arm and mips busdma
> implementations work correctly.  It won't even be too hard to make =
them
> fairly efficient at bouncing small IOs (my thinking is that we can =
make
> small bounces no more expensive than the current partial cacheline =
flush
> implementation which copies the data multiple times).  Bouncing large =
IO
> will never be efficient, but the inefficiency will be a powerful
> motivator to update drivers that do large IO to work better, such as
> using buffers allocated from busdma.

I don't think the cache line problem can be solved with bounce buffers.  =
Trying to accommodate broken drivers is what lead us to this spot.  We =
need to fix the broken drivers.  If that's impossible, then the best we =
can do is have the driver set a 'always bounce' flag in the tag it =
creates and use that to always bounce for operations through that tag.

Warner




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?10307B47-13F3-45C0-87F7-66FD3ACA3F86>