Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 06 Oct 2013 11:09:52 -0600
From:      Ian Lepore <ian@FreeBSD.org>
To:        Adrian Chadd <adrian.chadd@gmail.com>
Cc:        "freebsd-mips@freebsd.org" <freebsd-mips@FreeBSD.org>
Subject:   Re: How's bus-space stuff supposed to work with superscalar MIPS?
Message-ID:  <1381079392.1152.45.camel@revolution.hippie.lan>
In-Reply-To: <CAJ-Vmom8FfmoNh2EM4v5CCYcHmpQG0xTLqDmicEhs9%2BA-bNMrg@mail.gmail.com>
References:  <CAJ-Vmo=PNSsW0eEAhc9LEDLswsj41VN%2BFX1vakQL=qGGdKqMuw@mail.gmail.com> <5AD9EE93-9D19-4A07-B189-43DA0C4A85E9@FreeBSD.org> <CAJ-Vmoky4Sc6DURPj_YeahUPe8=XurP_j7k1S_6L4gzhCXyPrw@mail.gmail.com> <21AC10EC-BAA6-4F1A-BC17-F781CF77D224@bsdimp.com> <CAJ-Vmom8FfmoNh2EM4v5CCYcHmpQG0xTLqDmicEhs9%2BA-bNMrg@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, 2013-10-06 at 09:31 -0700, Adrian Chadd wrote:
> On Oct 6, 2013 12:22 AM, "Warner Losh" <imp@bsdimp.com> wrote:
> >
> >
> > On Oct 5, 2013, at 5:51 PM, Adrian Chadd wrote:
> >
> > > On 5 October 2013 16:06, Stanislav Sedov <stas@freebsd.org> wrote:
> > >
> > >>
> > >> On Oct 5, 2013, at 10:18 AM, Adrian Chadd <adrian@freebsd.org> wrote:
> > >>
> > >>> Hi all,
> > >>>
> > >>> I've been bringing up the AR9344 PHY and after a lot of digging, I
> > >>> discovered that I can fix things by changing ARGE_WRITE() (ie, write
> to
> > >> the
> > >>> ethernet space registers) to:
> > >>>
> > >>> bus_write_4();
> > >>> bus_read_4();
> > >>>
> > >>> .. to (what I'm guessing here) flush the write out before the next
> > >>> instruction is run.
> > >>>
> > >>> So, given this particular hilarity has shown up, what's the story with
> > >>> doing IO accesses on a superscalar MIPS CPU? If it's going to kseg1,
> is
> > >> it
> > >>> somehow going to magically enforce ordering? Or am I right in
> thinking we
> > >>> will need explicit barriers here?
> > >>>
> > >>
> > >> I don't know specifics of mips74k, but usually one indeed needs memory
> > >> barriers
> > >> when performing read of write operation sequences that require
> ordering on
> > >> device I/O (e.g changing the ring and writing the new ring index
> > >> afterwards).  I would
> > >> not be surprised if the cpu reorders i/o bus memory access, especially
> a
> > >> multi-issue
> > >> one.
> > >>
> > >> It is a good idea to have barriers where needed regardless.  We have
> > >> special macros
> > >> for them which are defined to nothing on the in-order platforms.
> > >
> > >
> > > Right. I know this stuff. I really though want to know this kind of
> stuff:
> > >
> > > * What the specifics are for superscalar MIPS CPUs;
> >
> > I believe they document that writes can be reordered unless there's an
> intervening read or memory barrier. I've not looked it up.
> >
> > > * What the bus space stuff should be be providing by default (and I've
> been
> > > down this path once, with ath(4) bugs, PPC, and the bus space macros not
> > > enforcing flushes after IO operations, even though the API requires
> drivers
> > > do it themselves..);
> >
> > It isn't so much flushes as barriers to prevent reordering. By doing the
> read after write, you are forcing an expensive memory barrier. Drivers that
> depend on a particular write ordering need to have explicit barriers.
> >
> > > * Whether it should be enough to map space COHERENT - then it's up to
> the
> > > underlying bus implementation to implement enforcing ordering.
> >
> > The question here is whether there should be an implied barrier in write
> operations. On x86 there is, but as you are discovering on other
> architectures there isn't. While it would be convenient to force a memory
> barrier between every write (something trivial to do with an explicit
> barrier in your driver), it is not very performant to do so, since most
> writes don't have an explicit ordering...
> 
> The other thing is how correct the shared driver code is, like pci, usb,
> etc.
> 
> I think that allocing bus space coherent means non cached, not non
> speculative/in order. So, what should we do?
> 
> And whats the busdma barrier method do? Is it a cache barrier, or did its
> definition include ordering? Its a stub in mips, with the cache invalidate
> call commented out.
> 
> My idea here is to change the definition of coherent, making it imply in
> order. Then add another flag saying space is potentially non ordered. That
> puts the onus on drivers to do the right thing if they want the performance
> boost, but buys us correctness now.
> 
> I know that ppc modified their bus space to enforce ordered writes.
> 
> Thanks,

There is mixing here between the concepts of bus_space and busdma, and
they're not miscible.  They're two separate subsystems that live side by
side in driverland.  The bus_space system is how the cpu accesses
peripherals that live on a memory or IO bus, and busdma is how the cpu
and peripherals share access to main memory.

You speak of "allocating bus space" and of "busdma barrier" -- that's
backwards.  You can allocate busdma memory, but not bus_space.  There
are barrier operations available in bus_space, not in busdma (there are
sync operations in busdma).

Normally I try not to be overly pedantic, but this is an area where you
really can't discuss things properly without using the correct
terminology, or the discusssion will become hopelessly muddled.

So for bus_space, the documentation states that each individual driver
must call bus_space_barrier() as needed after other bus_space accesses.
Very few drivers currently do so, and it just seems to accidentally work
out okay on most platforms.  On ARM for example the memory-mapped
devices are mapped with MMU attributes that force all access to be
strongly ordered (each read or write happens in the order it was issued,
without caching or buffering, without speculative access or prefetching,
etc).

Fixing the lack of bus_space_barrier() calls would be a monumental task.
Pretty much every existing bus_space_read() and bus_space_write() call
in all the various flavors in the whole system has to be examined in the
context of the code that surrounds it with thoughts in mind such as
"what would happen if this read/write happened before the prior one?"
IMO, the right way to handle this kind of fix would be to change the
bus_space API so that every access function had a flag that said what to
do about barriers.  That's the only way you'll ever be sure that you've
fixed every existing driver and that new drivers in the future will
always be written correctly.

Or you could just implement the ordering at the bus_space implementation
layer and rewrite the docs to match the existing practice (and
effectively eliminate the bus_space_barrier() call).  To me, this makes
a lot more sense -- the bus_space implementation is closer to the host
hardware, and seems like the right place to know about things such as
the ordering of bus accesses.

When it comes to busdma and coherent mappings, that's a whole different
can of worms, a whole 'nother area full of "works by accident" right
now.  But since I think bus_space is what you're really concerned with
in this thread, we probably shouldn't muddy the discussion with busdma
issues.

-- Ian





Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1381079392.1152.45.camel>