Date: Sun, 06 Oct 2013 11:09:52 -0600 From: Ian Lepore <ian@FreeBSD.org> To: Adrian Chadd <adrian.chadd@gmail.com> Cc: "freebsd-mips@freebsd.org" <freebsd-mips@FreeBSD.org> Subject: Re: How's bus-space stuff supposed to work with superscalar MIPS? Message-ID: <1381079392.1152.45.camel@revolution.hippie.lan> In-Reply-To: <CAJ-Vmom8FfmoNh2EM4v5CCYcHmpQG0xTLqDmicEhs9%2BA-bNMrg@mail.gmail.com> References: <CAJ-Vmo=PNSsW0eEAhc9LEDLswsj41VN%2BFX1vakQL=qGGdKqMuw@mail.gmail.com> <5AD9EE93-9D19-4A07-B189-43DA0C4A85E9@FreeBSD.org> <CAJ-Vmoky4Sc6DURPj_YeahUPe8=XurP_j7k1S_6L4gzhCXyPrw@mail.gmail.com> <21AC10EC-BAA6-4F1A-BC17-F781CF77D224@bsdimp.com> <CAJ-Vmom8FfmoNh2EM4v5CCYcHmpQG0xTLqDmicEhs9%2BA-bNMrg@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, 2013-10-06 at 09:31 -0700, Adrian Chadd wrote: > On Oct 6, 2013 12:22 AM, "Warner Losh" <imp@bsdimp.com> wrote: > > > > > > On Oct 5, 2013, at 5:51 PM, Adrian Chadd wrote: > > > > > On 5 October 2013 16:06, Stanislav Sedov <stas@freebsd.org> wrote: > > > > > >> > > >> On Oct 5, 2013, at 10:18 AM, Adrian Chadd <adrian@freebsd.org> wrote: > > >> > > >>> Hi all, > > >>> > > >>> I've been bringing up the AR9344 PHY and after a lot of digging, I > > >>> discovered that I can fix things by changing ARGE_WRITE() (ie, write > to > > >> the > > >>> ethernet space registers) to: > > >>> > > >>> bus_write_4(); > > >>> bus_read_4(); > > >>> > > >>> .. to (what I'm guessing here) flush the write out before the next > > >>> instruction is run. > > >>> > > >>> So, given this particular hilarity has shown up, what's the story with > > >>> doing IO accesses on a superscalar MIPS CPU? If it's going to kseg1, > is > > >> it > > >>> somehow going to magically enforce ordering? Or am I right in > thinking we > > >>> will need explicit barriers here? > > >>> > > >> > > >> I don't know specifics of mips74k, but usually one indeed needs memory > > >> barriers > > >> when performing read of write operation sequences that require > ordering on > > >> device I/O (e.g changing the ring and writing the new ring index > > >> afterwards). I would > > >> not be surprised if the cpu reorders i/o bus memory access, especially > a > > >> multi-issue > > >> one. > > >> > > >> It is a good idea to have barriers where needed regardless. We have > > >> special macros > > >> for them which are defined to nothing on the in-order platforms. > > > > > > > > > Right. I know this stuff. I really though want to know this kind of > stuff: > > > > > > * What the specifics are for superscalar MIPS CPUs; > > > > I believe they document that writes can be reordered unless there's an > intervening read or memory barrier. I've not looked it up. > > > > > * What the bus space stuff should be be providing by default (and I've > been > > > down this path once, with ath(4) bugs, PPC, and the bus space macros not > > > enforcing flushes after IO operations, even though the API requires > drivers > > > do it themselves..); > > > > It isn't so much flushes as barriers to prevent reordering. By doing the > read after write, you are forcing an expensive memory barrier. Drivers that > depend on a particular write ordering need to have explicit barriers. > > > > > * Whether it should be enough to map space COHERENT - then it's up to > the > > > underlying bus implementation to implement enforcing ordering. > > > > The question here is whether there should be an implied barrier in write > operations. On x86 there is, but as you are discovering on other > architectures there isn't. While it would be convenient to force a memory > barrier between every write (something trivial to do with an explicit > barrier in your driver), it is not very performant to do so, since most > writes don't have an explicit ordering... > > The other thing is how correct the shared driver code is, like pci, usb, > etc. > > I think that allocing bus space coherent means non cached, not non > speculative/in order. So, what should we do? > > And whats the busdma barrier method do? Is it a cache barrier, or did its > definition include ordering? Its a stub in mips, with the cache invalidate > call commented out. > > My idea here is to change the definition of coherent, making it imply in > order. Then add another flag saying space is potentially non ordered. That > puts the onus on drivers to do the right thing if they want the performance > boost, but buys us correctness now. > > I know that ppc modified their bus space to enforce ordered writes. > > Thanks, There is mixing here between the concepts of bus_space and busdma, and they're not miscible. They're two separate subsystems that live side by side in driverland. The bus_space system is how the cpu accesses peripherals that live on a memory or IO bus, and busdma is how the cpu and peripherals share access to main memory. You speak of "allocating bus space" and of "busdma barrier" -- that's backwards. You can allocate busdma memory, but not bus_space. There are barrier operations available in bus_space, not in busdma (there are sync operations in busdma). Normally I try not to be overly pedantic, but this is an area where you really can't discuss things properly without using the correct terminology, or the discusssion will become hopelessly muddled. So for bus_space, the documentation states that each individual driver must call bus_space_barrier() as needed after other bus_space accesses. Very few drivers currently do so, and it just seems to accidentally work out okay on most platforms. On ARM for example the memory-mapped devices are mapped with MMU attributes that force all access to be strongly ordered (each read or write happens in the order it was issued, without caching or buffering, without speculative access or prefetching, etc). Fixing the lack of bus_space_barrier() calls would be a monumental task. Pretty much every existing bus_space_read() and bus_space_write() call in all the various flavors in the whole system has to be examined in the context of the code that surrounds it with thoughts in mind such as "what would happen if this read/write happened before the prior one?" IMO, the right way to handle this kind of fix would be to change the bus_space API so that every access function had a flag that said what to do about barriers. That's the only way you'll ever be sure that you've fixed every existing driver and that new drivers in the future will always be written correctly. Or you could just implement the ordering at the bus_space implementation layer and rewrite the docs to match the existing practice (and effectively eliminate the bus_space_barrier() call). To me, this makes a lot more sense -- the bus_space implementation is closer to the host hardware, and seems like the right place to know about things such as the ordering of bus accesses. When it comes to busdma and coherent mappings, that's a whole different can of worms, a whole 'nother area full of "works by accident" right now. But since I think bus_space is what you're really concerned with in this thread, we probably shouldn't muddy the discussion with busdma issues. -- Ian
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1381079392.1152.45.camel>
