Date: Wed, 29 Apr 2015 23:13:45 -0500 From: Jason Harmening <jason.harmening@gmail.com> To: Ian Lepore <ian@freebsd.org> Cc: Konstantin Belousov <kostikbel@gmail.com>, Adrian Chadd <adrian@freebsd.org>, Svatopluk Kraus <onwahe@gmail.com>, freebsd-arch <freebsd-arch@freebsd.org> Subject: Re: bus_dmamap_sync() for bounced client buffers from user address space Message-ID: <CAM=8qanFPinmBV3Sv4GD=mFRbyZc8tGHF_OJdJ%2BrwEDLPKu48w@mail.gmail.com> In-Reply-To: <CAM=8qakVsbukSTVh5UVQEO2Vmtcmj36cqw6KJC3frvEQGGCQsg@mail.gmail.com> References: <38574E63-2D74-4ECB-8D68-09AC76DFB30C@bsdimp.com> <CAJ-VmomqGkEFVauya%2BrmPGcD_-=Z-mmg1RSDf1D2bT_DfwPBGA@mail.gmail.com> <1761247.Bq816CMB8v@ralph.baldwin.cx> <CAFHCsPX9rgmCAPABct84a000NuBPQm5sprOAQr9BTT6Ev6KZcQ@mail.gmail.com> <20150429132017.GM2390@kib.kiev.ua> <CAFHCsPWjEFBF%2B-7SR7EJ3UHP6oAAa9xjbu0CbRaQvd_-6gKuAQ@mail.gmail.com> <20150429165432.GN2390@kib.kiev.ua> <CAM=8qakzkKX8TZNYE33H=JqL_r5z%2BAU9fyp5%2B7Z0mixmF5t63w@mail.gmail.com> <20150429185019.GO2390@kib.kiev.ua> <CAM=8qanPHbCwUeu0-zi-ccY4WprHaOGzCm44PwNSgb==nwgGGw@mail.gmail.com> <20150429193337.GQ2390@kib.kiev.ua> <CAM=8qak0qRw5MsSG4e1Zqxo_x9VFGQ2rQpjUBFX_UA6P9_-2cA@mail.gmail.com> <1430346204.1157.107.camel@freebsd.org> <CAM=8qakVsbukSTVh5UVQEO2Vmtcmj36cqw6KJC3frvEQGGCQsg@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, Apr 29, 2015 at 6:10 PM, Jason Harmening <jason.harmening@gmail.com> wrote: > >> For what we call armv6 (which is mostly armv7)... >> >> The cache maintenance operations require virtual addresses, which means >> it looks a lot like a VIPT cache. Under the hood the implementation >> behaves as if it were a PIPT cache so even in the presence of multiple >> mappings of the same physical page into different virtual addresses, the >> SMP coherency hardware works correctly. >> >> The ARM ARM says... >> >> [Stuff about ARMv6 and page coloring when a cache way exceeds >> 4K.] >> >> ARMv7 does not support page coloring, and requires that all data >> and unified caches behave as Physically Indexed Physically >> Tagged (PIPT) caches. >> >> The only true armv6 chip we support isn't SMP and has a 16K/4-way cache >> that neatly sidesteps the aliasing problem that requires page coloring >> solutions. So modern arm chips we get to act like we've got PIPT data >> caches, but with the quirk that cache ops are initiated by virtual >> address. >> > > Cool, thanks for the explanation! > To satisfy my own curiosity, since it "looks like VIPT", does that mean we > still have to flush the cache on context switch? > > >> >> Basically, when you perform a cache maintainence operation, a >> translation table walk is done on the core that issued the cache op, >> then from that point on the physical address is used within the cache >> hardware and that's what gets broadcast to the other cores by the snoop >> control unit or cache coherency fabric (depending on the chip). > > > So, if we go back to the original problem of wanting to do > bus_dmamap_sync() on userspace buffers from some asynchronous context: > > Say the process that owns the buffer is running on one core and prefetches > some data into a cacheline for the buffer, and bus_dmamap_sync(POSTREAD) is > done by a kernel thread running on another core. Since the core running > the kernel thread is responsible for the TLB lookup to get the physical > address, then since that core has no UVA the cache ops will be treated as > misses and the cacheline on the core that owns the UVA won't be > invalidated, correct? > > That means the panic on !pmap_dmap_iscurrent() in busdma_machdep-v6.c > should stay? > > Sort of the same problem would apply to drivers using > vm_fault_quick_hold_pages + bus_dmamap_load_ma...no cache maintenance, > since there are no VAs to operate on. Indeed, both arm and mips > implementation of _bus_dmamap_load_phys don't do anything with the sync > list. > It occurs to me that one way to deal with both the blocking-sfbuf for physcopy and VIPT cache maintenance might be to have a reserved per-CPU KVA page. For arches that don't have a direct map, the idea would be to grab a critical section, copy the bounce page or do cache maintenance on the synclist entry, then drop the critical section. That brought up a dim memory I had of Linux doing something similar, and in fact it seems to use kmap_atomic for both cache ops and bounce buffers.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAM=8qanFPinmBV3Sv4GD=mFRbyZc8tGHF_OJdJ%2BrwEDLPKu48w>