Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 29 Apr 2015 23:13:45 -0500
From:      Jason Harmening <jason.harmening@gmail.com>
To:        Ian Lepore <ian@freebsd.org>
Cc:        Konstantin Belousov <kostikbel@gmail.com>, Adrian Chadd <adrian@freebsd.org>,  Svatopluk Kraus <onwahe@gmail.com>, freebsd-arch <freebsd-arch@freebsd.org>
Subject:   Re: bus_dmamap_sync() for bounced client buffers from user address space
Message-ID:  <CAM=8qanFPinmBV3Sv4GD=mFRbyZc8tGHF_OJdJ%2BrwEDLPKu48w@mail.gmail.com>
In-Reply-To: <CAM=8qakVsbukSTVh5UVQEO2Vmtcmj36cqw6KJC3frvEQGGCQsg@mail.gmail.com>
References:  <38574E63-2D74-4ECB-8D68-09AC76DFB30C@bsdimp.com> <CAJ-VmomqGkEFVauya%2BrmPGcD_-=Z-mmg1RSDf1D2bT_DfwPBGA@mail.gmail.com> <1761247.Bq816CMB8v@ralph.baldwin.cx> <CAFHCsPX9rgmCAPABct84a000NuBPQm5sprOAQr9BTT6Ev6KZcQ@mail.gmail.com> <20150429132017.GM2390@kib.kiev.ua> <CAFHCsPWjEFBF%2B-7SR7EJ3UHP6oAAa9xjbu0CbRaQvd_-6gKuAQ@mail.gmail.com> <20150429165432.GN2390@kib.kiev.ua> <CAM=8qakzkKX8TZNYE33H=JqL_r5z%2BAU9fyp5%2B7Z0mixmF5t63w@mail.gmail.com> <20150429185019.GO2390@kib.kiev.ua> <CAM=8qanPHbCwUeu0-zi-ccY4WprHaOGzCm44PwNSgb==nwgGGw@mail.gmail.com> <20150429193337.GQ2390@kib.kiev.ua> <CAM=8qak0qRw5MsSG4e1Zqxo_x9VFGQ2rQpjUBFX_UA6P9_-2cA@mail.gmail.com> <1430346204.1157.107.camel@freebsd.org> <CAM=8qakVsbukSTVh5UVQEO2Vmtcmj36cqw6KJC3frvEQGGCQsg@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, Apr 29, 2015 at 6:10 PM, Jason Harmening <jason.harmening@gmail.com>
wrote:

>
>> For what we call armv6 (which is mostly armv7)...
>>
>> The cache maintenance operations require virtual addresses, which means
>> it looks a lot like a VIPT cache.  Under the hood the implementation
>> behaves as if it were a PIPT cache so even in the presence of multiple
>> mappings of the same physical page into different virtual addresses, the
>> SMP coherency hardware works correctly.
>>
>> The ARM ARM says...
>>
>>         [Stuff about ARMv6 and page coloring when a cache way exceeds
>>         4K.]
>>
>>         ARMv7 does not support page coloring, and requires that all data
>>         and unified caches behave as Physically Indexed Physically
>>         Tagged (PIPT) caches.
>>
>> The only true armv6 chip we support isn't SMP and has a 16K/4-way cache
>> that neatly sidesteps the aliasing problem that requires page coloring
>> solutions.  So modern arm chips we get to act like we've got PIPT data
>> caches, but with the quirk that cache ops are initiated by virtual
>> address.
>>
>
> Cool, thanks for the explanation!
> To satisfy my own curiosity, since it "looks like VIPT", does that mean we
> still have to flush the cache on context switch?
>
>
>>
>> Basically, when you perform a cache maintainence operation, a
>> translation table walk is done on the core that issued the cache op,
>> then from that point on the physical address is used within the cache
>> hardware and that's what gets broadcast to the other cores by the snoop
>> control unit or cache coherency fabric (depending on the chip).
>
>
> So, if we go back to the original problem of wanting to do
> bus_dmamap_sync() on userspace buffers from some asynchronous context:
>
> Say the process that owns the buffer is running on one core and prefetches
> some data into a cacheline for the buffer, and bus_dmamap_sync(POSTREAD) is
> done by a kernel thread running on another core.  Since the core running
> the kernel thread is responsible for the TLB lookup to get the physical
> address, then since that core has no UVA the cache ops will be treated as
> misses and the cacheline on the core that owns the UVA won't be
> invalidated, correct?
>
> That means the panic on !pmap_dmap_iscurrent() in busdma_machdep-v6.c
> should stay?
>
> Sort of the same problem would apply to drivers using
> vm_fault_quick_hold_pages + bus_dmamap_load_ma...no cache maintenance,
> since there are no VAs to operate on.  Indeed, both arm and mips
> implementation of _bus_dmamap_load_phys don't do anything with the sync
> list.
>
It occurs to me that one way to deal with both the blocking-sfbuf for
physcopy and VIPT cache maintenance might be to have a reserved per-CPU KVA
page.  For arches that don't have a direct map, the idea would be to grab a
critical section, copy the bounce page or do cache maintenance on the
synclist entry, then drop the critical section.   That brought up a dim
memory I had of Linux doing something similar, and in fact it seems to use
kmap_atomic for both cache ops and bounce buffers.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAM=8qanFPinmBV3Sv4GD=mFRbyZc8tGHF_OJdJ%2BrwEDLPKu48w>