Date: Tue, 7 Jul 2015 11:37:55 -0500 From: Jason Harmening <jason.harmening@gmail.com> To: FreeBSD Arch <freebsd-arch@freebsd.org> Subject: RFC: New KPI for fast temporary single-page KVA mappings Message-ID: <CAM=8qanB11WEWHZZfxyOT7VeL%2BOLqZ47bg=1TKp5c-W=VHNZnw@mail.gmail.com>
next in thread | raw e-mail | index | archive | help
Hi everyone, I'd like to propose a couple of new pmap functions: vm_offset_t pmap_quick_enter_page(vm_page_t m) void pmap_quick_remove_page(vm_offset_t kva) These functions will create and destroy a temporary, usually CPU-local mapping of the specified page. Where available, they will use the direct map. Otherwise, they will use a per-CPU pageframe that's allocated at boot. Guarantees: --Will not sleep --Will not fail --Safe to call under a non-spin lock or from an ithread Restrictions: --Not safe to call from interrupt filter or under a spin mutex on all arches --Mappings should be held for as little time as possible; don't do any locking or sleeping while holding a mapping --Current implementation only guarantees a single page of mapping space across all arches. MI code should not make nested calls to pmap_quick_enter_page(). My idea is that the first consumer of this would be busdma. All non-iommu implementations would use this for bounce buffer copies of pages that don't have resident mappings. Currently busdma uses physcopy[in|out] for unmapped buffers, which on most arches uses sf_bufs that can sleep, making bus_dmamap_sync() unsafe to call in a lot of cases. busdma would also use this for virtually-indexed cache maintenance on arm and mips. It currently ignores cache maintenance for buffers that don't have a KVA or resident UVA mapping, which may not be correct for buffers that don't belong to curproc or have cache-resident VAs on other cores. I've created 2 Differential reviews: https://reviews.freebsd.org/D3013: the implementation https://reviews.freebsd.org/D3014: the kmod I've been using to test it I'd like any and all feedback, both on the general approach and the implementation details. Some things to note on the implementation: --I've intentionally avoided touching existing pmap code for the time being. Some of the new code could likely be shared with other pmap KPIs in a lot of cases. --I've structured the KPI to make it easy to extend to guarantee more than one per-CPU page in the future. I could see that being useful for copying between pages, for example --There's no immediate consumer for the sparc64 implementation, since busdma there needs neither bounce buffers nor cache maintenance. --I would very much like feedback and testing from experts on non-x86 arches. I only have hardware to test the i386 and amd64 implementations; I've only cross-compiled it for everything else. Some of the non-x86 details, like the Book E powerpc TLB invalidation code, are a bit scary and probably not quite right. Thanks, Jason
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAM=8qanB11WEWHZZfxyOT7VeL%2BOLqZ47bg=1TKp5c-W=VHNZnw>