Date: Tue, 17 Sep 2019 17:28:44 +0000 (UTC) From: Alan Cox <alc@FreeBSD.org> To: src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-stable@freebsd.org, svn-src-stable-12@freebsd.org Subject: svn commit: r352452 - in stable/12/sys/arm64: arm64 include Message-ID: <201909171728.x8HHSiwe007582@repo.freebsd.org>
next in thread | raw e-mail | index | archive | help
Author: alc Date: Tue Sep 17 17:28:44 2019 New Revision: 352452 URL: https://svnweb.freebsd.org/changeset/base/352452 Log: MFC r349117, r349122, r349183, r349897, r349943, r350004, r350029, r350038, r350191, r350202, r350422, r350427, r350525 r349117: Three enhancements to arm64's pmap_protect(): Implement protection changes on superpage mappings. Previously, a superpage mapping was unconditionally demoted by pmap_protect(), even if the protection change applied to the entire superpage mapping. Precompute the bit mask describing the protection changes rather than recomputing it for every page table entry that is changed. Skip page table entries that already have the requested protection changes in place. r349122: Three changes to arm64's pmap_unwire(): Implement wiring changes on superpage mappings. Previously, a superpage mapping was unconditionally demoted by pmap_unwire(), even if the wiring change applied to the entire superpage mapping. Rewrite a comment to use the arm64 names for bits in a page table entry. Previously, the bits were referred to by their x86 names. Use atomic_"op"_64() instead of atomic_"op"_long() to update a page table entry in order to match the prevailing style in this file. r349183: Correct an error in r349122. pmap_unwire() should update the pmap's wired count, not its resident count. r349897: (by markj) Rename pmap_page_dirty() to pmap_pte_dirty(). This is a precursor to implementing dirty bit management. r349943: (by markj) Apply some light cleanup to uses of pmap_pte_dirty(). - Check for ATTR_SW_MANAGED before anything else. - Use pmap_pte_dirty() in pmap_remove_pages(). r350004: (by markj) Implement software access and dirty bit management for arm64. Previously the arm64 pmap did no reference or modification tracking; all mappings were treated as referenced and all read-write mappings were treated as dirty. This change implements software management of these attributes. Dirty bit management is implemented to emulate ARMv8.1's optional hardware dirty bit modifier management, following a suggestion from alc. In particular, a mapping with ATTR_SW_DBM set is logically writeable and is dirty if the ATTR_AP_RW_BIT bit is clear. Mappings with ATTR_AP_RW_BIT set are write-protected, and a write access will trigger a permission fault. pmap_fault() handles permission faults for such mappings and marks the page dirty by clearing ATTR_AP_RW_BIT, thus mapping the page read-write. r350029: (by markj) Propagate attribute changes during demotion. After r349117 and r349122, some mapping attribute changes do not trigger superpage demotion. However, pmap_demote_l2() was not updated to ensure that the replacement L3 entries carry any attribute changes that occurred since promotion. r350038: (by markj) Always use the software DBM bit for now. r350004 added most of the machinery needed to support hardware DBM management, but it did not intend to actually enable use of the hardware DBM bit. r350191: Introduce pmap_store(), and use it to replace pmap_load_store() in places where the page table entry was previously invalid. (Note that I did not replace pmap_load_store() when it was followed by a TLB invalidation, even if we are not using the return value from pmap_load_store().) Correct an error in pmap_enter(). A test for determining when to set PGA_WRITEABLE was always true, even if the mapping was read only. In pmap_enter_l2(), when replacing an empty kernel page table page by a superpage mapping, clear the old l2 entry and issue a TLB invalidation. My reading of the ARM architecture manual leads me to believe that the TLB could hold an intermediate entry referencing the empty kernel page table page even though it contains no valid mappings. Replace a couple direct uses of atomic_clear_64() by the new pmap_clear_bits(). In a couple comments, replace the term "paging-structure caches", which is an Intel-specific term for the caches that hold intermediate entries in the page table, with wording that is more consistent with the ARM architecture manual. r350202: With the introduction of software dirty bit emulation for managed mappings, we should test ATTR_SW_DBM, not ATTR_AP_RW, to determine whether to set PGA_WRITEABLE. In effect, we are currently setting PGA_WRITEABLE based on whether the dirty bit is preset, not whether the mapping is writeable. Correct this mistake. r350422: (by markj) Remove an unneeded trunc_page() in pmap_fault(). r350427: (by markj) Have arm64's pmap_fault() handle WnR faults on dirty PTEs. If we take a WnR permission fault on a managed, writeable and dirty PTE, simply return success without calling the main fault handler. This situation can occur if multiple threads simultaneously access a clean writeable mapping and trigger WnR faults; losers of the race to mark the PTE dirty would end up calling the main fault handler, which had no work to do. r350525: (by markj) Use ATTR_DBM even when hardware dirty bit management is not enabled. The ARMv8 reference manual only states that the bit is reserved in this case; following Linux's example, use it instead of a software-defined bit for the purpose of indicating that a managed mapping is writable. Modified: stable/12/sys/arm64/arm64/pmap.c stable/12/sys/arm64/arm64/trap.c stable/12/sys/arm64/include/pte.h Directory Properties: stable/12/ (props changed) Modified: stable/12/sys/arm64/arm64/pmap.c ============================================================================== --- stable/12/sys/arm64/arm64/pmap.c Tue Sep 17 16:16:46 2019 (r352451) +++ stable/12/sys/arm64/arm64/pmap.c Tue Sep 17 17:28:44 2019 (r352452) @@ -217,6 +217,16 @@ __FBSDID("$FreeBSD$"); #define VM_PAGE_TO_PV_LIST_LOCK(m) \ PHYS_TO_PV_LIST_LOCK(VM_PAGE_TO_PHYS(m)) +/* + * The presence of this flag indicates that the mapping is writeable. + * If the ATTR_AP_RO bit is also set, then the mapping is clean, otherwise it is + * dirty. This flag may only be set on managed mappings. + * + * The DBM bit is reserved on ARMv8.0 but it seems we can safely treat it + * as a software managed bit. + */ +#define ATTR_SW_DBM ATTR_DBM + struct pmap kernel_pmap_store; /* Used for mapping ACPI memory before VM is initialized */ @@ -315,11 +325,13 @@ static __inline vm_page_t pmap_remove_pt_page(pmap_t p * They need to be atomic as the System MMU may write to the table at * the same time as the CPU. */ -#define pmap_clear(table) atomic_store_64(table, 0) -#define pmap_load_store(table, entry) atomic_swap_64(table, entry) -#define pmap_set(table, mask) atomic_set_64(table, mask) -#define pmap_load_clear(table) atomic_swap_64(table, 0) -#define pmap_load(table) (*table) +#define pmap_clear(table) atomic_store_64(table, 0) +#define pmap_clear_bits(table, bits) atomic_clear_64(table, bits) +#define pmap_load(table) (*table) +#define pmap_load_clear(table) atomic_swap_64(table, 0) +#define pmap_load_store(table, entry) atomic_swap_64(table, entry) +#define pmap_set_bits(table, bits) atomic_set_64(table, bits) +#define pmap_store(table, entry) atomic_store_64(table, entry) /********************/ /* Inline functions */ @@ -532,15 +544,18 @@ pmap_l3_valid(pt_entry_t l3) CTASSERT(L1_BLOCK == L2_BLOCK); /* - * Checks if the page is dirty. We currently lack proper tracking of this on - * arm64 so for now assume is a page mapped as rw was accessed it is. + * Checks if the PTE is dirty. */ static inline int -pmap_page_dirty(pt_entry_t pte) +pmap_pte_dirty(pt_entry_t pte) { - return ((pte & (ATTR_AF | ATTR_AP_RW_BIT)) == - (ATTR_AF | ATTR_AP(ATTR_AP_RW))); + KASSERT((pte & ATTR_SW_MANAGED) != 0, ("pte %#lx is unmanaged", pte)); + KASSERT((pte & (ATTR_AP_RW_BIT | ATTR_SW_DBM)) != 0, + ("pte %#lx is writeable and missing ATTR_SW_DBM", pte)); + + return ((pte & (ATTR_AP_RW_BIT | ATTR_SW_DBM)) == + (ATTR_AP(ATTR_AP_RW) | ATTR_SW_DBM)); } static __inline void @@ -626,7 +641,7 @@ pmap_bootstrap_dmap(vm_offset_t kern_l1, vm_paddr_t mi (vm_offset_t)l2); freemempos += PAGE_SIZE; - pmap_load_store(&pagetable_dmap[l1_slot], + pmap_store(&pagetable_dmap[l1_slot], (l2_pa & ~Ln_TABLE_MASK) | L1_TABLE); memset(l2, 0, PAGE_SIZE); @@ -644,7 +659,7 @@ pmap_bootstrap_dmap(vm_offset_t kern_l1, vm_paddr_t mi l2_slot = pmap_l2_index(va); KASSERT(l2_slot != 0, ("...")); - pmap_load_store(&l2[l2_slot], + pmap_store(&l2[l2_slot], (pa & ~L2_OFFSET) | ATTR_DEFAULT | ATTR_XN | ATTR_IDX(CACHED_MEMORY) | L2_BLOCK); } @@ -656,7 +671,7 @@ pmap_bootstrap_dmap(vm_offset_t kern_l1, vm_paddr_t mi (physmap[i + 1] - pa) >= L1_SIZE; pa += L1_SIZE, va += L1_SIZE) { l1_slot = ((va - DMAP_MIN_ADDRESS) >> L1_SHIFT); - pmap_load_store(&pagetable_dmap[l1_slot], + pmap_store(&pagetable_dmap[l1_slot], (pa & ~L1_OFFSET) | ATTR_DEFAULT | ATTR_XN | ATTR_IDX(CACHED_MEMORY) | L1_BLOCK); } @@ -671,7 +686,7 @@ pmap_bootstrap_dmap(vm_offset_t kern_l1, vm_paddr_t mi (vm_offset_t)l2); freemempos += PAGE_SIZE; - pmap_load_store(&pagetable_dmap[l1_slot], + pmap_store(&pagetable_dmap[l1_slot], (l2_pa & ~Ln_TABLE_MASK) | L1_TABLE); memset(l2, 0, PAGE_SIZE); @@ -681,7 +696,7 @@ pmap_bootstrap_dmap(vm_offset_t kern_l1, vm_paddr_t mi for (; va < DMAP_MAX_ADDRESS && pa < physmap[i + 1]; pa += L2_SIZE, va += L2_SIZE) { l2_slot = pmap_l2_index(va); - pmap_load_store(&l2[l2_slot], + pmap_store(&l2[l2_slot], (pa & ~L2_OFFSET) | ATTR_DEFAULT | ATTR_XN | ATTR_IDX(CACHED_MEMORY) | L2_BLOCK); } @@ -716,7 +731,7 @@ pmap_bootstrap_l2(vm_offset_t l1pt, vm_offset_t va, vm KASSERT(l1_slot < Ln_ENTRIES, ("Invalid L1 index")); pa = pmap_early_vtophys(l1pt, l2pt); - pmap_load_store(&l1[l1_slot], + pmap_store(&l1[l1_slot], (pa & ~Ln_TABLE_MASK) | L1_TABLE); l2pt += PAGE_SIZE; } @@ -746,7 +761,7 @@ pmap_bootstrap_l3(vm_offset_t l1pt, vm_offset_t va, vm KASSERT(l2_slot < Ln_ENTRIES, ("Invalid L2 index")); pa = pmap_early_vtophys(l1pt, l3pt); - pmap_load_store(&l2[l2_slot], + pmap_store(&l2[l2_slot], (pa & ~Ln_TABLE_MASK) | L2_TABLE); l3pt += PAGE_SIZE; } @@ -765,11 +780,11 @@ pmap_bootstrap(vm_offset_t l0pt, vm_offset_t l1pt, vm_ vm_size_t kernlen) { u_int l1_slot, l2_slot; - uint64_t kern_delta; pt_entry_t *l2; vm_offset_t va, freemempos; vm_offset_t dpcpu, msgbufpv; vm_paddr_t start_pa, pa, min_pa; + uint64_t kern_delta; int i; kern_delta = KERNBASE - kernstart; @@ -1520,7 +1535,7 @@ _pmap_alloc_l3(pmap_t pmap, vm_pindex_t ptepindex, str l0index = ptepindex - (NUL2E + NUL1E); l0 = &pmap->pm_l0[l0index]; - pmap_load_store(l0, VM_PAGE_TO_PHYS(m) | L0_TABLE); + pmap_store(l0, VM_PAGE_TO_PHYS(m) | L0_TABLE); } else if (ptepindex >= NUL2E) { vm_pindex_t l0index, l1index; pd_entry_t *l0, *l1; @@ -1546,7 +1561,7 @@ _pmap_alloc_l3(pmap_t pmap, vm_pindex_t ptepindex, str l1 = (pd_entry_t *)PHYS_TO_DMAP(pmap_load(l0) & ~ATTR_MASK); l1 = &l1[ptepindex & Ln_ADDR_MASK]; - pmap_load_store(l1, VM_PAGE_TO_PHYS(m) | L1_TABLE); + pmap_store(l1, VM_PAGE_TO_PHYS(m) | L1_TABLE); } else { vm_pindex_t l0index, l1index; pd_entry_t *l0, *l1, *l2; @@ -1588,7 +1603,7 @@ _pmap_alloc_l3(pmap_t pmap, vm_pindex_t ptepindex, str l2 = (pd_entry_t *)PHYS_TO_DMAP(pmap_load(l1) & ~ATTR_MASK); l2 = &l2[ptepindex & Ln_ADDR_MASK]; - pmap_load_store(l2, VM_PAGE_TO_PHYS(m) | L2_TABLE); + pmap_store(l2, VM_PAGE_TO_PHYS(m) | L2_TABLE); } pmap_resident_count_inc(pmap, 1); @@ -1761,7 +1776,7 @@ pmap_growkernel(vm_offset_t addr) if ((nkpg->flags & PG_ZERO) == 0) pmap_zero_page(nkpg); paddr = VM_PAGE_TO_PHYS(nkpg); - pmap_load_store(l1, paddr | L1_TABLE); + pmap_store(l1, paddr | L1_TABLE); continue; /* try again */ } l2 = pmap_l1_to_l2(l1, kernel_vm_end); @@ -1952,7 +1967,7 @@ reclaim_pv_chunk(pmap_t locked_pmap, struct rwlock **l tpte = pmap_load_clear(pte); pmap_invalidate_page(pmap, va); m = PHYS_TO_VM_PAGE(tpte & ~ATTR_MASK); - if (pmap_page_dirty(tpte)) + if (pmap_pte_dirty(tpte)) vm_page_dirty(m); if ((tpte & ATTR_AF) != 0) vm_page_aflag_set(m, PGA_REFERENCED); @@ -2449,7 +2464,7 @@ pmap_remove_l2(pmap_t pmap, pt_entry_t *l2, vm_offset_ eva = sva + L2_SIZE; for (va = sva, m = PHYS_TO_VM_PAGE(old_l2 & ~ATTR_MASK); va < eva; va += PAGE_SIZE, m++) { - if (pmap_page_dirty(old_l2)) + if (pmap_pte_dirty(old_l2)) vm_page_dirty(m); if (old_l2 & ATTR_AF) vm_page_aflag_set(m, PGA_REFERENCED); @@ -2478,7 +2493,7 @@ pmap_remove_l2(pmap_t pmap, pt_entry_t *l2, vm_offset_ /* * pmap_remove_l3: do the things to unmap a page in a process */ -static int +static int __unused pmap_remove_l3(pmap_t pmap, pt_entry_t *l3, vm_offset_t va, pd_entry_t l2e, struct spglist *free, struct rwlock **lockp) { @@ -2494,7 +2509,7 @@ pmap_remove_l3(pmap_t pmap, pt_entry_t *l3, vm_offset_ pmap_resident_count_dec(pmap, 1); if (old_l3 & ATTR_SW_MANAGED) { m = PHYS_TO_VM_PAGE(old_l3 & ~ATTR_MASK); - if (pmap_page_dirty(old_l3)) + if (pmap_pte_dirty(old_l3)) vm_page_dirty(m); if (old_l3 & ATTR_AF) vm_page_aflag_set(m, PGA_REFERENCED); @@ -2542,7 +2557,7 @@ pmap_remove_l3_range(pmap_t pmap, pd_entry_t l2e, vm_o pmap_resident_count_dec(pmap, 1); if ((old_l3 & ATTR_SW_MANAGED) != 0) { m = PHYS_TO_VM_PAGE(old_l3 & ~ATTR_MASK); - if (pmap_page_dirty(old_l3)) + if (pmap_pte_dirty(old_l3)) vm_page_dirty(m); if ((old_l3 & ATTR_AF) != 0) vm_page_aflag_set(m, PGA_REFERENCED); @@ -2771,7 +2786,7 @@ retry: /* * Update the vm_page_t clean and reference bits. */ - if (pmap_page_dirty(tpte)) + if (pmap_pte_dirty(tpte)) vm_page_dirty(m); pmap_unuse_pt(pmap, pv->pv_va, tpde, &free); TAILQ_REMOVE(&m->md.pv_list, pv, pv_next); @@ -2785,6 +2800,53 @@ retry: } /* + * pmap_protect_l2: do the things to protect a 2MB page in a pmap + */ +static void +pmap_protect_l2(pmap_t pmap, pt_entry_t *l2, vm_offset_t sva, pt_entry_t mask, + pt_entry_t nbits) +{ + pd_entry_t old_l2; + vm_page_t m, mt; + + PMAP_LOCK_ASSERT(pmap, MA_OWNED); + KASSERT((sva & L2_OFFSET) == 0, + ("pmap_protect_l2: sva is not 2mpage aligned")); + old_l2 = pmap_load(l2); + KASSERT((old_l2 & ATTR_DESCR_MASK) == L2_BLOCK, + ("pmap_protect_l2: L2e %lx is not a block mapping", old_l2)); + + /* + * Return if the L2 entry already has the desired access restrictions + * in place. + */ +retry: + if ((old_l2 & mask) == nbits) + return; + + /* + * When a dirty read/write superpage mapping is write protected, + * update the dirty field of each of the superpage's constituent 4KB + * pages. + */ + if ((old_l2 & ATTR_SW_MANAGED) != 0 && + (nbits & ATTR_AP(ATTR_AP_RO)) != 0 && pmap_pte_dirty(old_l2)) { + m = PHYS_TO_VM_PAGE(old_l2 & ~ATTR_MASK); + for (mt = m; mt < &m[L2_SIZE / PAGE_SIZE]; mt++) + vm_page_dirty(mt); + } + + if (!atomic_fcmpset_64(l2, &old_l2, (old_l2 & ~mask) | nbits)) + goto retry; + + /* + * Since a promotion must break the 4KB page mappings before making + * the 2MB page mapping, a pmap_invalidate_page() suffices. + */ + pmap_invalidate_page(pmap, sva); +} + +/* * Set the physical protection on the * specified range of this map as requested. */ @@ -2793,7 +2855,7 @@ pmap_protect(pmap_t pmap, vm_offset_t sva, vm_offset_t { vm_offset_t va, va_next; pd_entry_t *l0, *l1, *l2; - pt_entry_t *l3p, l3, nbits; + pt_entry_t *l3p, l3, mask, nbits; KASSERT((prot & ~VM_PROT_ALL) == 0, ("invalid prot %x", prot)); if (prot == VM_PROT_NONE) { @@ -2801,8 +2863,16 @@ pmap_protect(pmap_t pmap, vm_offset_t sva, vm_offset_t return; } - if ((prot & (VM_PROT_WRITE | VM_PROT_EXECUTE)) == - (VM_PROT_WRITE | VM_PROT_EXECUTE)) + mask = nbits = 0; + if ((prot & VM_PROT_WRITE) == 0) { + mask |= ATTR_AP_RW_BIT | ATTR_SW_DBM; + nbits |= ATTR_AP(ATTR_AP_RO); + } + if ((prot & VM_PROT_EXECUTE) == 0) { + mask |= ATTR_XN; + nbits |= ATTR_XN; + } + if (mask == 0) return; PMAP_LOCK(pmap); @@ -2833,9 +2903,11 @@ pmap_protect(pmap_t pmap, vm_offset_t sva, vm_offset_t continue; if ((pmap_load(l2) & ATTR_DESCR_MASK) == L2_BLOCK) { - l3p = pmap_demote_l2(pmap, l2, sva); - if (l3p == NULL) + if (sva + L2_SIZE == va_next && eva >= va_next) { + pmap_protect_l2(pmap, l2, sva, mask, nbits); continue; + } else if (pmap_demote_l2(pmap, l2, sva) == NULL) + continue; } KASSERT((pmap_load(l2) & ATTR_DESCR_MASK) == L2_TABLE, ("pmap_protect: Invalid L2 entry after demotion")); @@ -2847,29 +2919,36 @@ pmap_protect(pmap_t pmap, vm_offset_t sva, vm_offset_t for (l3p = pmap_l2_to_l3(l2, sva); sva != va_next; l3p++, sva += L3_SIZE) { l3 = pmap_load(l3p); - if (!pmap_l3_valid(l3)) { +retry: + /* + * Go to the next L3 entry if the current one is + * invalid or already has the desired access + * restrictions in place. (The latter case occurs + * frequently. For example, in a "buildworld" + * workload, almost 1 out of 4 L3 entries already + * have the desired restrictions.) + */ + if (!pmap_l3_valid(l3) || (l3 & mask) == nbits) { if (va != va_next) { pmap_invalidate_range(pmap, va, sva); va = va_next; } continue; } - if (va == va_next) - va = sva; - nbits = 0; - if ((prot & VM_PROT_WRITE) == 0) { - if ((l3 & ATTR_SW_MANAGED) && - pmap_page_dirty(l3)) { - vm_page_dirty(PHYS_TO_VM_PAGE(l3 & - ~ATTR_MASK)); - } - nbits |= ATTR_AP(ATTR_AP_RO); - } - if ((prot & VM_PROT_EXECUTE) == 0) - nbits |= ATTR_XN; + /* + * When a dirty read/write mapping is write protected, + * update the page's dirty field. + */ + if ((l3 & ATTR_SW_MANAGED) != 0 && + (nbits & ATTR_AP(ATTR_AP_RO)) != 0 && + pmap_pte_dirty(l3)) + vm_page_dirty(PHYS_TO_VM_PAGE(l3 & ~ATTR_MASK)); - pmap_set(l3p, nbits); + if (!atomic_fcmpset_64(l3p, &l3, (l3 & ~mask) | nbits)) + goto retry; + if (va == va_next) + va = sva; } if (va != va_next) pmap_invalidate_range(pmap, va, sva); @@ -2934,7 +3013,7 @@ pmap_update_entry(pmap_t pmap, pd_entry_t *pte, pd_ent pmap_invalidate_range_nopin(pmap, va, va + size); /* Create the new mapping */ - pmap_load_store(pte, newpte); + pmap_store(pte, newpte); dsb(ishst); critical_exit(); @@ -3004,17 +3083,32 @@ pmap_promote_l2(pmap_t pmap, pd_entry_t *l2, vm_offset firstl3 = pmap_l2_to_l3(l2, sva); newl2 = pmap_load(firstl3); - /* Check the alingment is valid */ - if (((newl2 & ~ATTR_MASK) & L2_OFFSET) != 0) { +setl2: + if (((newl2 & (~ATTR_MASK | ATTR_AF)) & L2_OFFSET) != ATTR_AF) { atomic_add_long(&pmap_l2_p_failures, 1); CTR2(KTR_PMAP, "pmap_promote_l2: failure for va %#lx" " in pmap %p", va, pmap); return; } + if ((newl2 & (ATTR_AP_RW_BIT | ATTR_SW_DBM)) == + (ATTR_AP(ATTR_AP_RO) | ATTR_SW_DBM)) { + if (!atomic_fcmpset_64(l2, &newl2, newl2 & ~ATTR_SW_DBM)) + goto setl2; + newl2 &= ~ATTR_SW_DBM; + } + pa = newl2 + L2_SIZE - PAGE_SIZE; for (l3 = firstl3 + NL3PG - 1; l3 > firstl3; l3--) { oldl3 = pmap_load(l3); +setl3: + if ((oldl3 & (ATTR_AP_RW_BIT | ATTR_SW_DBM)) == + (ATTR_AP(ATTR_AP_RO) | ATTR_SW_DBM)) { + if (!atomic_fcmpset_64(l3, &oldl3, oldl3 & + ~ATTR_SW_DBM)) + goto setl3; + oldl3 &= ~ATTR_SW_DBM; + } if (oldl3 != pa) { atomic_add_long(&pmap_l2_p_failures, 1); CTR2(KTR_PMAP, "pmap_promote_l2: failure for va %#lx" @@ -3097,8 +3191,14 @@ pmap_enter(pmap_t pmap, vm_offset_t va, vm_page_t m, v new_l3 |= ATTR_SW_WIRED; if (va < VM_MAXUSER_ADDRESS) new_l3 |= ATTR_AP(ATTR_AP_USER) | ATTR_PXN; - if ((m->oflags & VPO_UNMANAGED) == 0) + if ((m->oflags & VPO_UNMANAGED) == 0) { new_l3 |= ATTR_SW_MANAGED; + if ((prot & VM_PROT_WRITE) != 0) { + new_l3 |= ATTR_SW_DBM; + if ((flags & VM_PROT_WRITE) == 0) + new_l3 |= ATTR_AP(ATTR_AP_RO); + } + } CTR2(KTR_PMAP, "pmap_enter: %.16lx -> %.16lx", va, pa); @@ -3199,12 +3299,9 @@ havel3: /* * No, might be a protection or wiring change. */ - if ((orig_l3 & ATTR_SW_MANAGED) != 0) { - if ((new_l3 & ATTR_AP(ATTR_AP_RW)) == - ATTR_AP(ATTR_AP_RW)) { - vm_page_aflag_set(m, PGA_WRITEABLE); - } - } + if ((orig_l3 & ATTR_SW_MANAGED) != 0 && + (new_l3 & ATTR_SW_DBM) != 0) + vm_page_aflag_set(m, PGA_WRITEABLE); goto validate; } @@ -3223,7 +3320,7 @@ havel3: * concurrent calls to pmap_page_test_mappings() and * pmap_ts_referenced(). */ - if (pmap_page_dirty(orig_l3)) + if (pmap_pte_dirty(orig_l3)) vm_page_dirty(om); if ((orig_l3 & ATTR_AF) != 0) vm_page_aflag_set(om, PGA_REFERENCED); @@ -3258,7 +3355,7 @@ havel3: CHANGE_PV_LIST_LOCK_TO_PHYS(&lock, pa); TAILQ_INSERT_TAIL(&m->md.pv_list, pv, pv_next); m->md.pv_gen++; - if ((new_l3 & ATTR_AP_RW_BIT) == ATTR_AP(ATTR_AP_RW)) + if ((new_l3 & ATTR_SW_DBM) != 0) vm_page_aflag_set(m, PGA_WRITEABLE); } @@ -3286,10 +3383,11 @@ validate: KASSERT(opa == pa, ("pmap_enter: invalid update")); if ((orig_l3 & ~ATTR_AF) != (new_l3 & ~ATTR_AF)) { /* same PA, different attributes */ + /* XXXMJ need to reload orig_l3 for hardware DBM. */ pmap_load_store(l3, new_l3); pmap_invalidate_page(pmap, va); - if (pmap_page_dirty(orig_l3) && - (orig_l3 & ATTR_SW_MANAGED) != 0) + if ((orig_l3 & ATTR_SW_MANAGED) != 0 && + pmap_pte_dirty(orig_l3)) vm_page_dirty(m); } else { /* @@ -3309,7 +3407,7 @@ validate: } } else { /* New mapping */ - pmap_load_store(l3, new_l3); + pmap_store(l3, new_l3); dsb(ishst); } @@ -3348,8 +3446,10 @@ pmap_enter_2mpage(pmap_t pmap, vm_offset_t va, vm_page new_l2 = (pd_entry_t)(VM_PAGE_TO_PHYS(m) | ATTR_DEFAULT | ATTR_IDX(m->md.pv_memattr) | ATTR_AP(ATTR_AP_RO) | L2_BLOCK); - if ((m->oflags & VPO_UNMANAGED) == 0) + if ((m->oflags & VPO_UNMANAGED) == 0) { new_l2 |= ATTR_SW_MANAGED; + new_l2 &= ~ATTR_AF; + } if ((prot & VM_PROT_EXECUTE) == 0 || m->md.pv_memattr == DEVICE_MEMORY) new_l2 |= ATTR_XN; if (va < VM_MAXUSER_ADDRESS) @@ -3409,12 +3509,16 @@ pmap_enter_l2(pmap_t pmap, vm_offset_t va, pd_entry_t vm_page_free_pages_toq(&free, true); if (va >= VM_MAXUSER_ADDRESS) { /* - * Both pmap_remove_l2() and pmap_remove_l3() will - * leave the kernel page table page zero filled. + * Both pmap_remove_l2() and pmap_remove_l3_range() + * will leave the kernel page table page zero filled. + * Nonetheless, the TLB could have an intermediate + * entry for the kernel page table page. */ mt = PHYS_TO_VM_PAGE(pmap_load(l2) & ~ATTR_MASK); if (pmap_insert_pt_page(pmap, mt, false)) panic("pmap_enter_l2: trie insert failed"); + pmap_clear(l2); + pmap_invalidate_page(pmap, va); } else KASSERT(pmap_load(l2) == 0, ("pmap_enter_l2: non-zero L2 entry %p", l2)); @@ -3428,10 +3532,13 @@ pmap_enter_l2(pmap_t pmap, vm_offset_t va, pd_entry_t SLIST_INIT(&free); if (pmap_unwire_l3(pmap, va, l2pg, &free)) { /* - * Although "va" is not mapped, paging-structure - * caches could nonetheless have entries that + * Although "va" is not mapped, the TLB could + * nonetheless have intermediate entries that * refer to the freed page table pages. * Invalidate those entries. + * + * XXX redundant invalidation (See + * _pmap_unwire_l3().) */ pmap_invalidate_page(pmap, va); vm_page_free_pages_toq(&free, true); @@ -3441,7 +3548,7 @@ pmap_enter_l2(pmap_t pmap, vm_offset_t va, pd_entry_t va, pmap); return (KERN_RESOURCE_SHORTAGE); } - if ((new_l2 & ATTR_AP_RW_BIT) == ATTR_AP(ATTR_AP_RW)) + if ((new_l2 & ATTR_SW_DBM) != 0) for (mt = m; mt < &m[L2_SIZE / PAGE_SIZE]; mt++) vm_page_aflag_set(mt, PGA_WRITEABLE); } @@ -3456,7 +3563,7 @@ pmap_enter_l2(pmap_t pmap, vm_offset_t va, pd_entry_t /* * Map the superpage. */ - (void)pmap_load_store(l2, new_l2); + pmap_store(l2, new_l2); dsb(ishst); atomic_add_long(&pmap_l2_mappings, 1); @@ -3649,15 +3756,17 @@ pmap_enter_quick_locked(pmap_t pmap, vm_offset_t va, v /* * Now validate mapping with RO protection */ - if ((m->oflags & VPO_UNMANAGED) == 0) + if ((m->oflags & VPO_UNMANAGED) == 0) { l3_val |= ATTR_SW_MANAGED; + l3_val &= ~ATTR_AF; + } /* Sync icache before the mapping is stored to PTE */ if ((prot & VM_PROT_EXECUTE) && pmap != kernel_pmap && m->md.pv_memattr == VM_MEMATTR_WRITE_BACK) cpu_icache_sync_range(PHYS_TO_DMAP(pa), PAGE_SIZE); - pmap_load_store(l3, l3_val); + pmap_store(l3, l3_val); dsb(ishst); return (mpte); @@ -3721,9 +3830,21 @@ pmap_unwire(pmap_t pmap, vm_offset_t sva, vm_offset_t continue; if ((pmap_load(l2) & ATTR_DESCR_MASK) == L2_BLOCK) { - l3 = pmap_demote_l2(pmap, l2, sva); - if (l3 == NULL) + if ((pmap_load(l2) & ATTR_SW_WIRED) == 0) + panic("pmap_unwire: l2 %#jx is missing " + "ATTR_SW_WIRED", (uintmax_t)pmap_load(l2)); + + /* + * Are we unwiring the entire large page? If not, + * demote the mapping and fall through. + */ + if (sva + L2_SIZE == va_next && eva >= va_next) { + pmap_clear_bits(l2, ATTR_SW_WIRED); + pmap->pm_stats.wired_count -= L2_SIZE / + PAGE_SIZE; continue; + } else if (pmap_demote_l2(pmap, l2, sva) == NULL) + panic("pmap_unwire: demotion failed"); } KASSERT((pmap_load(l2) & ATTR_DESCR_MASK) == L2_TABLE, ("pmap_unwire: Invalid l2 entry after demotion")); @@ -3739,11 +3860,11 @@ pmap_unwire(pmap_t pmap, vm_offset_t sva, vm_offset_t "ATTR_SW_WIRED", (uintmax_t)pmap_load(l3)); /* - * PG_W must be cleared atomically. Although the pmap - * lock synchronizes access to PG_W, another processor - * could be setting PG_M and/or PG_A concurrently. + * ATTR_SW_WIRED must be cleared atomically. Although + * the pmap lock synchronizes access to ATTR_SW_WIRED, + * the System MMU may write to the entry concurrently. */ - atomic_clear_long(l3, ATTR_SW_WIRED); + pmap_clear_bits(l3, ATTR_SW_WIRED); pmap->pm_stats.wired_count--; } } @@ -3767,7 +3888,7 @@ pmap_copy(pmap_t dst_pmap, pmap_t src_pmap, vm_offset_ struct rwlock *lock; struct spglist free; pd_entry_t *l0, *l1, *l2, srcptepaddr; - pt_entry_t *dst_pte, ptetemp, *src_pte; + pt_entry_t *dst_pte, mask, nbits, ptetemp, *src_pte; vm_offset_t addr, end_addr, va_next; vm_page_t dst_l2pg, dstmpte, srcmpte; @@ -3818,8 +3939,11 @@ pmap_copy(pmap_t dst_pmap, pmap_t src_pmap, vm_offset_ ((srcptepaddr & ATTR_SW_MANAGED) == 0 || pmap_pv_insert_l2(dst_pmap, addr, srcptepaddr, PMAP_ENTER_NORECLAIM, &lock))) { - (void)pmap_load_store(l2, srcptepaddr & - ~ATTR_SW_WIRED); + mask = ATTR_AF | ATTR_SW_WIRED; + nbits = 0; + if ((srcptepaddr & ATTR_SW_DBM) != 0) + nbits |= ATTR_AP_RW_BIT; + pmap_store(l2, (srcptepaddr & ~mask) | nbits); pmap_resident_count_inc(dst_pmap, L2_SIZE / PAGE_SIZE); atomic_add_long(&pmap_l2_mappings, 1); @@ -3863,11 +3987,12 @@ pmap_copy(pmap_t dst_pmap, pmap_t src_pmap, vm_offset_ /* * Clear the wired, modified, and accessed * (referenced) bits during the copy. - * - * XXX not yet */ - (void)pmap_load_store(dst_pte, ptetemp & - ~ATTR_SW_WIRED); + mask = ATTR_AF | ATTR_SW_WIRED; + nbits = 0; + if ((ptetemp & ATTR_SW_DBM) != 0) + nbits |= ATTR_AP_RW_BIT; + pmap_store(dst_pte, (ptetemp & ~mask) | nbits); pmap_resident_count_inc(dst_pmap, 1); } else { SLIST_INIT(&free); @@ -3875,8 +4000,8 @@ pmap_copy(pmap_t dst_pmap, pmap_t src_pmap, vm_offset_ &free)) { /* * Although "addr" is not mapped, - * paging-structure caches could - * nonetheless have entries that refer + * the TLB could nonetheless have + * intermediate entries that refer * to the freed page table pages. * Invalidate those entries. * @@ -4218,8 +4343,7 @@ pmap_remove_pages(pmap_t pmap) /* * Update the vm_page_t clean/reference bits. */ - if ((tpte & ATTR_AP_RW_BIT) == - ATTR_AP(ATTR_AP_RW)) { + if (pmap_pte_dirty(tpte)) { switch (lvl) { case 1: for (mt = m; mt < &m[L2_SIZE / PAGE_SIZE]; mt++) @@ -4494,7 +4618,7 @@ retry_pv_loop: } va = pv->pv_va; pte = pmap_pte(pmap, pv->pv_va, &lvl); - if ((pmap_load(pte) & ATTR_AP_RW_BIT) == ATTR_AP(ATTR_AP_RW)) + if ((pmap_load(pte) & ATTR_SW_DBM) != 0) (void)pmap_demote_l2_locked(pmap, pte, va, &lock); KASSERT(lock == VM_PAGE_TO_PV_LIST_LOCK(m), ("inconsistent pv lock %p %p for page %p", @@ -4517,13 +4641,14 @@ retry_pv_loop: } } pte = pmap_pte(pmap, pv->pv_va, &lvl); -retry: oldpte = pmap_load(pte); - if ((oldpte & ATTR_AP_RW_BIT) == ATTR_AP(ATTR_AP_RW)) { - if (!atomic_cmpset_long(pte, oldpte, - oldpte | ATTR_AP(ATTR_AP_RO))) +retry: + if ((oldpte & ATTR_SW_DBM) != 0) { + if (!atomic_fcmpset_long(pte, &oldpte, + (oldpte | ATTR_AP_RW_BIT) & ~ATTR_SW_DBM)) goto retry; - if ((oldpte & ATTR_AF) != 0) + if ((oldpte & ATTR_AP_RW_BIT) == + ATTR_AP(ATTR_AP_RW)) vm_page_dirty(m); pmap_invalidate_page(pmap, pv->pv_va); } @@ -4533,13 +4658,6 @@ retry: vm_page_aflag_clear(m, PGA_WRITEABLE); } -static __inline boolean_t -safe_to_clear_referenced(pmap_t pmap, pt_entry_t pte) -{ - - return (FALSE); -} - /* * pmap_ts_referenced: * @@ -4565,12 +4683,10 @@ pmap_ts_referenced(vm_page_t m) struct rwlock *lock; pd_entry_t *pde, tpde; pt_entry_t *pte, tpte; - pt_entry_t *l3; vm_offset_t va; vm_paddr_t pa; - int cleared, md_gen, not_cleared, lvl, pvh_gen; + int cleared, lvl, md_gen, not_cleared, pvh_gen; struct spglist free; - bool demoted; KASSERT((m->oflags & VPO_UNMANAGED) == 0, ("pmap_ts_referenced: page %p is not managed", m)); @@ -4609,7 +4725,7 @@ retry: ("pmap_ts_referenced: found an invalid l1 table")); pte = pmap_l1_to_l2(pde, pv->pv_va); tpte = pmap_load(pte); - if (pmap_page_dirty(tpte)) { + if (pmap_pte_dirty(tpte)) { /* * Although "tpte" is mapping a 2MB page, because * this function is called at a 4KB page granularity, @@ -4617,17 +4733,18 @@ retry: */ vm_page_dirty(m); } + if ((tpte & ATTR_AF) != 0) { /* - * Since this reference bit is shared by 512 4KB - * pages, it should not be cleared every time it is - * tested. Apply a simple "hash" function on the - * physical page number, the virtual superpage number, - * and the pmap address to select one 4KB page out of - * the 512 on which testing the reference bit will - * result in clearing that reference bit. This - * function is designed to avoid the selection of the - * same 4KB page for every 2MB page mapping. + * Since this reference bit is shared by 512 4KB pages, + * it should not be cleared every time it is tested. + * Apply a simple "hash" function on the physical page + * number, the virtual superpage number, and the pmap + * address to select one 4KB page out of the 512 on + * which testing the reference bit will result in + * clearing that reference bit. This function is + * designed to avoid the selection of the same 4KB page + * for every 2MB page mapping. * * On demotion, a mapping that hasn't been referenced * is simply destroyed. To avoid the possibility of a @@ -4639,39 +4756,9 @@ retry: if ((((pa >> PAGE_SHIFT) ^ (pv->pv_va >> L2_SHIFT) ^ (uintptr_t)pmap) & (Ln_ENTRIES - 1)) == 0 && (tpte & ATTR_SW_WIRED) == 0) { - if (safe_to_clear_referenced(pmap, tpte)) { - /* - * TODO: We don't handle the access - * flag at all. We need to be able - * to set it in the exception handler. - */ - panic("ARM64TODO: " - "safe_to_clear_referenced\n"); - } else if (pmap_demote_l2_locked(pmap, pte, - pv->pv_va, &lock) != NULL) { - demoted = true; - va += VM_PAGE_TO_PHYS(m) - - (tpte & ~ATTR_MASK); - l3 = pmap_l2_to_l3(pte, va); - pmap_remove_l3(pmap, l3, va, - pmap_load(pte), NULL, &lock); - } else - demoted = true; - - if (demoted) { - /* - * The superpage mapping was removed - * entirely and therefore 'pv' is no - * longer valid. - */ - if (pvf == pv) - pvf = NULL; - pv = NULL; - } + pmap_clear_bits(pte, ATTR_AF); + pmap_invalidate_page(pmap, pv->pv_va); cleared++; - KASSERT(lock == VM_PAGE_TO_PV_LIST_LOCK(m), - ("inconsistent pv lock %p %p for page %p", - lock, VM_PAGE_TO_PV_LIST_LOCK(m), m)); } else not_cleared++; } @@ -4713,32 +4800,13 @@ small_mappings: ("pmap_ts_referenced: found an invalid l2 table")); pte = pmap_l2_to_l3(pde, pv->pv_va); tpte = pmap_load(pte); - if (pmap_page_dirty(tpte)) + if (pmap_pte_dirty(tpte)) vm_page_dirty(m); if ((tpte & ATTR_AF) != 0) { - if (safe_to_clear_referenced(pmap, tpte)) { - /* - * TODO: We don't handle the access flag - * at all. We need to be able to set it in - * the exception handler. - */ - panic("ARM64TODO: safe_to_clear_referenced\n"); - } else if ((tpte & ATTR_SW_WIRED) == 0) { - /* - * Wired pages cannot be paged out so - * doing accessed bit emulation for - * them is wasted effort. We do the - * hard work for unwired pages only. - */ - pmap_remove_l3(pmap, pte, pv->pv_va, tpde, - &free, &lock); + if ((tpte & ATTR_SW_WIRED) == 0) { + pmap_clear_bits(pte, ATTR_AF); + pmap_invalidate_page(pmap, pv->pv_va); cleared++; - if (pvf == pv) - pvf = NULL; - pv = NULL; - KASSERT(lock == VM_PAGE_TO_PV_LIST_LOCK(m), - ("inconsistent pv lock %p %p for page %p", - lock, VM_PAGE_TO_PV_LIST_LOCK(m), m)); } else not_cleared++; } @@ -4773,6 +4841,14 @@ pmap_advise(pmap_t pmap, vm_offset_t sva, vm_offset_t void pmap_clear_modify(vm_page_t m) { + struct md_page *pvh; + struct rwlock *lock; + pmap_t pmap; + pv_entry_t next_pv, pv; + pd_entry_t *l2, oldl2; + pt_entry_t *l3, oldl3; + vm_offset_t va; + int md_gen, pvh_gen; KASSERT((m->oflags & VPO_UNMANAGED) == 0, ("pmap_clear_modify: page %p is not managed", m)); @@ -4781,14 +4857,81 @@ pmap_clear_modify(vm_page_t m) ("pmap_clear_modify: page %p is exclusive busied", m)); /* - * If the page is not PGA_WRITEABLE, then no PTEs can have PG_M set. - * If the object containing the page is locked and the page is not + * If the page is not PGA_WRITEABLE, then no PTEs can have ATTR_SW_DBM + * set. If the object containing the page is locked and the page is not * exclusive busied, then PGA_WRITEABLE cannot be concurrently set. */ if ((m->aflags & PGA_WRITEABLE) == 0) return; - - /* ARM64TODO: We lack support for tracking if a page is modified */ + pvh = (m->flags & PG_FICTITIOUS) != 0 ? &pv_dummy : + pa_to_pvh(VM_PAGE_TO_PHYS(m)); + lock = VM_PAGE_TO_PV_LIST_LOCK(m); + rw_wlock(lock); +restart: + TAILQ_FOREACH_SAFE(pv, &pvh->pv_list, pv_next, next_pv) { + pmap = PV_PMAP(pv); + if (!PMAP_TRYLOCK(pmap)) { + pvh_gen = pvh->pv_gen; + rw_wunlock(lock); + PMAP_LOCK(pmap); + rw_wlock(lock); + if (pvh_gen != pvh->pv_gen) { + PMAP_UNLOCK(pmap); + goto restart; + } + } + va = pv->pv_va; + l2 = pmap_l2(pmap, va); + oldl2 = pmap_load(l2); + if ((oldl2 & ATTR_SW_DBM) != 0) { + if (pmap_demote_l2_locked(pmap, l2, va, &lock)) { + if ((oldl2 & ATTR_SW_WIRED) == 0) { + /* + * Write protect the mapping to a + * single page so that a subsequent + * write access may repromote. + */ + va += VM_PAGE_TO_PHYS(m) - + (oldl2 & ~ATTR_MASK); + l3 = pmap_l2_to_l3(l2, va); + oldl3 = pmap_load(l3); + if (pmap_l3_valid(oldl3)) { + while (!atomic_fcmpset_long(l3, + &oldl3, (oldl3 & ~ATTR_SW_DBM) | + ATTR_AP(ATTR_AP_RO))) + cpu_spinwait(); + vm_page_dirty(m); + pmap_invalidate_page(pmap, va); + } + } + } + } + PMAP_UNLOCK(pmap); + } + TAILQ_FOREACH(pv, &m->md.pv_list, pv_next) { + pmap = PV_PMAP(pv); + if (!PMAP_TRYLOCK(pmap)) { + md_gen = m->md.pv_gen; + pvh_gen = pvh->pv_gen; + rw_wunlock(lock); + PMAP_LOCK(pmap); + rw_wlock(lock); + if (pvh_gen != pvh->pv_gen || md_gen != m->md.pv_gen) { + PMAP_UNLOCK(pmap); + goto restart; + } + } + l2 = pmap_l2(pmap, pv->pv_va); + l3 = pmap_l2_to_l3(l2, pv->pv_va); + oldl3 = pmap_load(l3); + if (pmap_l3_valid(oldl3) && + (oldl3 & (ATTR_AP_RW_BIT | ATTR_SW_DBM)) == ATTR_SW_DBM) { + pmap_set_bits(l3, ATTR_AP(ATTR_AP_RO)); + pmap_invalidate_page(pmap, pv->pv_va); + } + PMAP_UNLOCK(pmap); + } + rw_wunlock(lock); } void * @@ -5168,6 +5311,17 @@ pmap_demote_l1(pmap_t pmap, pt_entry_t *l1, vm_offset_ } static void +pmap_fill_l3(pt_entry_t *firstl3, pt_entry_t newl3) +{ + pt_entry_t *l3; + + for (l3 = firstl3; l3 - firstl3 < Ln_ENTRIES; l3++) { + *l3 = newl3; + newl3 += L3_SIZE; + } +} + +static void pmap_demote_l2_abort(pmap_t pmap, vm_offset_t va, pt_entry_t *l2, struct rwlock **lockp) { @@ -5188,9 +5342,8 @@ pmap_demote_l2_locked(pmap_t pmap, pt_entry_t *l2, vm_ { pt_entry_t *l3, newl3, oldl2; vm_offset_t tmpl2; - vm_paddr_t l3phys, phys; + vm_paddr_t l3phys; vm_page_t ml3; - int i; PMAP_LOCK_ASSERT(pmap, MA_OWNED); l3 = NULL; @@ -5262,28 +5415,19 @@ pmap_demote_l2_locked(pmap_t pmap, pt_entry_t *l2, vm_ pmap_resident_count_inc(pmap, 1); } } - l3phys = VM_PAGE_TO_PHYS(ml3); l3 = (pt_entry_t *)PHYS_TO_DMAP(l3phys); + newl3 = (oldl2 & ~ATTR_DESCR_MASK) | L3_PAGE; + KASSERT((oldl2 & (ATTR_AP_RW_BIT | ATTR_SW_DBM)) != + (ATTR_AP(ATTR_AP_RO) | ATTR_SW_DBM), + ("pmap_demote_l2: L2 entry is writeable but not dirty")); - /* Address the range points at */ - phys = oldl2 & ~ATTR_MASK; - /* The attributed from the old l2 table to be copied */ - newl3 = (oldl2 & (ATTR_MASK & ~ATTR_DESCR_MASK)) | L3_PAGE; - /* *** DIFF OUTPUT TRUNCATED AT 1000 LINES ***
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201909171728.x8HHSiwe007582>