Date: Wed, 9 Jun 2010 23:41:57 +0530 From: "Jayachandran C." <c.jayachandran@gmail.com> To: Alan Cox <alc@cs.rice.edu> Cc: Kostik Belousov <kostikbel@gmail.com>, "Jayachandran C." <jchandra@freebsd.org>, mips@freebsd.org Subject: Re: svn commit: r208589 - head/sys/mips/mips Message-ID: <AANLkTimRND9G9udHWzhN06wTJcRCN-OEjPPXctOoyj9_@mail.gmail.com> In-Reply-To: <AANLkTimWh77REpi3ZD0BOihZit5qKNYNbtAx5PWQRYBX@mail.gmail.com> References: <AANLkTimIa3jmBPMhWIOcY6DenGpZ2ZYmqwDTWspVx0-u@mail.gmail.com> <AANLkTil2gE1niUWCHnsTlQvibhxBh7QYwD0TTWo0rj5c@mail.gmail.com> <AANLkTinA2D5iTDGPbflHVzLyAZW-ZewjJkUWWL8FVskr@mail.gmail.com> <4C07E07B.9060802@cs.rice.edu> <AANLkTimjyPc_AXKP1yaJaF1BN7CAGBeNikVzcp9OCb4P@mail.gmail.com> <4C09345F.9040300@cs.rice.edu> <AANLkTinmFOZY3OlaoKStxlNIRBt2G2I4ILkQ1P0CjozG@mail.gmail.com> <4C0D2BEA.6060103@cs.rice.edu> <AANLkTikZxx_30H9geHvZYkYd0sE-wiuZljEd0PAi14ca@mail.gmail.com> <4C0D3F40.2070101@cs.rice.edu> <20100607202844.GU83316@deviant.kiev.zoral.com.ua> <4C0D64B7.7060604@cs.rice.edu> <AANLkTilBxdXxXrWC1cAT0wX9ubmFrvaAdk4feG6PwDYQ@mail.gmail.com> <4C0DE424.9080601@cs.rice.edu> <AANLkTinzIUOykgwtHlJ2vDwYS9as3ha_BYiy_qRd5h2Q@mail.gmail.com> <AANLkTimWh77REpi3ZD0BOihZit5qKNYNbtAx5PWQRYBX@mail.gmail.com>
index | next in thread | previous in thread | raw e-mail
[-- Attachment #1 --] On Wed, Jun 9, 2010 at 11:20 AM, Jayachandran C. <c.jayachandran@gmail.com> wrote: > On Wed, Jun 9, 2010 at 3:01 AM, Jayachandran C. > <c.jayachandran@gmail.com> wrote: >> On Tue, Jun 8, 2010 at 12:03 PM, Alan Cox <alc@cs.rice.edu> wrote: >>> >>> VM_FREEPOOL_DIRECT is used by at least amd64 and ia64 for page table pages >>> and small kernel memory allocations. Unlike mips, these machines don't have >>> MMU support for a direct map. Their direct maps are just a range of >>> mappings in the regular (kernel) page table. So, unlike mips, accesses >>> through their direct map may still miss in the TLB and require a page table >>> walk. VM_FREEPOOL_* is a way to increase the physical locality (or >>> clustering) of page allocations, so that, for example, page table page >>> accesses by the pmap on amd64 are less likely to miss in the TLB. However, >>> it doesn't place a hard restriction on the range of physical addresses that >>> will be used, which you need for mips. >>> >>> The impact of this clustering can be significant. For example, on amd64 we >>> use 2MB page mappings to implement the direct map. However, old Opterons >>> only had 8 data TLB entries for 2MB page mappings. For a uniprocessor >>> kernel running on such an Opteron, I measured an 18% reduction in system >>> time during a buildworld with the introduction of VM_FREEPOOL_DIRECT. (See >>> the commit logs for vm/vm_phys.c and the comment that precedes the >>> VM_NFREEORDER definition on amd64.) >>> >>> Until such time as superpage support is ported to mips from the amd64/i386 >>> pmaps, I don't think there is a point in having more than one VM_FREEPOOL_* >>> on mips. And then, the point would be to reduce fragmentation of the >>> physical memory that could be caused by small allocations, such as page >>> table pages. >> >> Thanks for the detailed explanation. >> >> Also, after looking at the code again, I think vm_phys_alloc_contig() >> can optimized not to look into segments which lie outside the area of >> interest. The patch is: > [...] >> This change, along with the vmparam.h changes for HIGHMEM, I think we >> should be able to use vm_phys_alloc_contig() for page table pages (or >> have I again missed something fundamental?). > > That patch was obviously wrong - many segments can map to a freelist > as the comment right above my change noted. > > But the idea of eliminating freelists for which all the segments are > outside (low,high) may still be useful, will look at this some more. I have attached a patch (also at http://people.freebsd.org/~jchandra/pmap-with-HIGHMEM-freelist.patch) which reverts most of the changes I did to convert the page table page allocation to use UMA zone, and replaces it with an implementation using vm_phys_alloc_contig() and vm_contig_grow_cache(). This creates a new HIGHMEM freelist for mips for memory outside the KSEG0 area, and makes a few changes in vm_phys_alloc_contig() to skip freelists for which all the segments fall outside the address range requested. With this the buildworld perf on MIPS is similar to what I got with the older code with zones. If this approach is okay, I will do another round of testing(buildworld passes, but I haven't really tested the case where grow_cache is called). If the changes are not okay, I will add another page allocator which takes freelist as argument as you had suggested earlier, instead of the vm_phys_alloc_contig() changes. Thanks, JC. [-- Attachment #2 --] Index: sys/mips/include/vmparam.h =================================================================== --- sys/mips/include/vmparam.h (revision 208890) +++ sys/mips/include/vmparam.h (working copy) @@ -103,8 +103,9 @@ #define VM_MAXUSER_ADDRESS ((vm_offset_t)0x80000000) #define VM_MAX_MMAP_ADDR VM_MAXUSER_ADDRESS -#define VM_MIN_KERNEL_ADDRESS ((vm_offset_t)0xC0000000) -#define VM_MAX_KERNEL_ADDRESS ((vm_offset_t)0xFFFFC000) +#define VM_MIN_KERNEL_ADDRESS ((vm_offset_t)0xC0000000) +#define VM_MAX_KERNEL_ADDRESS ((vm_offset_t)0xFFFFC000) +#define VM_HIGHMEM_ADDRESS ((vm_paddr_t)0x20000000) #if 0 #define KERNBASE (VM_MIN_KERNEL_ADDRESS) #else @@ -168,13 +169,15 @@ #define VM_FREEPOOL_DIRECT 1 /* - * we support 1 free list: + * we support 2 free lists: * - * - DEFAULT for all systems + * - DEFAULT for direct mapped (KSEG0) pages + * - HIGHMEM for other pages */ -#define VM_NFREELIST 1 -#define VM_FREELIST_DEFAULT 0 +#define VM_NFREELIST 2 +#define VM_FREELIST_DEFAULT 1 +#define VM_FREELIST_HIGHMEM 0 /* * The largest allocation size is 1MB. Index: sys/mips/mips/pmap.c =================================================================== --- sys/mips/mips/pmap.c (revision 208890) +++ sys/mips/mips/pmap.c (working copy) @@ -184,8 +184,6 @@ static int init_pte_prot(vm_offset_t va, vm_page_t m, vm_prot_t prot); static void pmap_TLB_invalidate_kernel(vm_offset_t); static void pmap_TLB_update_kernel(vm_offset_t, pt_entry_t); -static vm_page_t pmap_alloc_pte_page(pmap_t, unsigned int, int, vm_offset_t *); -static void pmap_release_pte_page(vm_page_t); #ifdef SMP static void pmap_invalidate_page_action(void *arg); @@ -193,10 +191,6 @@ static void pmap_update_page_action(void *arg); #endif -static void pmap_ptpgzone_dtor(void *mem, int size, void *arg); -static void *pmap_ptpgzone_allocf(uma_zone_t, int, u_int8_t *, int); -static uma_zone_t ptpgzone; - struct local_sysmaps { struct mtx lock; vm_offset_t base; @@ -329,7 +323,7 @@ } /* - * Bootstrap the system enough to run with virtual memory. This + * Bootstrap the system enough to run with virtual memory. This * assumes that the phys_avail array has been initialized. */ void @@ -535,10 +529,6 @@ pv_entry_max = PMAP_SHPGPERPROC * maxproc + cnt.v_page_count; pv_entry_high_water = 9 * (pv_entry_max / 10); uma_zone_set_obj(pvzone, &pvzone_obj, pv_entry_max); - - ptpgzone = uma_zcreate("PT ENTRY", PAGE_SIZE, NULL, pmap_ptpgzone_dtor, - NULL, NULL, PAGE_SIZE - 1, UMA_ZONE_NOFREE | UMA_ZONE_ZINIT); - uma_zone_set_allocf(ptpgzone, pmap_ptpgzone_allocf); } /*************************************************** @@ -885,12 +875,8 @@ /* * If the page is finally unwired, simply free it. */ + vm_page_free_zero(m); atomic_subtract_int(&cnt.v_wire_count, 1); - PMAP_UNLOCK(pmap); - vm_page_unlock_queues(); - pmap_release_pte_page(m); - vm_page_lock_queues(); - PMAP_LOCK(pmap); return (1); } @@ -949,96 +935,36 @@ bzero(&pmap->pm_stats, sizeof pmap->pm_stats); } + static void -pmap_ptpgzone_dtor(void *mem, int size, void *arg) +pmap_grow_page_cache(int wait) { -#ifdef INVARIANTS - static char zeropage[PAGE_SIZE]; - - KASSERT(size == PAGE_SIZE, - ("pmap_ptpgzone_dtor: invalid size %d", size)); - KASSERT(bcmp(mem, zeropage, PAGE_SIZE) == 0, - ("pmap_ptpgzone_dtor: freeing a non-zeroed page")); -#endif + printf("[%s] wait %x\n", __func__, wait); + vm_contig_grow_cache(3, 0, MIPS_KSEG0_LARGEST_PHYS); } -static void * -pmap_ptpgzone_allocf(uma_zone_t zone, int bytes, u_int8_t *flags, int wait) -{ - vm_page_t m; - vm_paddr_t paddr; - int tries; - - KASSERT(bytes == PAGE_SIZE, - ("pmap_ptpgzone_allocf: invalid allocation size %d", bytes)); - *flags = UMA_SLAB_PRIV; - tries = 0; -retry: - m = vm_phys_alloc_contig(1, 0, MIPS_KSEG0_LARGEST_PHYS, - PAGE_SIZE, PAGE_SIZE); - if (m == NULL) { - if (tries < ((wait & M_NOWAIT) != 0 ? 1 : 3)) { - vm_contig_grow_cache(tries, 0, MIPS_KSEG0_LARGEST_PHYS); - tries++; - goto retry; - } else - return (NULL); - } - - paddr = VM_PAGE_TO_PHYS(m); - return ((void *)MIPS_PHYS_TO_KSEG0(paddr)); -} - static vm_page_t -pmap_alloc_pte_page(pmap_t pmap, unsigned int index, int wait, vm_offset_t *vap) +pmap_alloc_pte_page(unsigned int index, int wait) { - vm_paddr_t paddr; - void *va; vm_page_t m; - int locked; - locked = mtx_owned(&pmap->pm_mtx); - if (locked) { - mtx_assert(&vm_page_queue_mtx, MA_OWNED); - PMAP_UNLOCK(pmap); - vm_page_unlock_queues(); - } - va = uma_zalloc(ptpgzone, wait); - if (locked) { - vm_page_lock_queues(); - PMAP_LOCK(pmap); - } - if (va == NULL) + m = vm_phys_alloc_contig(1, 0, MIPS_KSEG0_LARGEST_PHYS, + PAGE_SIZE, 0); + if (m == NULL) return (NULL); - paddr = MIPS_KSEG0_TO_PHYS(va); - m = PHYS_TO_VM_PAGE(paddr); - - if (!locked) - vm_page_lock_queues(); + if ((m->flags & PG_ZERO) == 0) + pmap_zero_page(m); + m->pindex = index; m->valid = VM_PAGE_BITS_ALL; - m->wire_count = 1; - if (!locked) - vm_page_unlock_queues(); - atomic_add_int(&cnt.v_wire_count, 1); - *vap = (vm_offset_t)va; + m->wire_count = 1; return (m); } -static void -pmap_release_pte_page(vm_page_t m) -{ - void *va; - vm_paddr_t paddr; - paddr = VM_PAGE_TO_PHYS(m); - va = (void *)MIPS_PHYS_TO_KSEG0(paddr); - uma_zfree(ptpgzone, va); -} - /* * Initialize a preallocated and zeroed pmap structure, * such as one in a vmspace structure. @@ -1055,10 +981,10 @@ /* * allocate the page directory page */ - ptdpg = pmap_alloc_pte_page(pmap, NUSERPGTBLS, M_WAITOK, &ptdva); - if (ptdpg == NULL) - return (0); + while ((ptdpg = pmap_alloc_pte_page(NUSERPGTBLS, M_WAITOK)) == NULL) + pmap_grow_page_cache(M_WAITOK); + ptdva = MIPS_PHYS_TO_KSEG0(VM_PAGE_TO_PHYS(ptdpg)); pmap->pm_segtab = (pd_entry_t *)ptdva; pmap->pm_active = 0; pmap->pm_ptphint = NULL; @@ -1089,15 +1015,28 @@ /* * Find or fabricate a new pagetable page */ - m = pmap_alloc_pte_page(pmap, ptepindex, flags, &pteva); - if (m == NULL) + if ((m = pmap_alloc_pte_page(ptepindex, flags)) == NULL) { + if (flags & M_WAITOK) { + PMAP_UNLOCK(pmap); + vm_page_unlock_queues(); + pmap_grow_page_cache(flags); + vm_page_lock_queues(); + PMAP_LOCK(pmap); + } + + /* + * Indicate the need to retry. While waiting, the page + * table page may have been allocated. + */ return (NULL); + } /* * Map the pagetable page into the process address space, if it * isn't already there. */ + pteva = MIPS_PHYS_TO_KSEG0(VM_PAGE_TO_PHYS(m)); pmap->pm_stats.resident_count++; pmap->pm_segtab[ptepindex] = (pd_entry_t)pteva; @@ -1193,7 +1132,7 @@ ptdpg->wire_count--; atomic_subtract_int(&cnt.v_wire_count, 1); - pmap_release_pte_page(ptdpg); + vm_page_free_zero(ptdpg); PMAP_LOCK_DESTROY(pmap); } @@ -1203,7 +1142,6 @@ void pmap_growkernel(vm_offset_t addr) { - vm_offset_t pageva; vm_page_t nkpg; pt_entry_t *pte; int i; @@ -1238,13 +1176,13 @@ /* * This index is bogus, but out of the way */ - nkpg = pmap_alloc_pte_page(kernel_pmap, nkpt, M_NOWAIT, &pageva); + nkpg = pmap_alloc_pte_page(nkpt, M_NOWAIT); if (!nkpg) panic("pmap_growkernel: no memory to grow kernel"); nkpt++; - pte = (pt_entry_t *)pageva; + pte = (pt_entry_t *)MIPS_PHYS_TO_KSEG0(VM_PAGE_TO_PHYS(nkpg)); segtab_pde(kernel_segmap, kernel_vm_end) = (pd_entry_t)pte; /* Index: sys/vm/vm_phys.c =================================================================== --- sys/vm/vm_phys.c (revision 208890) +++ sys/vm/vm_phys.c (working copy) @@ -66,6 +66,7 @@ vm_paddr_t end; vm_page_t first_page; struct vm_freelist (*free_queues)[VM_NFREEPOOL][VM_NFREEORDER]; + int flind; }; static struct vm_phys_seg vm_phys_segs[VM_PHYSSEG_MAX]; @@ -188,6 +189,7 @@ seg = &vm_phys_segs[vm_phys_nsegs++]; seg->start = start; seg->end = end; + seg->flind = flind; #ifdef VM_PHYSSEG_SPARSE seg->first_page = &vm_page_array[pages]; #else @@ -595,7 +597,8 @@ vm_object_t m_object; vm_paddr_t pa, pa_last, size; vm_page_t deferred_vdrop_list, m, m_ret; - int flind, i, oind, order, pind; + int flind, i, oind, order, pind, segind; + int fl_good[VM_NFREELIST]; size = npages << PAGE_SHIFT; KASSERT(size != 0, @@ -607,11 +610,22 @@ deferred_vdrop_list = NULL; /* Compute the queue that is the best fit for npages. */ for (order = 0; (1 << order) < npages; order++); + /* Compute the freelists that should be searched */ + for (flind = 0; flind < vm_nfreelists; flind++) + fl_good[flind] = 0; + for (segind = 0; segind < vm_phys_nsegs; segind++) { + seg = &vm_phys_segs[segind]; + if (low < seg->end && high > seg->start) + fl_good[seg->flind]++; + } mtx_lock(&vm_page_queue_free_mtx); #if VM_NRESERVLEVEL > 0 retry: #endif for (flind = 0; flind < vm_nfreelists; flind++) { + if (fl_good[flind] == 0) + continue; + for (oind = min(order, VM_NFREEORDER - 1); oind < VM_NFREEORDER; oind++) { for (pind = 0; pind < VM_NFREEPOOL; pind++) { fl = vm_phys_free_queues[flind][pind];home | help
Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?AANLkTimRND9G9udHWzhN06wTJcRCN-OEjPPXctOoyj9_>
