From owner-freebsd-mips@FreeBSD.ORG Mon Jun 7 18:49:47 2010 Return-Path: Delivered-To: mips@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 19B471065679; Mon, 7 Jun 2010 18:49:47 +0000 (UTC) (envelope-from alc@cs.rice.edu) Received: from mail.cs.rice.edu (mail.cs.rice.edu [128.42.1.31]) by mx1.freebsd.org (Postfix) with ESMTP id CCD678FC16; Mon, 7 Jun 2010 18:49:46 +0000 (UTC) Received: from mail.cs.rice.edu (localhost.localdomain [127.0.0.1]) by mail.cs.rice.edu (Postfix) with ESMTP id 578C22C2ACA; Mon, 7 Jun 2010 13:49:46 -0500 (CDT) X-Virus-Scanned: by amavis-2.4.0 at mail.cs.rice.edu Received: from mail.cs.rice.edu ([127.0.0.1]) by mail.cs.rice.edu (mail.cs.rice.edu [127.0.0.1]) (amavisd-new, port 10024) with LMTP id JpwGxz+w3Agl; Mon, 7 Jun 2010 13:49:38 -0500 (CDT) Received: from adsl-216-63-78-18.dsl.hstntx.swbell.net (adsl-216-63-78-18.dsl.hstntx.swbell.net [216.63.78.18]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.cs.rice.edu (Postfix) with ESMTP id 9D0662C2A02; Mon, 7 Jun 2010 13:49:37 -0500 (CDT) Message-ID: <4C0D3F40.2070101@cs.rice.edu> Date: Mon, 07 Jun 2010 13:49:36 -0500 From: Alan Cox User-Agent: Thunderbird 2.0.0.24 (X11/20100501) MIME-Version: 1.0 To: "C. Jayachandran" References: <201005271005.o4RA5eVu032269@svn.freebsd.org> <4C058145.3000707@cs.rice.edu> <4C05F9BC.40409@cs.rice.edu> <4C0731A5.7090301@cs.rice.edu> <4C07E07B.9060802@cs.rice.edu> <4C09345F.9040300@cs.rice.edu> <4C0D2BEA.6060103@cs.rice.edu> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: "Jayachandran C." , Konstantin Belousov , Alan Cox , mips@freebsd.org Subject: Re: svn commit: r208589 - head/sys/mips/mips X-BeenThere: freebsd-mips@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Porting FreeBSD to MIPS List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 07 Jun 2010 18:49:47 -0000 C. Jayachandran wrote: > On Mon, Jun 7, 2010 at 10:57 PM, Alan Cox wrote: > >> C. Jayachandran wrote: >> >>> On Fri, Jun 4, 2010 at 10:44 PM, Alan Cox wrote: >>> >>> >>>> C. Jayachandran wrote: >>>> >>>> >>>>> On Thu, Jun 3, 2010 at 10:33 PM, Alan Cox wrote: >>>>> >>>>> >>>>> >>>> [snip] >>>> >>>> >>>>>> Add a static counter to pmap_ptpgzone_allocf(). When the counter >>>>>> reaches >>>>>> some not-too-small prime number, don't call vm_phys_alloc_contig(), >>>>>> instead >>>>>> set "m" to NULL and reset the counter to zero. Setting "m" to NULL >>>>>> should >>>>>> then trigger the vm_contig_grow_cache() call. Make sense? >>>>>> >>>>>> >>>>>> >>>>> Is this to move the vm_contig_grow_cache() out of the >>>>> pmap_ptpgzone_allocf() and into the function calling uma_zalloc()? I >>>>> am wondering why the prime number is needed. >>>>> >>>>> Another implementation I had thought about was to have a kernel >>>>> process maintain a pool of direct mapped pages for MIPS. This will be >>>>> woken up if the pool goes below a low water mark - any comments on >>>>> this? >>>>> >>>>> >>>>> >>>>> >>>> Here is how the page table page allocation should be done. It's not much >>>> harder than the vm_contig_grow_cache() change. >>>> >>>> 1. Look at vm/vm_phys.c, in particular, vm_phys_init(). Create a new >>>> freelist, like VM_FREELIST_ISADMA, for the direct-mapped memory on >>>> MIPS. Perhaps, call it VM_FREELIST_LOWMEM. The controlling macros >>>> should >>>> be defined in mips/include/vmparam.h. >>>> >>>> 2. Trivially refactor vm_phys_alloc_pages(). Specifically, take the body >>>> of >>>> the outer for loop and make it a new function, vm_phys_alloc_freelist(), >>>> that takes the variable "flind" as a parameter. >>>> >>>> 3. Eliminate the UMA zone for page table pages. In place of >>>> vm_phys_alloc_contig() call your new function with VM_FREELIST_LOWMEM. >>>> (The >>>> vm_contig_grow_cache() remains unchanged.) Go back to calling >>>> vm_page_free() to release page table pages. >>>> >>>> For the record, the changes that you made to start using a zone for page >>>> table page allocation introduced at least one possible race condition >>>> between pmap_remove_pages() and pmap_remove_all(). Specifically, by >>>> dropping and reacquiring the page queues and pmap lock when you free a >>>> page >>>> table page, you allow a pmap_remove_all() call while pmap_remove_pages() >>>> is >>>> running to leave the variable "npv" in pmap_remove_pages() pointing at a >>>> freed pv entry. >>>> >>>> >>> My first attempt is attached, it comes up multiuser but crashes if I >>> stress it a bit (panic: Bad link elm 0xc0078f00 prev->next != elm). >>> Will look at this tomorrow, and see if I can find the cause. >>> >>> >>> >> Where exactly is this occurring? >> >> >>> In the meantime, it may be worth looking at the patch to see if this >>> is in line with the approach you had suggested. I have tried to use >>> VM_FREELIST_HIGHMEM which is already there, instead of adding >>> VM_FREELIST_LOWMEM. >>> >>> >>> >> Basically, yes. At first, I was taken aback by defining VM_FREELIST_DEFAULT >> as other than zero, but it shouldn't break anything. >> > > I can change this to add VM_FREELIST_LOWMEM. This works okay so far, > even though I haven't really checked the code to see if it works when > there is no memory in HIGHMEM. > > I think that it will. It will just waste a little time looking at empty queues. >> The one issue that I see with the patch is that you'll need to release and >> reacquire the pmap and page queues locks around the call to >> vm_contig_grow_cache(). However, if you release these locks, you need to >> restart the pmap_allocpte(), i.e., perform the "goto retry;", because while >> you slept another processor/thread might have already allocated and >> installed the page table page that you are trying to allocate into the page >> table. >> >> For what it's worth, this same race condition exists in the current code in >> subversion using an UMA zone, but not in the code that predates use of an >> UMA zone. >> > > I have been testing with an updated version of the patch - my -j128 > buildworld works on this. The sys/vm side changes are simple enough, > but the he page table page allocation part is still not complete. The > version I have now is (with gmail whitespace damage): > > --- > static vm_page_t > pmap_alloc_pte_page(pmap_t pmap, unsigned int index, int wait, vm_offset_t *vap) > { > vm_paddr_t paddr; > vm_page_t m; > int tries, flags; > > tries = 0; > retry: > mtx_lock(&vm_page_queue_free_mtx); > m = vm_phys_alloc_freelist_pages(VM_FREELIST_DEFAULT, > VM_FREEPOOL_DEFAULT, 0); > if (m == NULL) { > mtx_unlock(&vm_page_queue_free_mtx); > if (tries < ((wait & M_NOWAIT) != 0 ? 1 : 3)) { > vm_contig_grow_cache(tries, 0, MIPS_KSEG0_LARGEST_PHYS); > tries++; > goto retry; > } else { > printf("[%d] %s wait %x, fail!\n", tries, __func__, > wait); > return (NULL); > } > } > > m->pindex = index; > m->valid = VM_PAGE_BITS_ALL; > atomic_add_int(&cnt.v_wire_count, 1); > m->wire_count = 1; > if (m->flags & PG_ZERO) > vm_page_zero_count--; > else > pmap_zero_page(m); > flags = PG_ZERO | PG_UNMANAGED; > m->flags = flags; > m->oflags = 0; > m->act_count = 0; > if ((m->flags & PG_CACHED) != 0) > printf("Not yet! handle cached page %p flags 0x%x\n", > m, m->flags); > else > cnt.v_free_count--; > > m->act_count = 0; > mtx_unlock(&vm_page_queue_free_mtx); > } > > > ---- > > I tried to match the vm_page_alloc() code here. I'm not sure if I have > to handle the PG_CACHED case, and if it is okay to change the VM > counters here. > > You do. In the end, I suspect that this function will be added to the machine-independent layer. > Also, I will change the code to return NULL all the way and retry > allocation (similar to the old code which called VM_WAIT and then > returned NULL). > > Probably I should get the whole code in line with the old code, > replacing the old vm_page_alloc() call with the function above without > retry, and the VM_WAIT with a new function which does the > vm_contig_grow_cache(). > > You may find it easier if you move the call to vm_contig_grow_cache() to pmap_allocpte(). Alan