From owner-freebsd-mips@FreeBSD.ORG Mon Jun 7 18:17:49 2010 Return-Path: Delivered-To: mips@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 193F1106567D; Mon, 7 Jun 2010 18:17:49 +0000 (UTC) (envelope-from c.jayachandran@gmail.com) Received: from mail-gw0-f54.google.com (mail-gw0-f54.google.com [74.125.83.54]) by mx1.freebsd.org (Postfix) with ESMTP id A69618FC1B; Mon, 7 Jun 2010 18:17:48 +0000 (UTC) Received: by gwj20 with SMTP id 20so806057gwj.13 for ; Mon, 07 Jun 2010 11:17:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=JVqdRsA4Ne82HDGbhEmCRtKT9o0/FcRdaKS/rcqV/uU=; b=hWkyyRx4dD9R84AhZfL4dxW6V1HExRr7j7d76jdQiDDKvPqJVoUuEVTSrIeS3P0N33 RgA9O68vbLUnSpPqmxfCsgvK/lNXYhOXHJ7gnD5m+jZy7/Ph08jbYAUTKo2/xLMTzRvR OPkCH/Zt2gwdyaDRLSOTJ6GqCnswTs0xzz23k= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; b=dJ0Oe0SFQ0mteqhZmL/h3fiUe787L6BwEaUJjnFFVEpxZd4fAi8SI7nr+8cQgoLw0z +4FEjYLh9KiGW4mDUAkcSUs79tYYYSLU430IfVuaSmZIglOgmse1GV8hsgONs2cL3MOa zHjl35OUhQwPtcW4a8MYBqiTffbA5bQrZd1qY= MIME-Version: 1.0 Received: by 10.229.97.5 with SMTP id j5mr5009350qcn.133.1275934667335; Mon, 07 Jun 2010 11:17:47 -0700 (PDT) Received: by 10.220.189.13 with HTTP; Mon, 7 Jun 2010 11:17:47 -0700 (PDT) In-Reply-To: <4C0D2BEA.6060103@cs.rice.edu> References: <201005271005.o4RA5eVu032269@svn.freebsd.org> <4C058145.3000707@cs.rice.edu> <4C05F9BC.40409@cs.rice.edu> <4C0731A5.7090301@cs.rice.edu> <4C07E07B.9060802@cs.rice.edu> <4C09345F.9040300@cs.rice.edu> <4C0D2BEA.6060103@cs.rice.edu> Date: Mon, 7 Jun 2010 23:47:47 +0530 Message-ID: From: "C. Jayachandran" To: Alan Cox Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Cc: "Jayachandran C." , Konstantin Belousov , mips@freebsd.org Subject: Re: svn commit: r208589 - head/sys/mips/mips X-BeenThere: freebsd-mips@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Porting FreeBSD to MIPS List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 07 Jun 2010 18:17:49 -0000 On Mon, Jun 7, 2010 at 10:57 PM, Alan Cox wrote: > C. Jayachandran wrote: >> >> On Fri, Jun 4, 2010 at 10:44 PM, Alan Cox wrote: >> >>> >>> C. Jayachandran wrote: >>> >>>> >>>> On Thu, Jun 3, 2010 at 10:33 PM, Alan Cox wrote: >>>> >>>> >>> >>> [snip] >>> >>>>> >>>>> Add a static counter to pmap_ptpgzone_allocf(). =A0When the counter >>>>> reaches >>>>> some not-too-small prime number, don't call vm_phys_alloc_contig(), >>>>> instead >>>>> set "m" to NULL and reset the counter to zero. =A0Setting "m" to NULL >>>>> should >>>>> then trigger the vm_contig_grow_cache() call. =A0Make sense? >>>>> >>>>> >>>> >>>> Is this to move the vm_contig_grow_cache() out of the >>>> pmap_ptpgzone_allocf() and into the function calling uma_zalloc()? =A0= I >>>> am wondering why the prime number is needed. >>>> >>>> Another implementation I had thought about was to have a kernel >>>> process maintain a pool of direct mapped pages for MIPS. This will be >>>> woken up if the pool goes below a low water mark - any comments on >>>> this? >>>> >>>> >>>> >>> >>> Here is how the page table page allocation should be done. =A0It's not = much >>> harder than the vm_contig_grow_cache() change. >>> >>> 1. Look at vm/vm_phys.c, in particular, vm_phys_init(). =A0Create a new >>> freelist, like =A0 =A0 =A0VM_FREELIST_ISADMA, for the direct-mapped mem= ory on >>> MIPS. =A0Perhaps, call it VM_FREELIST_LOWMEM. =A0The controlling macros >>> should >>> be defined in mips/include/vmparam.h. >>> >>> 2. Trivially refactor vm_phys_alloc_pages(). =A0Specifically, take the = body >>> of >>> the outer for loop and make it a new function, vm_phys_alloc_freelist()= , >>> =A0that takes the variable "flind" as a parameter. >>> >>> 3. Eliminate the UMA zone for page table pages. =A0In place of >>> vm_phys_alloc_contig() call your new function with VM_FREELIST_LOWMEM. >>> =A0(The >>> vm_contig_grow_cache() remains unchanged.) =A0Go back to calling >>> vm_page_free() to release page table pages. >>> >>> For the record, the changes that you made to start using a zone for pag= e >>> table page allocation introduced at least one possible race condition >>> between pmap_remove_pages() and pmap_remove_all(). =A0Specifically, by >>> dropping and reacquiring the page queues and pmap lock when you free a >>> page >>> table page, you allow a pmap_remove_all() call while pmap_remove_pages(= ) >>> is >>> running to leave the variable "npv" in pmap_remove_pages() pointing at = a >>> freed pv entry. >>> >> >> My first attempt is attached, it comes up multiuser but crashes if I >> stress it a bit (panic: Bad link elm 0xc0078f00 prev->next !=3D elm). >> Will look at this tomorrow, and see if I can find the cause. >> >> > > Where exactly is this occurring? > >> In the meantime, it may be worth looking at the patch to see if this >> is in line with the approach you had suggested. I have tried to =A0use >> VM_FREELIST_HIGHMEM which is already there, instead of adding >> VM_FREELIST_LOWMEM. >> >> > > Basically, yes. =A0At first, I was taken aback by defining VM_FREELIST_DE= FAULT > as other than zero, but it shouldn't break anything. I can change this to add VM_FREELIST_LOWMEM. This works okay so far, even though I haven't really checked the code to see if it works when there is no memory in HIGHMEM. > The one issue that I see with the patch is that you'll need to release an= d > reacquire the pmap and page queues locks around the call to > vm_contig_grow_cache(). =A0However, if you release these locks, you need = to > restart the pmap_allocpte(), i.e., perform the "goto retry;", because whi= le > you slept another processor/thread might have already allocated and > installed the page table page that you are trying to allocate into the pa= ge > table. > > For what it's worth, this same race condition exists in the current code = in > subversion using an UMA zone, but not in the code that predates use of an > UMA zone. I have been testing with an updated version of the patch - my -j128 buildworld works on this. The sys/vm side changes are simple enough, but the he page table page allocation part is still not complete. The version I have now is (with gmail whitespace damage): --- static vm_page_t pmap_alloc_pte_page(pmap_t pmap, unsigned int index, int wait, vm_offset_t = *vap) { =A0 =A0 =A0 =A0vm_paddr_t paddr; =A0 =A0 =A0 =A0vm_page_t m; =A0 =A0 =A0 =A0int tries, flags; =A0 =A0 =A0 =A0tries =3D 0; retry: =A0 =A0 =A0 =A0mtx_lock(&vm_page_queue_free_mtx); =A0 =A0 =A0 =A0m =3D vm_phys_alloc_freelist_pages(VM_FREELIST_DEFAULT, =A0 =A0 =A0 =A0 =A0 =A0VM_FREEPOOL_DEFAULT, 0); =A0 =A0 =A0 =A0if (m =3D=3D NULL) { =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0mtx_unlock(&vm_page_queue_free_mtx); =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0if (tries < ((wait & M_NOWAIT) !=3D 0 ? 1 : = 3)) { =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0vm_contig_grow_cache(tries, = 0, MIPS_KSEG0_LARGEST_PHYS); =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0tries++; =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0goto retry; =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0} else { =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0printf("[%d] %s wait %x, fai= l!\n", tries, __func__, =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0wait); =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0return (NULL); =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0} =A0 =A0 =A0 =A0} =A0 =A0 =A0 =A0m->pindex =3D index; =A0 =A0 =A0 =A0m->valid =3D VM_PAGE_BITS_ALL; =A0 =A0 =A0 =A0atomic_add_int(&cnt.v_wire_count, 1); =A0 =A0 =A0 =A0m->wire_count =3D 1; =A0 =A0 =A0 =A0if (m->flags & PG_ZERO) =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0vm_page_zero_count--; =A0 =A0 =A0 =A0else =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0pmap_zero_page(m); =A0 =A0 =A0 =A0flags =3D PG_ZERO | PG_UNMANAGED; =A0 =A0 =A0 =A0m->flags =3D flags; =A0 =A0 =A0 =A0m->oflags =3D 0; =A0 =A0 =A0 =A0m->act_count =3D 0; =A0 =A0 =A0 =A0if ((m->flags & PG_CACHED) !=3D 0) =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0printf("Not yet! handle cached =A0page %p fl= ags 0x%x\n", m, m->flags); =A0 =A0 =A0 =A0else =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0cnt.v_free_count--; =A0 =A0 =A0 =A0m->act_count =3D 0; =A0 =A0 =A0 =A0mtx_unlock(&vm_page_queue_free_mtx); } ---- I tried to match the vm_page_alloc() code here. I'm not sure if I have to handle the PG_CACHED case, and if it is okay to change the VM counters here. Also, I will change the code to return NULL all the way and retry allocation (similar to the old code which called VM_WAIT and then returned NULL). Probably I should get the whole code in line with the old code, replacing the old vm_page_alloc() call with the function above without retry, and the VM_WAIT with a new function which does the vm_contig_grow_cache(). Thanks, JC.