From owner-freebsd-mips@FreeBSD.ORG Mon Jun 7 20:46:36 2010 Return-Path: Delivered-To: mips@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E16E4106570A for ; Mon, 7 Jun 2010 20:46:36 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from mail.zoral.com.ua (mx0.zoral.com.ua [91.193.166.200]) by mx1.freebsd.org (Postfix) with ESMTP id 47C198FC14 for ; Mon, 7 Jun 2010 20:46:35 +0000 (UTC) Received: from deviant.kiev.zoral.com.ua (root@deviant.kiev.zoral.com.ua [10.1.1.148]) by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id o57KSwgI095528 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Mon, 7 Jun 2010 23:28:58 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1]) by deviant.kiev.zoral.com.ua (8.14.4/8.14.4) with ESMTP id o57KSiEH084484; Mon, 7 Jun 2010 23:28:44 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: (from kostik@localhost) by deviant.kiev.zoral.com.ua (8.14.4/8.14.4/Submit) id o57KSiet084483; Mon, 7 Jun 2010 23:28:44 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to kostikbel@gmail.com using -f Date: Mon, 7 Jun 2010 23:28:44 +0300 From: Kostik Belousov To: Alan Cox Message-ID: <20100607202844.GU83316@deviant.kiev.zoral.com.ua> References: <4C07E07B.9060802@cs.rice.edu> <4C09345F.9040300@cs.rice.edu> <4C0D2BEA.6060103@cs.rice.edu> <4C0D3F40.2070101@cs.rice.edu> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="eM1qf7WYQBf+P1UJ" Content-Disposition: inline In-Reply-To: <4C0D3F40.2070101@cs.rice.edu> User-Agent: Mutt/1.4.2.3i X-Virus-Scanned: clamav-milter 0.95.2 at skuns.kiev.zoral.com.ua X-Virus-Status: Clean X-Spam-Status: No, score=-2.3 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_50, DNS_FROM_OPENWHOIS autolearn=no version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on skuns.kiev.zoral.com.ua Cc: "Jayachandran C." , mips@freebsd.org Subject: Re: svn commit: r208589 - head/sys/mips/mips X-BeenThere: freebsd-mips@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Porting FreeBSD to MIPS List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 07 Jun 2010 20:46:37 -0000 --eM1qf7WYQBf+P1UJ Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Mon, Jun 07, 2010 at 01:49:36PM -0500, Alan Cox wrote: > C. Jayachandran wrote: > >On Mon, Jun 7, 2010 at 10:57 PM, Alan Cox wrote: > > =20 > >>C. Jayachandran wrote: > >> =20 > >>>On Fri, Jun 4, 2010 at 10:44 PM, Alan Cox wrote: > >>> > >>> =20 > >>>>C. Jayachandran wrote: > >>>> > >>>> =20 > >>>>>On Thu, Jun 3, 2010 at 10:33 PM, Alan Cox wrote: > >>>>> > >>>>> > >>>>> =20 > >>>>[snip] > >>>> > >>>> =20 > >>>>>>Add a static counter to pmap_ptpgzone_allocf(). When the counter > >>>>>>reaches > >>>>>>some not-too-small prime number, don't call vm_phys_alloc_contig(), > >>>>>>instead > >>>>>>set "m" to NULL and reset the counter to zero. Setting "m" to NULL > >>>>>>should > >>>>>>then trigger the vm_contig_grow_cache() call. Make sense? > >>>>>> > >>>>>> > >>>>>> =20 > >>>>>Is this to move the vm_contig_grow_cache() out of the > >>>>>pmap_ptpgzone_allocf() and into the function calling uma_zalloc()? I > >>>>>am wondering why the prime number is needed. > >>>>> > >>>>>Another implementation I had thought about was to have a kernel > >>>>>process maintain a pool of direct mapped pages for MIPS. This will be > >>>>>woken up if the pool goes below a low water mark - any comments on > >>>>>this? > >>>>> > >>>>> > >>>>> > >>>>> =20 > >>>>Here is how the page table page allocation should be done. It's not= =20 > >>>>much > >>>>harder than the vm_contig_grow_cache() change. > >>>> > >>>>1. Look at vm/vm_phys.c, in particular, vm_phys_init(). Create a new > >>>>freelist, like VM_FREELIST_ISADMA, for the direct-mapped memory = on > >>>>MIPS. Perhaps, call it VM_FREELIST_LOWMEM. The controlling macros > >>>>should > >>>>be defined in mips/include/vmparam.h. > >>>> > >>>>2. Trivially refactor vm_phys_alloc_pages(). Specifically, take the= =20 > >>>>body > >>>>of > >>>>the outer for loop and make it a new function, vm_phys_alloc_freelist= (), > >>>> that takes the variable "flind" as a parameter. > >>>> > >>>>3. Eliminate the UMA zone for page table pages. In place of > >>>>vm_phys_alloc_contig() call your new function with VM_FREELIST_LOWMEM. > >>>> (The > >>>>vm_contig_grow_cache() remains unchanged.) Go back to calling > >>>>vm_page_free() to release page table pages. > >>>> > >>>>For the record, the changes that you made to start using a zone for p= age > >>>>table page allocation introduced at least one possible race condition > >>>>between pmap_remove_pages() and pmap_remove_all(). Specifically, by > >>>>dropping and reacquiring the page queues and pmap lock when you free a > >>>>page > >>>>table page, you allow a pmap_remove_all() call while pmap_remove_page= s() > >>>>is > >>>>running to leave the variable "npv" in pmap_remove_pages() pointing a= t a > >>>>freed pv entry. > >>>> > >>>> =20 > >>>My first attempt is attached, it comes up multiuser but crashes if I > >>>stress it a bit (panic: Bad link elm 0xc0078f00 prev->next !=3D elm). > >>>Will look at this tomorrow, and see if I can find the cause. > >>> > >>> > >>> =20 > >>Where exactly is this occurring? > >> > >> =20 > >>>In the meantime, it may be worth looking at the patch to see if this > >>>is in line with the approach you had suggested. I have tried to use > >>>VM_FREELIST_HIGHMEM which is already there, instead of adding > >>>VM_FREELIST_LOWMEM. > >>> > >>> > >>> =20 > >>Basically, yes. At first, I was taken aback by defining=20 > >>VM_FREELIST_DEFAULT > >>as other than zero, but it shouldn't break anything. > >> =20 > > > >I can change this to add VM_FREELIST_LOWMEM. This works okay so far, > >even though I haven't really checked the code to see if it works when > >there is no memory in HIGHMEM. > > > > =20 >=20 > I think that it will. It will just waste a little time looking at empty= =20 > queues. >=20 > >>The one issue that I see with the patch is that you'll need to release = and > >>reacquire the pmap and page queues locks around the call to > >>vm_contig_grow_cache(). However, if you release these locks, you need = to > >>restart the pmap_allocpte(), i.e., perform the "goto retry;", because= =20 > >>while > >>you slept another processor/thread might have already allocated and > >>installed the page table page that you are trying to allocate into the= =20 > >>page > >>table. > >> > >>For what it's worth, this same race condition exists in the current cod= e=20 > >>in > >>subversion using an UMA zone, but not in the code that predates use of = an > >>UMA zone. > >> =20 > > > >I have been testing with an updated version of the patch - my -j128 > >buildworld works on this. The sys/vm side changes are simple enough, > >but the he page table page allocation part is still not complete. The > >version I have now is (with gmail whitespace damage): > > > >--- > >static vm_page_t > >pmap_alloc_pte_page(pmap_t pmap, unsigned int index, int wait, vm_offset= _t=20 > >*vap) > >{ > > vm_paddr_t paddr; > > vm_page_t m; > > int tries, flags; > > > > tries =3D 0; > >retry: > > mtx_lock(&vm_page_queue_free_mtx); > > m =3D vm_phys_alloc_freelist_pages(VM_FREELIST_DEFAULT, > > VM_FREEPOOL_DEFAULT, 0); > > if (m =3D=3D NULL) { > > mtx_unlock(&vm_page_queue_free_mtx); > > if (tries < ((wait & M_NOWAIT) !=3D 0 ? 1 : 3)) { > > vm_contig_grow_cache(tries, 0,=20 > > MIPS_KSEG0_LARGEST_PHYS); > > tries++; > > goto retry; > > } else { > > printf("[%d] %s wait %x, fail!\n", tries, __func_= _, > > wait); > > return (NULL); > > } > > } > > > > m->pindex =3D index; > > m->valid =3D VM_PAGE_BITS_ALL; > > atomic_add_int(&cnt.v_wire_count, 1); > > m->wire_count =3D 1; > > if (m->flags & PG_ZERO) > > vm_page_zero_count--; > > else > > pmap_zero_page(m); > > flags =3D PG_ZERO | PG_UNMANAGED; > > m->flags =3D flags; > > m->oflags =3D 0; > > m->act_count =3D 0; > > if ((m->flags & PG_CACHED) !=3D 0) > > printf("Not yet! handle cached page %p flags 0x%x\n", > >m, m->flags); > > else > > cnt.v_free_count--; > > > > m->act_count =3D 0; > > mtx_unlock(&vm_page_queue_free_mtx); > >} > > > > > >---- > > > >I tried to match the vm_page_alloc() code here. I'm not sure if I have > >to handle the PG_CACHED case, and if it is okay to change the VM > >counters here. > > > > =20 >=20 > You do. In the end, I suspect that this function will be added to the=20 > machine-independent layer. >=20 > >Also, I will change the code to return NULL all the way and retry > >allocation (similar to the old code which called VM_WAIT and then > >returned NULL). > > > >Probably I should get the whole code in line with the old code, > >replacing the old vm_page_alloc() call with the function above without > >retry, and the VM_WAIT with a new function which does the > >vm_contig_grow_cache(). > > > > =20 >=20 > You may find it easier if you move the call to vm_contig_grow_cache() to= =20 > pmap_allocpte(). Selecting a random message in the thread to ask my question. Is the issue that page table pages should be allocated from the specific physical region of the memory ? If yes, doesn't i386 PAE has similar issue with page directory pointer table ? I see a KASSERT in i386 pmap that verifies that the allocated table is below 4G, but I do not understand how uma ensures the constraint (I suspect that it does not). --eM1qf7WYQBf+P1UJ Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (FreeBSD) iEYEARECAAYFAkwNVnsACgkQC3+MBN1Mb4gM3QCdHNkTwnwGd8Yb3M3XdZDpUtHb ACgAn0DBpuImieTVCYtqMtGuZO0HWiha =Ji95 -----END PGP SIGNATURE----- --eM1qf7WYQBf+P1UJ--