Date: Mon, 7 Jun 2010 23:28:44 +0300 From: Kostik Belousov <kostikbel@gmail.com> To: Alan Cox <alc@cs.rice.edu> Cc: "Jayachandran C." <jchandra@freebsd.org>, mips@freebsd.org Subject: Re: svn commit: r208589 - head/sys/mips/mips Message-ID: <20100607202844.GU83316@deviant.kiev.zoral.com.ua> In-Reply-To: <4C0D3F40.2070101@cs.rice.edu> References: <AANLkTimIa3jmBPMhWIOcY6DenGpZ2ZYmqwDTWspVx0-u@mail.gmail.com> <AANLkTil2gE1niUWCHnsTlQvibhxBh7QYwD0TTWo0rj5c@mail.gmail.com> <AANLkTinA2D5iTDGPbflHVzLyAZW-ZewjJkUWWL8FVskr@mail.gmail.com> <4C07E07B.9060802@cs.rice.edu> <AANLkTimjyPc_AXKP1yaJaF1BN7CAGBeNikVzcp9OCb4P@mail.gmail.com> <4C09345F.9040300@cs.rice.edu> <AANLkTinmFOZY3OlaoKStxlNIRBt2G2I4ILkQ1P0CjozG@mail.gmail.com> <4C0D2BEA.6060103@cs.rice.edu> <AANLkTikZxx_30H9geHvZYkYd0sE-wiuZljEd0PAi14ca@mail.gmail.com> <4C0D3F40.2070101@cs.rice.edu>
next in thread | previous in thread | raw e-mail | index | archive | help
--eM1qf7WYQBf+P1UJ Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Mon, Jun 07, 2010 at 01:49:36PM -0500, Alan Cox wrote: > C. Jayachandran wrote: > >On Mon, Jun 7, 2010 at 10:57 PM, Alan Cox <alc@cs.rice.edu> wrote: > > =20 > >>C. Jayachandran wrote: > >> =20 > >>>On Fri, Jun 4, 2010 at 10:44 PM, Alan Cox <alc@cs.rice.edu> wrote: > >>> > >>> =20 > >>>>C. Jayachandran wrote: > >>>> > >>>> =20 > >>>>>On Thu, Jun 3, 2010 at 10:33 PM, Alan Cox <alc@cs.rice.edu> wrote: > >>>>> > >>>>> > >>>>> =20 > >>>>[snip] > >>>> > >>>> =20 > >>>>>>Add a static counter to pmap_ptpgzone_allocf(). When the counter > >>>>>>reaches > >>>>>>some not-too-small prime number, don't call vm_phys_alloc_contig(), > >>>>>>instead > >>>>>>set "m" to NULL and reset the counter to zero. Setting "m" to NULL > >>>>>>should > >>>>>>then trigger the vm_contig_grow_cache() call. Make sense? > >>>>>> > >>>>>> > >>>>>> =20 > >>>>>Is this to move the vm_contig_grow_cache() out of the > >>>>>pmap_ptpgzone_allocf() and into the function calling uma_zalloc()? I > >>>>>am wondering why the prime number is needed. > >>>>> > >>>>>Another implementation I had thought about was to have a kernel > >>>>>process maintain a pool of direct mapped pages for MIPS. This will be > >>>>>woken up if the pool goes below a low water mark - any comments on > >>>>>this? > >>>>> > >>>>> > >>>>> > >>>>> =20 > >>>>Here is how the page table page allocation should be done. It's not= =20 > >>>>much > >>>>harder than the vm_contig_grow_cache() change. > >>>> > >>>>1. Look at vm/vm_phys.c, in particular, vm_phys_init(). Create a new > >>>>freelist, like VM_FREELIST_ISADMA, for the direct-mapped memory = on > >>>>MIPS. Perhaps, call it VM_FREELIST_LOWMEM. The controlling macros > >>>>should > >>>>be defined in mips/include/vmparam.h. > >>>> > >>>>2. Trivially refactor vm_phys_alloc_pages(). Specifically, take the= =20 > >>>>body > >>>>of > >>>>the outer for loop and make it a new function, vm_phys_alloc_freelist= (), > >>>> that takes the variable "flind" as a parameter. > >>>> > >>>>3. Eliminate the UMA zone for page table pages. In place of > >>>>vm_phys_alloc_contig() call your new function with VM_FREELIST_LOWMEM. > >>>> (The > >>>>vm_contig_grow_cache() remains unchanged.) Go back to calling > >>>>vm_page_free() to release page table pages. > >>>> > >>>>For the record, the changes that you made to start using a zone for p= age > >>>>table page allocation introduced at least one possible race condition > >>>>between pmap_remove_pages() and pmap_remove_all(). Specifically, by > >>>>dropping and reacquiring the page queues and pmap lock when you free a > >>>>page > >>>>table page, you allow a pmap_remove_all() call while pmap_remove_page= s() > >>>>is > >>>>running to leave the variable "npv" in pmap_remove_pages() pointing a= t a > >>>>freed pv entry. > >>>> > >>>> =20 > >>>My first attempt is attached, it comes up multiuser but crashes if I > >>>stress it a bit (panic: Bad link elm 0xc0078f00 prev->next !=3D elm). > >>>Will look at this tomorrow, and see if I can find the cause. > >>> > >>> > >>> =20 > >>Where exactly is this occurring? > >> > >> =20 > >>>In the meantime, it may be worth looking at the patch to see if this > >>>is in line with the approach you had suggested. I have tried to use > >>>VM_FREELIST_HIGHMEM which is already there, instead of adding > >>>VM_FREELIST_LOWMEM. > >>> > >>> > >>> =20 > >>Basically, yes. At first, I was taken aback by defining=20 > >>VM_FREELIST_DEFAULT > >>as other than zero, but it shouldn't break anything. > >> =20 > > > >I can change this to add VM_FREELIST_LOWMEM. This works okay so far, > >even though I haven't really checked the code to see if it works when > >there is no memory in HIGHMEM. > > > > =20 >=20 > I think that it will. It will just waste a little time looking at empty= =20 > queues. >=20 > >>The one issue that I see with the patch is that you'll need to release = and > >>reacquire the pmap and page queues locks around the call to > >>vm_contig_grow_cache(). However, if you release these locks, you need = to > >>restart the pmap_allocpte(), i.e., perform the "goto retry;", because= =20 > >>while > >>you slept another processor/thread might have already allocated and > >>installed the page table page that you are trying to allocate into the= =20 > >>page > >>table. > >> > >>For what it's worth, this same race condition exists in the current cod= e=20 > >>in > >>subversion using an UMA zone, but not in the code that predates use of = an > >>UMA zone. > >> =20 > > > >I have been testing with an updated version of the patch - my -j128 > >buildworld works on this. The sys/vm side changes are simple enough, > >but the he page table page allocation part is still not complete. The > >version I have now is (with gmail whitespace damage): > > > >--- > >static vm_page_t > >pmap_alloc_pte_page(pmap_t pmap, unsigned int index, int wait, vm_offset= _t=20 > >*vap) > >{ > > vm_paddr_t paddr; > > vm_page_t m; > > int tries, flags; > > > > tries =3D 0; > >retry: > > mtx_lock(&vm_page_queue_free_mtx); > > m =3D vm_phys_alloc_freelist_pages(VM_FREELIST_DEFAULT, > > VM_FREEPOOL_DEFAULT, 0); > > if (m =3D=3D NULL) { > > mtx_unlock(&vm_page_queue_free_mtx); > > if (tries < ((wait & M_NOWAIT) !=3D 0 ? 1 : 3)) { > > vm_contig_grow_cache(tries, 0,=20 > > MIPS_KSEG0_LARGEST_PHYS); > > tries++; > > goto retry; > > } else { > > printf("[%d] %s wait %x, fail!\n", tries, __func_= _, > > wait); > > return (NULL); > > } > > } > > > > m->pindex =3D index; > > m->valid =3D VM_PAGE_BITS_ALL; > > atomic_add_int(&cnt.v_wire_count, 1); > > m->wire_count =3D 1; > > if (m->flags & PG_ZERO) > > vm_page_zero_count--; > > else > > pmap_zero_page(m); > > flags =3D PG_ZERO | PG_UNMANAGED; > > m->flags =3D flags; > > m->oflags =3D 0; > > m->act_count =3D 0; > > if ((m->flags & PG_CACHED) !=3D 0) > > printf("Not yet! handle cached page %p flags 0x%x\n", > >m, m->flags); > > else > > cnt.v_free_count--; > > > > m->act_count =3D 0; > > mtx_unlock(&vm_page_queue_free_mtx); > >} > > > > > >---- > > > >I tried to match the vm_page_alloc() code here. I'm not sure if I have > >to handle the PG_CACHED case, and if it is okay to change the VM > >counters here. > > > > =20 >=20 > You do. In the end, I suspect that this function will be added to the=20 > machine-independent layer. >=20 > >Also, I will change the code to return NULL all the way and retry > >allocation (similar to the old code which called VM_WAIT and then > >returned NULL). > > > >Probably I should get the whole code in line with the old code, > >replacing the old vm_page_alloc() call with the function above without > >retry, and the VM_WAIT with a new function which does the > >vm_contig_grow_cache(). > > > > =20 >=20 > You may find it easier if you move the call to vm_contig_grow_cache() to= =20 > pmap_allocpte(). Selecting a random message in the thread to ask my question. Is the issue that page table pages should be allocated from the specific physical region of the memory ? If yes, doesn't i386 PAE has similar issue with page directory pointer table ? I see a KASSERT in i386 pmap that verifies that the allocated table is below 4G, but I do not understand how uma ensures the constraint (I suspect that it does not). --eM1qf7WYQBf+P1UJ Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (FreeBSD) iEYEARECAAYFAkwNVnsACgkQC3+MBN1Mb4gM3QCdHNkTwnwGd8Yb3M3XdZDpUtHb ACgAn0DBpuImieTVCYtqMtGuZO0HWiha =Ji95 -----END PGP SIGNATURE----- --eM1qf7WYQBf+P1UJ--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20100607202844.GU83316>