Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 07 Jun 2010 13:49:36 -0500
From:      Alan Cox <alc@cs.rice.edu>
To:        "C. Jayachandran" <c.jayachandran@gmail.com>
Cc:        "Jayachandran C." <jchandra@freebsd.org>, Konstantin Belousov <kostikbel@gmail.com>, Alan Cox <alc@cs.rice.edu>, mips@freebsd.org
Subject:   Re: svn commit: r208589 - head/sys/mips/mips
Message-ID:  <4C0D3F40.2070101@cs.rice.edu>
In-Reply-To: <AANLkTikZxx_30H9geHvZYkYd0sE-wiuZljEd0PAi14ca@mail.gmail.com>
References:  <201005271005.o4RA5eVu032269@svn.freebsd.org>	<4C058145.3000707@cs.rice.edu>	<AANLkTil955Ek-a3tek4Ony9NqrK5l2j7lNA4baRVPBzb@mail.gmail.com>	<4C05F9BC.40409@cs.rice.edu>	<AANLkTikdnvXsTwm8onl3MZPQmVfV-2GovB9--KMNnMgC@mail.gmail.com>	<4C0731A5.7090301@cs.rice.edu>	<AANLkTimIa3jmBPMhWIOcY6DenGpZ2ZYmqwDTWspVx0-u@mail.gmail.com>	<AANLkTil2gE1niUWCHnsTlQvibhxBh7QYwD0TTWo0rj5c@mail.gmail.com>	<AANLkTinA2D5iTDGPbflHVzLyAZW-ZewjJkUWWL8FVskr@mail.gmail.com>	<4C07E07B.9060802@cs.rice.edu>	<AANLkTimjyPc_AXKP1yaJaF1BN7CAGBeNikVzcp9OCb4P@mail.gmail.com>	<4C09345F.9040300@cs.rice.edu>	<AANLkTinmFOZY3OlaoKStxlNIRBt2G2I4ILkQ1P0CjozG@mail.gmail.com>	<4C0D2BEA.6060103@cs.rice.edu> <AANLkTikZxx_30H9geHvZYkYd0sE-wiuZljEd0PAi14ca@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
C. Jayachandran wrote:
> On Mon, Jun 7, 2010 at 10:57 PM, Alan Cox <alc@cs.rice.edu> wrote:
>   
>> C. Jayachandran wrote:
>>     
>>> On Fri, Jun 4, 2010 at 10:44 PM, Alan Cox <alc@cs.rice.edu> wrote:
>>>
>>>       
>>>> C. Jayachandran wrote:
>>>>
>>>>         
>>>>> On Thu, Jun 3, 2010 at 10:33 PM, Alan Cox <alc@cs.rice.edu> wrote:
>>>>>
>>>>>
>>>>>           
>>>> [snip]
>>>>
>>>>         
>>>>>> Add a static counter to pmap_ptpgzone_allocf().  When the counter
>>>>>> reaches
>>>>>> some not-too-small prime number, don't call vm_phys_alloc_contig(),
>>>>>> instead
>>>>>> set "m" to NULL and reset the counter to zero.  Setting "m" to NULL
>>>>>> should
>>>>>> then trigger the vm_contig_grow_cache() call.  Make sense?
>>>>>>
>>>>>>
>>>>>>             
>>>>> Is this to move the vm_contig_grow_cache() out of the
>>>>> pmap_ptpgzone_allocf() and into the function calling uma_zalloc()?  I
>>>>> am wondering why the prime number is needed.
>>>>>
>>>>> Another implementation I had thought about was to have a kernel
>>>>> process maintain a pool of direct mapped pages for MIPS. This will be
>>>>> woken up if the pool goes below a low water mark - any comments on
>>>>> this?
>>>>>
>>>>>
>>>>>
>>>>>           
>>>> Here is how the page table page allocation should be done.  It's not much
>>>> harder than the vm_contig_grow_cache() change.
>>>>
>>>> 1. Look at vm/vm_phys.c, in particular, vm_phys_init().  Create a new
>>>> freelist, like      VM_FREELIST_ISADMA, for the direct-mapped memory on
>>>> MIPS.  Perhaps, call it VM_FREELIST_LOWMEM.  The controlling macros
>>>> should
>>>> be defined in mips/include/vmparam.h.
>>>>
>>>> 2. Trivially refactor vm_phys_alloc_pages().  Specifically, take the body
>>>> of
>>>> the outer for loop and make it a new function, vm_phys_alloc_freelist(),
>>>>  that takes the variable "flind" as a parameter.
>>>>
>>>> 3. Eliminate the UMA zone for page table pages.  In place of
>>>> vm_phys_alloc_contig() call your new function with VM_FREELIST_LOWMEM.
>>>>  (The
>>>> vm_contig_grow_cache() remains unchanged.)  Go back to calling
>>>> vm_page_free() to release page table pages.
>>>>
>>>> For the record, the changes that you made to start using a zone for page
>>>> table page allocation introduced at least one possible race condition
>>>> between pmap_remove_pages() and pmap_remove_all().  Specifically, by
>>>> dropping and reacquiring the page queues and pmap lock when you free a
>>>> page
>>>> table page, you allow a pmap_remove_all() call while pmap_remove_pages()
>>>> is
>>>> running to leave the variable "npv" in pmap_remove_pages() pointing at a
>>>> freed pv entry.
>>>>
>>>>         
>>> My first attempt is attached, it comes up multiuser but crashes if I
>>> stress it a bit (panic: Bad link elm 0xc0078f00 prev->next != elm).
>>> Will look at this tomorrow, and see if I can find the cause.
>>>
>>>
>>>       
>> Where exactly is this occurring?
>>
>>     
>>> In the meantime, it may be worth looking at the patch to see if this
>>> is in line with the approach you had suggested. I have tried to  use
>>> VM_FREELIST_HIGHMEM which is already there, instead of adding
>>> VM_FREELIST_LOWMEM.
>>>
>>>
>>>       
>> Basically, yes.  At first, I was taken aback by defining VM_FREELIST_DEFAULT
>> as other than zero, but it shouldn't break anything.
>>     
>
> I can change this to add VM_FREELIST_LOWMEM. This works okay so far,
> even though I haven't really checked the code to see if it works when
> there is no memory in HIGHMEM.
>
>   

I think that it will.  It will just waste a little time looking at empty 
queues.

>> The one issue that I see with the patch is that you'll need to release and
>> reacquire the pmap and page queues locks around the call to
>> vm_contig_grow_cache().  However, if you release these locks, you need to
>> restart the pmap_allocpte(), i.e., perform the "goto retry;", because while
>> you slept another processor/thread might have already allocated and
>> installed the page table page that you are trying to allocate into the page
>> table.
>>
>> For what it's worth, this same race condition exists in the current code in
>> subversion using an UMA zone, but not in the code that predates use of an
>> UMA zone.
>>     
>
> I have been testing with an updated version of the patch - my -j128
> buildworld works on this. The sys/vm side changes are simple enough,
> but the he page table page allocation part is still not complete. The
> version I have now is (with gmail whitespace damage):
>
> ---
> static vm_page_t
> pmap_alloc_pte_page(pmap_t pmap, unsigned int index, int wait, vm_offset_t *vap)
> {
>        vm_paddr_t paddr;
>        vm_page_t m;
>        int tries, flags;
>
>        tries = 0;
> retry:
>        mtx_lock(&vm_page_queue_free_mtx);
>        m = vm_phys_alloc_freelist_pages(VM_FREELIST_DEFAULT,
>            VM_FREEPOOL_DEFAULT, 0);
>        if (m == NULL) {
>                mtx_unlock(&vm_page_queue_free_mtx);
>                if (tries < ((wait & M_NOWAIT) != 0 ? 1 : 3)) {
>                        vm_contig_grow_cache(tries, 0, MIPS_KSEG0_LARGEST_PHYS);
>                        tries++;
>                        goto retry;
>                } else {
>                        printf("[%d] %s wait %x, fail!\n", tries, __func__,
>                            wait);
>                        return (NULL);
>                }
>        }
>
>        m->pindex = index;
>        m->valid = VM_PAGE_BITS_ALL;
>        atomic_add_int(&cnt.v_wire_count, 1);
>        m->wire_count = 1;
>        if (m->flags & PG_ZERO)
>                vm_page_zero_count--;
>        else
>                pmap_zero_page(m);
>        flags = PG_ZERO | PG_UNMANAGED;
>        m->flags = flags;
>        m->oflags = 0;
>        m->act_count = 0;
>        if ((m->flags & PG_CACHED) != 0)
>                printf("Not yet! handle cached  page %p flags 0x%x\n",
> m, m->flags);
>        else
>                cnt.v_free_count--;
>
>        m->act_count = 0;
>        mtx_unlock(&vm_page_queue_free_mtx);
> }
>
>
> ----
>
> I tried to match the vm_page_alloc() code here. I'm not sure if I have
> to handle the PG_CACHED case, and if it is okay to change the VM
> counters here.
>
>   

You do.  In the end, I suspect that this function will be added to the 
machine-independent layer.

> Also, I will change the code to  return NULL all the way and retry
> allocation (similar to the old code which called VM_WAIT and then
> returned NULL).
>
> Probably I should get the whole code in line with the old code,
> replacing the old vm_page_alloc() call with the function above without
> retry, and the VM_WAIT with a new function which does the
> vm_contig_grow_cache().
>
>   

You may find it easier if you move the call to vm_contig_grow_cache() to 
pmap_allocpte().

Alan




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4C0D3F40.2070101>