From owner-freebsd-mips@FreeBSD.ORG Tue Jun 8 21:31:14 2010 Return-Path: Delivered-To: mips@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 1F1BF1065689; Tue, 8 Jun 2010 21:31:14 +0000 (UTC) (envelope-from c.jayachandran@gmail.com) Received: from mail-pv0-f182.google.com (mail-pv0-f182.google.com [74.125.83.182]) by mx1.freebsd.org (Postfix) with ESMTP id D64BB8FC21; Tue, 8 Jun 2010 21:31:13 +0000 (UTC) Received: by pvb32 with SMTP id 32so114359pvb.13 for ; Tue, 08 Jun 2010 14:31:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=UgCjRjZuaQR5dxbAkQreZjwubz2NqqHOpk+ZsRESzmw=; b=AxRPF5a8QwsZHBq7Bn+iB7HSRNqDKJllxu0wmGxIgJBWX8FxnMUZiIIBoRmgldFgpN KPYeYMiB61/hFHIkaII9CryIIbFU1bMFXG1fvohUt0ph+HzuycRkUINcZmscFPQq9dHs xM+Ul2N3HWtUeIYHa7TAKcSZ8HDoO9euEaEoE= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; b=CcgcMAsAyy14OPN/aOAfqdTM+cBMNRAtdtXLNGgXE711Tj4auX40z2PDW5gDgYhHTD nWLkOv0Vun85zkeO+CS8P6AZmuyiujYFgfi4p3RDiJYJpZ5eUazeC+ODUJgfC35XpIKM PDPYnneAhu4nZ8tlbccJCzCl8L+0qq1eUlLYM= MIME-Version: 1.0 Received: by 10.229.186.211 with SMTP id ct19mr3847153qcb.206.1276032672738; Tue, 08 Jun 2010 14:31:12 -0700 (PDT) Received: by 10.220.189.13 with HTTP; Tue, 8 Jun 2010 14:31:12 -0700 (PDT) In-Reply-To: <4C0DE424.9080601@cs.rice.edu> References: <4C07E07B.9060802@cs.rice.edu> <4C09345F.9040300@cs.rice.edu> <4C0D2BEA.6060103@cs.rice.edu> <4C0D3F40.2070101@cs.rice.edu> <20100607202844.GU83316@deviant.kiev.zoral.com.ua> <4C0D64B7.7060604@cs.rice.edu> <4C0DE424.9080601@cs.rice.edu> Date: Wed, 9 Jun 2010 03:01:12 +0530 Message-ID: From: "Jayachandran C." To: Alan Cox Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Cc: Kostik Belousov , "Jayachandran C." , mips@freebsd.org Subject: Re: svn commit: r208589 - head/sys/mips/mips X-BeenThere: freebsd-mips@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Porting FreeBSD to MIPS List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 08 Jun 2010 21:31:14 -0000 On Tue, Jun 8, 2010 at 12:03 PM, Alan Cox wrote: > C. Jayachandran wrote: >> >> On Tue, Jun 8, 2010 at 2:59 AM, Alan Cox wrote: >> >>> >>> On 6/7/2010 3:28 PM, Kostik Belousov wrote: >>> >>>> >>>> Selecting a random message in the thread to ask my question. >>>> Is the issue that page table pages should be allocated from the specif= ic >>>> physical region of the memory ? If yes, doesn't i386 PAE has similar >>>> issue with page directory pointer table ? I see a KASSERT in i386 >>>> pmap that verifies that the allocated table is below 4G, but I do not >>>> understand how uma ensures the constraint (I suspect that it does not)= . >>>> >>>> >>> >>> For i386 PAE, the UMA backend allocator uses kmem_alloc_contig() to >>> ensure >>> that the memory is below 4G. =A0The crucial difference between i386 PAE= and >>> MIPS is that for i386 PAE only the top-level table needs to be below a >>> specific address threshold. =A0Moreover, this level is allocated in a >>> place, >>> pmap_pinit(), where we are allowed to sleep. >>> >> >> Yes. I saw the PAE top level page table code and thought I could use >> that mechanism for allocating MIPS page table pages in the direct >> mapped memory. The other reference I used was >> pmap_alloc_zeroed_contig_pages() function in sun4v/sun4v/pmap.c which >> uses the vm_phys_alloc_contig() and VM_WAIT. > > That's unfortunate. =A0:-( =A0Since sun4v is essentially dead code, I've = never > spent much time thinking about its pmap implementation. =A0I'll mechanica= lly > apply changes to it, but that's about it. =A0I wouldn't recommend using i= t as > a reference. > >> ... =A0I had also thought of >> using the VM_FREEPOOL_DIRECT which seemed to be for a similar purpose, >> but could find see any usage in the kernel. >> >> > > VM_FREEPOOL_DIRECT is used by at least amd64 and ia64 for page table page= s > and small kernel memory allocations. =A0Unlike mips, these machines don't= have > MMU support for a direct map. =A0Their direct maps are just a range of > mappings in the regular (kernel) page table. =A0So, unlike mips, accesses > through their direct map may still miss in the TLB and require a page tab= le > walk. =A0VM_FREEPOOL_* is a way to increase the physical locality (or > clustering) of page allocations, so that, for example, page table page > accesses by the pmap on amd64 are less likely to miss in the TLB. =A0Howe= ver, > it doesn't place a hard restriction on the range of physical addresses th= at > will be used, which you need for mips. > > The impact of this clustering can be significant. =A0For example, on amd6= 4 we > use 2MB page mappings to implement the direct map. =A0However, old Optero= ns > only had 8 data TLB entries for 2MB page mappings. =A0For a uniprocessor > kernel running on such an Opteron, I measured an 18% reduction in system > time during a buildworld with the introduction of VM_FREEPOOL_DIRECT. =A0= (See > the commit logs for vm/vm_phys.c and the comment that precedes the > VM_NFREEORDER definition on amd64.) > > Until such time as superpage support is ported to mips from the amd64/i38= 6 > pmaps, I don't think there is a point in having more than one VM_FREEPOOL= _* > on mips. =A0And then, the point would be to reduce fragmentation of the > physical memory that could be caused by small allocations, such as page > table pages. Thanks for the detailed explanation. Also, after looking at the code again, I think vm_phys_alloc_contig() can optimized not to look into segments which lie outside the area of interest. The patch is: Index: sys/vm/vm_phys.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- sys/vm/vm_phys.c (revision 208890) +++ sys/vm/vm_phys.c (working copy) @@ -595,7 +595,7 @@ vm_object_t m_object; vm_paddr_t pa, pa_last, size; vm_page_t deferred_vdrop_list, m, m_ret; - int flind, i, oind, order, pind; + int segind, i, oind, order, pind; size =3D npages << PAGE_SHIFT; KASSERT(size !=3D 0, @@ -611,21 +611,20 @@ #if VM_NRESERVLEVEL > 0 retry: #endif - for (flind =3D 0; flind < vm_nfreelists; flind++) { + for (segind =3D 0; segind < vm_phys_nsegs; segind++) { + /* + * A free list may contain physical pages + * from one or more segments. + */ + seg =3D &vm_phys_segs[segind]; + if (seg->start > high || low >=3D seg->end) + continue; + for (oind =3D min(order, VM_NFREEORDER - 1); oind < VM_NFREEORDER; oind++) { for (pind =3D 0; pind < VM_NFREEPOOL; pind++) { - fl =3D vm_phys_free_queues[flind][pind]; + fl =3D (*seg->free_queues)[pind]; TAILQ_FOREACH(m_ret, &fl[oind].pl, pageq) { /* - * A free list may contain physical pages - * from one or more segments. - */ - seg =3D &vm_phys_segs[m_ret->segind= ]; - if (seg->start > high || - low >=3D seg->end) - continue; - - /* * Is the size of this allocation request * larger than the largest block si= ze? */ ----- This change, along with the vmparam.h changes for HIGHMEM, I think we should be able to use vm_phys_alloc_contig() for page table pages (or have I again missed something fundamental?). JC.