Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 01 Oct 2012 11:57:03 -0500
From:      Alan Cox <alc@rice.edu>
To:        "Jayachandran C." <c.jayachandran@gmail.com>
Cc:        mips@freebsd.org, Alan Cox <alc@rice.edu>
Subject:   Re: optimizing TLB invalidations
Message-ID:  <5069CB5F.40100@rice.edu>
In-Reply-To: <CA%2B7sy7CdcGzpOG6ou4vE30GSvz%2BwMQ1XAaUGhuGxasC1oQm5Gw@mail.gmail.com>
References:  <505DE9D4.5010204@rice.edu> <CA%2B7sy7CdcGzpOG6ou4vE30GSvz%2BwMQ1XAaUGhuGxasC1oQm5Gw@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On 10/01/2012 11:16, Jayachandran C. wrote:
> On Sat, Sep 22, 2012 at 10:09 PM, Alan Cox<alc@rice.edu>  wrote:
>> Can you please test the attached patch?  It introduces a new TLB
>> invalidation function for efficiently invalidating address ranges and uses
>> this function in pmap_remove().
>>
>> Basically, the function looks at the size of the address range in order to
>> decide how best to perform the invalidation.  If the range is small compared
>> to the TLB size, it probes the TLB for pages in the range.  That said, the
>> function understands that pages come in pairs, and so it won't probe for odd
>> page numbers.  In contrast, the current code in pmap_remove() will probe for
>> both the even and odd page.  On the other hand, if the range is large, then
>> the function changes its approach.  It iterates over the TLB entries
>> checking each to see if it falls within the range.  This can eliminate an
>> enormous number of TLB probes when a large virtual address range is
>> unmapped.  Finally, on a multiprocessor, this change will reduce the number
>> of IPIs to invalidate TLB entries.  There will be one IPI per range rather
>> than one per page.
>>
>> Ultimately, this new function could be applied elsewhere, like
>> pmap_protect(), but that's a patch for another day.
> Tested this on my XLP 64 bit SMP config, and did not any issues.  The
> compilation test did not show much change in performance, but I think
> I need to run a multi-threaded benchmark to see the performance
> improvement.
>

Yes, I agree.  Under a compilation test, the FreeBSD malloc(3)/free(3) 
implementation will occasionally release a large chunk of memory (4MB) 
back to the kernel.  If all of that chunk was used, then we'll save 
about 900 or so TLB probes.  But, this doesn't happen very often.  Under 
a compilation workload, most of the bulk destruction of mappings happens 
in pmap_remove_pages(), not pmap_remove().

Probably the place where you'll see an easily discernible effect is when 
pmap_qremove() is modified to use the ranged TLB invalidation.  
pmap_qremove() gets used when we unmap data from the buffer cache and 
must shootdown every CPU.

Alan




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?5069CB5F.40100>