From owner-freebsd-sparc64@FreeBSD.ORG Fri Apr 11 21:40:05 2003 Return-Path: Delivered-To: freebsd-sparc64@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 4A81737B401 for ; Fri, 11 Apr 2003 21:40:05 -0700 (PDT) Received: from k6.locore.ca (k6.locore.ca [198.96.117.170]) by mx1.FreeBSD.org (Postfix) with ESMTP id 6389943F3F for ; Fri, 11 Apr 2003 21:40:04 -0700 (PDT) (envelope-from jake@k6.locore.ca) Received: from k6.locore.ca (localhost.locore.ca [127.0.0.1]) by k6.locore.ca (8.12.8/8.12.8) with ESMTP id h3C4enxS098453; Sat, 12 Apr 2003 00:40:49 -0400 (EDT) (envelope-from jake@k6.locore.ca) Received: (from jake@localhost) by k6.locore.ca (8.12.8/8.12.8/Submit) id h3C4enwR098452; Sat, 12 Apr 2003 00:40:49 -0400 (EDT) Date: Sat, 12 Apr 2003 00:40:49 -0400 From: Jake Burkholder To: Narvi Message-ID: <20030412044048.GA97094@locore.ca> References: <20030411224922.B75698-100000@haldjas.folklore.ee> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20030411224922.B75698-100000@haldjas.folklore.ee> User-Agent: Mutt/1.4i cc: freebsd-sparc64@freebsd.org Subject: Re: tlb, tsb & ...stuff X-BeenThere: freebsd-sparc64@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Porting FreeBSD to the Sparc List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 12 Apr 2003 04:40:05 -0000 Hmm, so someone else has read that code. :) Apparently, On Sat, Apr 12, 2003 at 02:12:43AM +0300, Narvi said words to the effect of; > > ok, I'm a lamer and couldn't think of a nice & spiffy subject line. > > > TLB / TSB statistics: > > Presently we only get statistics on entries being moved into TSB, with no > dtlb/itlb separation. Unless people think this is a bad idea, I'd like to > make an option that would expose dTLB/iTLB and related TSB misses as > statisics. this would allow you to get Solaris 9 style 'trapstat -t' > information. The counters would need to be per-processor. Well, the problem is the current tlb fault handlers are really tight on space in the trap table. I think the tl0_immu_miss and tl0_dmmu_prot have 0 or 1 instructions free. Incrementing counters to track dTLB misses will take 3 instructions minimum, so you'd have to do something like ifdef the handlers to just branch to code at the end of the trap table if the counters are enabled, which gets pretty ugly. You're welcome to do this and report results, but I'm not sure I want it to be committed. There are some adhoc statistics on tsb replacements with options PMAP_STATS, under sysctl debug.pmap_stats. In my experience few replacements occur unless you are using a lot of memory. Adding statistics in the page fault path sounds fine, but lower than that I'm not so sure. > > > TSB & replacement: > > >From what I gather (please correct me if I'm wrong!) the present TSB > consists of 2K entries, organised into buckets with each bucket containing > 4 entries. On replacement/entry we enter into an entry that was > empty/invalid or pick one "randomly" based on the lower digits of tick. We > try 4 times (for each page size) so up to 16 places get probed before a > miss / hit. > > Making it a 4-way random replacement software managed unified L2 tlb (with > slight oddness for multiple pages sizes). Yes, this is correct. The multiple page size stuff doesn't work as well as I'd like, and the vm system isn't setup to use it yet (this is a lot of work). I consider the current tsb implementation to be a bit of an experiment (I'd never dealt with pmap or tlb fault handlers when I started) and worth throwing out completely if we can think of something better. Its decent and fast in most cases but the fixed size of the tsb, which causes the replacements, limits the RSS that a process can have without casuing soft faults into the vm system. The kernel gets bogged down with soft faults pretty fast if you go out the RSS that fits in the tsb. You can really see this if you reduce the size of the tsb. It works well enough for current workloads but once we start supporting things like X I'm not sure that it will fly. I've been planning to replace it with something that's more like page tables and not so reliant on hashing in the same sense. The way it works is in the base case you have a 1 page direct mapped tsb (ie no buckets), indexed by the first 8 bits of virtual address above page size (call this level 0). On a miss in the tsb the tlb fault handler would check a bit in the tte which indicates that there's actually another level and the tte just loaded (the "miss" tte) contains a pointer to it. So it would restart the lookup using the new tsb page. The twist is that as you go the next level you use the next higher "page spread" virtual address bits to index the tsb pages. Basically collisions in the address bits used to index a given level cause another level to be added which is indexed by the next higher set of virtual address bits, instead of causing replacements. The lookup function for an arbitrary level looks something like this: #define TSB_MAX_LEVEL (3) #define TSB_PAGE_ADDRESS_BITS (8) static __inline struct tte * tsb_vpntotte(struct tte *tsb, vm_offset_t vpn, int level) { return (&tsb[(vpn >> (TSB_PAGE_ADDRESS_BITS * level)) & ((1 << TSB_PAGE_ADDRESS_BITS) - 1)]); } static __inline struct tte * tsb_vtotte(struct tte *tsb, vm_offset_t va, int level) { return (tsb_vpntotte(tsb, va >> TAR_VPN_SHIFT, level)); } With 3 levels this can support a 32 gigabyte virtual address space (fully resident), but doesn't penalize processes with sparse address spaces due to not requiring intermediate "page table pages" in all cases. Basically like traditional page tables but backwards :). This has an added advantage of not requiring more than 1 page of contiguous virtual or physical address space for any part of the tsb. With the current implementation you can't increase the tsb size too much because it allocates a large chunk of contiguous virtual address space and as the the kernel address space gets fragmented you start to run out. I'm not sure if you've looked at the kernel tlb fault handlers, but the same technique that's used in MIPS and alpha kernels is used to provide a direct mapped address space region, which corresponds to the upper VA hole on UltraSPARC II. What this does is maps all of physical memory into the upper portion of the address space using 4 meg tlb entries. The physical address is encoded in the virtual address, so the fault handler just needs to extract it and whip up a tlb entry on the fly. No page tables, no lookups, no nothing. This would allow the tsb pages to be mapped with the direct mapped address space, so no mappable kva would be required for the tsb. > > It would imho be interesting to support a couple of different and > selectable entry indexing policies, say at least: > > * hashed > * skew-associative > > to cater for various access patterns & tsb lookup loads. Again, if this > would be a bad idea, let me know. Its a good idea and I'd be interested in the results, but what I'm more interested in is new data structures to support virtual memory that give improvements in design or architecture, rather than heuristics such as tweaking the hashing algorithms. > > Usparc3(cu) > > What will happen there? Do we use any of the large page sizes enough to > make one of the large TLB-s cache a large(r) page size? Yes, see above about the direct mapped address space. I've read papers on generalized schemes for using large page sizes for user mappings, but I'm not sure if we'll see this anytime soon in FreeBSD in a big way. However, the 2 512 entry tlbs with programmable page sizes on USIII+ should work very well with one programmed for 4meg tlb entries and the other for 8K. The direct mapping technique is hooked into the kernel zone allocator, uma, which is also the back end allocator for malloc(9), so allocations of objects that are less than a page minus some overhead use it, which for the most part would give the kernel an entire 512 entry tlb for itself. This may or may not be faster than just using it as a single 1024 entry tlb for 8K mappings, have to see. Anyway, hope I didn't completely blow over your question. Jake