Date: Sat, 29 Apr 2006 10:21:11 -0700 From: Peter Wemm <peter@wemm.org> To: Nate Lawson <nate@root.org> Cc: Peter Jeremy <peterjeremy@optushome.com.au>, src-committers@freebsd.org, cvs-all@freebsd.org, cvs-src@freebsd.org Subject: Re: cvs commit: src/sys/i386/i386 pmap.c Message-ID: <200604291021.12379.peter@wemm.org> In-Reply-To: <4452FB64.7030308@root.org> References: <200604281905.k3SJ59MZ008806@repoman.freebsd.org> <20060429024850.GB997@turion.vk2pj.dyndns.org> <4452FB64.7030308@root.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Friday 28 April 2006 10:36 pm, Nate Lawson wrote: > Peter Jeremy wrote: > > On Fri, 2006-Apr-28 14:22:34 -0700, Peter Wemm wrote: > >> On Friday 28 April 2006 12:05 pm, Peter Wemm wrote: > >>> ups@ had a truely evil idea that I'll investigate. It should > >>> allow freeing unused pages again by giving us a no-cost way to > >>> track the holes in the kva block. > >> > >> FWIW, this idea appears to work. For the curious: > >> http://people.freebsd.org/~peter/freepv.diff > > > > Care to explain how this works in slightly more detail than "truely > > evil". > > The PTE (page-table entry) is an appropriate size to store a pointer. > So you link them all together in a freelist. In more detail. There is one PTE per page of virtual address space. It is used by the hardware to translate a virtual address access to a physical address. One normally stores the corresponding physical address in there with various control bits (read/write mode etc) and most importantly, the 'valid' (PG_V) bit. This tells the hardware of the cpu that the physical address it found is valid. The problem we have is that we have a block of virtual address space. If we allow random pages to be freed and unmapped from that block, there is no easy way to keep track of what has been freed. You can't thread a freelist through the unused blocks like one would do with malloc(3), because we'd like to return the pages if possible and you can't store something in nothing. Storing a 'first free' pointer doesn't scale because we free pages randomly, and after the first-free has been used, we have no way to find the next one without an exhaustive search. We could use a tree or list structure to track the free address space, except that requires memory allocation, which we cannot do from this level of pmap. We could reserve another tailq and length in the headers of the pv chunks, and use the chunk proceding a freed block to record the length of the free space and then use the tailq to link the blocks together. This is non-trivial code though and incurs a constant runtime cost maintaining it. A bitmap doesn't scale because there are potentially hundreds of thousands of entries - an exhaustive search there hurts. And so on. Stephan realized that the kernel already allocates one PTE per virtual page. Although it normally holds physical addresses plus attributes, as long as we don't set PG_V, then there are 31 other bits that we could use for data storage. We could put virtual addresses in there so long as we didn't set PG_V, and abuse that to have a singly linked freelist threaded through the PTEs. It turned out even easier though. As long as the virtual addresses are page aligned, we neatly avoid all the PG_* mode bits as well. We can avoid locking and atomic ops when updating these because they are only accessed under the page queues mutex. We can avoid doing pte invalidations (tlb shootdowns) because there can never be any tlb entries corresponding to them. So I wrapped this in pmap_ptelist_{alloc/free/init} functions and it turned out quite neat and tidy. And its virtually free - we can grab a page of address space by unlinking the head of the ptelist. Freeing a page is as simple as storing it on the head. I'm running my laptop with that code right now. -- Peter Wemm - peter@wemm.org; peter@FreeBSD.org; peter@yahoo-inc.com "All of this is for nothing if we don't go to the stars" - JMS/B5
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200604291021.12379.peter>