From owner-freebsd-hackers Thu Oct 3 06:57:38 1996 Return-Path: owner-hackers Received: (from root@localhost) by freefall.freebsd.org (8.7.5/8.7.3) id GAA28117 for hackers-outgoing; Thu, 3 Oct 1996 06:57:38 -0700 (PDT) Received: from minnow.render.com (render.demon.co.uk [158.152.30.118]) by freefall.freebsd.org (8.7.5/8.7.3) with SMTP id GAA28092; Thu, 3 Oct 1996 06:57:22 -0700 (PDT) Received: from minnow.render.com (minnow.render.com [193.195.178.1]) by minnow.render.com (8.6.12/8.6.9) with SMTP id OAA26130; Thu, 3 Oct 1996 14:54:58 +0100 Date: Thu, 3 Oct 1996 14:54:56 +0100 (BST) From: Doug Rabson To: dyson@freebsd.org cc: phk@critter.tfs.com, heo@cslsun10.sogang.ac.kr, freebsd-hackers@freebsd.org, freebsd-fs@freebsd.org Subject: Re: vnode and cluster read-ahead In-Reply-To: <199610031312.IAA00602@dyson.iquest.net> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-hackers@freebsd.org X-Loop: FreeBSD.org Precedence: bulk On Thu, 3 Oct 1996, John S. Dyson wrote: > > > > On the subject of saving memory, I firmly believe that signficant > > performance improvements can be made just by reducing the memory footprint > > of algorithms. In our 3D graphics work, we have found that making > > important datastructures fit into cache lines (and using an aligning > > allocator to make sure that they start on cache line boundaries) can > > improve performance by as much as 20%. > > > The pmap code is a perfect example of that. There are times that I have > "improved" the code, and noted a net slowdown, because it has grown. > Soon, I intend to chop out another 1-2k out of pmap.o. Smaller is > definitely better sometimes. You may find that increasing the size of struct pv_entry to 32 bytes and arranging get_pv_entry to return new pv_entries on 32 byte boundaries will improve performance for operations that traverse pmaps which contain a large number of entries. Making structures like this fit cleanly into cache lines reduces the average number of cache misses needed to access a large quantity of data. If in addition, you arrange those functions to access the struct pv_entry sequentially from start to end, you will benefit from the fact that the 8 words of a cache line are read sequentially after a cache miss by the pentium and are available for use by instructions as soon as they are read, i.e. you can use the first couple of words in the cache line while the processor reads the rest. Looking at pmap_remove_entry() it seems to do this already but you can only benefit from it if the structure starts on a cache line boundary. -- Doug Rabson, Microsoft RenderMorphics Ltd. Mail: dfr@render.com Phone: +44 171 734 3761 FAX: +44 171 734 6426