From owner-freebsd-hackers  Sun Jul 14 16:20:59 1996
Return-Path: owner-hackers
Received: (from root@localhost)
          by freefall.freebsd.org (8.7.5/8.7.3) id QAA09153
          for hackers-outgoing; Sun, 14 Jul 1996 16:20:59 -0700 (PDT)
Received: from dyson.iquest.net (dyson.iquest.net [198.70.144.127])
          by freefall.freebsd.org (8.7.5/8.7.3) with ESMTP id QAA09146;
          Sun, 14 Jul 1996 16:20:56 -0700 (PDT)
Received: (from root@localhost) by dyson.iquest.net (8.7.5/8.6.9) id SAA04064; Sun, 14 Jul 1996 18:20:43 -0500 (EST)
From: "John S. Dyson" <toor@dyson.iquest.net>
Message-Id: <199607142320.SAA04064@dyson.iquest.net>
Subject: Re: Preach it (was Some recent changes to GENERIC)
To: bde@zeta.org.au (Bruce Evans)
Date: Sun, 14 Jul 1996 18:20:43 -0500 (EST)
Cc: sos@FreeBSD.ORG, freebsd-hackers@FreeBSD.ORG,
        joerg_wunsch@uriah.heep.sax.de, jonny@gaia.coppe.ufrj.br,
        pjchilds@imforei.apana.org.au
In-Reply-To: <199607142231.IAA08296@godzilla.zeta.org.au> from "Bruce Evans" at Jul 15, 96 08:31:00 am
X-Mailer: ELM [version 2.4 PL24 ME8]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-hackers@FreeBSD.ORG
X-Loop: FreeBSD.org
Precedence: bulk

> 
> >If I remove all __inlines in pmap.c, I can save about 3K.  Maybe we
> >should/shoundn't have a "SMALL_KERNEL" option?
> 
> It would probably be faster too.  Large inlines are often slower because
> they bust caches.  3K is too much for the 8K combined I&D L1 cache on
> 486's - if the 3K is all executed often then it busts the cache, and if
> it isn't all executed often then inlining (all of) it just wastes space
> when it isn't executed and depletes the caches when it is executed (if a
> function version if it would be in a cache).
> 
I usually make guesses as to the applicability of __inline, and then
benchmark to check performance.  Sometimes, I look at the generated
code to make sure it doesn't produce gross or hugely expanded code.
My benchmarks are done on a P5-166, so of course my results are not
directly applicable to a 486.  For highest performance, 486 isn't the
answer anymore anyway.

However, the performance improvment of careful inlining is about 5%
at best (in pmap) using lmbench lat_proc.  But, with Linux'ers looking
at that kind of difference to distinguish the OSes, I believe that we
should be careful to squeeze where we can.  Feel free to suggest
otherwise...  I really don't mind guidelines, as well as they are
well thought out.

John