Date: Sun, 14 Jul 1996 18:20:43 -0500 (EST) From: "John S. Dyson" <toor@dyson.iquest.net> To: bde@zeta.org.au (Bruce Evans) Cc: sos@FreeBSD.ORG, freebsd-hackers@FreeBSD.ORG, joerg_wunsch@uriah.heep.sax.de, jonny@gaia.coppe.ufrj.br, pjchilds@imforei.apana.org.au Subject: Re: Preach it (was Some recent changes to GENERIC) Message-ID: <199607142320.SAA04064@dyson.iquest.net> In-Reply-To: <199607142231.IAA08296@godzilla.zeta.org.au> from "Bruce Evans" at Jul 15, 96 08:31:00 am
next in thread | previous in thread | raw e-mail | index | archive | help
> > >If I remove all __inlines in pmap.c, I can save about 3K. Maybe we > >should/shoundn't have a "SMALL_KERNEL" option? > > It would probably be faster too. Large inlines are often slower because > they bust caches. 3K is too much for the 8K combined I&D L1 cache on > 486's - if the 3K is all executed often then it busts the cache, and if > it isn't all executed often then inlining (all of) it just wastes space > when it isn't executed and depletes the caches when it is executed (if a > function version if it would be in a cache). > I usually make guesses as to the applicability of __inline, and then benchmark to check performance. Sometimes, I look at the generated code to make sure it doesn't produce gross or hugely expanded code. My benchmarks are done on a P5-166, so of course my results are not directly applicable to a 486. For highest performance, 486 isn't the answer anymore anyway. However, the performance improvment of careful inlining is about 5% at best (in pmap) using lmbench lat_proc. But, with Linux'ers looking at that kind of difference to distinguish the OSes, I believe that we should be careful to squeeze where we can. Feel free to suggest otherwise... I really don't mind guidelines, as well as they are well thought out. John
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199607142320.SAA04064>