Date: Mon, 15 Jul 1996 10:44:46 +1000 From: Bruce Evans <bde@zeta.org.au> To: bde@zeta.org.au, toor@dyson.iquest.net Cc: freebsd-hackers@FreeBSD.ORG, joerg_wunsch@uriah.heep.sax.de, jonny@gaia.coppe.ufrj.br, pjchilds@imforei.apana.org.au, sos@FreeBSD.ORG Subject: Re: Preach it (was Some recent changes to GENERIC) Message-ID: <199607150044.KAA12373@godzilla.zeta.org.au>
next in thread | raw e-mail | index | archive | help
>> >If I remove all __inlines in pmap.c, I can save about 3K. Maybe we >> >should/shoundn't have a "SMALL_KERNEL" option? >> >> It would probably be faster too. Large inlines are often slower because >> they bust caches. 3K is too much for the 8K combined I&D L1 cache on >> ... >I usually make guesses as to the applicability of __inline, and then >benchmark to check performance. Sometimes, I look at the generated Remember, it's very hard to benchmark. Benchmarks tend to test the unloaded case and caches work better in the unloaded case :-). >However, the performance improvment of careful inlining is about 5% >at best (in pmap) using lmbench lat_proc. But, with Linux'ers looking >at that kind of difference to distinguish the OSes, I believe that we >should be careful to squeeze where we can. Feel free to suggest >otherwise... I really don't mind guidelines, as well as they are >well thought out. A simple guidline: don't inline anything that makes a function call, perhaps even to an inline function. pmap.c more or less follows this rule except it calls nested inline functions a lot. Most of the 3K seems to be caused by nesting. Perhaps more functions should be split up like pmap_allocpte() (to have a small usual case and call a non-inline function for special cases). Bruce
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199607150044.KAA12373>