Date: Sat, 7 Dec 1996 18:51:47 +1100 From: Bruce Evans <bde@zeta.org.au> To: bde@zeta.org.au, peter@spinner.DIALix.COM Cc: cvs-all@freefall.freebsd.org, CVS-committers@freefall.freebsd.org, cvs-sys@freefall.freebsd.org, dyson@freefall.freebsd.org, toor@dyson.iquest.net Subject: Re: cvs commit: src/sys/i386/include endian.h Message-ID: <199612070751.SAA19808@godzilla.zeta.org.au>
next in thread | raw e-mail | index | archive | help
>Just as a thought, is it possible to handle the illegal instruction >trap on the i386 and emulate the bswap instruction? Then we could just >use bswap everywhere and be done with it. Obviously this would be a We would have to replace it by code that doesn't trap. Always trapping would be too slow. Unfortunately, bswap is a short instruction (3 bytes IIRC) so there is no room for replacing it unless it is padded to begin with. >penalty on the i386 (I wonder how much?), but it'd simplify the I guess about 50 us. >environment on the "current" mainstream cpu's. Perhaps this would >also be worth doing for invlpg() and other instructions? It would >eliminate a runtime overhead for testing cpu_class on >= i486 cpu's Using function calls would provide maximum flexibilty at a small cost. A function call+ret takes only 1+2 cycles (+more for cache misses and BTB misses). That's not much more than the 2 cycles (+more ...) for testing cpu_class. BTW, I have found a case where non-inline spls cause a reproducible slowdown - `ping localhost' on an idle P5/133 takes about 3 us longer. Each ping takes about 16 splnnn()s and 16 splx()s, so the call+ret overhead doesn't completely account for the slowdown. I guess this is caused by more BTB and cache misses caused by the extra function calls and lack of localitly. 16 splnnn()'s per second is probably too few to allow the function to stay in the L1 cache any better than inline code. Bruce
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199612070751.SAA19808>
