From owner-cvs-sys Fri Dec 6 23:58:43 1996 Return-Path: Received: (from root@localhost) by freefall.freebsd.org (8.8.4/8.8.4) id XAA16011 for cvs-sys-outgoing; Fri, 6 Dec 1996 23:58:43 -0800 (PST) Received: from godzilla.zeta.org.au (godzilla.zeta.org.au [203.2.228.19]) by freefall.freebsd.org (8.8.4/8.8.4) with ESMTP id XAA16006; Fri, 6 Dec 1996 23:58:20 -0800 (PST) Received: (from bde@localhost) by godzilla.zeta.org.au (8.8.3/8.6.9) id SAA19808; Sat, 7 Dec 1996 18:51:47 +1100 Date: Sat, 7 Dec 1996 18:51:47 +1100 From: Bruce Evans Message-Id: <199612070751.SAA19808@godzilla.zeta.org.au> To: bde@zeta.org.au, peter@spinner.DIALix.COM Subject: Re: cvs commit: src/sys/i386/include endian.h Cc: cvs-all@freefall.freebsd.org, CVS-committers@freefall.freebsd.org, cvs-sys@freefall.freebsd.org, dyson@freefall.freebsd.org, toor@dyson.iquest.net Sender: owner-cvs-sys@FreeBSD.ORG X-Loop: FreeBSD.org Precedence: bulk >Just as a thought, is it possible to handle the illegal instruction >trap on the i386 and emulate the bswap instruction? Then we could just >use bswap everywhere and be done with it. Obviously this would be a We would have to replace it by code that doesn't trap. Always trapping would be too slow. Unfortunately, bswap is a short instruction (3 bytes IIRC) so there is no room for replacing it unless it is padded to begin with. >penalty on the i386 (I wonder how much?), but it'd simplify the I guess about 50 us. >environment on the "current" mainstream cpu's. Perhaps this would >also be worth doing for invlpg() and other instructions? It would >eliminate a runtime overhead for testing cpu_class on >= i486 cpu's Using function calls would provide maximum flexibilty at a small cost. A function call+ret takes only 1+2 cycles (+more for cache misses and BTB misses). That's not much more than the 2 cycles (+more ...) for testing cpu_class. BTW, I have found a case where non-inline spls cause a reproducible slowdown - `ping localhost' on an idle P5/133 takes about 3 us longer. Each ping takes about 16 splnnn()s and 16 splx()s, so the call+ret overhead doesn't completely account for the slowdown. I guess this is caused by more BTB and cache misses caused by the extra function calls and lack of localitly. 16 splnnn()'s per second is probably too few to allow the function to stay in the L1 cache any better than inline code. Bruce