From owner-cvs-etc Tue Mar 21 05:57:59 1995 Return-Path: cvs-etc-owner Received: (from majordom@localhost) by freefall.cdrom.com (8.6.10/8.6.6) id FAA20565 for cvs-etc-outgoing; Tue, 21 Mar 1995 05:57:59 -0800 Received: from time.cdrom.com (time.cdrom.com [192.216.223.46]) by freefall.cdrom.com (8.6.10/8.6.6) with ESMTP id FAA20558; Tue, 21 Mar 1995 05:57:56 -0800 Received: from godzilla.zeta.org.au (godzilla.zeta.org.au [203.2.228.34]) by time.cdrom.com (8.6.11/8.6.9) with ESMTP id FAA17088; Tue, 21 Mar 1995 05:57:40 -0800 Received: (from bde@localhost) by godzilla.zeta.org.au (8.6.9/8.6.9) id XAA16990; Tue, 21 Mar 1995 23:49:34 +1000 Date: Tue, 21 Mar 1995 23:49:34 +1000 From: Bruce Evans Message-Id: <199503211349.XAA16990@godzilla.zeta.org.au> To: phk@ref.tfs.com, pst@shockwave.com Subject: Re: cvs commit: src/etc make.conf Cc: CVS-commiters@time.cdrom.com, bde@zeta.org.au, cvs-etc@time.cdrom.com, jkh@freebsd.org, rgrimes@gndrsh.aac.dev.com Sender: cvs-etc-owner@freebsd.org Precedence: bulk > > We also need dynamic support for the i387 functions. -DHAVE_FPU is no > > good because it can't be used for the distribution libraries. Something > > like > > > > if (_have_i387) > > result = _i387_pow(x, y); > > else > > result = __ieee754_pow(x, y); > > > > would add less time overhead than shared linkage. >The extra test on every operation is bad. Let's replace `pow' by `sin'. pow() isn't an i387 function and is too complicated to synthesize from a few i387 functions. To be precise, it costs 6 cycles on a 486 for the _i387_sin case and 5 cycles for the __ieee754_sin case (plus cache misses...) >Xonsider the following fragment or high-speed linkages with shared libraries >instead (I don't know how fast or slow shared linkages are): Shared linkage costs 4 cycles (1 wasted for a stupidly placed nop and much more for the first call; plus cache misses...). > static vec_pow = pow_init; > pow (base, exp) > { > return (*vec_pow)(base, exp); > } This would only cost 2 cycle (plus cache misses...). Anyone for self modifying code? :-) The shared library already uses it to avoid these 2 cycles and it might not be too hard to get the shared library to patch in the addresses of the i387-specifice functions instead of the generic one. Unfortunately , this won't work for statically linked programs. The hardware sin() takes 193-279 cycles on a 486 and the msun wrappers take many more (especially for shared libraries; position-independent code costs about 10 cycles just for loading the global register), so another 5 cycles would be hardly noticeable. Bruce