Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 19 May 2000 02:50:02 -0700 (PDT)
From:      Bruce Evans <bde@zeta.org.au>
To:        freebsd-bugs@FreeBSD.org
Subject:   Re: i386/18560: libm's log1p not working as designed on Intel architectures.
Message-ID:  <200005190950.CAA66980@freefall.freebsd.org>

next in thread | raw e-mail | index | archive | help
The following reply was made to PR i386/18560; it has been noted by GNATS.

From: Bruce Evans <bde@zeta.org.au>
To: neis@cdc.informatik.tu-darmstadt.de
Cc: freebsd-gnats-submit@FreeBSD.ORG
Subject: Re: i386/18560: libm's log1p not working as designed on Intel
 architectures.
Date: Fri, 19 May 2000 19:43:30 +1000 (EST)

 On Mon, 15 May 2000 neis@cdc.informatik.tu-darmstadt.de wrote:
 
 > While "porting" current libm to OS/2 (i.e. recompiling and running
 > various tests), I noticed that assembler code for log1p(x) is basically
 > as follows: Compute x+1, than call fyl2x.
 
 This is only an efficient bug under FreeBSD.  log1p.S is too broken to
 use, so FreeBSD doesn't use it:
 
 RCS file: /home/ncvs/src/lib/msun/Makefile,v
 Working file: Makefile
 head: 1.23
 ...
 ----------------------------
 revision 1.14
 date: 1997/02/15 05:21:16;  author: bde;  state: Exp;  lines: +4 -2
 Disabled the i387 version if log1p().  It just evaluates log(1 + x).
 This defeats the point of log1p().  ucbtest reports errors of +-5e+15
 ULPs.  A correct version would use the i387 fyl2xp1 instruction for
 small x and maybe scale to small x.  The C version does the scaling
 reasonably efficiently, and fyl2px1 is slow (at least on P5s), so not
 much is lost by always using the C version (only 25% for small x even
 with the broken i387 version; 50% for large x).
 ----------------------------
 
 You can find a correct version in glibc (version 2.1.1. at least).
 
 > On a related issue, the various wrapper functions around assembler
 > code cause an additional function call which really causes a
 > performance loss.
 > I have been able to speed up e.g. "acos" by more than then percent
 > by replacing the assembler routine __ieee754_acos with inline
 > assembler code.
 
 A non-inline version of (the i387 version of) __ieee754_acos() is only
 about 2% slower than the inline version.  (Inlining acos doesn't help
 much because the inlined code is quite large and slow; the speedup for
 sqrt() is relatively much larger.)  I've never worried much about even
 10% speedups for inlining.  Usuually you only get the 10% speedups for
 simplistic benchmarks where everything is cached.
 
 Bruce
 
 


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-bugs" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200005190950.CAA66980>