Date: Fri, 19 May 2000 02:50:02 -0700 (PDT) From: Bruce Evans <bde@zeta.org.au> To: freebsd-bugs@FreeBSD.org Subject: Re: i386/18560: libm's log1p not working as designed on Intel architectures. Message-ID: <200005190950.CAA66980@freefall.freebsd.org>
next in thread | raw e-mail | index | archive | help
The following reply was made to PR i386/18560; it has been noted by GNATS. From: Bruce Evans <bde@zeta.org.au> To: neis@cdc.informatik.tu-darmstadt.de Cc: freebsd-gnats-submit@FreeBSD.ORG Subject: Re: i386/18560: libm's log1p not working as designed on Intel architectures. Date: Fri, 19 May 2000 19:43:30 +1000 (EST) On Mon, 15 May 2000 neis@cdc.informatik.tu-darmstadt.de wrote: > While "porting" current libm to OS/2 (i.e. recompiling and running > various tests), I noticed that assembler code for log1p(x) is basically > as follows: Compute x+1, than call fyl2x. This is only an efficient bug under FreeBSD. log1p.S is too broken to use, so FreeBSD doesn't use it: RCS file: /home/ncvs/src/lib/msun/Makefile,v Working file: Makefile head: 1.23 ... ---------------------------- revision 1.14 date: 1997/02/15 05:21:16; author: bde; state: Exp; lines: +4 -2 Disabled the i387 version if log1p(). It just evaluates log(1 + x). This defeats the point of log1p(). ucbtest reports errors of +-5e+15 ULPs. A correct version would use the i387 fyl2xp1 instruction for small x and maybe scale to small x. The C version does the scaling reasonably efficiently, and fyl2px1 is slow (at least on P5s), so not much is lost by always using the C version (only 25% for small x even with the broken i387 version; 50% for large x). ---------------------------- You can find a correct version in glibc (version 2.1.1. at least). > On a related issue, the various wrapper functions around assembler > code cause an additional function call which really causes a > performance loss. > I have been able to speed up e.g. "acos" by more than then percent > by replacing the assembler routine __ieee754_acos with inline > assembler code. A non-inline version of (the i387 version of) __ieee754_acos() is only about 2% slower than the inline version. (Inlining acos doesn't help much because the inlined code is quite large and slow; the speedup for sqrt() is relatively much larger.) I've never worried much about even 10% speedups for inlining. Usuually you only get the 10% speedups for simplistic benchmarks where everything is cached. Bruce To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-bugs" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200005190950.CAA66980>