Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 23 Oct 2012 17:00:17 +1100 (EST)
From:      Bruce Evans <brde@optusnet.com.au>
To:        Warner Losh <imp@bsdimp.com>
Cc:        src-committers@freebsd.org, Steve Kargl <sgk@troutmask.apl.washington.edu>, svn-src-all@freebsd.org, Bruce Evans <brde@optusnet.com.au>, svn-src-head@freebsd.org, Warner Losh <imp@freebsd.org>
Subject:   Re: svn commit: r241755 - head/lib/msun/src
Message-ID:  <20121023160721.O1282@besplex.bde.org>
In-Reply-To: <45589524-E249-43E3-91B7-6A78068208AD@bsdimp.com>
References:  <201210192246.q9JMkm4R092929@svn.freebsd.org> <20121020150917.I1095@besplex.bde.org> <18177777-6EE0-4103-98B0-272EFF98FE96@bsdimp.com> <20121022040651.GA49632@troutmask.apl.washington.edu> <B3BBD842-59C1-46FB-8E83-9DED9657A4D9@bsdimp.com> <20121022134003.GA52156@troutmask.apl.washington.edu> <45589524-E249-43E3-91B7-6A78068208AD@bsdimp.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, 22 Oct 2012, Warner Losh wrote:

> On Oct 22, 2012, at 7:40 AM, Steve Kargl wrote:
>> ...
>> BTW, besides bde's technical points, your change made
>> our sources different from OpenBSD, NetBSD, and new
>> project openlibm.  Diffing against the other trees
>> would become cluttered.
>
> BDE's technical points vary in quality and are difficult to argue with since they are so nit-picky. :(  I'd be happy to work through them, but some of the issues I just fundamentally disagree with.  Since I backed out the comments, I've decided not to spend the time arguing, but do think that documenting the differences between the precisions would be good.  I started down this path because I thought expf was broken because it didn't match exp exactly...
>
> However, since he's implementing a new one, wouldn't that also have diffability issues too?

Steve is implementing it :-).  It would be completely different.  It
is already implemented, not quite perfectly, for expl(), and diffs
between that version and the float and double versions are unreadable.
Changing the float and double versions to be like it would make the
diffs readable again.  Imperfections in it include its documentation
consisting mainly of a pointer to a much further-away place than a
nearby source file -- to a paper by Tang which is still behind the ACM
paywall AFAIK.  It is slightly simpler and more general than the fdlibm
version (no transformation through the apparently-magic R(z)), and well
documented by the paper, so it is easier to understand iff you have seen
the paper or know the general details.

Starting from scratch, I wouldn't go this way.  Translating the fdlibm
exp() directly to expl() would have given a good enough version.
Similarly for all functions in fdlibm.  The double precision versions
aren't perfect, but they are mostly good or very good.
   (It's interesting that they keep getting better with each
   generation of x86, because each generation has better support for
   the type of bit fiddling that the fdlibm functions like to do.
   Better means often taking 1/2 or 1/3 as many cycles relative to
   the 2002 generation of x86's, mainly by reducing pipeline stalls
   in instructions that fdlibm probably liked to use because they
   were fast on sparc in 1992, but were slow on x86 in 1992 and
   became relatively slower with pipelines on x86.)
But Steve didn't understand the fdlibm version when he started, and
didn't like the looks of it, so he wrote a completely different version.
We now have a version that is so much better than the fdlibm version
that it is silly to keep using the fdlibm version.  It takes about the
same time for expl() as for expf(), to create about 3 times as many
accurate bits internally and deliver 64 of them (it would be useful
to deliver more, but the API for this isn't established and expf()
doesn't have any more to deliver).  (This is for ld80; ld128 on at
least sparc64 is so slow that it is unusable for almost all purposes
and especially unusable for optimizing expf().)  Normally, using the
same algorithm, you have to work hard for long double precision to be
less than 4 times slower than float precision.

Note that i386 doesn't even use fdlibm for exp().  It uses the i387
for "efficiency".  But with newer x86, even fdlibm's slow version is
faster than the i387.  We never used the i387 for expf() on i386
because optimizing expf() wasn't considered important until after
x86's became new enough for their hardware expf to be slower than
fdlibm software expf, though we almost imported this from NetBSD.
The i387 is unusable for expl() on i386 since it is barely accurate
enough for exp().

Lots of other i387 "optimized" versions on i386 should be removed.
There are just a couple of them that are more efficient or more accurate
(usually not both) than the fdlibm versions on modern x86.  amd64 never
used most of the ones that should be removed, though they would have
been relatively more accurate for amd64.  But it takes courage to axe
working versions :-).

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20121023160721.O1282>