Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 15 Aug 2012 08:15:52 +1000 (EST)
From:      Bruce Evans <brde@optusnet.com.au>
To:        Steve Kargl <sgk@troutmask.apl.washington.edu>
Cc:        Stephen Montgomery-Smith <stephen@missouri.edu>, freebsd-numerics@freebsd.org
Subject:   Re: Status of expl logl
Message-ID:  <20120815070807.B3431@besplex.bde.org>
In-Reply-To: <20120814195659.GA70571@troutmask.apl.washington.edu>
References:  <502A8CCC.5080606@missouri.edu> <20120814175257.GA69865@troutmask.apl.washington.edu> <20120814183518.GA70092@troutmask.apl.washington.edu> <502AA971.4010403@missouri.edu> <20120814195659.GA70571@troutmask.apl.washington.edu>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, 14 Aug 2012, Steve Kargl wrote:

> On Tue, Aug 14, 2012 at 02:39:29PM -0500, Stephen Montgomery-Smith wrote:
>> On 08/14/2012 01:35 PM, Steve Kargl wrote:
>>> On Tue, Aug 14, 2012 at 10:52:57AM -0700, Steve Kargl wrote:
>>>> On Tue, Aug 14, 2012 at 12:37:16PM -0500, Stephen Montgomery-Smith wrote:
>>>>> Are people working on expl, logl and log1pl?

-rw-r--r--  1 bde  wheel  37470 Aug 10 00:08 ld128/s_logl.c
-rw-r--r--  1 bde  wheel  32946 Aug 10 00:07 ld80/s_logl.c
-rw-r--r--  1 bde  wheel  32056 Aug 10 00:06 s_log.c
-rw-r--r--  1 bde  wheel  30053 Aug 10 00:05 s_logf.c

These are mostly large comments and larger tables.  Each file implements
log, log10, log2, log1p for the given precision.  Internal accuracy is
about 7 extra bits of precision.  Speed on core2 with optimal CFLAGS is
30-70 cycles depending on the precision.

>> ...
>> So I am looking through src/e_pow.c.
>>
>> It seems to me that the constants L1, L2, L3, etc, are 3/5, 3/7, 3/9,
>> etc, but not exactly these constants.  So they must have used some
>> process where they jiggled the constants around, perhaps using trial and
>> error, to get a few extra ulp.  Is that right?

Hopefully not by trial and error.  There is the Remes algorithm.  I
use a 1000 line (plus infrastructure) pari program to implement a form
of this algorithm.  In general, for nice functions like cos through
not so nice functions like tan near 0, the best possible (minimax)
polynomial approximation seems to need about 1 term less than a Taylor
approximation, and are only about 1 bit better than a Chebyshev
approximation.

>> Also, I am trying to see what P1, P2, P3, etc are.  They seem to be
>> related to the factorial (maybe a power series related to exp(x)), but I
>> must admit that I am not getting it.

These must be for exp().  Indeed, they are identical with the Pn's in
e_exp.c.  They are essentially Bernoulli numbers.  One generating
function for Bernoulli numbers with a certain normalization is

     z/(exp(z) - 1) := sum(n = 0, Inf, B[n]/n!*z^n)

and e_exp.c transforms exp(x) to essentially x/(exp(x) - 1).  Applying
Remes to them makes Pn not quite a nice fraction.

>> Is there a more detailed reference to how these numbers were obtained?
>> A paper somewhere?

e_exp.c mentions Remes.

>> Finally, what is ovfl (in the definition of ovt) meant to be?

Something to do with exp's overflow threshold I think.  The comment
about ovt doesn't seem to match the value (I don't see why the value
is so small), but the comment is very similar to the one that I wrote
for exp's overflow threshold in ld80/s_expl.c.  (fdlibm e_exp.c only
gives the magic number for the threshold.)

> I haven't looked too closely at the details of pow[fl]().  I am
> not aware of any published paper that gives the details. AFAIK,
> the comment in e_pow.c is only detailed description (other than
> the code).  I tried to find a paper about pow() implementations
> on Sunday with a very cursory google search.  Came up empty.
>
> The L and P constants are used in lines 235 and 299,
> line 235:  r = s2*s2*(L1+s2*(L2+s2*(L3+s2*(L4+s2*(L5+s2*L6)))));
> line 299: t1  = z - t*(P1+t*(P2+t*(P3+t*(P4+t*P5))));
>
> These are polynomials that are evaluated via Horner's method.
> I suspect the jiggling that you mention is actually a result
> of a Remes minimax procedure.
>
> ovfl looks like it's used to define an overflow threshold (ie, ovt).

I hardly looked at e_pow.c before.  It is apparently half about repeating
e_log.c and e_exp.c, to get at their extra internal precision.  In old
BSD libm (which FreeBSD used briefly before fdlibm was imported), pow()
and some other functions got their extra precision from __log__D() and
__exp__D().  These functions are still in msun/bsdsrc and are used in
tgamma().  My logl() and Steve's expl() have similar extra precision
internally, and run about 4 times faster than the old BSD functions.

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20120815070807.B3431>