Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 25 Sep 2012 13:22:28 -0700
From:      Steve Kargl <sgk@troutmask.apl.washington.edu>
To:        Bruce Evans <brde@optusnet.com.au>
Cc:        numerics@freebsd.org
Subject:   Re: [PATCH] implementation for expm1l()
Message-ID:  <20120925202228.GA31099@troutmask.apl.washington.edu>
In-Reply-To: <20120926021900.R2081@besplex.bde.org>
References:  <20120925154606.GA28919@troutmask.apl.washington.edu> <20120926021900.R2081@besplex.bde.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, Sep 26, 2012 at 04:39:36AM +1000, Bruce Evans wrote:
> On Tue, 25 Sep 2012, Steve Kargl wrote:
> 
> > Now, the problem!  On i386 in the interval [-0.287682:0.223144],
> > I see up to 15.5 ULP for the ld80 patch.  Note, if I don't
>                     ulps or ULPs
> > use ENTERI and RETURNI in the ld80 code, I see ULP that are
> > consistent with the amd64 results.  This of course leads back
> > to the 53- versus 64-bit precision settings on i386.

My laptop (ie, i386) is at home, so I'm going from memory.  My
testl generates a report of the form

Interval tested: [-0.287682:0.223144]
    Max ULP: 15.4554
  x Max ULP: -2.some_set_of_digits_e-01
      Count: 10000000
> 
> I thought I fixed it, and you have all my patches, but...
> 

Perhaps, I do.  I'll go look through my email archive.

> It can't possibly work better without ENTERI.  Maybe a bug in the
> test code.

I could have been looking at the wrong xterm. I check tonight.

> You should test the internal values before they are added together,
> and expect an error of < ~1/256 ulps (or ~0.5 ulps in 72-bit precision).
> I thought I did that.  Maybe I only tested amd64.  Now that I look at
> it again, I see that it wasn't worth testing on i386 since it can't
> possibly work there...
> 
> > I have tried re-arranging the order of evaluation of the polynomial
> > and some of the intermediate results with little improvement.  I've
> > also increased the order of the polynomial.  Everything I've tried
> > seems to fail on i386.  So, anyone got a suggestion?
> >
> > PS: the relevant section of code is in the 'if (T1 < x && x < T2)'
> > block of code.
> >
> > Index: ld80/s_expl.c
> > ===================================================================
> > --- ld80/s_expl.c	(revision 240889)
> > +++ ld80/s_expl.c	(working copy)
> > @@ -301,3 +301,149 @@
> > ...
> > +#if 0
> > +static const long double
> > +T1 = -0x1.269621134db92786p-2L, /* log(1-1/4) = -2.876820724517809275e-01L */
> > +T2 =  0x1.c8ff7c79a9a21ac4p-3L; /* log(1+1/4) =  2.231435513142097558e-01L */
> > +#else
> > +static const long double
> > +T1 = -0.25L,
> > +T2 =  0.25L;
> > +#endif
> 
> Apparently these thresholds aren't critical (since fuzzy ones work
> on amd64). The fuzzy ones should be declared double.

The fuzzy ones are left over from an attempt to use a 2nd lookup
table on the range [-0.25:0.25].   This has the nice property that
T2 - T1 = 0.5 and dividing this up into 2**n segments gives a 
delete of the form 0x1.0p-7.

The log(1+-1/4) expressions are adopted from Tang's paper where he
gives an error analysis to establish an upper bound on the max
ULP.

> > +
> > +/*
> > + * Remes polynomial coefficients on the interval [T1:T2] with an error
> > + * estimate of log2(|expm1(x)-p(x)|) ~ -74.  These are the coefficients
> 
> It's expm1(x)/x that must be approximated by p(x)/x, not expm1(x) by
> p(x).  This combined with the nonstandard comment style confused me for
> a while.  (Plain expm1(x) is just as easy to approximate as exp(x) and
> doesn't require any long double coefficients.)

The actual function that I approximated was (expm1(x)-1-x-x**2/2)/x**3.
Tang's algorithm assumes that one compute x+x**2/2 with extra precision.
He does this by decomposing x into high and low parts.  I suppose I
should update the comment to reflect the function I approximate.

> > + * rounded to long double precision where B5 through B14 can be used with
> > + * a double type.
> > + */
> > +static const long double
> > +B3 = 0x1.5555555555555556p-3L,	/* 1.66666666666666666671e-01 */
> > +B4 = 0x1.5555555555555556p-5L;	/* 4.16666666666666666678e-02 */
> 
> These 2 aren't actually rounded to long double precision on i386, so the
> point of having them in more precision than the others is completely lost.
> gcc rounds them to nearest, so the result is the same as:
> 
> static const non_long double
> B3 = 0x1.5555555555555800p-3,	/* some wrong value */
> B4 = 0x1.5555555555555800p-5;	/* some wrong value */
> 
> This is the only obvious problem.
> 
> For a quick check that this is the only problem, try compiling with clang.
> I think it is incompatible with gcc here and rounds to 64 bits.

I'll try compiling with clang tonight.   I know that B3 and B4
cannot be double as this leads to problems on amd64.  I'll see
about using LD80C for B3 and B4 or simply split these into
high and low parts.

> LD80C must be used for most long double constants in ld80 (unless the
> constants can be double).
> 
> I generated the following polynomial.  Using only 2 full-precision coeffs
> limit the accuracy, so B14 doesn't improve accuracy significantly.

(coefficients deleted)

> 
> Note that my P12 is your B13 (different by 1 for dividing by x in
> expm1(x)/x).

I see if chaning to your set of coefficients is on any help tonight.

> > Index: ld128/s_expl.c
> > ===================================================================
> > --- ld128/s_expl.c	(revision 240889)
> > +++ ld128/s_expl.c	(working copy)
> > @@ -259,3 +259,132 @@
> >  		return (t * twopkp10000 * twom10000);
> >  	}
> >  }
> > +
> > +static const long double
> > +T1 = -2.87682072451780927439219005993827491e-01L, /* log(1-1/4) */
> > +T2 =  2.23143551314209755766295090309834524e-01L; /* log(1+1/4) */
> 
> Now I see that these aren't critical.  But isn't T1 = -0.25 not far
> enough negative?  Probably T2 being larger than necessary compensates.

The ld128 patch is hot off the keyboard as I wrote it on Sunday and
tested it yesterday.  I'm suspect the coefficients can use some tweaking.
I'll look over the coefficients that you sent later this week.

-- 
Steve



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20120925202228.GA31099>