Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 26 Sep 2012 15:06:33 +1000 (EST)
From:      Bruce Evans <brde@optusnet.com.au>
To:        Steve Kargl <sgk@troutmask.apl.washington.edu>
Cc:        numerics@freebsd.org
Subject:   Re: [PATCH] implementation for expm1l()
Message-ID:  <20120926140942.K4521@besplex.bde.org>
In-Reply-To: <20120926031407.GA33693@troutmask.apl.washington.edu>
References:  <20120925154606.GA28919@troutmask.apl.washington.edu> <20120926021900.R2081@besplex.bde.org> <20120926031407.GA33693@troutmask.apl.washington.edu>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, 25 Sep 2012, Steve Kargl wrote:

> On Wed, Sep 26, 2012 at 04:39:36AM +1000, Bruce Evans wrote:
>> On Tue, 25 Sep 2012, Steve Kargl wrote:
>>> +static const long double
>>> +B3 = 0x1.5555555555555556p-3L,	/* 1.66666666666666666671e-01 */
>>> +B4 = 0x1.5555555555555556p-5L;	/* 4.16666666666666666678e-02 */
>>
>> These 2 aren't actually rounded to long double precision on i386, so the
>> point of having them in more precision than the others is completely lost.
>> gcc rounds them to nearest, so the result is the same as:
>>
>> static const non_long double
>> B3 = 0x1.5555555555555800p-3,	/* some wrong value */
>> B4 = 0x1.5555555555555800p-5;	/* some wrong value */
>>
>> This is the only obvious problem.
>
> You nailed it on the head.  I haven't tried your coefficient, yet.
> I just regenerated a new set and split B3 and B4 into high and
> low parts.  With these coefficients:

Don't do that.  Use LD80C() to simplify generation of the constants and
avoid bugs.

> /*
> * Remes polynomial coefficients on the interval [T1:T2] with an error
> * estimate of log2(|f(x)-p(x)|) ~ -68 with f(x)=(expm1(x)-1-x-x**2/2)/x**3.
> */
> static const long double
> B3hi = 0x1.5555555555555p-3, B3lo = 0x1.5554d1fb0e211p-57,
> B4hi = 0x1.5555555555555p-5, B4lo = 0x1.66bec322050bep-59,

> ...
> I needed to re-arrange the relevant section of code to add
> terms in the correct order:
>
> 		x2 = x * x;
> 		t1 = (long double)B3lo + x * B4lo + x2 * B5 + x * B4hi + B3hi;

This is slower and more complicated than using LD80C(), especially on
arches (all except i386) where LD80C() is not needed.  Before LD80C()
existed, this should have been written as:

 		t1 = B3 + x * B4;

(no change), where B3 and B4 give the constants with ifdefs:

#ifdef __i386__
static const double		/* split values */
B3hi = ...,			/* values in bits */
B4hi = ...;
static const volatile double	/* volatile for run-time addition */
B3lo = ...,
B4lo = ...;
#define	B3	((long double)B3hi + B3lo)
#define	B4	((long double)B4hi + B4lo)
#else
static const long double	/* unsplit values */
B3 = ...,			/* slightly different values in bits */
B4 = ...;
#endif

Copy this method from a nearby file, e.g. k_sinl.c, but omit the __amd64__
ifdef there (except for testing; it was intended as an optimization,
but made little difference even on non-Intel CPUs where loading long
doubles is slow, because the load latency is usually hidden by
scheduling).  But you shouldn't want to do this.  Use LD80C() so that
the declarations take 1 line per additional constant with no visible
ifdefs (they are in the implementation of LD80C(), instead of 5 lines
each like the above, with ifdefs.

See the nearby files invtrig.[ch] and the non-nearby files ../*/invtrig.c
for an alternative old not-so-good way of declaring these values.  The
other way produces long doubles directly using bits much like LD80C()
internally, but has the equivalent of even more convoluted ifdefs and
macros then the above.  For example, for the long double value pio2_hi:
- it is #defined as atanhi[3]
- for ld128, there no further layers (splitting is not supported); for
   ld80, there are the following complications:
- atanhi is #defined as _ItL_atanhi
- atanhi is declared as extern const LONGDOUBLE in ld*/invtrig.h
- LONGDOUBLE is long double for !i386 and a mispacked struct for i386.
   Mispacking gives slowness via misalignment, especially in arrays
   like atanhi, where 8-byte alignment of the first element ensures
   perfect misalignment for odd elements lik atanhi[3].  This strcuct
   just contains bits and the long double values are obtained by a
   type pun, so the compiler can't align things properly (alignment
   for the uint64_t in the struct is 4 bytes, but long doubles should
   have at least 8-byte alignment and preferably 16-byte alignment.
- for !i386, ld80/invtrig.c declares _ItL_atanhi as an array of long
   doubles with ordinary long double constants.  It has a lot of style
   bugs (mainly missing hex values in comments.  Another feature of
   LD80C() is that it doesn't permit omitting either the hex or the
   decimal value.  Both are (not quite) active code).
- for i386, i387/invtrig.c declares _ItL_atanhi as an array of
   LONGDOUBLE, where LONGDOUBLE is now a struct and the initializers
   give values in bits.

So to add a new constant using this method, you need to understand some
of above and perform the following edits:
- if not for invtrig, first duplicate invtrig's messes in the relevant
   subsystem (don't pollute invtrig files with !invtrig declarations).
   Then apply the following to the relevant files.
- add a macro and an extern declaration in ld80/invtrig.h.  (This complication
   is mostly related.  It is mostly to use extern constants.  There are
   namespace problems then...)
- add an extern declaration in ld128/invtrig.h
- add a normal const long double declaration to ld80/invtrig.c and
   ld128/invtrig.c
- add a const LONGDOUBLE declaration using bits to i387/invtrig.c.

You shouldn't want to do this.

I have only partly cleaned up invtrig.  I fixed the misalignment and
removed 1 layer of macros.  I don't use LD80C() there yet.

> With these changes, I see
>
> [... correct results]

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20120926140942.K4521>