Date: Wed, 26 Sep 2012 15:06:33 +1000 (EST) From: Bruce Evans <brde@optusnet.com.au> To: Steve Kargl <sgk@troutmask.apl.washington.edu> Cc: numerics@freebsd.org Subject: Re: [PATCH] implementation for expm1l() Message-ID: <20120926140942.K4521@besplex.bde.org> In-Reply-To: <20120926031407.GA33693@troutmask.apl.washington.edu> References: <20120925154606.GA28919@troutmask.apl.washington.edu> <20120926021900.R2081@besplex.bde.org> <20120926031407.GA33693@troutmask.apl.washington.edu>
next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, 25 Sep 2012, Steve Kargl wrote: > On Wed, Sep 26, 2012 at 04:39:36AM +1000, Bruce Evans wrote: >> On Tue, 25 Sep 2012, Steve Kargl wrote: >>> +static const long double >>> +B3 = 0x1.5555555555555556p-3L, /* 1.66666666666666666671e-01 */ >>> +B4 = 0x1.5555555555555556p-5L; /* 4.16666666666666666678e-02 */ >> >> These 2 aren't actually rounded to long double precision on i386, so the >> point of having them in more precision than the others is completely lost. >> gcc rounds them to nearest, so the result is the same as: >> >> static const non_long double >> B3 = 0x1.5555555555555800p-3, /* some wrong value */ >> B4 = 0x1.5555555555555800p-5; /* some wrong value */ >> >> This is the only obvious problem. > > You nailed it on the head. I haven't tried your coefficient, yet. > I just regenerated a new set and split B3 and B4 into high and > low parts. With these coefficients: Don't do that. Use LD80C() to simplify generation of the constants and avoid bugs. > /* > * Remes polynomial coefficients on the interval [T1:T2] with an error > * estimate of log2(|f(x)-p(x)|) ~ -68 with f(x)=(expm1(x)-1-x-x**2/2)/x**3. > */ > static const long double > B3hi = 0x1.5555555555555p-3, B3lo = 0x1.5554d1fb0e211p-57, > B4hi = 0x1.5555555555555p-5, B4lo = 0x1.66bec322050bep-59, > ... > I needed to re-arrange the relevant section of code to add > terms in the correct order: > > x2 = x * x; > t1 = (long double)B3lo + x * B4lo + x2 * B5 + x * B4hi + B3hi; This is slower and more complicated than using LD80C(), especially on arches (all except i386) where LD80C() is not needed. Before LD80C() existed, this should have been written as: t1 = B3 + x * B4; (no change), where B3 and B4 give the constants with ifdefs: #ifdef __i386__ static const double /* split values */ B3hi = ..., /* values in bits */ B4hi = ...; static const volatile double /* volatile for run-time addition */ B3lo = ..., B4lo = ...; #define B3 ((long double)B3hi + B3lo) #define B4 ((long double)B4hi + B4lo) #else static const long double /* unsplit values */ B3 = ..., /* slightly different values in bits */ B4 = ...; #endif Copy this method from a nearby file, e.g. k_sinl.c, but omit the __amd64__ ifdef there (except for testing; it was intended as an optimization, but made little difference even on non-Intel CPUs where loading long doubles is slow, because the load latency is usually hidden by scheduling). But you shouldn't want to do this. Use LD80C() so that the declarations take 1 line per additional constant with no visible ifdefs (they are in the implementation of LD80C(), instead of 5 lines each like the above, with ifdefs. See the nearby files invtrig.[ch] and the non-nearby files ../*/invtrig.c for an alternative old not-so-good way of declaring these values. The other way produces long doubles directly using bits much like LD80C() internally, but has the equivalent of even more convoluted ifdefs and macros then the above. For example, for the long double value pio2_hi: - it is #defined as atanhi[3] - for ld128, there no further layers (splitting is not supported); for ld80, there are the following complications: - atanhi is #defined as _ItL_atanhi - atanhi is declared as extern const LONGDOUBLE in ld*/invtrig.h - LONGDOUBLE is long double for !i386 and a mispacked struct for i386. Mispacking gives slowness via misalignment, especially in arrays like atanhi, where 8-byte alignment of the first element ensures perfect misalignment for odd elements lik atanhi[3]. This strcuct just contains bits and the long double values are obtained by a type pun, so the compiler can't align things properly (alignment for the uint64_t in the struct is 4 bytes, but long doubles should have at least 8-byte alignment and preferably 16-byte alignment. - for !i386, ld80/invtrig.c declares _ItL_atanhi as an array of long doubles with ordinary long double constants. It has a lot of style bugs (mainly missing hex values in comments. Another feature of LD80C() is that it doesn't permit omitting either the hex or the decimal value. Both are (not quite) active code). - for i386, i387/invtrig.c declares _ItL_atanhi as an array of LONGDOUBLE, where LONGDOUBLE is now a struct and the initializers give values in bits. So to add a new constant using this method, you need to understand some of above and perform the following edits: - if not for invtrig, first duplicate invtrig's messes in the relevant subsystem (don't pollute invtrig files with !invtrig declarations). Then apply the following to the relevant files. - add a macro and an extern declaration in ld80/invtrig.h. (This complication is mostly related. It is mostly to use extern constants. There are namespace problems then...) - add an extern declaration in ld128/invtrig.h - add a normal const long double declaration to ld80/invtrig.c and ld128/invtrig.c - add a const LONGDOUBLE declaration using bits to i387/invtrig.c. You shouldn't want to do this. I have only partly cleaned up invtrig. I fixed the misalignment and removed 1 layer of macros. I don't use LD80C() there yet. > With these changes, I see > > [... correct results] Bruce
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20120926140942.K4521>