Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 14 Sep 2012 16:05:30 +1000 (EST)
From:      Bruce Evans <brde@optusnet.com.au>
To:        Stephen Montgomery-Smith <stephen@missouri.edu>
Cc:        freebsd-numerics@freebsd.org
Subject:   Re: cexp error
Message-ID:  <20120914143553.X870@besplex.bde.org>
In-Reply-To: <5052A923.9030806@missouri.edu>
References:  <5017111E.6060003@missouri.edu> <501D9C36.2040207@missouri.edu> <20120805175106.X3574@besplex.bde.org> <501EC015.3000808@missouri.edu> <20120805191954.GA50379@troutmask.apl.washington.edu> <20120807205725.GA10572@server.rulingia.com> <20120809025220.N4114@besplex.bde.org> <5027F07E.9060409@missouri.edu> <20120814003614.H3692@besplex.bde.org> <50295F5C.6010800@missouri.edu> <20120814072946.S5260@besplex.bde.org> <50297CA5.5010900@missouri.edu> <50297E43.7090309@missouri.edu> <20120814201105.T934@besplex.bde.org> <502A780B.2010106@missouri.edu> <20120815223631.N1751@besplex.bde.org> <502C0CF8.8040003@missouri.edu> <20120906221028.O1542@besplex.bde.org> <5048D00B.8010401@missouri.edu> <504D3CCD.2050006@missouri.edu> <504FF726.9060001@missouri.edu> <20120912191556.F1078@besplex.bde.org> <20120912225847.J1771@besplex.bde.org> <50511B40.3070009@missouri.edu> <20120913204808.T1964@besplex.bde.org> <5052A923.9030806@missouri.edu>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, 13 Sep 2012, Stephen Montgomery-Smith wrote:

>> % rcexp: max_er = 0x1658ca3e2ce252 11716177.9430, avg_er = 0.197,
>> #>=1:0.5 = 15750254:115895992
>> 
>> Everything except this works fairly well.  cexp() passed tests on only
>> 2**12 x 2**12 args.
>
> Do you have any idea why you are getting this huge error?

It turned out to be just a bug in the test program.

> All I could think of is that cos(y) or sin(y) is subnormal, and exp(x) is 
> large.  Is it possible for sin(y) or cos(y) to be subnormal if, say, y is 
> close to a non-zero multiple of PI/2?  In other words, can rem_pio2 give 
> subnormal result (unless of course y is subnormal, but I would think that 
> wouldn't generate the above error).

It was a problem with denormals, but not this.  One of the parts of cexpf()
was incorrectly rounded to the smallest denormal, when the correctly rounded
result of 0 is given by (float)(part(cexp()).  This gives a complication
scaling the error, and the scaling step was in fact omitted because there
is no natural scale for reference results of 0 and the code takes too many
shortcuts.

Denormals are prevented for the reduced y by the theoretical result
that representable FP values cannot lie very close to a nonzero multiple
of Pi/2.  IIRC, the minimum distance is ~2**-73 (LDBL_EPSILON / 2**10)
for ld80.  2**-73 is far from being denormal even in float precision,
and in float precision it is easy to test all cases and find the exact
minimum, and it is more like FLT_EPSILON than LDBL_EPSILON.  I should
rerun my tests to find this minimum.

My tests also determined the exact minimum for all multiples up to the
thresholds for "large" multiples in e_rem_pio2.c, except for ld128,
to verify that there are enough bits in the special approximations for
Pi/2 there, except for ld128.  The maximum multiples handled there are
2**28*Pi/2, except for ld128 they are 2**45*Pi/2.  2**45 is too many
to check exhaustively.  It is obvious that e_rem_pio2.c cannot produce
a denormal, since they only subtract a limited number of bits which
does not exceed the relevant |MIN_EXP|.  But the general case handled
in k_rem_pio2.c has to consider subtracting just a few more than
|MIN_EXP| bits.  Its comments are about the number of bits supported
are confusing:

% /*
%  * Table of constants for 2/pi, 396 Hex digits (476 decimal) of 2/pi

This is an old comment.  It only applies to the old table, before the
table was extended to support long doubles.  396 hex digits is 1584
bits.  This is significantly more that |MIN_EXP| = |DBL_MIN_EXP| = 1022.
I think it has a large safety margin.

%  * ...
%  * NB: This table must have at least (e0-3)/24 + jk terms.
%  *     For quad precision (e0 <= 16360, jk = 6), this is 686.

The result of "this" expression is actually 687.  Did someone expand
jk by 1 without expanding the 686?

%  */
% static const int32_t ipio2[] = {
% 0xA2F983, 0x6E4E44, 0x1529FC, 0x2757D1, 0xF534DD, 0xC0DB62, 
% ...

The old table ends here and still has 396 hex digits, formatted in 396/6
= 66 24-bit values with 6 values per line in 11 lines.

% #if LDBL_MAX_EXP > 1024
% #if LDBL_MAX_EXP > 16384
% #error "ipio2 table needs to be expanded"
% #endif
% 0x47C419, 0xC367CD, 0xDCE809, 0x2A8359, 0xC4768B, 0x961CA6,
% ...
% 0x5B2746, 0xED3400, 0x7700D2, 0x55F4FC, 0x4D5901, 0x8071E0,
% #endif
% 
% };

To find the current number of digits, you have to count the number of
entries in the full table.  It is 115 lines with 6 values per line, =
690 24-bit values (this is 686 or 687 rounded up a bit) = 16560 bits.
This 16560 exceeds |LDBL_MIN_EXP| = 16382 by only a few.  If the 686
is correct, then the number of extra bits is only (686 * 24 = 16484) -
16382 = 82.  Now the table size is dominated by LDBL_MAX_EXP (it is
really LDBL_MAX_EXP and not LDBL_MIN_EXP that is the main parameter,
since LDBL_MAX_EXP is what tells us how man bits the values need to
be shifted (max shift count ~= log2(LDBL_MAX / (Pi/2)) = LDBL_MAX_EXP +-
<a couple>.  Then we need just LDBL_MANT_DIG + ~73 more bits
(LDBL_MANT_DIG bits for the result and ~73 more for the worst case of
cancelations).  64 + 73 = 137 exceeds 82, so I don't see how the
documented magic 686, or even the 687 given by the formula can be correct,
but the 690 in the actual table is:

     686   * 24 - 16382 = 82
     687   * 24 - 16382 = 106  (687   is from the formula, rounded down to int)
     687.5 * 24 - 16382 = 118  (687.5 is from the formula, not rounded so much)
     687   * 24 - 16382 = 130  (688   is from the formula, rounded up to int)
     690   * 24 - 16382 = 178

Another bug in trig functions turned up: on i386, at least without
-msse, clang now inlines sinf and cosf to use the slow and inaccurate
i387 functions, and it does this without even doing the necessary arg
reduction above 2**64 (* Pi/2 ?).  FreeBSD libm had exactly the same
bugs until the asm versions of float trig functions were removed in
2005.  FreeBSD libm still has all these bugs except for the missing
reduction from using asm versions of the double trig functions.

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20120914143553.X870>