Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 13 Sep 2012 22:55:00 +1000 (EST)
From:      Bruce Evans <brde@optusnet.com.au>
To:        Stephen Montgomery-Smith <stephen@missouri.edu>
Cc:        freebsd-numerics@FreeBSD.org
Subject:   Re: Complex arg-trig functions
Message-ID:  <20120913204808.T1964@besplex.bde.org>
In-Reply-To: <50511B40.3070009@missouri.edu>
References:  <5017111E.6060003@missouri.edu> <501D51D7.1020101@missouri.edu> <20120805030609.R3101@besplex.bde.org> <501D9C36.2040207@missouri.edu> <20120805175106.X3574@besplex.bde.org> <501EC015.3000808@missouri.edu> <20120805191954.GA50379@troutmask.apl.washington.edu> <20120807205725.GA10572@server.rulingia.com> <20120809025220.N4114@besplex.bde.org> <5027F07E.9060409@missouri.edu> <20120814003614.H3692@besplex.bde.org> <50295F5C.6010800@missouri.edu> <20120814072946.S5260@besplex.bde.org> <50297CA5.5010900@missouri.edu> <50297E43.7090309@missouri.edu> <20120814201105.T934@besplex.bde.org> <502A780B.2010106@missouri.edu> <20120815223631.N1751@besplex.bde.org> <502C0CF8.8040003@missouri.edu> <20120906221028.O1542@besplex.bde.org> <5048D00B.8010401@missouri.edu> <504D3CCD.2050006@missouri.edu> <504FF726.9060001@missouri.edu> <20120912191556.F1078@besplex.bde.org> <20120912225847.J1771@besplex.bde.org> <50511B40.3070009@missouri.edu>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, 12 Sep 2012, Stephen Montgomery-Smith wrote:

> OK, I believe that I have implemented all your suggested changes - but with 
> style changes that I prefer.
>
> Test it now, and see if it works.

Yes, it mostly works now (most of my last round of NaN fixes were
consistently wrong, since I applied the spec for cacosh() to cacos()...).

> As for formatting problems caused by the perl scripts, I would anticipate 
> that by the time it gets committed, that people would no longer use the perl 
> scripts.  But it is a realyl convenient way to make big changes to catrig.c, 
> and then copy these same changes to catrigf.c and catrigl.c.

Yes, but I find them not so convenient for small changes.  I didn't
actually use them, but edited all versions manually, with somehwhat
different changes in each.

> As far as I can see, the only formatting mess ups are multiple double spaced 
> blank lines.  I think I have got rid of them now.

There are still single double spaced blank lines.

The following set of changes starts fixing long doubles for i386, and has
1 fix and many cosmetic changes related to the previous round of fixes.

% diff -u2 catrig.c~ catrig.c
% --- catrig.c~	2012-09-13 01:34:25.000000000 +0000
% +++ catrig.c	2012-09-13 10:40:46.760362000 +0000
% @@ -30,19 +30,18 @@
%  			(bx).parts.lsw) == 0)
% 
% -static const volatile double
% -tiny =			0x1p-2000;
% -

This was broken for doubles (it underflowed to 0).

%  /* We need that DBL_EPSILON^2 is larger than FOUR_SQRT_MIN. */
%  static const double
% +A_crossover =		10, /* Hull et al suggest 1.5, but 10 works better */
% +B_crossover =		0.6417,		/* suggested by Hull et al */

Sort the declarations.  Don't use type suffixes if possible (for 0.6417 here).

%  FOUR_SQRT_MIN =		0x1p-500,	/* less than 4 * sqrt(DBL_MIN) */
%  FOURTH_SQRT_MAX =	0x1p500,	/* 1 / FOUR_SQRT_MIN */

Oops, this patch is missing my change of these to match the comments.
0x1p-500 is very far from matching the comment which says that
FOUR_SQRT_MIN is _less_ than 4 * sqrt(DBL_MIN).  DBL_MIN_EXP is -1021,
so DBL_MIN is about 2**-1022 and 4 * sqrt(DBL_MIN) is about 4 * 2**-511
= 2**-509.  Your FOUR_SQRT_MIN is about 512 times larger than that.

Changing the 500's to 509's made no difference in my test, but the
corresponding change for float precision (restore my 61's from your 60's)
gave a small improvevement.

%  half =			0.5,

I thought that we were going to change all the multiplications by 0.5 by
divisions by 2.  Then we would not need this.

% -quarter =		0.25,

Similarly for this.

Similarly for inverses.  You already use lots of 1/DBL_EPSILONs, and
this is efficient soince it can be evaluated at compile time even by
non-broken compilers, since DBL_EPSILON is a power of 2.  Now that
FOUR_SQRT_MIN is a power of 2, its inverse FOURTH_SQRT_MAX can also
be spelled as an inverse.  Except it is not actually an inverse like
its comment claims. DBL_MAX is not actually the inverse of DBL_MIN;
it is just that within one or 2 powers of 2 in supported formats, and
maybe we should worry about other formats in which it is more different
-- the different name for FOURTH_SQRT_MAX supports that, and the comment
shouldn't say that it is the inverse of FOUR_SQRT_MIN.

% -A_crossover =		10, /* Hull et al suggest 1.5, but 10 works better. */
% -B_crossover =		0.6417L, /* suggested by Hull et al. */
% -m_pi_2 =		1.5707963267948966192313216916397514420985846996876L,
% -m_e =			2.7182818284590452353602874713526624977572470937000L,
% -m_ln2 =			0.69314718055994530941723212145817656807550013436026L;
% +m_e =			2.7182818284590452e0,	/*  0x15bf0a8b145769.0p-51 */
% +m_ln2 =			6.9314718055994531e-1,	/*  0x162e42fefa39ef.0p-53 */
% +m_pi_2 =		1.5707963267948966e0,	/*  0x1921fb54442d18.0p-52 */

I prefer my style for the declarations (except possibly for the
whitespace around '='.  I have formatting routines which print the
constants in decimal and hex using the correct number of digits and
formats everything including initializers and comments fairly well.
Automatic generation is essential for larger tables, but I don't have
it set up for these particular constants, and just manually formatted
the above after printing just the constants using my pari formatting
routines.

% +quarter =		0.25;

Sort this.

% 
% +static const volatile double
% +tiny =			0x1p-1000;

Sort this.

% 
%  #include <complex.h>
% @@ -280,8 +279,8 @@
% 
%  	if (ISNAN(x) || ISNAN(y)) {
% -		/* casinh(inf + I*NaN) = inf + I*NaN */
% +		/* casinh(+-Inf + I*NaN) = +-Inf + I*NaN */

I don't like spelling "infinity" as "inf", and started fixing this.

I prefer my comments giving details on the signs of the Infs.  The sign
rules follow from odd/even and/or conjugacy rules, but I find these
confusing in the special cases for Infs and NaNs.

I refrained from changing this to give the details of the transformation
of the NaN are.  Here they are that the result NaN is the default result
of operating on the (single) arg NaN.  In s_ccosh.c, the result NaN for
this is written as d(NaN), where NaN is implicitly the arg NaN.

%  		if (ISINF(bx))
%  			return (cpack(x, y+y));
% -		/* casinh(NaN + I*inf) = inf + I*NaN */
% +		/* casinh(NaN + I*+-Inf) = +-Inf + I*NaN (sign optional) */
%  		if (ISINF(by))
%  			return (cpack(y, x+x));

Now the handling of the sign of the infinity is subtly different.  The
comment says that we can choose any sign, but not the choice.  The code
chooses the same sign as the arg.  Now I don't see why C99 allows either
sign.  C99 also requires casinh() to be odd, so the sign should toggle
with the sign of y -- although the oddness rule doesn't applys to NaNs,
this case specifically ignores the whole NaN including its sign when
creating the infinity for the real part.

% @@ -360,10 +359,10 @@
% 
%  	if (ISNAN(x) || ISNAN(y)) {
% -		/* cacos(inf + I*NaN) = NaN + I*inf */
% -		/* cacos(NaN + I*inf) = NaN + I*inf */
% +		/* cacos(+-Inf + I*NaN) = NaN + I*-+Inf (sign optional) */
%  		if (ISINF(bx))
% -			return (cpack(x+y, -x));
% +			return (cpackf(y+y, -INFINITY));

Use the logically correct expression for the NaN part of the result, like
we do elsewhere.

Now the handling of the sign of the infinity is not so subtly different.
Now there is only a congugacy rule, not an oddness rule.  We are in the
case of a NaN y.  Conjugating a NaN doesn't really change it (it will
in fact change its sign depending on whether it is implemented as -y
or 0-y).  So always return infinity with a fixed sign (-Inf).  If it
is to depend on the sign of an arg, then it should be on the sign of
the NaN y, since for non-NaN y, the sign of x has no effect on the
sign of the resulting imaginary part.

% +		/* cacos(NaN + I*+-Inf) = NaN + I*-+Inf */
%  		if (ISINF(by))
% -			return (cpack(x+y, -y));
% +			return (cpackf(x+x, -y));

Use the logically correct expression for the NaN part of the result, as
above.

Comment changes as above.

%  		/* cacos(0 + I*NaN) = PI/2 + I*NaN */
%  		if (x == 0) return (cpack(m_pi_2, y+y));
% @@ -416,11 +415,20 @@
%  cacosh(double complex z)
%  {
% -	double complex w = cacos(z);
% -	double rx = creal(w);
% -	double ry = cimag(w);
% +	double complex w;
% +	double rx;
% +	double ry;
% +
% +	w = cacos(z);
% +	rx = creal(w);
% +	ry = cimag(w);
% +

Start fixing the largest style bugs.

% +	/* cacosh(NaN + I*NaN) = NaN + I*NaN */
%  	if (ISNAN(rx) && ISNAN(ry))
%  		return (cpack(ry, rx));
% +	/* cacosh(NaN + I*+-Inf) = +Inf + I*NaN */
% +	/* cacosh(+-Inf + I*NaN) = +Inf + I*NaN */
%  	if (ISNAN(rx))
%  		return (cpack(fabs(ry), rx));
% +	/* cacosh(0 + I*NaN) = PI/2 + I*NaN */
%  	if (ISNAN(ry))
%  		return (cpack(ry, copysign(rx, cimag(z))));

The sign rules are especially confusing here, since we look at NaNs in
the result.  I forgot to expand these comments further, by saying exactly
which combinations of NaNs with finite rx and ry occur.  I would prefer
to classify all the special cases from the args.

% @@ -429,6 +437,5 @@
% 
%  /*
% - * This clog is a stop-gap function.  It is designed to work well only if
% - * |z| is larger than 1/DBL_EPSILON.
% + * Optimized version of clog() for |z| finite and larger than ~1/DBL_EPSILON.
%   */
%  static double complex

Start cleaning this up.  Mainly just change its comments to match its code
for now.

% @@ -440,5 +447,4 @@
%  	x = creal(z);
%  	y = cimag(z);
% -
%  	ax = fabs(x);
%  	ay = fabs(y);
% @@ -450,9 +456,10 @@
% 
%  	/*
% -	 * To avoid unnecessary overflow, if x or y are very large, divide x
% +	 * Avoid overflow in hypot() when x and y are both very large.
% +	 * Divide x
%  	 * and y by E, and then add 1 to the logarithm.  This depends on
%  	 * E being larger than sqrt(2).
% -	 * There is a potential loss of accuracy caused by dividing by E,
% -	 * but this case should happen extremely rarely.
% +	 * Dividing by E causes an insignificant loss of accuracy; however
% +	 * this method is still poor since it is uneccessarily slow.
%  	 */
%  	if (ax > half*DBL_MAX)

Multiplying by 1/E would be much better, but I want to get rid of the
m_e constant.  I tried dividing by 2 and adding m_ln2, but this lost
accuracy unnecessarily.  A more complete conversion to the methods in
clog() is needed.

% @@ -460,8 +467,8 @@
% 
%  	/*
% -	 * Because atan2 and hypot conform to C99, this also covers all the
% -	 * edge cases when x or y are 0 or infinite.
% +	 * Avoid overflow when x or y is large.  Avoid underflow when x or
% +	 * y is small.
%  	 */
% -	if (ax < FOUR_SQRT_MIN || ay < FOUR_SQRT_MIN || ax > FOURTH_SQRT_MAX || ay > FOURTH_SQRT_MAX)
% +	if (ax > FOURTH_SQRT_MAX || ay < FOUR_SQRT_MIN)
%  		return (cpack(log(hypot(x, y)), atan2(y, x)));
%

Minor optimization.

% @@ -480,5 +487,5 @@
% 
%  /*
% - * sum_squares(x,y) = x*x + y*y.
% + * sum_squares(x,y) = x*x + y*y (or just x*x if y*y would underflow).
%   * Assumes x*x and y*y will not overflow.
%   * Assumes x and y are finite.
% @@ -488,12 +495,9 @@
%  sum_squares(double x, double y)
%  {
% -	/*
% -	 * The following code seems to do better than
% -	 * return (hypot(x, y) * hypot(x, y));
% -	 */
% -

Of course we can do better than using hypot().

% +	/* Avoid underflow when y is small. */
%  	if (fabs(y) < FOUR_SQRT_MIN)

The correct cutoff for this is now (more) obviously sqrt(DBL_MIN).
Similarly in clog_for_large_values().  The sloppy cutoffs are just
minor pessimizations.

%  		if ((int)y==0) /* raise inexact */
%  			return (x*x);
% +
%  	return (x*x + y*y);
%  }
% diff -u2 catrigf.c~ catrigf.c
% --- catrigf.c~	2012-09-13 01:34:33.000000000 +0000
% +++ catrigf.c	2012-09-13 10:30:41.284444000 +0000
% @@ -35,17 +35,17 @@
%   */
% 
% -static const volatile float
% -tiny =			0x1p-100;
% -
%  static const float
% -FOUR_SQRT_MIN =		0x1p-60,
% -FOURTH_SQRT_MAX =	0x1p60,
% -half =			0.5,
% -quarter =		0.25,
%  A_crossover =		10,
% -B_crossover =		0.6417L,
% -m_pi_2 =		1.5707963267948966192313216916397514420985846996876L,
% -m_e =			2.7182818284590452353602874713526624977572470937000L,
% -m_ln2 =			0.69314718055994530941723212145817656807550013436026L;
% +B_crossover =		0.6417,
% +FOUR_SQRT_MIN =		0x1p-61,
% +FOURTH_SQRT_MAX =	0x1p61,
% +half =			0.5,
% +m_e =			2.7182818285e0,	/*  0xadf854.0p-22 */
% +m_ln2 =			6.9314718056e-1,	/*  0xb17218.0p-24 */
% +m_pi_2 =		1.5707963268e0,	/*  0xc90fdb.0p-23 */
% +quarter =		0.25;
% +
% +static const volatile float
% +tiny =			0x1p-100;
% 
%  #include <complex.h>
% @@ -196,7 +196,7 @@
% 
%  		if (isinf(x))
% -			return (cpackf(x+y, -x));
% +			return (cpackf(y+y, -INFINITY));
%  		if (isinf(y))
% -			return (cpackf(x+y, -y));
% +			return (cpackf(x+x, -y));
% 
%  		if (x == 0) return (cpackf(m_pi_2, y+y));
% @@ -240,7 +240,11 @@
%  cacoshf(float complex z)
%  {
% -	float complex w = cacosf(z);
% -	float rx = crealf(w);
% -	float ry = cimagf(w);
% +	float complex w;
% +	float rx, ry;
% +
% +	w = cacosf(z);
% +	rx = crealf(w);
% +	ry = cimagf(w);
% +
%  	if (isnan(rx) && isnan(ry))
%  		return (cpackf(ry, rx));
% @@ -260,5 +264,4 @@
%  	x = crealf(z);
%  	y = cimagf(z);
% -
%  	ax = fabsf(x);
%  	ay = fabsf(y);

Changes as in the double version.  I didn't do them all here.

% diff -u2 catrigl.c~ catrigl.c
% --- catrigl.c~	2012-09-13 01:34:30.000000000 +0000
% +++ catrigl.c	2012-09-13 09:54:32.886453000 +0000
% @@ -35,22 +35,38 @@
%   */
% 
% -static const volatile long double
% -tiny =			0x1p-16000L;
% +#include <complex.h>
% +#include <float.h>
% +
% +#include "fpmath.h"
% +#include "math.h"
% +#include "math_private.h"

The #includes have to be first so that LD80C() can be used in the
constant table.  We could also use M_E, etc., in the constant table
for other precisions after moving the table after the includes.

% 
%  static const long double
% -FOUR_SQRT_MIN =		0x1p-8000L,
% -FOURTH_SQRT_MAX =	0x1p8000L,

Change these to match the comment.  Made no obvious difference.

% -half =			0.5,
% -quarter =		0.25,
%  A_crossover =		10,
% -B_crossover =		0.6417L,

Sort things.  Some should be named differently so that they sort better
or aren't so verbose.  E.g., the m_ prefixes are vestiges of the M_
prefixes, which were for math.h but are not needed here and make some
of the names more prefixed than others.  fdlibm mostly spells Pi/2 as
pio2 or pi_o_2, but we spell it as m_pi_2 here.

% -m_pi_2 =		1.5707963267948966192313216916397514420985846996876L,
% -m_e =			2.7182818284590452353602874713526624977572470937000L,
% -m_ln2 =			0.69314718055994530941723212145817656807550013436026L;
% +B_crossover =		0.6417,
% +FOUR_SQRT_MIN =		0x1p-8189L,
% +FOURTH_SQRT_MAX =	0x1p8189L,

"FOURTH SQUARE [ROOT]" is confusing (it sounds like a 4th or 8th root,
or maybe a 4th power of a squre root).  Do you want to keep this name
matching a name in the paper?  If not, "quarter_sqrt_max" might be
better.  Or use the pio2 naming scheme and name it sqrt_max_o4.  "FOUR"
is a verbose spelling of "4", and is used mainly since identifiers
can't start with "4".

% +half =			0.5,
% +quarter =		0.25;
% 
% -#include <complex.h>
% -#include <float.h>
% -#include <math.h>
% -#include "math_private.h"
% +#if LDBL_MANT_DIG == 64
% +static const union IEEEl2bits
% +um_e =		LD80C(0xadf85458a2bb4a9b,  1, 0, 2.71828182845904523536e0L),
% +um_ln2 =	LD80C(0xb17217f7d1cf79ac, -1, 0, 6.93147180559945309417e-1L),
% +um_pi_2 =	LD80C(0xc90fdaa22168c235,  0, 0, 1.57079632679489661923e0L);
% +#define	m_e	um_e.e
% +#define	m_ln2	um_ln2.e
% +#define	m_pi_2	um_pi_2.e
% +#elif LDBL_MANT_DIG == 113
% +static const long double
% +m_e =			2.71828182845904523536028747135266250e0L,	/* 0x15bf0a8b1457695355fb8ac404e7a.0p-111 */
% +m_ln2 =			6.93147180559945309417232121458176568e-1L,	/* 0x162e42fefa39ef35793c7673007e6.0p-113 */
% +m_pi_2 =		1.57079632679489661923132169163975144e0L;	/* 0x1921fb54442d18469898cc51701b8.0p-112 */
% +#else
% +#error "Unsupported long double format"
% +#endif

Long double constants are painful to declare correctly.  First, they can
be any length.  Second, on i386 gcc truncates them to double precision
(but long double exponent range).  The LD80C macro hides most of the
details of working around the latter.  After that, it gives only minor
messes to ifdef declare the hex values in comments and to ifdef for
the unsupported case.

% +
% +static const volatile long double
% +tiny =			0x1p-10000L;
% 
%  static long double complex clog_for_large_values(long double complex z);
% @@ -196,7 +212,7 @@
% 
%  		if (isinf(x))
% -			return (cpackl(x+y, -x));
% +			return (cpackl(y+y, -INFINITY));
%  		if (isinf(y))
% -			return (cpackl(x+y, -y));
% +			return (cpackl(x+x, -y));
% 
%  		if (x == 0) return (cpackl(m_pi_2, y+y));
% @@ -240,7 +256,12 @@
%  cacoshl(long double complex z)
%  {
% -	long double complex w = cacosl(z);
% -	long double rx = creall(w);
% -	long double ry = cimagl(w);
% +	long double complex w;
% +	long double rx;
% +	long double ry;
% +
% +	w = cacosl(z);
% +	rx = creall(w);
% +	ry = cimagl(w);
% +
%  	if (isnan(rx) && isnan(ry))
%  		return (cpackl(ry, rx));

Other changes as in the double version.  I didn't do them all here.

Error table for complex functions, with more tests than before (takes
a day or two to run on 1 CPU):

% amd64 float prec, on 2**16 x 2**16 args:
% rcacos:max_er = 0x70f82558 3.5303, avg_er = 0.327, #>=1:0.5 = 12113072:86215320
% rcacosh:max_er = 0x642421ce 3.1294, avg_er = 0.262, #>=1:0.5 = 4492356:851070980
% rcasin:max_er = 0x6aaaeb38 3.3334, avg_er = 0.225, #>=1:0.5 = 13545196:105163176
% rcasinh:max_er = 0x642421ce 3.1294, avg_er = 0.262, #>=1:0.5 = 4492356:851070980
% rcatan:max_er = 0x65573b1a 3.1669, avg_er = 0.291, #>=1:0.5 = 11857524:92253376
% rcatanh:max_er = 0x5994bb70 2.7994, avg_er = 0.188, #>=1:0.5 = 42972472:357234756
% rclog: max_er = 0x305820c5 1.5108, avg_er = 0.247, #>=1:0.5 = 106660:32521284
% rcsqrt:max_er = 0x1fffffd0 1.0000, avg_er = 0.239, #>=1:0.5 = 0:27082
% rccos: max_er = 0x55b98673 2.6789, avg_er = 0.090, #>=1:0.5 = 15162940:82894084
% rccosh:max_er = 0x55b98673 2.6789, avg_er = 0.090, #>=1:0.5 = 15162940:82894084
% rcexp: max_er = 0x1658ca3e2ce252 11716177.9430, avg_er = 0.197, #>=1:0.5 = 15750254:115895992

Everything except this works fairly well.  cexp() passed tests on only
2**12 x 2**12 args.

% rcsin: max_er = 0x56838440 2.7036, avg_er = 0.095, #>=1:0.5 = 21624908:122655056
% rcsinh:max_er = 0x52a0d11e 2.5821, avg_er = 0.105, #>=1:0.5 = 14563920:304694328
% rctan: max_er = 0xc1c16ec2 6.0549, avg_er = 0.105, #>=1:0.5 = 38983948:270561304
% rctanh:max_er = 0xc89c698a 6.2691, avg_er = 0.165, #>=1:0.5 = 176171768:584759008

ctan() isn't too good.  Its max error expanded from ~5 ulps to ~6 ulps with
more tests.

% icacos:max_er = 0x642421ce 3.1294, avg_er = 0.262, #>=1:0.5 = 4492356:851070980
% icacosh:max_er = 0x70f82558 3.5303, avg_er = 0.327, #>=1:0.5 = 12113072:86215320
% icasin:max_er = 0x642421ce 3.1294, avg_er = 0.262, #>=1:0.5 = 4492356:851070980
% icasinh:max_er = 0x6aaaeb38 3.3334, avg_er = 0.225, #>=1:0.5 = 13545196:105163176
% icatan:max_er = 0x5994bb70 2.7994, avg_er = 0.188, #>=1:0.5 = 42972472:357234756
% icatanh:max_er = 0x65573b1a 3.1669, avg_er = 0.291, #>=1:0.5 = 11857524:92253376
% iclog: max_er = 0x2fbaebea 1.4916, avg_er = 0.313, #>=1:0.5 = 103490:88996762
% icsqrt:max_er = 0x1fffffd0 1.0000, avg_er = 0.239, #>=1:0.5 = 0:27082
% iccos: max_er = 0x524fa191 2.5722, avg_er = 0.144, #>=1:0.5 = 18085136:346402008
% iccosh:max_er = 0x524fa191 2.5722, avg_er = 0.144, #>=1:0.5 = 18085136:346402008
% icsin: max_er = 0x52a0d11e 2.5821, avg_er = 0.105, #>=1:0.5 = 14563920:304694328
% icsinh:max_er = 0x56838440 2.7036, avg_er = 0.095, #>=1:0.5 = 21624908:122655056
% ictan: max_er = 0x9c7f786c 4.8906, avg_er = 0.159, #>=1:0.5 = 635992:2189748
% ictanh:max_er = 0x9708bb4b 4.7198, avg_er = 0.104, #>=1:0.5 = 177976:980356
% 
% amd64 double prec, on 2**16 x 2**16 args:
% not done yet
% 
% i386 float prec, on 2**16 x 2**16 args:
% rcacos:max_er = 0x50899c13 2.5168, avg_er = 0.323, #>=1:0.5 = 1109224:63935432
% rcacosh:max_er = 0x38676cc7 1.7626, avg_er = 0.256, #>=1:0.5 = 888252:812489596
% rcasin:max_er = 0x5ad0db95 2.8380, avg_er = 0.223, #>=1:0.5 = 8301060:96238644
% rcasinh:max_er = 0x38676cc7 1.7626, avg_er = 0.256, #>=1:0.5 = 888252:812489596
% rcatan:max_er = 0x3dfcde4c 1.9371, avg_er = 0.287, #>=1:0.5 = 7573764:74078480
% rcatanh:max_er = 0x2f90804a 1.4864, avg_er = 0.155, #>=1:0.5 = 493896:87730140
% rclog: max_er = 0x2f626046 1.4808, avg_er = 0.247, #>=1:0.5 = 69520:7720848
% rcsqrt:max_er = 0x1fffffd0 1.0000, avg_er = 0.239, #>=1:0.5 = 0:27082
% rccos: max_er = 0x56fb2dd4 2.7182, avg_er = 0.089, #>=1:0.5 = 14830032:79662628
% rccosh:max_er = 0x56fb2dd4 2.7182, avg_er = 0.089, #>=1:0.5 = 14830032:79662628
% rcexp: max_er = 0x1658ca3e2ce252 11716177.9430, avg_er = 0.196, #>=1:0.5 = 15191710:107310178
% rcsin: max_er = 0x560146c2 2.6877, avg_er = 0.095, #>=1:0.5 = 21025048:121830712
% rcsinh:max_er = 0x52a0d11e 2.5821, avg_er = 0.105, #>=1:0.5 = 14018592:299616036
% rctan: max_er = 0xa0945cb2 5.0181, avg_er = 0.095, #>=1:0.5 = 18625048:215669808
% rctanh:max_er = 0x86db180b 4.2142, avg_er = 0.142, #>=1:0.5 = 112980788:493723688
% icacos:max_er = 0x38676cc7 1.7626, avg_er = 0.256, #>=1:0.5 = 888252:812489596
% icacosh:max_er = 0x50899c13 2.5168, avg_er = 0.323, #>=1:0.5 = 1109224:63935432
% not redone recently past here:
% icasin:max_er = 0x40a05dd0 2.0196, avg_er = 0.261, #>=1:0.5 = 5281044:852735716
% icasinh:max_er = 0x67e2ed1b 3.2465, avg_er = 0.224, #>=1:0.5 = 8821392:96082548
% icatan:max_er = 0x2f1b30d9 1.4721, avg_er = 0.150, #>=1:0.5 = 473412:98823608
% icatanh:max_er = 0x3e698bd3 1.9504, avg_er = 0.284, #>=1:0.5 = 2928868:53999712
% iclog: max_er = 0x2f3ab89e 1.4759, avg_er = 0.314, #>=1:0.5 = 85208:86449406
% icsqrt:max_er =  0xfffffd1 0.5000, avg_er = 0.243, #>=1:0.5 = 0:0
% iccos: max_er = 0xb8a89cc9da6f8ecf 24784463438.8261, avg_er = 4.375, #>=1:0.5 = 985914352:1145676100
% iccosh:max_er = 0xb8a89cc9da6f8ecf 24784463438.8261, avg_er = 4.375, #>=1:0.5 = 985914352:1145676100
% icexp: max_er = 0xb8a3a12532b83c66 24781850921.5850, avg_er = 3.275, #>=1:0.5 = 989776068:1097527556
% icsin: max_er = 0xb89efbb1b6eff235 24779414925.7168, avg_er = 0.772, #>=1:0.5 = 984470420:1108121412
% icsinh:max_er = 0xb8a89cc9da6f8ecf 24784463438.8261, avg_er = 3.803, #>=1:0.5 = 990393948:1068699356
% ictan: max_er = 0x1a19443fc048dbb 218931743.8756, avg_er = 4.826, #>=1:0.5 = 748442224:916580252
% ictanh:max_er = 0xb68ffe828cb533f6 24503120916.3971, avg_er = 5.660, #>=1:0.5 = 1002055892:1086633900

All the old tests use the -current, broken i387 cos, sin and tan.  I had
to kill these to get tests to pass, and did this earlier in most previously
reported results.

% 
% i386 double prec, on 2**16 x 2**16 args:
% rcacos:max_er =     0x12df 2.3589, avg_er = 0.192, #>=1:0.5 = 170980:25565816
% rcacosh:max_er =      0xd29 1.6450, avg_er = 0.257, #>=1:0.5 = 35456:693380268
% rcasin:max_er =     0x1751 2.9146, avg_er = 0.166, #>=1:0.5 = 1964988:25241352
% rcasinh:max_er =      0xd29 1.6450, avg_er = 0.257, #>=1:0.5 = 35456:693380268
% rcatan:max_er =      0xd8b 1.6929, avg_er = 0.015, #>=1:0.5 = 1105980:15127868
% rcatanh:max_er =      0xb56 1.4170, avg_er = 0.118, #>=1:0.5 = 65700:22533940
% rclog: max_er =      0xa52 1.2900, avg_er = 0.250, #>=1:0.5 = 776:6095880
% rcsqrt:max_er =      0x7ff 0.9995, avg_er = 0.250, #>=1:0.5 = 18:322256
% icacos:max_er =      0xd29 1.6450, avg_er = 0.257, #>=1:0.5 = 35456:693380268
% icacosh:max_er =     0x12df 2.3589, avg_er = 0.192, #>=1:0.5 = 170980:25565816
% icasin:max_er =      0xd29 1.6450, avg_er = 0.257, #>=1:0.5 = 35456:693380268
% icasinh:max_er =     0x1751 2.9146, avg_er = 0.166, #>=1:0.5 = 1964988:25241352
% icatan:max_er =      0xb56 1.4170, avg_er = 0.118, #>=1:0.5 = 65700:22533940
% icatanh:max_er =      0xd8b 1.6929, avg_er = 0.015, #>=1:0.5 = 1105980:15127868
% iclog: max_er =      0x981 1.1880, avg_er = 0.251, #>=1:0.5 = 28630:39211728
% icsqrt:max_er =      0x400 0.5000, avg_er = 0.250, #>=1:0.5 = 0:304172

Test coverage is not really adequate to find results in double precision
-- it tests little more than values of the form 2**m + I * 2**n for
integer values m and n.  But it tests all of these values, so is almost
sure to find problems near 0 and infinity, and with NaNs.

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20120913204808.T1964>