From owner-freebsd-numerics@FreeBSD.ORG  Sat Sep 22 20:09:13 2012
Return-Path: <owner-freebsd-numerics@FreeBSD.ORG>
Delivered-To: freebsd-numerics@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 2538E106564A
	for <freebsd-numerics@FreeBSD.org>;
	Sat, 22 Sep 2012 20:09:13 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from mail08.syd.optusnet.com.au (mail08.syd.optusnet.com.au
	[211.29.132.189])
	by mx1.freebsd.org (Postfix) with ESMTP id 802C08FC0A
	for <freebsd-numerics@FreeBSD.org>;
	Sat, 22 Sep 2012 20:09:11 +0000 (UTC)
Received: from c122-106-157-84.carlnfd1.nsw.optusnet.com.au
	(c122-106-157-84.carlnfd1.nsw.optusnet.com.au [122.106.157.84])
	by mail08.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
	q8MK92hu011974
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Sun, 23 Sep 2012 06:09:04 +1000
Date: Sun, 23 Sep 2012 06:09:02 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Bruce Evans <brde@optusnet.com.au>
In-Reply-To: <20120922142349.X4599@besplex.bde.org>
Message-ID: <20120923044814.S1465@besplex.bde.org>
References: <5017111E.6060003@missouri.edu>
	<20120916041132.D6344@besplex.bde.org>
	<50553424.2080902@missouri.edu> <20120916134730.Y957@besplex.bde.org>
	<5055ECA8.2080008@missouri.edu> <20120917022614.R2943@besplex.bde.org>
	<50562213.9020400@missouri.edu> <20120917060116.G3825@besplex.bde.org>
	<50563C57.60806@missouri.edu> <20120918012459.V5094@besplex.bde.org>
	<5057A932.3000603@missouri.edu> <5057F24B.7020605@missouri.edu>
	<20120918162105.U991@besplex.bde.org>
	<20120918232850.N2144@besplex.bde.org>
	<20120919010613.T2493@besplex.bde.org> <505BD9B4.8020801@missouri.edu>
	<20120921172402.W945@besplex.bde.org>
	<20120921212525.W1732@besplex.bde.org>
	<505C7490.90600@missouri.edu> <20120922042112.E3044@besplex.bde.org>
	<505CBF14.70908@missouri.edu> <505CC11A.5030502@missouri.edu>
	<20120922081607.F3613@besplex.bde.org> <20120922091625.Y3828@besplex.b!
	de.org> <505D1037.8010202@missouri.edu>
	<20120922142349.X4599@besplex.bde.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: Stephen Montgomery-Smith <stephen@missouri.edu>,
	freebsd-numerics@FreeBSD.org
Subject: Re: Complex arg-trig functions
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
	<freebsd-numerics.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
	<mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
	<mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 22 Sep 2012 20:09:13 -0000

On Sat, 22 Sep 2012, Bruce Evans wrote:

> On Fri, 21 Sep 2012, Stephen Montgomery-Smith wrote:
>
>> On 09/21/2012 06:18 PM, Bruce Evans wrote:

>>> ...
>>> The attachment was larger than intended.  It had my complete patch set
>>> for catrig*.c.
>> 
>> Will there be another complete patch set tomorrow, or did you just send it 
>> today?
>
> I sent it all and won't change much more for a while.  I might describe it
> more tomorrow.  Already made a small change: always use float for `tiny'
> (it is now only used in raise_inexact), and in raise_inexact assign
> (1 + tiny) to volatile float instead of volatile int.

Just 1 detail in the old patch needs more description.  First a new
patch to finish merging recent changes:

% diff -u2 catrig.c~ catrig.c
% --- catrig.c~	2012-09-22 04:49:51.000000000 +0000
% +++ catrig.c	2012-09-22 18:41:34.779454000 +0000
% @@ -35,5 +35,5 @@
%  #undef isnan
%  #define isnan(x)	((x) != (x))
% -#define	raise_inexact()	do { volatile int junk = 1 + tiny; } while(0)
% +#define	raise_inexact()	do { volatile float junk = 1 + tiny; } while(0)
%  #undef signbit
%  #define signbit(x)	(__builtin_signbit(x))

No reason to convert it to int.  (Not quite similarly for the (int)(1 + tiny).
(float)(1 + tiny) == 1 would have failed due to compiler bugfeatures unless
tiny is double_t or larger, since the bugfeatures elide the cast.  There
would have to have been an assignment to a volatile FP variable, directly
as here or via STRICT_ASSIGN (whose purpose is to avoid going through the
volatile variable when this is unnecessary).  The cast to int is really
needed when we have a small x and want to set inexact iff x != 0.  Perhaps
'if (x != 0) raise_inexact();' is a more efficient way to do that too, as
well as being unobfuscated.)

% @@ -48,12 +48,12 @@
%  m_ln2 =			6.9314718055994531e-1,	/*  0x162e42fefa39ef.0p-53 */
%  /*
% - * We no longer use M_PI_2 or m_pi_2.  In float precision, rounding to
% + * We no longer use M_PI_2 or m_pi_2.  In some precisions (although not
% + * in double precision where this comment is attached), rounding to
%   * nearest of PI/2 happens to round up, but we want rounding down so
%   * that the expressions for approximating PI/2 and (PI/2 - z) work in all
% - * rounding modes.  This is not very important, but it is necessary for
% - * the same quality of implementation that fdlibm had in 1992 and that
% - * real functions mostly still have.  This is known to be broken only in
% - * ld80 acosl() via invtrig.c and in some invalid optimizations in code
% - * under development, and now in all functions in catrigl.c via invtrig.c.
% + * rounding modes.  This is not very important, but the real inverse trig
% + * functions always took great care to do it, and all inverse trig
% + * functions are close working right in all rounding modes for their
% + * other approximations (unlike the non-inverse ones).
%   */
%  pio2_hi =		1.5707963267948966e0,	/*  0x1921fb54442d18.0p-52 */

Tone down this comment a bit.  You might want to remove it.

% @@ -64,6 +64,7 @@
% 
%  static const volatile double
% -pio2_lo =		6.1232339957367659e-17,	/*  0x11a62633145c07.0p-106 */
% -tiny =			0x1p-1000;
% +pio2_lo =		6.1232339957367659e-17;	/*  0x11a62633145c07.0p-106 */
% +static const volatile float
% +tiny =			0x1p-100;
% 
%  static double complex clog_for_large_values(double complex z);

`tiny' is now always float.  It was just wasteful for it to be larger.

% @@ -550,5 +551,5 @@
%  	if (ix <= (BIAS + DBL_MAX_EXP / 2 - CUTOFF) << 20)
%  		return (x/(x*x + y*y));
% -	scale = 0;
% +	scale = 1;
%  	SET_HIGH_WORD(scale, 0x7ff00000 - ix);	/* 2**(1-ilogb(x)) */
%  	x *= scale;

scale = 0 makes no sense for doubles either.  For floats, the mantissa
is part of the high word, so no separate initialization is needed, and
none is used.  I broke the long double case by copying the float code
and not initializing the mantissa bits at all (scale = 0 would have
given pseudo-zero, but uninitialzed scale gives almost anything).

% @@ -618,12 +619,7 @@
%  	}
% 
% -	if (ax == 1 && ay < DBL_EPSILON) {
% -#if 0 /* this only improves accuracy in an already relative accurate case */
% -		if (ay > 2*DBL_MIN)
% -			rx = - log(ay/2) / 2;
% -		else
% -#endif
% -			rx = - (log(ay) - m_ln2) / 2;
% -	} else
% +	if (ax == 1 && ay < DBL_EPSILON)
% +		rx = - (log(ay) - m_ln2) / 2;
% +	else
%  		rx = log1p(4*ax / sum_squares(ax-1, ay)) / 4;
%

I think this can be removed.

I explained the details of this a week or 2 ago.  Here log(ay) is large
compared with m_ln2, so there is an extra error of less than half an
ulp for adding m_ln2.  The error for log(ay) is < 1 ulp, so the total
error is < 1.5 ulps (in practice, < 1.2 ulps).  Since other parts of
catanh() have errors of 2-3 ulps, we shouldn't care about going above
1.2 ulps here.

I now understand catanh() well enough to see how to make its errors < 1
ulp using not much more than clog() needs to do the same things:
- use an extra-precision log() and log1p()
- evaluate |z-1|**2 accurately (already done in clog()
- divide accurately by the accurate |z-1|**2.  I peeked at the Intel
   ia64 math library atanh() and it reminded me that Newton's method
   is good for extra-precision division, and that I already use this
   method in an unfinished naive implementation of gamma().

     (The Intel ia64 math library is insanely complicated, efficient,
     accurate and large.  It takes about 30K of asm code for each of
     atanhf(), atanh() and atanhl(), each with optimizations specialized
     for the precision including a specialized inline log1p).

     (The naive implementation of gamma() uses the functional
     equation to shit the arg to a large one so that the asymptotic
     formala is accurate.  This takes lots of divisions to convert
     the result for the shifted arg to the result for the unshifted
     arg, and each division must be very accurate for final result
     to be even moderately accurate.  Not a good method, since even
     1 non-extra-precision division is slow.  But I was interested
     in seeing how far this method could be pushed.  It was barely
     good enough for lgammaf() near its first negative zero, when all
     intermediate calculations were done in sesqui-double precision.)

     (The Intel ia64 math library is of course insanely complicated,
     etc., for *gamma*().  Instead of 30K of asm per function, it
     takes 220K for lgammal() and significantly less for lower
     precisions.  It even uses large asm for the wrapper functions
     (pre-C90 support which we axed long ago).  It doesn't do any
     complex functions, at least in the 2005 glibc version.
     Altogether, in the glibc 2005 version, Intel *gamma*.S takes
     630K, which is slightly larger than all of msun/src in FreeBSD,
     and we do some complex functions.)

% diff -u2 catrigf.c~ catrigf.c
% --- catrigf.c~	2012-09-22 04:49:51.000000000 +0000
% +++ catrigf.c	2012-09-22 00:38:55.503733000 +0000
% @@ -45,5 +45,5 @@
%  #undef isnan
%  #define isnan(x)	((x) != (x))
% -#define	raise_inexact()	do { volatile int junk = 1 + tiny; } while(0)
% +#define	raise_inexact()	do { volatile float junk = 1 + tiny; } while(0)
%  #undef signbit
%  #define signbit(x)	(__builtin_signbitf(x))
% diff -u2 catrigl.c~ catrigl.c
% --- catrigl.c~	2012-09-22 05:42:13.000000000 +0000
% +++ catrigl.c	2012-09-22 18:23:27.597349000 +0000
% @@ -46,5 +46,5 @@
%  #undef isnan
%  #define isnan(x)	((x) != (x))
% -#define	raise_inexact()	do { volatile int junk = 1 + tiny; } while(0)
% +#define	raise_inexact()	do { volatile float junk = 1 + tiny; } while(0)
%  #undef signbit
%  #define signbit(x)	(__builtin_signbitl(x)) 
% @@ -78,5 +78,5 @@
%  #endif
% 
% -static const volatile long double
% +static const volatile float
%  tiny =			0x1p-10000L;
%

That's all the new changes.  Now from the old patch:

@ diff -u2 catrig.c~ catrig.c
@ --- catrig.c~	2012-09-21 15:51:00.000000000 +0000
@ +++ catrig.c	2012-09-22 18:41:34.779454000 +0000
@ @@ -577,20 +607,24 @@
@  ...
@  	if (ax == 1)
@  		ry = atan2(2, -ay) / 2;
@ -	else if (ay < FOUR_SQRT_MIN)
@ +	else if (ay < DBL_EPSILON)
@  		ry = atan2(2*ay, (1-ax)*(1+ax)) / 2;
@  	else

You accepted this without comment.  My calculation is that since ax
!= 1, |1-ax*ax| is at lease 2*DBL_EPSILON; ay < DBL_EPSILON makes
ay*ay < DBL_EPSILON**2, so it is insignificant.  This threshold might
be off by a small factor.

SQRT_MIN makes some sense as a threshold below which ay*ay would
underflow.  FOUR_SQRT_MIN makes less sense (I think it was just a
nearby handy constant).  Both need the estimate on |1-ax*ax| to
show that a gradually underflowing ay*ay can be dropped since it is
insignificant.

I think we would prefer to always evaluate the full |z-1|**2, but can't
do it because we want to avoid spurious underflow.  The complications
in catrig seem to be just as large for avoiding overflow and underflow
as for getting enough accuracy.

I now understand how to make the float case signifcantly more efficient
than the double case: calculate everything in extra precision and
exponent range, and depend on the extra exponent range preventing
underflow and overflow, so that everything can be simpler and faster.
More accuracy occurs even more automatically.  But this would be too
much work for the unimportant float case.  The double case is more
interesting, but optimizations for it using long double are only
possible on arches that have long doubles larger than doubles, and
only optimizations on arches that have efficient long doubles.  The
Intel ia64 math library of course has complications to do this.  It
generally uses extra precision in double precision routines, with
algorithms specialized for this, and then has to work harder in long
double precision and use different algorithms since no extra precision
is available.

Bruce