From owner-freebsd-numerics@FreeBSD.ORG  Sun Sep 16 05:14:54 2012
Return-Path: <owner-freebsd-numerics@FreeBSD.ORG>
Delivered-To: freebsd-numerics@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id BC6BA1065672
	for <freebsd-numerics@FreeBSD.org>;
	Sun, 16 Sep 2012 05:14:54 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from mail07.syd.optusnet.com.au (mail07.syd.optusnet.com.au
	[211.29.132.188])
	by mx1.freebsd.org (Postfix) with ESMTP id 34BF58FC08
	for <freebsd-numerics@FreeBSD.org>;
	Sun, 16 Sep 2012 05:14:53 +0000 (UTC)
Received: from c122-106-157-84.carlnfd1.nsw.optusnet.com.au
	(c122-106-157-84.carlnfd1.nsw.optusnet.com.au [122.106.157.84])
	by mail07.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
	q8G5EoHD024205
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Sun, 16 Sep 2012 15:14:52 +1000
Date: Sun, 16 Sep 2012 15:14:50 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Stephen Montgomery-Smith <stephen@missouri.edu>
In-Reply-To: <50553424.2080902@missouri.edu>
Message-ID: <20120916134730.Y957@besplex.bde.org>
References: <5017111E.6060003@missouri.edu>
	<20120814072946.S5260@besplex.bde.org>
	<50297CA5.5010900@missouri.edu> <50297E43.7090309@missouri.edu>
	<20120814201105.T934@besplex.bde.org> <502A780B.2010106@missouri.edu>
	<20120815223631.N1751@besplex.bde.org> <502C0CF8.8040003@missouri.edu>
	<20120906221028.O1542@besplex.bde.org> <5048D00B.8010401@missouri.edu>
	<504D3CCD.2050006@missouri.edu> <504FF726.9060001@missouri.edu>
	<20120912191556.F1078@besplex.bde.org>
	<20120912225847.J1771@besplex.bde.org>
	<50511B40.3070009@missouri.edu> <20120913204808.T1964@besplex.bde.org>
	<5051F59C.6000603@missouri.edu> <20120914014208.I2862@besplex.bde.org>
	<50526050.2070303@missouri.edu> <20120914212403.H1983@besplex.bde.org>
	<50538E28.6050400@missouri.edu> <20120915231032.C2669@besplex.bde.org>
	<50548E15.3010405@missouri.edu> <5054C027.2040008@missouri.edu>
	<5054C200.7090307@missouri.edu> <20120916041132.D6344@besplex.bde.org>
	<50553424.2080902@missouri.edu>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: freebsd-numerics@FreeBSD.org
Subject: Re: Complex arg-trig functions
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
	<freebsd-numerics.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
	<mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
	<mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 16 Sep 2012 05:14:55 -0000

On Sat, 15 Sep 2012, Stephen Montgomery-Smith wrote:

> One more thing I would like an opinion on.
>
> In my code I check for |z| being small, and then use the approximations:
> casinh(z) = z
> cacos(z) = Pi - z

Actually Pi/2 - z.

> catanh(z) = z
>
> However these approximations are not used in the papers by Hull et al, and 
> the code works just fine if I don't include these in the code.

Probably a bug in the papers.

casinh(z) = formula(z) would probably spuriously underflow for small z.
Avoiding underflow in the formula would probably reduce to returning z
with special code to raise inexact.  A formula like casinh(z) = z - z**3/6
would raise inexact as a side effect, at least for the same small z that
it underflows, but would also raise underflow and possibly denormal.  A
formula like z * (1 - z**2/6) would avoid underflow in more cases but
would probably be slower and less accurate when both are valid.

cacos(z) = Pi/2 - z - z**3/6 should be parenthesized as Pi/2 - (z + z**3/6)
for accuracy.  This gives the same underflow problem for the parenthesized
part.  You should actually use more like Pi/2 than Pi/2 - z.  See below.

The corresponding real functions in fdlibm of course use the approximations,
with a threshold higher than need to avoid underflow so as as to get a
free optimization when the approximation applies.

Similarly for any function represented by a power series about 0:

     f(z) = <zero terms> + f<nth deriv>(z) * z**n/n! + o(z**n)

The first term would underflow for small z.  When the first term is a
nonzero constant, it won't underflow, but higher terms would, and
higher terms should be added to each other first for accuracy..  For
expansion about z0 != 0, (z - z0) won't underflow, but the first term
involving it might.

C didn't support FP exceptions when the paper was written, but fdlibm did.

I just noticed minor bufs and pessimizations in your code for Pi/2 - z:
from cacos():

% 	if (ax < DBL_EPSILON && ay < DBL_EPSILON)
% 		if ((int)ax==0 && (int)ay==0) { /* raise inexact */
% 			if (sy == 0)
% 				return (cpack(m_pi_2 - x, copysign(ay, -1)));
% 			return (cpack(m_pi_2 - x, ay));
% 		}

(1) The real result of m_pi_2 is inexact even when z = 0, so inexact should
     be raised in all cases and the tricky extra code to avoid setting it
     when z = 0 is just a bug.

(2) At least if ax is a little smaller than DBL_EPSILON and the rounding mode
     is to nearest, m_pi_2 - x is just m_pi_2.  I think subtracting x raises
     underflow, but inexact is already raised for x != 0 in another way.

(3) The other way is slower, so subtracting x should be preferred.

(4) The corresponding fdlibm code for real acos() essentially adds a
     constant to Pi/2 instead of x.  It is 'return pio2_hi + pio2_lo;'
     where pio2_lo is volatile so that the addition is hopefully done
     at run time.  This gives subtle differences in the result in nonstandard
     rounding modes.  Mostly we don't support nonstandard rounding modes,
     but this method is better for them.  Your method is sensitive to the
     sign of x, but should not be.  With perfect rounding in all modes,
     the result should be Pi/2 rounded according to the mode, and not
     depend on the sign of x.  I don't know if the fdlibm constants are
     magic enough for this to work in all modes.  Normally, a 'hi' term
     is the result rounded to nearest in the ambient precision, and the
     'lo' term is the residual (rounded to nearest...), but here we
     want the final rounding to depend on the mode and it isn't clear
     that this can be expressed with a pair of constants each rounded in
     a single mode.

(5) fdlibm real acos() uses a threshold of DBL_MIN / 32 for returning
     pio2_hi + pio2_lo, I think just because it isn't clear where the
     exact threshold for this approximation being valid is.  The general
     formula works (doesn't underflow) for DBL_MAX / 32 <= |x| < 0.5,
     since it is a rational approximation written in a form that doesn't
     involve any terms smaller than Const*x**2.  Raising x to a higher
     power requires more care.  I happen to have rewritten this
     approximation in the float case to use a polynomial written more
     efficiently using higher powers, and just noticed that I wasn't
     careful enough.  I have an 11th power, and in float precision
     the threshold is 2**-26 and raising 2**-26 to just the 5th power
     underflows in float precision.

Complex acos() still has to avoid underflow in in the code following
the above when only one of ax and ay is small, so perhaps a special
case for this isn't actually optimal.

Bruce