From owner-freebsd-numerics@FreeBSD.ORG  Mon Sep 17 22:50:28 2012
Return-Path: <owner-freebsd-numerics@FreeBSD.ORG>
Delivered-To: freebsd-numerics@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id C951F106564A
	for <freebsd-numerics@freebsd.org>;
	Mon, 17 Sep 2012 22:50:28 +0000 (UTC)
	(envelope-from stephen@missouri.edu)
Received: from wilberforce.math.missouri.edu (wilberforce.math.missouri.edu
	[128.206.184.213])
	by mx1.freebsd.org (Postfix) with ESMTP id 57FDC8FC0A
	for <freebsd-numerics@freebsd.org>;
	Mon, 17 Sep 2012 22:50:28 +0000 (UTC)
Received: from [127.0.0.1] (wilberforce.math.missouri.edu [128.206.184.213])
	by wilberforce.math.missouri.edu (8.14.5/8.14.5) with ESMTP id
	q8HMoQkw083746; Mon, 17 Sep 2012 17:50:26 -0500 (CDT)
	(envelope-from stephen@missouri.edu)
Message-ID: <5057A932.3000603@missouri.edu>
Date: Mon, 17 Sep 2012 17:50:26 -0500
From: Stephen Montgomery-Smith <stephen@missouri.edu>
User-Agent: Mozilla/5.0 (X11; Linux i686;
	rv:15.0) Gecko/20120827 Thunderbird/15.0
MIME-Version: 1.0
To: Bruce Evans <brde@optusnet.com.au>
References: <5017111E.6060003@missouri.edu> <502C0CF8.8040003@missouri.edu>
	<20120906221028.O1542@besplex.bde.org>
	<5048D00B.8010401@missouri.edu> <504D3CCD.2050006@missouri.edu>
	<504FF726.9060001@missouri.edu>
	<20120912191556.F1078@besplex.bde.org>
	<20120912225847.J1771@besplex.bde.org>
	<50511B40.3070009@missouri.edu>
	<20120913204808.T1964@besplex.bde.org>
	<5051F59C.6000603@missouri.edu>
	<20120914014208.I2862@besplex.bde.org>
	<50526050.2070303@missouri.edu>
	<20120914212403.H1983@besplex.bde.org>
	<50538E28.6050400@missouri.edu>
	<20120915231032.C2669@besplex.bde.org>
	<50548E15.3010405@missouri.edu> <5054C027.2040008@missouri.edu>
	<5054C200.7090307@missouri.edu>
	<20120916041132.D6344@besplex.bde.org>
	<50553424.2080902@missouri.edu>
	<20120916134730.Y957@besplex.bde.org>
	<5055ECA8.2080008@missouri.edu>
	<20120917022614.R2943@besplex.bde.org>
	<50562213.9020400@missouri.edu>
	<20120917060116.G3825@besplex.bde.org>
	<50563C57.60806@missouri.edu>
	<20120918012459.V5094@besplex.bde.org>
In-Reply-To: <20120918012459.V5094@besplex.bde.org>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: freebsd-numerics@freebsd.org
Subject: Re: Complex arg-trig functions
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
	<freebsd-numerics.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
	<mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
	<mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 17 Sep 2012 22:50:28 -0000

OK, I am struggling a bit with the latest suggestions.

First, I have completely removed all the code related to when |z| is 
small.  I have just lost it all.  So I didn't perform any changes 
related to that code.  If you want me to put it back with appropriate 
"#if 0", can you email those code segments back to me?

On 09/17/2012 12:07 PM, Bruce Evans wrote:

> @ @@ -321,30 +334,28 @@
> @      }
> @ @ -    if (isinf(x) || isinf(y))
> @ -        return (cpackf(copysignf(0, x), copysignf(m_pi_2 + tiny, y)));
> @ +    /* Raise inexact unless z == 0; return for z == 0 as a side
> effect. */
> @ +    if ((x == 0 && y == 0) || (int)(1 + tiny) != 1)
> @ +        return (z);

I'm not too sure where this code is meant to be.  It looks like it 
should be part of testing |z| small, but it seems to be placed where |z| 
is large.  When |z| is large, z=0 will never happen.

> cacos*() and casin*() should benefit even more from an up-front raising
> of inexact, since do_hard_work() has 7 magic statements to raise inexact
> where sum_squares has only 1.

Where is the code that raises inexact up-front?

> There are no magic expressions (int)(1+tiny) left except the new up-front
> one.  There are still not-so- magic expressions (m_pi_2 + tiny).  BTW,
> most or all of the recent fixes to use the latter expressions don't
> have a comment about raising inexact in catrig.c, while most or all
> older expressions for setting inexact do have such a comment.

I put the comments in.


> Previous optimization not turned off for debugging.  It is simpler now
> that it can depend on inexact being raised up-front.

Ditto.  Which code turns on inexact up front?

> @      } else
> @          rx = log1pf(4*ax / sum_squares(ax-1, ay)) / 4;
> @ @ -    if (ax == 1) {
> @ -        if (ay==0)
> @ -            ry = 0;
> @ -        else
> @ -            ry = atan2f(2, -ay) / 2;
> @ -    } else if (ay < FOUR_SQRT_MIN) {
> @ -        if ((int)ay==0)
> @ -            ry = atan2f(2*ay, (1-ax)*(1+ax)) / 2;
> @ -    } else
> @ +    if (ax == 1)
> @ +        ry = atan2f(2, ay) / 2;
> @ +    else if (ay < FOUR_SQRT_MIN)
> @ +        ry = atan2f(2*ay, (1-ax)*(1+ax)) / 2;
> @ +    else
> @          ry = atan2f(2*ay, (1-ax)*(1+ax) - ay*ay) / 2;
> @
>
> Remove the special case for (ax == 1, ay == 0).  The general case gives
> the same result.

I don't think your code works.  It should be ry = atan2f(2, -ay) / 2, 
not ry = atan2f(2, ay) / 2.

In your tests, you should include cases where x or y is equal or close 
to 1.  These are important special cases that I think your test code is 
very unlikely to hit.  These are difficult edge cases for all the 
arc-trig functions.

> Remove negation of ay for ax == 1.  The sign will be copied into the result
> later for all cases, so it doesn't matter in the arg.  I didn't check the
> branch cut details for this, but runtime tests passed.

See above.

> ...  I now understand what the threshold should be.  You have
> filtered out ax == 1.  This makes 1 - ax*ax at least ~2*EPSILON, so
> ay*ay can be dropped if ay is less than sqrt(2*EPSILON*EPSILON) *
> 2**-GUARD_DIGITS = EPSILON * 2**-5 say.  SQRT_MIN is way smaller
> than that, so FOUR_SQRT_MIN works too.  We should use a larger
> threshold for efficiency, or avoid the special case for ax == 1.
> Testing shows that this analysis is off by a factor of about
> sqrt(EPSILON), since a threshold of EPSILON * 2**7 is optimal.
> The optimization made no difference to speed; it is just an
> optimization for understanding.  Maybe the special case for ax == 1
> can be avoided, or folded together with the same special case for
> evaluation of the real part.  This special case is similar to the
> one in clog(), but easier.

This was one of the clever ideas in the paper by Hull et al, which I 
only understood recently.  Their code was closer to your approach, I think.

Let me think about what you wrote some more.

>
> Further optimization: in sum_squares(), y is always ay >= 0, so there
> is no need to apply fabs*() to it.  I think the compiler does this
> optimization.  It can see that y == ay via the inline.

Well spotted.