From owner-freebsd-numerics@FreeBSD.ORG Mon Sep 17 22:50:28 2012 Return-Path: Delivered-To: freebsd-numerics@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id C951F106564A for ; Mon, 17 Sep 2012 22:50:28 +0000 (UTC) (envelope-from stephen@missouri.edu) Received: from wilberforce.math.missouri.edu (wilberforce.math.missouri.edu [128.206.184.213]) by mx1.freebsd.org (Postfix) with ESMTP id 57FDC8FC0A for ; Mon, 17 Sep 2012 22:50:28 +0000 (UTC) Received: from [127.0.0.1] (wilberforce.math.missouri.edu [128.206.184.213]) by wilberforce.math.missouri.edu (8.14.5/8.14.5) with ESMTP id q8HMoQkw083746; Mon, 17 Sep 2012 17:50:26 -0500 (CDT) (envelope-from stephen@missouri.edu) Message-ID: <5057A932.3000603@missouri.edu> Date: Mon, 17 Sep 2012 17:50:26 -0500 From: Stephen Montgomery-Smith User-Agent: Mozilla/5.0 (X11; Linux i686; rv:15.0) Gecko/20120827 Thunderbird/15.0 MIME-Version: 1.0 To: Bruce Evans References: <5017111E.6060003@missouri.edu> <502C0CF8.8040003@missouri.edu> <20120906221028.O1542@besplex.bde.org> <5048D00B.8010401@missouri.edu> <504D3CCD.2050006@missouri.edu> <504FF726.9060001@missouri.edu> <20120912191556.F1078@besplex.bde.org> <20120912225847.J1771@besplex.bde.org> <50511B40.3070009@missouri.edu> <20120913204808.T1964@besplex.bde.org> <5051F59C.6000603@missouri.edu> <20120914014208.I2862@besplex.bde.org> <50526050.2070303@missouri.edu> <20120914212403.H1983@besplex.bde.org> <50538E28.6050400@missouri.edu> <20120915231032.C2669@besplex.bde.org> <50548E15.3010405@missouri.edu> <5054C027.2040008@missouri.edu> <5054C200.7090307@missouri.edu> <20120916041132.D6344@besplex.bde.org> <50553424.2080902@missouri.edu> <20120916134730.Y957@besplex.bde.org> <5055ECA8.2080008@missouri.edu> <20120917022614.R2943@besplex.bde.org> <50562213.9020400@missouri.edu> <20120917060116.G3825@besplex.bde.org> <50563C57.60806@missouri.edu> <20120918012459.V5094@besplex.bde.org> In-Reply-To: <20120918012459.V5094@besplex.bde.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-numerics@freebsd.org Subject: Re: Complex arg-trig functions X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 17 Sep 2012 22:50:28 -0000 OK, I am struggling a bit with the latest suggestions. First, I have completely removed all the code related to when |z| is small. I have just lost it all. So I didn't perform any changes related to that code. If you want me to put it back with appropriate "#if 0", can you email those code segments back to me? On 09/17/2012 12:07 PM, Bruce Evans wrote: > @ @@ -321,30 +334,28 @@ > @ } > @ @ - if (isinf(x) || isinf(y)) > @ - return (cpackf(copysignf(0, x), copysignf(m_pi_2 + tiny, y))); > @ + /* Raise inexact unless z == 0; return for z == 0 as a side > effect. */ > @ + if ((x == 0 && y == 0) || (int)(1 + tiny) != 1) > @ + return (z); I'm not too sure where this code is meant to be. It looks like it should be part of testing |z| small, but it seems to be placed where |z| is large. When |z| is large, z=0 will never happen. > cacos*() and casin*() should benefit even more from an up-front raising > of inexact, since do_hard_work() has 7 magic statements to raise inexact > where sum_squares has only 1. Where is the code that raises inexact up-front? > There are no magic expressions (int)(1+tiny) left except the new up-front > one. There are still not-so- magic expressions (m_pi_2 + tiny). BTW, > most or all of the recent fixes to use the latter expressions don't > have a comment about raising inexact in catrig.c, while most or all > older expressions for setting inexact do have such a comment. I put the comments in. > Previous optimization not turned off for debugging. It is simpler now > that it can depend on inexact being raised up-front. Ditto. Which code turns on inexact up front? > @ } else > @ rx = log1pf(4*ax / sum_squares(ax-1, ay)) / 4; > @ @ - if (ax == 1) { > @ - if (ay==0) > @ - ry = 0; > @ - else > @ - ry = atan2f(2, -ay) / 2; > @ - } else if (ay < FOUR_SQRT_MIN) { > @ - if ((int)ay==0) > @ - ry = atan2f(2*ay, (1-ax)*(1+ax)) / 2; > @ - } else > @ + if (ax == 1) > @ + ry = atan2f(2, ay) / 2; > @ + else if (ay < FOUR_SQRT_MIN) > @ + ry = atan2f(2*ay, (1-ax)*(1+ax)) / 2; > @ + else > @ ry = atan2f(2*ay, (1-ax)*(1+ax) - ay*ay) / 2; > @ > > Remove the special case for (ax == 1, ay == 0). The general case gives > the same result. I don't think your code works. It should be ry = atan2f(2, -ay) / 2, not ry = atan2f(2, ay) / 2. In your tests, you should include cases where x or y is equal or close to 1. These are important special cases that I think your test code is very unlikely to hit. These are difficult edge cases for all the arc-trig functions. > Remove negation of ay for ax == 1. The sign will be copied into the result > later for all cases, so it doesn't matter in the arg. I didn't check the > branch cut details for this, but runtime tests passed. See above. > ... I now understand what the threshold should be. You have > filtered out ax == 1. This makes 1 - ax*ax at least ~2*EPSILON, so > ay*ay can be dropped if ay is less than sqrt(2*EPSILON*EPSILON) * > 2**-GUARD_DIGITS = EPSILON * 2**-5 say. SQRT_MIN is way smaller > than that, so FOUR_SQRT_MIN works too. We should use a larger > threshold for efficiency, or avoid the special case for ax == 1. > Testing shows that this analysis is off by a factor of about > sqrt(EPSILON), since a threshold of EPSILON * 2**7 is optimal. > The optimization made no difference to speed; it is just an > optimization for understanding. Maybe the special case for ax == 1 > can be avoided, or folded together with the same special case for > evaluation of the real part. This special case is similar to the > one in clog(), but easier. This was one of the clever ideas in the paper by Hull et al, which I only understood recently. Their code was closer to your approach, I think. Let me think about what you wrote some more. > > Further optimization: in sum_squares(), y is always ay >= 0, so there > is no need to apply fabs*() to it. I think the compiler does this > optimization. It can see that y == ay via the inline. Well spotted.