From owner-freebsd-numerics@FreeBSD.ORG Tue Sep 18 06:19:26 2012 Return-Path: Delivered-To: freebsd-numerics@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C0E491065673 for ; Tue, 18 Sep 2012 06:19:26 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail06.syd.optusnet.com.au (mail06.syd.optusnet.com.au [211.29.132.187]) by mx1.freebsd.org (Postfix) with ESMTP id 451178FC16 for ; Tue, 18 Sep 2012 06:19:25 +0000 (UTC) Received: from c122-106-157-84.carlnfd1.nsw.optusnet.com.au (c122-106-157-84.carlnfd1.nsw.optusnet.com.au [122.106.157.84]) by mail06.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id q8I6JMOn031723 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Tue, 18 Sep 2012 16:19:23 +1000 Date: Tue, 18 Sep 2012 16:19:22 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Stephen Montgomery-Smith In-Reply-To: <5057A932.3000603@missouri.edu> Message-ID: <20120918150551.Y820@besplex.bde.org> References: <5017111E.6060003@missouri.edu> <5048D00B.8010401@missouri.edu> <504D3CCD.2050006@missouri.edu> <504FF726.9060001@missouri.edu> <20120912191556.F1078@besplex.bde.org> <20120912225847.J1771@besplex.bde.org> <50511B40.3070009@missouri.edu> <20120913204808.T1964@besplex.bde.org> <5051F59C.6000603@missouri.edu> <20120914014208.I2862@besplex.bde.org> <50526050.2070303@missouri.edu> <20120914212403.H1983@besplex.bde.org> <50538E28.6050400@missouri.edu> <20120915231032.C2669@besplex.bde.org> <50548E15.3010405@missouri.edu> <5054C027.2040008@missouri.edu> <5054C200.7090307@missouri.edu> <20120916041132.D6344@besplex.bde.org> <50553424.2080902@missouri.edu> <20120916134730.Y957@besplex.bde.org> <5055ECA8.2080008@missouri.edu> <20120917022614.R2943@besplex.bde.org> <50562213.9020400@missouri.edu> <20120917060116.G3825@besplex.bde.org> <50563C57.60806@missouri.edu> <20120918012459.V5094@besplex.bde.org> <5057A932.3000603@missouri.edu> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-numerics@freebsd.org Subject: Re: Complex arg-trig functions X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 18 Sep 2012 06:19:26 -0000 On Mon, 17 Sep 2012, Stephen Montgomery-Smith wrote: > OK, I am struggling a bit with the latest suggestions. > > First, I have completely removed all the code related to when |z| is small. > I have just lost it all. So I didn't perform any changes related to that > code. If you want me to put it back with appropriate "#if 0", can you email > those code segments back to me? I have not :-). It is also quoted in the mail archives. Will sent it in my next patch. > On 09/17/2012 12:07 PM, Bruce Evans wrote: > >> @ @@ -321,30 +334,28 @@ >> @ } >> @ @ - if (isinf(x) || isinf(y)) >> @ - return (cpackf(copysignf(0, x), copysignf(m_pi_2 + tiny, y))); >> @ + /* Raise inexact unless z == 0; return for z == 0 as a side >> effect. */ >> @ + if ((x == 0 && y == 0) || (int)(1 + tiny) != 1) >> @ + return (z); > > I'm not too sure where this code is meant to be. It looks like it should be > part of testing |z| small, but it seems to be placed where |z| is large. > When |z| is large, z=0 will never happen. As its comment says, this raises inexact [up front] unless z == 0, and [so that the test is not optimized away] returns [z] for z == 0 as a side effect. This is for any z that has not been previously classified (mainly ones with NaNs). Its operation is: - z == 0: find x == 0 and y == 0 and return z - z != 0: find !(x == 0 and y == 0); evaluate (int)(1 + tiny) != 1 and find it to be false while raising inexact; don't return z, but continue with inexact set. >> cacos*() and casin*() should benefit even more from an up-front raising >> of inexact, since do_hard_work() has 7 magic statements to raise inexact >> where sum_squares has only 1. > > Where is the code that raises inexact up-front? As quoted above. Later I tried removing all the 7 magic statements in do_hard_work(), without adding code like the above. This made very little difference. OTOH, the above code costs a cycle or 2, and removing the additions in all magic expressions (m_pi_2 + tiny) gave a small improvement. I think I can explain this, and it shows that we should be using fenv (optimized) and not "optimizing" using magic constext-sensitive expressions. The point is that the code that sets inexact can run in parallel so that the main path can run faster because it doesn't involve an operation like (m_pi_2 + tiny). Good ways for raising exceptions: FE_INEXACT: Your if '((int)(1 + tiny) == 1) return (foo);' works well. This depends on the branch being predictable. But returning is inconvenient. I hope if '((int)(1 + tiny) == 1) volatile_variable = 0;' works similarly. This could be in feraiseexcept(FE_INEXACT) or in a more primitive raise_inexact() (the latter is less verbose and easier to optimize). Then if you actually want to return, the code would be something like { raise_inexact(); return (m_pi_2); }. Better, the branch can be avoided using something like `volatile_variable = (int)(1 + tiny);'. Better still, write this in asm and just do `(int)0.5;' (use asm only to avoid the optimizer removing this). Possibly better still, use a purer FP operation since conversion to int can be slow. In the above, we don't really want a special case for z == 0; we need branches to classify this case but should skip the return since returns use branch resources too. The code becomes: if (x != 0 || y != 0) raise_inexact(); /* No comment. */ #if !THIS_CODE_INTENTIONALLY_LEFT_OUT else return (z); /* No comment. */ #endif FE_OVERFLOW: Instead of evaluating huge*huge and returning it, use something like `volatile_variable = huge*huge; return (INFINITY);'. This is more natural than the above, so it takes at most 1 more instruction (assignment to variable with no dependents) and thus loses little even without parallelism. The version written in asm can also avoid the assignment (just evaluate huge*huge) and lose nothing. FE_UNDERFLOW: Instead of evaluating tiny*tiny and returning it, use something like `volatile_variable = tiny*tiny; return (0);'. I hope there is a variation on this that raises underflow at full speed (underflowing cases are very slow on core2 although not on Athlon64; hopefully they are not so slow if the result of tiny*tiny is not used). The last 2 raisings will also fix the i386 bug that huge*huge and tiny*tiny don't actually raise overflow or underflow or return infinity or 0, since they are evaluated in extra exponent range. It takes conversion to double or float to trigger the exception and to give the correct value. When we try to raise exceptions in a parallel code path, we are hoping for related asynchronicities in the setting of the exception flags so that the usual case where the exception flags are not tested soon proceeds at full speed. It is unclear how compilers and CPUs produce the ordering of operations required by the abstract C machines -- I think a strict interpretation of `volatile' would require synchronizing everything for every access to a volatile variable, but that would be too slow and I've never seen compilers doing much synchronization. >> @ } else >> @ rx = log1pf(4*ax / sum_squares(ax-1, ay)) / 4; >> @ @ - if (ax == 1) { >> @ - if (ay==0) >> @ - ry = 0; >> @ - else >> @ - ry = atan2f(2, -ay) / 2; >> @ - } else if (ay < FOUR_SQRT_MIN) { >> @ - if ((int)ay==0) >> @ - ry = atan2f(2*ay, (1-ax)*(1+ax)) / 2; >> @ - } else >> @ + if (ax == 1) >> @ + ry = atan2f(2, ay) / 2; >> @ + else if (ay < FOUR_SQRT_MIN) >> @ + ry = atan2f(2*ay, (1-ax)*(1+ax)) / 2; >> @ + else >> @ ry = atan2f(2*ay, (1-ax)*(1+ax) - ay*ay) / 2; >> @ >> >> Remove the special case for (ax == 1, ay == 0). The general case gives >> the same result. > > I don't think your code works. It should be ry = atan2f(2, -ay) / 2, not ry > = atan2f(2, ay) / 2. Only logically. As I explained, the negation makes no difference to the result, but of course takes longer, so I removed it. > In your tests, you should include cases where x or y is equal or close to 1. > These are important special cases that I think your test code is very > unlikely to hit. These are difficult edge cases for all the arc-trig > functions. Hmm, I only did this carefully for clog(). I happen to have been testing lots of cases ctanh(1 + tiny, tiny') where tiny* is really tiny (denormal) with either sign, but not so many cases ctanh(1, tiny') and no (?) cases of ctanh(1 + tiny, +-0). >> Remove negation of ay for ax == 1. The sign will be copied into the result >> later for all cases, so it doesn't matter in the arg. I didn't check the >> branch cut details for this, but runtime tests passed. > > See above. I might have missed this. But if the sign matters, why do you set ry = +0 for catanh on both sides of 1 + I*(+-0)? Bruce