From owner-freebsd-numerics@FreeBSD.ORG Sun Sep 16 02:06:31 2012 Return-Path: Delivered-To: freebsd-numerics@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 4D0D0106564A for ; Sun, 16 Sep 2012 02:06:31 +0000 (UTC) (envelope-from stephen@missouri.edu) Received: from wilberforce.math.missouri.edu (wilberforce.math.missouri.edu [128.206.184.213]) by mx1.freebsd.org (Postfix) with ESMTP id 07CB78FC12 for ; Sun, 16 Sep 2012 02:06:30 +0000 (UTC) Received: from [127.0.0.1] (wilberforce.math.missouri.edu [128.206.184.213]) by wilberforce.math.missouri.edu (8.14.5/8.14.5) with ESMTP id q8G26S1B091225; Sat, 15 Sep 2012 21:06:29 -0500 (CDT) (envelope-from stephen@missouri.edu) Message-ID: <50553424.2080902@missouri.edu> Date: Sat, 15 Sep 2012 21:06:28 -0500 From: Stephen Montgomery-Smith User-Agent: Mozilla/5.0 (X11; Linux i686; rv:15.0) Gecko/20120827 Thunderbird/15.0 MIME-Version: 1.0 To: Bruce Evans References: <5017111E.6060003@missouri.edu> <20120814003614.H3692@besplex.bde.org> <50295F5C.6010800@missouri.edu> <20120814072946.S5260@besplex.bde.org> <50297CA5.5010900@missouri.edu> <50297E43.7090309@missouri.edu> <20120814201105.T934@besplex.bde.org> <502A780B.2010106@missouri.edu> <20120815223631.N1751@besplex.bde.org> <502C0CF8.8040003@missouri.edu> <20120906221028.O1542@besplex.bde.org> <5048D00B.8010401@missouri.edu> <504D3CCD.2050006@missouri.edu> <504FF726.9060001@missouri.edu> <20120912191556.F1078@besplex.bde.org> <20120912225847.J1771@besplex.bde.org> <50511B40.3070009@missouri.edu> <20120913204808.T1964@besplex.bde.org> <5051F59C.6000603@missouri.edu> <20120914014208.I2862@besplex.bde.org> <50526050.2070303@missouri.edu> <20120914212403.H1983@besplex.bde.org> <50538E28.6050400@missouri.edu> <20120915231032.C2669@besplex.bde.org> <50548E15.3010405@missouri.edu> <5054C027.2040008@missouri.edu> <5054C200.7090307@missouri.edu> <20120916041132.D6344@besplex.bde.org> In-Reply-To: <20120916041132.D6344@besplex.bde.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-numerics@freebsd.org Subject: Re: Complex arg-trig functions X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 16 Sep 2012 02:06:31 -0000 One more thing I would like an opinion on. In my code I check for |z| being small, and then use the approximations: casinh(z) = z cacos(z) = Pi - z catanh(z) = z However these approximations are not used in the papers by Hull et al, and the code works just fine if I don't include these in the code. The only reason I put this code in is because I thought it would go a little faster in the cases that |z| is small. Checking |z| is small takes no time at all. So what do you think? Should I keep these in the code or not? Thanks, Stephen From owner-freebsd-numerics@FreeBSD.ORG Sun Sep 16 04:42:19 2012 Return-Path: Delivered-To: freebsd-numerics@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id CC31D106564A for ; Sun, 16 Sep 2012 04:42:18 +0000 (UTC) (envelope-from stephen@missouri.edu) Received: from wilberforce.math.missouri.edu (wilberforce.math.missouri.edu [128.206.184.213]) by mx1.freebsd.org (Postfix) with ESMTP id 4AA558FC08 for ; Sun, 16 Sep 2012 04:42:17 +0000 (UTC) Received: from [127.0.0.1] (wilberforce.math.missouri.edu [128.206.184.213]) by wilberforce.math.missouri.edu (8.14.5/8.14.5) with ESMTP id q8G4gGmn001583 for ; Sat, 15 Sep 2012 23:42:17 -0500 (CDT) (envelope-from stephen@missouri.edu) Message-ID: <505558A8.6040600@missouri.edu> Date: Sat, 15 Sep 2012 23:42:16 -0500 From: Stephen Montgomery-Smith User-Agent: Mozilla/5.0 (X11; Linux i686; rv:15.0) Gecko/20120827 Thunderbird/15.0 MIME-Version: 1.0 To: freebsd-numerics@freebsd.org References: <5017111E.6060003@missouri.edu> <50295F5C.6010800@missouri.edu> <20120814072946.S5260@besplex.bde.org> <50297CA5.5010900@missouri.edu> <50297E43.7090309@missouri.edu> <20120814201105.T934@besplex.bde.org> <502A780B.2010106@missouri.edu> <20120815223631.N1751@besplex.bde.org> <502C0CF8.8040003@missouri.edu> <20120906221028.O1542@besplex.bde.org> <5048D00B.8010401@missouri.edu> <504D3CCD.2050006@missouri.edu> <504FF726.9060001@missouri.edu> <20120912191556.F1078@besplex.bde.org> <20120912225847.J1771@besplex.bde.org> <50511B40.3070009@missouri.edu> <20120913204808.T1964@besplex.bde.org> <5051F59C.6000603@missouri.edu> <20120914014208.I2862@besplex.bde.org> <50526050.2070303@missouri.edu> <20120914212403.H1983@besplex.bde.org> <50538E28.6050400@missouri.edu> <20120915231032.C2669@besplex.bde.org> <50548E15.3010405@missouri.edu> <5054C027.2040008@missouri.edu> <5054C200.7090307@missouri.edu> <20120916041132.D6344@besplex.bde.org> <50553424.2080902@missouri.edu> In-Reply-To: <50553424.2080902@missouri.edu> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: Complex arg-trig functions X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 16 Sep 2012 04:42:19 -0000 Hey guys - I have a piece of code like this: if (ax < DBL_EPSILON && ay < DBL_EPSILON) if ((int)ax==0 && (int)ay==0) { /* raise inexact */ if (sy == 0) return (cpack(m_pi_2 - x, copysign(ay, -1))); return (cpack(m_pi_2 - x, ay)); } Is there a good reason I didn't code it like this? if (ax < DBL_EPSILON && ay < DBL_EPSILON) if ((int)ax==0 && (int)ay==0) /* raise inexact */ return (cpack(m_pi_2 - x, -y)); I'm trying to remember if I coded it the second way, and one of you told me to code it the first way. Or maybe I came up with the first way myself - maybe I wasn't sure what would happen if y was 0 or -0. Thanks, Stephen From owner-freebsd-numerics@FreeBSD.ORG Sun Sep 16 05:14:54 2012 Return-Path: Delivered-To: freebsd-numerics@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id BC6BA1065672 for ; Sun, 16 Sep 2012 05:14:54 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail07.syd.optusnet.com.au (mail07.syd.optusnet.com.au [211.29.132.188]) by mx1.freebsd.org (Postfix) with ESMTP id 34BF58FC08 for ; Sun, 16 Sep 2012 05:14:53 +0000 (UTC) Received: from c122-106-157-84.carlnfd1.nsw.optusnet.com.au (c122-106-157-84.carlnfd1.nsw.optusnet.com.au [122.106.157.84]) by mail07.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id q8G5EoHD024205 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sun, 16 Sep 2012 15:14:52 +1000 Date: Sun, 16 Sep 2012 15:14:50 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Stephen Montgomery-Smith In-Reply-To: <50553424.2080902@missouri.edu> Message-ID: <20120916134730.Y957@besplex.bde.org> References: <5017111E.6060003@missouri.edu> <20120814072946.S5260@besplex.bde.org> <50297CA5.5010900@missouri.edu> <50297E43.7090309@missouri.edu> <20120814201105.T934@besplex.bde.org> <502A780B.2010106@missouri.edu> <20120815223631.N1751@besplex.bde.org> <502C0CF8.8040003@missouri.edu> <20120906221028.O1542@besplex.bde.org> <5048D00B.8010401@missouri.edu> <504D3CCD.2050006@missouri.edu> <504FF726.9060001@missouri.edu> <20120912191556.F1078@besplex.bde.org> <20120912225847.J1771@besplex.bde.org> <50511B40.3070009@missouri.edu> <20120913204808.T1964@besplex.bde.org> <5051F59C.6000603@missouri.edu> <20120914014208.I2862@besplex.bde.org> <50526050.2070303@missouri.edu> <20120914212403.H1983@besplex.bde.org> <50538E28.6050400@missouri.edu> <20120915231032.C2669@besplex.bde.org> <50548E15.3010405@missouri.edu> <5054C027.2040008@missouri.edu> <5054C200.7090307@missouri.edu> <20120916041132.D6344@besplex.bde.org> <50553424.2080902@missouri.edu> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-numerics@FreeBSD.org Subject: Re: Complex arg-trig functions X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 16 Sep 2012 05:14:55 -0000 On Sat, 15 Sep 2012, Stephen Montgomery-Smith wrote: > One more thing I would like an opinion on. > > In my code I check for |z| being small, and then use the approximations: > casinh(z) = z > cacos(z) = Pi - z Actually Pi/2 - z. > catanh(z) = z > > However these approximations are not used in the papers by Hull et al, and > the code works just fine if I don't include these in the code. Probably a bug in the papers. casinh(z) = formula(z) would probably spuriously underflow for small z. Avoiding underflow in the formula would probably reduce to returning z with special code to raise inexact. A formula like casinh(z) = z - z**3/6 would raise inexact as a side effect, at least for the same small z that it underflows, but would also raise underflow and possibly denormal. A formula like z * (1 - z**2/6) would avoid underflow in more cases but would probably be slower and less accurate when both are valid. cacos(z) = Pi/2 - z - z**3/6 should be parenthesized as Pi/2 - (z + z**3/6) for accuracy. This gives the same underflow problem for the parenthesized part. You should actually use more like Pi/2 than Pi/2 - z. See below. The corresponding real functions in fdlibm of course use the approximations, with a threshold higher than need to avoid underflow so as as to get a free optimization when the approximation applies. Similarly for any function represented by a power series about 0: f(z) = + f(z) * z**n/n! + o(z**n) The first term would underflow for small z. When the first term is a nonzero constant, it won't underflow, but higher terms would, and higher terms should be added to each other first for accuracy.. For expansion about z0 != 0, (z - z0) won't underflow, but the first term involving it might. C didn't support FP exceptions when the paper was written, but fdlibm did. I just noticed minor bufs and pessimizations in your code for Pi/2 - z: from cacos(): % if (ax < DBL_EPSILON && ay < DBL_EPSILON) % if ((int)ax==0 && (int)ay==0) { /* raise inexact */ % if (sy == 0) % return (cpack(m_pi_2 - x, copysign(ay, -1))); % return (cpack(m_pi_2 - x, ay)); % } (1) The real result of m_pi_2 is inexact even when z = 0, so inexact should be raised in all cases and the tricky extra code to avoid setting it when z = 0 is just a bug. (2) At least if ax is a little smaller than DBL_EPSILON and the rounding mode is to nearest, m_pi_2 - x is just m_pi_2. I think subtracting x raises underflow, but inexact is already raised for x != 0 in another way. (3) The other way is slower, so subtracting x should be preferred. (4) The corresponding fdlibm code for real acos() essentially adds a constant to Pi/2 instead of x. It is 'return pio2_hi + pio2_lo;' where pio2_lo is volatile so that the addition is hopefully done at run time. This gives subtle differences in the result in nonstandard rounding modes. Mostly we don't support nonstandard rounding modes, but this method is better for them. Your method is sensitive to the sign of x, but should not be. With perfect rounding in all modes, the result should be Pi/2 rounded according to the mode, and not depend on the sign of x. I don't know if the fdlibm constants are magic enough for this to work in all modes. Normally, a 'hi' term is the result rounded to nearest in the ambient precision, and the 'lo' term is the residual (rounded to nearest...), but here we want the final rounding to depend on the mode and it isn't clear that this can be expressed with a pair of constants each rounded in a single mode. (5) fdlibm real acos() uses a threshold of DBL_MIN / 32 for returning pio2_hi + pio2_lo, I think just because it isn't clear where the exact threshold for this approximation being valid is. The general formula works (doesn't underflow) for DBL_MAX / 32 <= |x| < 0.5, since it is a rational approximation written in a form that doesn't involve any terms smaller than Const*x**2. Raising x to a higher power requires more care. I happen to have rewritten this approximation in the float case to use a polynomial written more efficiently using higher powers, and just noticed that I wasn't careful enough. I have an 11th power, and in float precision the threshold is 2**-26 and raising 2**-26 to just the 5th power underflows in float precision. Complex acos() still has to avoid underflow in in the code following the above when only one of ax and ay is small, so perhaps a special case for this isn't actually optimal. Bruce From owner-freebsd-numerics@FreeBSD.ORG Sun Sep 16 08:23:22 2012 Return-Path: Delivered-To: freebsd-numerics@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id C51F8106564A for ; Sun, 16 Sep 2012 08:23:22 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail30.syd.optusnet.com.au (mail30.syd.optusnet.com.au [211.29.133.193]) by mx1.freebsd.org (Postfix) with ESMTP id 3CA538FC0A for ; Sun, 16 Sep 2012 08:23:21 +0000 (UTC) Received: from c122-106-157-84.carlnfd1.nsw.optusnet.com.au (c122-106-157-84.carlnfd1.nsw.optusnet.com.au [122.106.157.84]) by mail30.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id q8G8N64n000482 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sun, 16 Sep 2012 18:23:08 +1000 Date: Sun, 16 Sep 2012 18:23:06 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Stephen Montgomery-Smith In-Reply-To: <505558A8.6040600@missouri.edu> Message-ID: <20120916174306.H1527@besplex.bde.org> References: <5017111E.6060003@missouri.edu> <50297CA5.5010900@missouri.edu> <50297E43.7090309@missouri.edu> <20120814201105.T934@besplex.bde.org> <502A780B.2010106@missouri.edu> <20120815223631.N1751@besplex.bde.org> <502C0CF8.8040003@missouri.edu> <20120906221028.O1542@besplex.bde.org> <5048D00B.8010401@missouri.edu> <504D3CCD.2050006@missouri.edu> <504FF726.9060001@missouri.edu> <20120912191556.F1078@besplex.bde.org> <20120912225847.J1771@besplex.bde.org> <50511B40.3070009@missouri.edu> <20120913204808.T1964@besplex.bde.org> <5051F59C.6000603@missouri.edu> <20120914014208.I2862@besplex.bde.org> <50526050.2070303@missouri.edu> <20120914212403.H1983@besplex.bde.org> <50538E28.6050400@missouri.edu> <20120915231032.C2669@besplex.bde.org> <50548E15.3010405@missouri.edu> <5054C027.2040008@missouri.edu> <5054C200.7090307@missouri.edu> <20120916041132.D6344@besplex.bde.org> <50553424.2080902@missouri.edu> <505558A8.6040600@missouri.edu> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-numerics@freebsd.org Subject: Re: Complex arg-trig functions X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 16 Sep 2012 08:23:23 -0000 On Sat, 15 Sep 2012, Stephen Montgomery-Smith wrote: > Hey guys - I have a piece of code like this: > > if (ax < DBL_EPSILON && ay < DBL_EPSILON) > if ((int)ax==0 && (int)ay==0) { /* raise inexact */ > if (sy == 0) > return (cpack(m_pi_2 - x, copysign(ay, -1))); > return (cpack(m_pi_2 - x, ay)); > } > > Is there a good reason I didn't code it like this? > > if (ax < DBL_EPSILON && ay < DBL_EPSILON) > if ((int)ax==0 && (int)ay==0) /* raise inexact */ > return (cpack(m_pi_2 - x, -y)); > > > I'm trying to remember if I coded it the second way, and one of you told me > to code it the first way. Or maybe I came up with the first way myself - > maybe I wasn't sure what would happen if y was 0 or -0. I can only think of [fear of] -y not working right on +-0. Combined with previous opttimizations and fixes, this gives: if (ax < DBL_EPSILON && ay < DBL_EPSILON) return (cpack(m_pi_2 + tiny, -y)); /* PI/2 with inexact...*/ cacos(0 + I*NaN) and several cases for catanh() should similarly add to m_pi_2 to raise inexact when they return a part with an inexact PI/2. Otherwise, catrig*.c is remarkably careful about raising inexact. Refinement: be more careful with the rounding direction (as in fdlibm?): (1) make sure that m_pi_2 is PI/2 rounded down for the above use (but round to nearest for other uses). Or maybe, if rounding to nearest happens to round up, use m_pi_2 - tiny instead of m_pi_2 + tiny so that the runtime rounding goes in the right direction in hopefully all rounding modes. (2) add (or subtract) more than `tiny' to m_pi_2 if necessary to bump it to the correct side of the infinite-precision PI/2, so that the runtime rounding goes in the right direction. I'm not sure if this is necessary or even possible. Copying the values of PI/2 from the real functions should give both of these, to the same extent that it gives them for the real functions. The spelling of the variables should be copied too. The latter is pio2_hi + pio2_lo. Using pio2_lo instead od `tiny' may be unnecessary and pessimal. pio2_lo is declared volatile so that it is runtime here, but it is also used in code where it doesn't need to be volatile. The real functions don't have a `tiny' variable, and just re-use the general pio2_lo to get inexact here. So it looks like (2) is unnecessary, with the real functions using pio2_lo just because it is good enough. Note that when you need to control the rounding direction or just have a hi+lo decomposition, it is critical that the constant for the hi part have a particular value in binary. When it is declared in decimal, the decimal value should be rounded to match the desired binary value, so its higher digits will be quite different from the ones of the infinite- precision full value, even when the hi value is the best approximation to the full value (and doesn't have bits in it killed for technical reasons). I noticed this in the opposite direction when I calculated the decimal and binary values to put in the constant tables recently. Normally I round in binary and then print the rounded value in decimal. This looked strange for m_e, so I switched to printing a value with it rounded in decimal, with the binary rounding only in a comment. The strangeness is largest when there are many extra guard digits in the decimal value, like you had originally. It is unclear whether these digits should match the infinite-precision value or the expected rounded binary value. Bruce From owner-freebsd-numerics@FreeBSD.ORG Sun Sep 16 15:13:46 2012 Return-Path: Delivered-To: freebsd-numerics@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id DECFE106564A for ; Sun, 16 Sep 2012 15:13:46 +0000 (UTC) (envelope-from stephen@missouri.edu) Received: from wilberforce.math.missouri.edu (wilberforce.math.missouri.edu [128.206.184.213]) by mx1.freebsd.org (Postfix) with ESMTP id 984BE8FC0C for ; Sun, 16 Sep 2012 15:13:46 +0000 (UTC) Received: from [127.0.0.1] (wilberforce.math.missouri.edu [128.206.184.213]) by wilberforce.math.missouri.edu (8.14.5/8.14.5) with ESMTP id q8GFDida042643; Sun, 16 Sep 2012 10:13:45 -0500 (CDT) (envelope-from stephen@missouri.edu) Message-ID: <5055ECA8.2080008@missouri.edu> Date: Sun, 16 Sep 2012 10:13:44 -0500 From: Stephen Montgomery-Smith User-Agent: Mozilla/5.0 (X11; Linux i686; rv:15.0) Gecko/20120827 Thunderbird/15.0 MIME-Version: 1.0 To: Bruce Evans References: <5017111E.6060003@missouri.edu> <20120814072946.S5260@besplex.bde.org> <50297CA5.5010900@missouri.edu> <50297E43.7090309@missouri.edu> <20120814201105.T934@besplex.bde.org> <502A780B.2010106@missouri.edu> <20120815223631.N1751@besplex.bde.org> <502C0CF8.8040003@missouri.edu> <20120906221028.O1542@besplex.bde.org> <5048D00B.8010401@missouri.edu> <504D3CCD.2050006@missouri.edu> <504FF726.9060001@missouri.edu> <20120912191556.F1078@besplex.bde.org> <20120912225847.J1771@besplex.bde.org> <50511B40.3070009@missouri.edu> <20120913204808.T1964@besplex.bde.org> <5051F59C.6000603@missouri.edu> <20120914014208.I2862@besplex.bde.org> <50526050.2070303@missouri.edu> <20120914212403.H1983@besplex.bde.org> <50538E28.6050400@missouri.edu> <20120915231032.C2669@besplex.bde.org> <50548E15.3010405@missouri.edu> <5054C027.2040008@missouri.edu> <5054C200.7090307@missouri.edu> <20120916041132.D6344@besplex.bde.org> <50553424.2080902@missouri.edu> <20120916134730.Y957@besplex.bde.org> In-Reply-To: <20120916134730.Y957@besplex.bde.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-numerics@FreeBSD.org Subject: Re: Complex arg-trig functions X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 16 Sep 2012 15:13:47 -0000 On 09/16/2012 12:14 AM, Bruce Evans wrote: > On Sat, 15 Sep 2012, Stephen Montgomery-Smith wrote: > >> One more thing I would like an opinion on. >> >> In my code I check for |z| being small, and then use the approximations: >> casinh(z) = z >> cacos(z) = Pi - z > > Actually Pi/2 - z. > >> catanh(z) = z >> >> However these approximations are not used in the papers by Hull et al, >> and the code works just fine if I don't include these in the code. > > Probably a bug in the papers. It is not a bug in the papers. The algorithms they provide really do work when |z| is small. In fact, you have to deal separately with the cases |x| is small and |y| is small (z=x+I*y), so dealing with both of them being small is not any additional problem. And now I see your other post, that using PI/2 is problematic especially when rounding is not to nearest. (Then the problem of rounding PI/2 properly is relegated to the acos function, and so it is someone else's problem.) So all things being said and done, I am going to remove the use of these approximations. (And also, my comments describing them had a silly mistake in them as well.) From owner-freebsd-numerics@FreeBSD.ORG Sun Sep 16 15:20:21 2012 Return-Path: Delivered-To: freebsd-numerics@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 43827106574B for ; Sun, 16 Sep 2012 15:20:21 +0000 (UTC) (envelope-from stephen@missouri.edu) Received: from wilberforce.math.missouri.edu (wilberforce.math.missouri.edu [128.206.184.213]) by mx1.freebsd.org (Postfix) with ESMTP id F17F08FC17 for ; Sun, 16 Sep 2012 15:20:20 +0000 (UTC) Received: from [127.0.0.1] (wilberforce.math.missouri.edu [128.206.184.213]) by wilberforce.math.missouri.edu (8.14.5/8.14.5) with ESMTP id q8GFKJ0S043146 for ; Sun, 16 Sep 2012 10:20:20 -0500 (CDT) (envelope-from stephen@missouri.edu) Message-ID: <5055EE33.2090400@missouri.edu> Date: Sun, 16 Sep 2012 10:20:19 -0500 From: Stephen Montgomery-Smith User-Agent: Mozilla/5.0 (X11; Linux i686; rv:15.0) Gecko/20120827 Thunderbird/15.0 MIME-Version: 1.0 To: freebsd-numerics@freebsd.org References: <5017111E.6060003@missouri.edu> <50297CA5.5010900@missouri.edu> <50297E43.7090309@missouri.edu> <20120814201105.T934@besplex.bde.org> <502A780B.2010106@missouri.edu> <20120815223631.N1751@besplex.bde.org> <502C0CF8.8040003@missouri.edu> <20120906221028.O1542@besplex.bde.org> <5048D00B.8010401@missouri.edu> <504D3CCD.2050006@missouri.edu> <504FF726.9060001@missouri.edu> <20120912191556.F1078@besplex.bde.org> <20120912225847.J1771@besplex.bde.org> <50511B40.3070009@missouri.edu> <20120913204808.T1964@besplex.bde.org> <5051F59C.6000603@missouri.edu> <20120914014208.I2862@besplex.bde.org> <50526050.2070303@missouri.edu> <20120914212403.H1983@besplex.bde.org> <50538E28.6050400@missouri.edu> <20120915231032.C2669@besplex.bde.org> <50548E15.3010405@missouri.edu> <5054C027.2040008@missouri.edu> <5054C200.7090307@missouri.edu> <20120916041132.D6344@besplex.bde.org> <50553424.2080902@missouri.edu> <20120916134730.Y957@besplex.bde.org> <5055ECA8.2080008@missouri.edu> In-Reply-To: <5055ECA8.2080008@missouri.edu> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: Complex arg-trig functions X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 16 Sep 2012 15:20:21 -0000 A style question: do you mind this if (sy==0) ry = copysign(ry, -1); if (A < 1) A = 1; or do you prefer if (sy==0) ry = copysign(ry, -1); if (A < 1) A = 1; From owner-freebsd-numerics@FreeBSD.ORG Sun Sep 16 16:51:29 2012 Return-Path: Delivered-To: freebsd-numerics@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 93594106566B for ; Sun, 16 Sep 2012 16:51:29 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail30.syd.optusnet.com.au (mail30.syd.optusnet.com.au [211.29.133.193]) by mx1.freebsd.org (Postfix) with ESMTP id 219218FC16 for ; Sun, 16 Sep 2012 16:51:28 +0000 (UTC) Received: from c122-106-157-84.carlnfd1.nsw.optusnet.com.au (c122-106-157-84.carlnfd1.nsw.optusnet.com.au [122.106.157.84]) by mail30.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id q8GGpO9J023821 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Mon, 17 Sep 2012 02:51:25 +1000 Date: Mon, 17 Sep 2012 02:51:24 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Stephen Montgomery-Smith In-Reply-To: <5055ECA8.2080008@missouri.edu> Message-ID: <20120917022614.R2943@besplex.bde.org> References: <5017111E.6060003@missouri.edu> <50297E43.7090309@missouri.edu> <20120814201105.T934@besplex.bde.org> <502A780B.2010106@missouri.edu> <20120815223631.N1751@besplex.bde.org> <502C0CF8.8040003@missouri.edu> <20120906221028.O1542@besplex.bde.org> <5048D00B.8010401@missouri.edu> <504D3CCD.2050006@missouri.edu> <504FF726.9060001@missouri.edu> <20120912191556.F1078@besplex.bde.org> <20120912225847.J1771@besplex.bde.org> <50511B40.3070009@missouri.edu> <20120913204808.T1964@besplex.bde.org> <5051F59C.6000603@missouri.edu> <20120914014208.I2862@besplex.bde.org> <50526050.2070303@missouri.edu> <20120914212403.H1983@besplex.bde.org> <50538E28.6050400@missouri.edu> <20120915231032.C2669@besplex.bde.org> <50548E15.3010405@missouri.edu> <5054C027.2040008@missouri.edu> <5054C200.7090307@missouri.edu> <20120916041132.D6344@besplex.bde.org> <50553424.2080902@missouri.edu> <20120916134730.Y957@besplex.bde.org> <5055ECA8.2080008@missouri.edu> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-numerics@FreeBSD.org Subject: Re: Complex arg-trig functions X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 16 Sep 2012 16:51:29 -0000 On Sun, 16 Sep 2012, Stephen Montgomery-Smith wrote: > On 09/16/2012 12:14 AM, Bruce Evans wrote: >> On Sat, 15 Sep 2012, Stephen Montgomery-Smith wrote: >> >>> One more thing I would like an opinion on. >>> >>> In my code I check for |z| being small, and then use the approximations: >>> casinh(z) = z >>> cacos(z) = Pi - z >> >> Actually Pi/2 - z. >> >>> catanh(z) = z >>> >>> However these approximations are not used in the papers by Hull et al, >>> and the code works just fine if I don't include these in the code. >> >> Probably a bug in the papers. > > It is not a bug in the papers. The algorithms they provide really do work > when |z| is small. In fact, you have to deal separately with the cases |x| > is small and |y| is small (z=x+I*y), so dealing with both of them being small > is not any additional problem. > > And now I see your other post, that using PI/2 is problematic especially when > rounding is not to nearest. (Then the problem of rounding PI/2 properly is > relegated to the acos function, and so it is someone else's problem.) > > So all things being said and done, I am going to remove the use of these > approximations. I don't like that. It will be much slower on almost 1/4 of arg space. The only reason to consider not doing it is that the args that it applies to are not very likely, and optimizing for them may pessimize the usual case. I just found a related optimization for atan2(). For x > 0 and |y|/x < 2**-(MANT_DIG+afew), atan2(y, x) is evaluated as essentially sign(y) * atan(|y|/x). But in this case, its value is simply y/x with inexact. Again the optimization applies to almost 1/4 of arg space. It gains more than the normal overhead of an atan() call by avoiding secondary underflows when y/x underflows. Bruce From owner-freebsd-numerics@FreeBSD.ORG Sun Sep 16 17:12:15 2012 Return-Path: Delivered-To: freebsd-numerics@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 67E6A106564A for ; Sun, 16 Sep 2012 17:12:15 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail03.syd.optusnet.com.au (mail03.syd.optusnet.com.au [211.29.132.184]) by mx1.freebsd.org (Postfix) with ESMTP id E908D8FC0C for ; Sun, 16 Sep 2012 17:12:14 +0000 (UTC) Received: from c122-106-157-84.carlnfd1.nsw.optusnet.com.au (c122-106-157-84.carlnfd1.nsw.optusnet.com.au [122.106.157.84]) by mail03.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id q8GHC7nQ019965 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Mon, 17 Sep 2012 03:12:07 +1000 Date: Mon, 17 Sep 2012 03:12:07 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Stephen Montgomery-Smith In-Reply-To: <5055EE33.2090400@missouri.edu> Message-ID: <20120917025148.X2943@besplex.bde.org> References: <5017111E.6060003@missouri.edu> <20120814201105.T934@besplex.bde.org> <502A780B.2010106@missouri.edu> <20120815223631.N1751@besplex.bde.org> <502C0CF8.8040003@missouri.edu> <20120906221028.O1542@besplex.bde.org> <5048D00B.8010401@missouri.edu> <504D3CCD.2050006@missouri.edu> <504FF726.9060001@missouri.edu> <20120912191556.F1078@besplex.bde.org> <20120912225847.J1771@besplex.bde.org> <50511B40.3070009@missouri.edu> <20120913204808.T1964@besplex.bde.org> <5051F59C.6000603@missouri.edu> <20120914014208.I2862@besplex.bde.org> <50526050.2070303@missouri.edu> <20120914212403.H1983@besplex.bde.org> <50538E28.6050400@missouri.edu> <20120915231032.C2669@besplex.bde.org> <50548E15.3010405@missouri.edu> <5054C027.2040008@missouri.edu> <5054C200.7090307@missouri.edu> <20120916041132.D6344@besplex.bde.org> <50553424.2080902@missouri.edu> <20120916134730.Y957@besplex.bde.org> <5055ECA8.2080008@missouri.edu> <5055EE33.2090400@missouri.edu> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-numerics@FreeBSD.org Subject: Re: Complex arg-trig functions X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 16 Sep 2012 17:12:15 -0000 On Sun, 16 Sep 2012, Stephen Montgomery-Smith wrote: > A style question: do you mind this > > if (sy==0) ry = copysign(ry, -1); > if (A < 1) A = 1; > > or do you prefer > > if (sy==0) > ry = copysign(ry, -1); > if (A < 1) > A = 1; Multiple statements per line are large style bugs, as are missing spaces around == operators (I might agree only to omitting spaces around most multiplication operators and some addition operators). Apart from being less readable, multiple statements per line break debugging using line-based debuggers. BTW, copysign() is builtin in gcc-4.2 and not broken by a macro in . Otherwise it would be very slow. BTW2, fdlibm avoids using copysign() internally, but often sets sign bits by a direct bit access which does the equivalent of what copysign() does semantically. This can be slow, since it best it takes a read-modify-write of the target with all 3 steps in this non- parallelizable. Another not so good way to set sign bits is use an array with enties +-1 and do `ry *= array[sy];' Branchy code for setting or clearing the sign bit may be better then either of these methods, at least if the branches are predictable. If the builtin is very smart, then it will treat the copysign() call as a hint and select the best alternative, and it can do this more easily than it can rewrite manually optimized sequences for setting sign bits. I think it is not very smart. Bruce From owner-freebsd-numerics@FreeBSD.ORG Sun Sep 16 18:26:50 2012 Return-Path: Delivered-To: freebsd-numerics@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 75BC01065820 for ; Sun, 16 Sep 2012 18:26:50 +0000 (UTC) (envelope-from stephen@missouri.edu) Received: from wilberforce.math.missouri.edu (wilberforce.math.missouri.edu [128.206.184.213]) by mx1.freebsd.org (Postfix) with ESMTP id 2EFF38FC18 for ; Sun, 16 Sep 2012 18:26:49 +0000 (UTC) Received: from [127.0.0.1] (wilberforce.math.missouri.edu [128.206.184.213]) by wilberforce.math.missouri.edu (8.14.5/8.14.5) with ESMTP id q8GIQl1D055245; Sun, 16 Sep 2012 13:26:48 -0500 (CDT) (envelope-from stephen@missouri.edu) Message-ID: <505619E7.8080804@missouri.edu> Date: Sun, 16 Sep 2012 13:26:47 -0500 From: Stephen Montgomery-Smith User-Agent: Mozilla/5.0 (X11; Linux i686; rv:15.0) Gecko/20120827 Thunderbird/15.0 MIME-Version: 1.0 To: Bruce Evans References: <5017111E.6060003@missouri.edu> <20120814201105.T934@besplex.bde.org> <502A780B.2010106@missouri.edu> <20120815223631.N1751@besplex.bde.org> <502C0CF8.8040003@missouri.edu> <20120906221028.O1542@besplex.bde.org> <5048D00B.8010401@missouri.edu> <504D3CCD.2050006@missouri.edu> <504FF726.9060001@missouri.edu> <20120912191556.F1078@besplex.bde.org> <20120912225847.J1771@besplex.bde.org> <50511B40.3070009@missouri.edu> <20120913204808.T1964@besplex.bde.org> <5051F59C.6000603@missouri.edu> <20120914014208.I2862@besplex.bde.org> <50526050.2070303@missouri.edu> <20120914212403.H1983@besplex.bde.org> <50538E28.6050400@missouri.edu> <20120915231032.C2669@besplex.bde.org> <50548E15.3010405@missouri.edu> <5054C027.2040008@missouri.edu> <5054C200.7090307@missouri.edu> <20120916041132.D6344@besplex.bde.org> <50553424.2080902@missouri.edu> <20120916134730.Y957@besplex.bde.org> <5055ECA8.2080008@missouri.edu> <5055EE33.2090400@missouri.edu> <20120917025148.X2943@besplex.bde.org> In-Reply-To: <20120917025148.X2943@besplex.bde.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-numerics@FreeBSD.org Subject: Re: Complex arg-trig functions X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 16 Sep 2012 18:26:50 -0000 On 09/16/2012 12:12 PM, Bruce Evans wrote: > On Sun, 16 Sep 2012, Stephen Montgomery-Smith wrote: > >> A style question: do you mind this >> >> if (sy==0) ry = copysign(ry, -1); >> if (A < 1) A = 1; >> >> or do you prefer >> >> if (sy==0) >> ry = copysign(ry, -1); >> if (A < 1) >> A = 1; > > Multiple statements per line are large style bugs, as are missing spaces > around == operators (I might agree only to omitting spaces around most > multiplication operators and some addition operators). > > Apart from being less readable, multiple statements per line break > debugging > using line-based debuggers. > > BTW, copysign() is builtin in gcc-4.2 and not broken by a macro in > . > Otherwise it would be very slow. I changed it to: if (sy==0) ry = -ry; I happen to know that ry is always positive. From owner-freebsd-numerics@FreeBSD.ORG Sun Sep 16 19:01:40 2012 Return-Path: Delivered-To: freebsd-numerics@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 8AC1F106566C for ; Sun, 16 Sep 2012 19:01:40 +0000 (UTC) (envelope-from stephen@missouri.edu) Received: from wilberforce.math.missouri.edu (wilberforce.math.missouri.edu [128.206.184.213]) by mx1.freebsd.org (Postfix) with ESMTP id 486608FC17 for ; Sun, 16 Sep 2012 19:01:40 +0000 (UTC) Received: from [127.0.0.1] (wilberforce.math.missouri.edu [128.206.184.213]) by wilberforce.math.missouri.edu (8.14.5/8.14.5) with ESMTP id q8GJ1c7N057483; Sun, 16 Sep 2012 14:01:39 -0500 (CDT) (envelope-from stephen@missouri.edu) Message-ID: <50562213.9020400@missouri.edu> Date: Sun, 16 Sep 2012 14:01:39 -0500 From: Stephen Montgomery-Smith User-Agent: Mozilla/5.0 (X11; Linux i686; rv:15.0) Gecko/20120827 Thunderbird/15.0 MIME-Version: 1.0 To: Bruce Evans References: <5017111E.6060003@missouri.edu> <50297E43.7090309@missouri.edu> <20120814201105.T934@besplex.bde.org> <502A780B.2010106@missouri.edu> <20120815223631.N1751@besplex.bde.org> <502C0CF8.8040003@missouri.edu> <20120906221028.O1542@besplex.bde.org> <5048D00B.8010401@missouri.edu> <504D3CCD.2050006@missouri.edu> <504FF726.9060001@missouri.edu> <20120912191556.F1078@besplex.bde.org> <20120912225847.J1771@besplex.bde.org> <50511B40.3070009@missouri.edu> <20120913204808.T1964@besplex.bde.org> <5051F59C.6000603@missouri.edu> <20120914014208.I2862@besplex.bde.org> <50526050.2070303@missouri.edu> <20120914212403.H1983@besplex.bde.org> <50538E28.6050400@missouri.edu> <20120915231032.C2669@besplex.bde.org> <50548E15.3010405@missouri.edu> <5054C027.2040008@missouri.edu> <5054C200.7090307@missouri.edu> <20120916041132.D6344@besplex.bde.org> <50553424.2080902@missouri.edu> <20120916134730.Y957@besplex.bde.org> <5055ECA8.2080008@missouri.edu> <20120917022614.R2943@besplex.bde.org> In-Reply-To: <20120917022614.R2943@besplex.bde.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-numerics@FreeBSD.org Subject: Re: Complex arg-trig functions X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 16 Sep 2012 19:01:40 -0000 On 09/16/2012 11:51 AM, Bruce Evans wrote: > > I don't like that. It will be much slower on almost 1/4 of arg space. > The only reason to consider not doing it is that the args that it > applies to are not very likely, and optimizing for them may pessimize > the usual case. The pessimization when |z| is not small is tiny. It takes no time at all to check that |z| is small. On the other hand let me go through the code and see what happens when |x| is small or |y| is small. There are actually specific formulas that work well in these two cases, and they are probably not that much slower than the formulas I decided to remove. And when you chase through all the logic and "if" statements, you may find that you didn't use up a whole bunch of time for these very special cases of |z| small - most of the extra time merely being the decisions invoked by the "if" statements. > I just found a related optimization for atan2(). For x > 0 and > |y|/x < 2**-(MANT_DIG+afew), atan2(y, x) is evaluated as essentially > sign(y) * atan(|y|/x). But in this case, its value is simply y/x > with inexact. Again the optimization applies to almost 1/4 of arg > space. It gains more than the normal overhead of an atan() call by > avoiding secondary underflows when y/x underflows. You see, that is exactly where I don't want to do special optimization in my code. In my opinion, it is the tan function itself that should realize that |y|/x is small, and hence it is that function that simply return |y|/x. Or if you want to implement it at a higher level, atan2 should make this realization, and simply return y/x. Similarly, I would expect log1p(x) to simply return x (inexactly) for x small. And if the compiler is really good, I would hope that the two codes: log1p(x); (fabs(x) < DBL_EPSILON) ? x + set_tiny() : log1p(x); would be equivalent. (But I am rather sure that gcc isn't that good.) Furthermore, casinh etc are not commonly used functions. Putting huge amounts of effort looking at special cases to speed it up a little somehow feels wrong to me. In fact, if the programmer knows that he will be wanting casinh, and evaluated very fast, then he should be motivated enough to try out using z in the case when |z| is small, and see if that really speeds things up. Stephen From owner-freebsd-numerics@FreeBSD.ORG Sun Sep 16 19:53:27 2012 Return-Path: Delivered-To: freebsd-numerics@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 73817106564A for ; Sun, 16 Sep 2012 19:53:27 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail02.syd.optusnet.com.au (mail02.syd.optusnet.com.au [211.29.132.183]) by mx1.freebsd.org (Postfix) with ESMTP id E9EDC8FC0A for ; Sun, 16 Sep 2012 19:53:26 +0000 (UTC) Received: from c122-106-157-84.carlnfd1.nsw.optusnet.com.au (c122-106-157-84.carlnfd1.nsw.optusnet.com.au [122.106.157.84]) by mail02.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id q8GJrIne020374 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Mon, 17 Sep 2012 05:53:19 +1000 Date: Mon, 17 Sep 2012 05:53:18 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Bruce Evans In-Reply-To: <20120917022614.R2943@besplex.bde.org> Message-ID: <20120917041848.F3504@besplex.bde.org> References: <5017111E.6060003@missouri.edu> <20120814201105.T934@besplex.bde.org> <502A780B.2010106@missouri.edu> <20120815223631.N1751@besplex.bde.org> <502C0CF8.8040003@missouri.edu> <20120906221028.O1542@besplex.bde.org> <5048D00B.8010401@missouri.edu> <504D3CCD.2050006@missouri.edu> <504FF726.9060001@missouri.edu> <20120912191556.F1078@besplex.bde.org> <20120912225847.J1771@besplex.bde.org> <50511B40.3070009@missouri.edu> <20120913204808.T1964@besplex.bde.org> <5051F59C.6000603@missouri.edu> <20120914014208.I2862@besplex.bde.org> <50526050.2070303@missouri.edu> <20120914212403.H1983@besplex.bde.org> <50538E28.6050400@missouri.edu> <20120915231032.C2669@besplex.bde.org> <50548E15.3010405@missouri.edu> <5054C027.2040008@missouri.edu> <5054C200.7090307@missouri.edu> <20120916041132.D6344@besplex.bde.org> <50553424.2080902@missouri.edu> <20120916134730.Y957@besplex.bde.org> <5055ECA8.2080008@missouri.edu> <20120917022614.R2943@besplex.bde.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: Stephen Montgomery-Smith , freebsd-numerics@freebsd.org Subject: Re: Complex arg-trig functions X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 16 Sep 2012 19:53:27 -0000 On Mon, 17 Sep 2012, Bruce Evans wrote: > On Sun, 16 Sep 2012, Stephen Montgomery-Smith wrote: > >> On 09/16/2012 12:14 AM, Bruce Evans wrote: >>> On Sat, 15 Sep 2012, Stephen Montgomery-Smith wrote: >>> >>>> One more thing I would like an opinion on. >>>> >>>> In my code I check for |z| being small, and then use the approximations: >>>> casinh(z) = z >>>> cacos(z) = Pi - z >>> >>> Actually Pi/2 - z. >>> >>>> catanh(z) = z >>>> >>>> However these approximations are not used in the papers by Hull et al, >>>> and the code works just fine if I don't include these in the code. >>> >>> Probably a bug in the papers. >> >> It is not a bug in the papers. The algorithms they provide really do work >> when |z| is small. In fact, you have to deal separately with the cases |x| >> is small and |y| is small (z=x+I*y), so dealing with both of them being >> small is not any additional problem. >> >> And now I see your other post, that using PI/2 is problematic especially >> when rounding is not to nearest. (Then the problem of rounding PI/2 >> properly is relegated to the acos function, and so it is someone else's >> problem.) >> >> So all things being said and done, I am going to remove the use of these >> approximations. > > I don't like that. It will be much slower on almost 1/4 of arg space. > The only reason to consider not doing it is that the args that it > applies to are not very likely, and optimizing for them may pessimize > the usual case. It gives the expected pessimizations, and unexpected accuracy improvements and unimprovements. On amd64: % 9,10c9,10 % < rcacos:max_er = 0x6947ecac 3.2900, avg_er = 0.317, #>=1:0.5 = 30489:303638 % < 5.47 real 5.47 user 0.00 sys % --- % > rcacos:max_er = 0x6947ecac 3.2900, avg_er = 0.317, #>=1:0.5 = 30489:268862 % > 5.87 real 5.86 user 0.00 sys Only float functions were updated for this test, and results are only shown for float functions (comparing them with double functions). '<' in the diff is for an old result and '>' for a new result. The above shows: - accuracy improvement. Apparently the thresholds were too large. - slowdown of 0.39 seconds. The test program mainly calls cacos() and cacosf(), but has some overheads. Say 1 1.47 seconds for the overheads and 2 seconds for each of the functions. 0.39/2 is almost 20%. % 21,22c21,22 % < rcacosh:max_er = 0x51e70742 2.5595, avg_er = 0.257, #>=1:0.5 = 25766:3286888 % < 5.81 real 5.79 user 0.00 sys % --- % > rcacosh:max_er = 0x51e70742 2.5595, avg_er = 0.258, #>=1:0.5 = 26034:3313256 % > 6.06 real 6.05 user 0.00 sys Similar slowdown, but now the old version os more accurate. Apparently the general code doesn't reduce to simply Pi/2. % 34c34 % < 5.98 real 5.98 user 0.00 sys % --- % > 6.30 real 6.28 user 0.00 sys This is for rcasin. Similar slowdown, but no change in values. % 45,46c45,46 % < rcasinh:max_er = 0x51e70742 2.5595, avg_er = 0.257, #>=1:0.5 = 25766:3286888 % < 5.57 real 5.56 user 0.00 sys % --- % > rcasinh:max_er = 0x51e70742 2.5595, avg_er = 0.258, #>=1:0.5 = 26034:3313256 % > 5.82 real 5.81 user 0.00 sys Like rcacosh (lose speed and accuracy). % 57,58c57,58 % < rcatan:max_er = 0x51d7c47a 2.5576, avg_er = 0.295, #>=1:0.5 = 77670:443246 % < 3.69 real 3.68 user 0.00 sys % --- % > rcatan:max_er = 0x51d7c47a 2.5576, avg_er = 0.296, #>=1:0.5 = 77874:469678 % > 3.64 real 3.64 user 0.00 sys No slowdown, but accuracy loss. % 69,70c69,70 % < rcatanh:max_er = 0x5304b263 2.5943, avg_er = 0.201, #>=1:0.5 = 185298:1337156 % < 3.88 real 3.86 user 0.00 sys % --- % > rcatanh:max_er = 0x5304b263 2.5943, avg_er = 0.203, #>=1:0.5 = 204986:1370276 % > 3.84 real 3.83 user 0.00 sys Like rcatan, but the accuracy loss is smaller. % [... unrelated functions] % [... imaginary parts have similar behaviour (various symmetries)] On i386: % 9,10c9,10 % < rcacos:max_er = 0x4517ee94 2.1592, avg_er = 0.315, #>=1:0.5 = 4607:246852 % < 8.18 real 7.48 user 0.02 sys % --- % > rcacos:max_er = 0x4517ee94 2.1592, avg_er = 0.314, #>=1:0.5 = 4607:212076 % > 8.48 real 7.82 user 0.03 sys % ... % 201,202c201,202 % < icacosh:max_er = 0x4517ee94 2.1592, avg_er = 0.315, #>=1:0.5 = 4607:246852 % < 8.62 real 8.50 user 0.01 sys % --- % > icacosh:max_er = 0x4517ee94 2.1592, avg_er = 0.314, #>=1:0.5 = 4607:212076 % > 9.88 real 9.05 user 0.03 sys Similar slowdowns (only look at the user time), but only changes in accuracy for these two. Now the accuracy is never reduced. I think you just have to reduce the threshold a little in the old version to get this improvement. Indeed, the following works well for me (only edited the float version): @ --- /home/stephen/public_html/catrigf.c 2012-09-16 15:14:05.000000000 +0000 @ +++ catrigf.c 2012-09-16 19:23:18.559723000 +0000 @ @@ -165,4 +165,9 @@ @ } @ @ + /* XXX the numbers are related to sqrt(6 * FLT_EPSILON). */ @ + if (ax < 2048 * FLT_EPSILON && ay < 2048 *FLT_EPSILON) @ + if ((int)ax==0 && (int)ay==0) @ + return (z); @ + @ do_hard_work(ax, ay, &rx, &B_is_usable, &B, &sqrt_A2my2, &new_y); @ if (B_is_usable) @ @@ -200,5 +205,6 @@ @ if (isinf(y)) @ return (cpackf(x+x, -y)); @ - if (x == 0) return (cpackf(m_pi_2, y+y)); @ + if (x == 0) @ + return (cpackf(m_pi_2+tiny, y+y)); @ return (cpackf(x+0.0L+(y+0), x+0.0L+(y+0))); @ } Also fix NaN cases. @ @@ -214,4 +220,8 @@ @ } @ @ + /* XXX the number for ay is related to sqrt(6 * FLT_EPSILON). */ @ + if (ax < FLT_EPSILON / 8 && ay < 2048 * FLT_EPSILON) @ + return (cpackf(m_pi_2 + tiny, -y)); @ + @ do_hard_work(ay, ax, &ry, &B_is_usable, &B, &sqrt_A2mx2, &new_x); @ if (B_is_usable) { The thresholds should be asymmetric since the real part uses the approximation Pi/2 + O(x) while the imaginary part uses the approximation -y + O(y**3). If we used the approximation Pi/2 - x for the real part as before, then the thresholds could be more symmetric and more cases could be handled here. But then the expression with m_pi_2 would be (m_pi_2 - x) again, so it wouldn't necessarily set inexact, and the special code for setting inexact would be needed again. The old thresholds were too conservative. I'm being sloppy with the xy product terms and nonstandard rounding modes. The magic numbers were whatever minimised the number of incorrectly rounded cases in my tests. sqrt(6 * FLT_EPSILON) is obviously not conservative enough. The divisor of 8 gives 3 guard digits which would fix some cases (iff the cases go through here). The magic 2048 gives about 4.5 guard "digits". A couple more guard digits make little difference, by 1 fewer gives observably more errors. @ @@ -313,5 +323,6 @@ @ return (cpackf(copysignf(0, x), y+y)); @ if (isinf(y)) @ - return (cpackf(copysignf(0, x), copysignf(m_pi_2, y))); @ + return (cpackf(copysignf(0, x), @ + copysignf(m_pi_2 + tiny, y))); @ if (x == 0) @ return (cpackf(x, y+y)); @ @@ -320,9 +331,16 @@ @ @ if (isinf(x) || isinf(y)) @ - return (cpackf(copysignf(0, x), copysignf(m_pi_2, y))); @ + return (cpackf(copysignf(0, x), copysignf(m_pi_2 + tiny, y))); @ @ if (ax > RECIP_EPSILON || ay > RECIP_EPSILON) @ if ((int)(1+tiny)==1) @ - return (cpackf(copysignf(real_part_reciprocal(ax, ay), x), copysignf(m_pi_2, y))); @ + return (cpackf( @ + copysignf(real_part_reciprocal(ax, ay), x), @ + copysignf(m_pi_2 + tiny, y))); @ + @ + /* XXX the numbers are related to sqrt(6 * FLT_EPSILON). */ @ + if (ax < 2048 * FLT_EPSILON && ay < 2048 * FLT_EPSILON) @ + if ((int)ax==0 && (int)ay==0) @ + return (z); @ @ if (ax == 1 && ay < FLT_EPSILON) { Bruce From owner-freebsd-numerics@FreeBSD.ORG Sun Sep 16 20:29:30 2012 Return-Path: Delivered-To: freebsd-numerics@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id D4EDC106564A for ; Sun, 16 Sep 2012 20:29:30 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail07.syd.optusnet.com.au (mail07.syd.optusnet.com.au [211.29.132.188]) by mx1.freebsd.org (Postfix) with ESMTP id 632E68FC08 for ; Sun, 16 Sep 2012 20:29:29 +0000 (UTC) Received: from c122-106-157-84.carlnfd1.nsw.optusnet.com.au (c122-106-157-84.carlnfd1.nsw.optusnet.com.au [122.106.157.84]) by mail07.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id q8GKTK4M019234 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Mon, 17 Sep 2012 06:29:21 +1000 Date: Mon, 17 Sep 2012 06:29:20 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Stephen Montgomery-Smith In-Reply-To: <50562213.9020400@missouri.edu> Message-ID: <20120917060116.G3825@besplex.bde.org> References: <5017111E.6060003@missouri.edu> <502A780B.2010106@missouri.edu> <20120815223631.N1751@besplex.bde.org> <502C0CF8.8040003@missouri.edu> <20120906221028.O1542@besplex.bde.org> <5048D00B.8010401@missouri.edu> <504D3CCD.2050006@missouri.edu> <504FF726.9060001@missouri.edu> <20120912191556.F1078@besplex.bde.org> <20120912225847.J1771@besplex.bde.org> <50511B40.3070009@missouri.edu> <20120913204808.T1964@besplex.bde.org> <5051F59C.6000603@missouri.edu> <20120914014208.I2862@besplex.bde.org> <50526050.2070303@missouri.edu> <20120914212403.H1983@besplex.bde.org> <50538E28.6050400@missouri.edu> <20120915231032.C2669@besplex.bde.org> <50548E15.3010405@missouri.edu> <5054C027.2040008@missouri.edu> <5054C200.7090307@missouri.edu> <20120916041132.D6344@besplex.bde.org> <50553424.2080902@missouri.edu> <20120916134730.Y957@besplex.bde.org> <5055ECA8.2080008@missouri.edu> <20120917022614.R2943@besplex.bde.org> <50562213.9020400@missouri.edu> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-numerics@freebsd.org Subject: Re: Complex arg-trig functions X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 16 Sep 2012 20:29:30 -0000 On Sun, 16 Sep 2012, Stephen Montgomery-Smith wrote: > On 09/16/2012 11:51 AM, Bruce Evans wrote: >> >> I don't like that. It will be much slower on almost 1/4 of arg space. >> The only reason to consider not doing it is that the args that it >> applies to are not very likely, and optimizing for them may pessimize >> the usual case. > > The pessimization when |z| is not small is tiny. It takes no time at all to > check that |z| is small. Not necessarily on out-of-order machines (most x86). The CPU executes multiple paths speculatively and concurrently. If it does more on an unused path, then it might do less on the used path. It may mispredict the branch on the size of |z| and thus misguess which path to do more on. (I don't know many details of this. For example, does it do anything at all on paths predicted to be not taken?) Losses from this are usually described as branch mispredictions. They might cost 20 (50? 100?) cycles after taking 2 about cycles to actually check |z| (2 cycles pipelined but more like + 8 in real time, and it is the latter time that you lose by backing out). The only sure way to avoid branch mispredictions is to not have any, and catrig is too complicated for that. > On the other hand let me go through the code and see what happens when |x| is > small or |y| is small. There are actually specific formulas that work well > in these two cases, and they are probably not that much slower than the > formulas I decided to remove. And when you chase through all the logic and > "if" statements, you may find that you didn't use up a whole bunch of time > for these very special cases of |z| small - most of the extra time merely > being the decisions invoked by the "if" statements. But all general cases end up going through an extern function like acos() or atan2(), and just calling another function is a significant overhead. When |z| is small, the arg(s) to the other function will probably be an special case for it (e.g., acos(small)). The other function should optimize this and not take as long as an average call. However, since it is special, it may cause branch mispredictions for other uses of the function. >> I just found a related optimization for atan2(). For x > 0 and >> |y|/x < 2**-(MANT_DIG+afew), atan2(y, x) is evaluated as essentially >> sign(y) * atan(|y|/x). But in this case, its value is simply y/x >> with inexact. Again the optimization applies to almost 1/4 of arg >> space. It gains more than the normal overhead of an atan() call by >> avoiding secondary underflows when y/x underflows. > > You see, that is exactly where I don't want to do special optimization in my > code. In my opinion, it is the tan function itself that should realize that > |y|/x is small, and hence it is that function that simply return |y|/x. Or > if you want to implement it at a higher level, atan2 should make this > realization, and simply return y/x. I'm thinking of going the other way and using atan(y/x) instead of atan2() :-). This is safe iff we know that y/x is not very special. > Similarly, I would expect log1p(x) to simply return x (inexactly) for x > small. And if the compiler is really good, I would hope that the two codes: > log1p(x); > (fabs(x) < DBL_EPSILON) ? x + set_tiny() : log1p(x); > would be equivalent. (But I am rather sure that gcc isn't that good.) > > Furthermore, casinh etc are not commonly used functions. Putting huge > amounts of effort looking at special cases to speed it up a little somehow > feels wrong to me. In fact, if the programmer knows that he will be wanting > casinh, and evaluated very fast, then he should be motivated enough to try > out using z in the case when |z| is small, and see if that really speeds > things up. True. Now I mainly want it to be fast so that I can test more cases. Bruce From owner-freebsd-numerics@FreeBSD.ORG Sun Sep 16 20:49:36 2012 Return-Path: Delivered-To: freebsd-numerics@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 16064106574A for ; Sun, 16 Sep 2012 20:49:36 +0000 (UTC) (envelope-from stephen@missouri.edu) Received: from wilberforce.math.missouri.edu (wilberforce.math.missouri.edu [128.206.184.213]) by mx1.freebsd.org (Postfix) with ESMTP id C490A8FC08 for ; Sun, 16 Sep 2012 20:49:35 +0000 (UTC) Received: from [127.0.0.1] (wilberforce.math.missouri.edu [128.206.184.213]) by wilberforce.math.missouri.edu (8.14.5/8.14.5) with ESMTP id q8GKnXtC064408; Sun, 16 Sep 2012 15:49:34 -0500 (CDT) (envelope-from stephen@missouri.edu) Message-ID: <50563B5E.3090301@missouri.edu> Date: Sun, 16 Sep 2012 15:49:34 -0500 From: Stephen Montgomery-Smith User-Agent: Mozilla/5.0 (X11; Linux i686; rv:15.0) Gecko/20120827 Thunderbird/15.0 MIME-Version: 1.0 To: Bruce Evans References: <5017111E.6060003@missouri.edu> <502A780B.2010106@missouri.edu> <20120815223631.N1751@besplex.bde.org> <502C0CF8.8040003@missouri.edu> <20120906221028.O1542@besplex.bde.org> <5048D00B.8010401@missouri.edu> <504D3CCD.2050006@missouri.edu> <504FF726.9060001@missouri.edu> <20120912191556.F1078@besplex.bde.org> <20120912225847.J1771@besplex.bde.org> <50511B40.3070009@missouri.edu> <20120913204808.T1964@besplex.bde.org> <5051F59C.6000603@missouri.edu> <20120914014208.I2862@besplex.bde.org> <50526050.2070303@missouri.edu> <20120914212403.H1983@besplex.bde.org> <50538E28.6050400@missouri.edu> <20120915231032.C2669@besplex.bde.org> <50548E15.3010405@missouri.edu> <5054C027.2040008@missouri.edu> <5054C200.7090307@missouri.edu> <20120916041132.D6344@besplex.bde.org> <50553424.2080902@missouri.edu> <20120916134730.Y957@besplex.bde.org> <5055ECA8.2080008@missouri.edu> <20120917022614.R2943@besplex.bde.org> <50562213.9020400@missouri.edu> <20120917060116.G3825@besplex.bde.org> In-Reply-To: <20120917060116.G3825@besplex.bde.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-numerics@freebsd.org Subject: Re: Complex arg-trig functions X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 16 Sep 2012 20:49:36 -0000 On 09/16/2012 03:29 PM, Bruce Evans wrote: > I'm thinking of going the other way and using atan(y/x) instead of atan2() > :-). This is safe iff we know that y/x is not very special. This was, in fact, how it was presented in the original paper. The Boost libraries also used atan instead of atan2. In fact, when I first heard of the "atan2" function (perhaps way back when PL/1 was a programming language), I naively thought that atan(x) was implemented as atan2(1,x). From owner-freebsd-numerics@FreeBSD.ORG Sun Sep 16 20:53:45 2012 Return-Path: Delivered-To: freebsd-numerics@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id F3447106566C for ; Sun, 16 Sep 2012 20:53:44 +0000 (UTC) (envelope-from stephen@missouri.edu) Received: from wilberforce.math.missouri.edu (wilberforce.math.missouri.edu [128.206.184.213]) by mx1.freebsd.org (Postfix) with ESMTP id BC1708FC08 for ; Sun, 16 Sep 2012 20:53:44 +0000 (UTC) Received: from [127.0.0.1] (wilberforce.math.missouri.edu [128.206.184.213]) by wilberforce.math.missouri.edu (8.14.5/8.14.5) with ESMTP id q8GKrhpE064673; Sun, 16 Sep 2012 15:53:43 -0500 (CDT) (envelope-from stephen@missouri.edu) Message-ID: <50563C57.60806@missouri.edu> Date: Sun, 16 Sep 2012 15:53:43 -0500 From: Stephen Montgomery-Smith User-Agent: Mozilla/5.0 (X11; Linux i686; rv:15.0) Gecko/20120827 Thunderbird/15.0 MIME-Version: 1.0 To: Bruce Evans References: <5017111E.6060003@missouri.edu> <502A780B.2010106@missouri.edu> <20120815223631.N1751@besplex.bde.org> <502C0CF8.8040003@missouri.edu> <20120906221028.O1542@besplex.bde.org> <5048D00B.8010401@missouri.edu> <504D3CCD.2050006@missouri.edu> <504FF726.9060001@missouri.edu> <20120912191556.F1078@besplex.bde.org> <20120912225847.J1771@besplex.bde.org> <50511B40.3070009@missouri.edu> <20120913204808.T1964@besplex.bde.org> <5051F59C.6000603@missouri.edu> <20120914014208.I2862@besplex.bde.org> <50526050.2070303@missouri.edu> <20120914212403.H1983@besplex.bde.org> <50538E28.6050400@missouri.edu> <20120915231032.C2669@besplex.bde.org> <50548E15.3010405@missouri.edu> <5054C027.2040008@missouri.edu> <5054C200.7090307@missouri.edu> <20120916041132.D6344@besplex.bde.org> <50553424.2080902@missouri.edu> <20120916134730.Y957@besplex.bde.org> <5055ECA8.2080008@missouri.edu> <20120917022614.R2943@besplex.bde.org> <50562213.9020400@missouri.edu> <20120917060116.G3825@besplex.bde.org> In-Reply-To: <20120917060116.G3825@besplex.bde.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-numerics@freebsd.org Subject: Re: Complex arg-trig functions X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 16 Sep 2012 20:53:45 -0000 On 09/16/2012 03:29 PM, Bruce Evans wrote: > On Sun, 16 Sep 2012, Stephen Montgomery-Smith wrote: > >> On 09/16/2012 11:51 AM, Bruce Evans wrote: >>> >>> I don't like that. It will be much slower on almost 1/4 of arg space. >>> The only reason to consider not doing it is that the args that it >>> applies to are not very likely, and optimizing for them may pessimize >>> the usual case. >> >> The pessimization when |z| is not small is tiny. It takes no time at >> all to check that |z| is small. > > Not necessarily on out-of-order machines (most x86). The CPU executes > multiple paths speculatively and concurrently. If it does more on an > unused path, then it might do less on the used path. It may mispredict > the branch on the size of |z| and thus misguess which path to do more > on. (I don't know many details of this. For example, does it do > anything at all on paths predicted to be not taken?) Losses from this > are usually described as branch mispredictions. They might cost 20 > (50? 100?) cycles after taking 2 about cycles to actually check |z| > (2 cycles pipelined but more like + 8 in real time, > and it is the latter time that you lose by backing out). > > The only sure way to avoid branch mispredictions is to not have any, > and catrig is too complicated for that. Yes, but I did a time test. And in my case the test was almost always failing. > >> On the other hand let me go through the code and see what happens when >> |x| is small or |y| is small. There are actually specific formulas >> that work well in these two cases, and they are probably not that much >> slower than the formulas I decided to remove. And when you chase >> through all the logic and "if" statements, you may find that you >> didn't use up a whole bunch of time for these very special cases of >> |z| small - most of the extra time merely being the decisions invoked >> by the "if" statements. > > But all general cases end up going through an extern function like > acos() or atan2(), and just calling another function is a significant > overhead. When |z| is small, the arg(s) to the other function will > probably be an special case for it (e.g., acos(small)). The other > function should optimize this and not take as long as an average call. > However, since it is special, it may cause branch mispredictions for > other uses of the function. I understand what you are saying. I guess it just seems to me that the "proper" way to do it is to make the C compiler really awesome and do this for you. (Doesn't the Intel compiler try to embed functions inline if it knows it will speed things up)? >> Furthermore, casinh etc are not commonly used functions. Putting huge >> amounts of effort looking at special cases to speed it up a little >> somehow feels wrong to me. In fact, if the programmer knows that he >> will be wanting casinh, and evaluated very fast, then he should be >> motivated enough to try out using z in the case when |z| is small, and >> see if that really speeds things up. Well, if casinh goes 20% slower, your not going to be testing too many fewer cases. > True. Now I mainly want it to be fast so that I can test more cases. I understand. But putting those special cases into casinh offends my sense of taste. From owner-freebsd-numerics@FreeBSD.ORG Sun Sep 16 21:00:19 2012 Return-Path: Delivered-To: freebsd-numerics@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 9A7E7106564A for ; Sun, 16 Sep 2012 21:00:19 +0000 (UTC) (envelope-from stephen@missouri.edu) Received: from wilberforce.math.missouri.edu (wilberforce.math.missouri.edu [128.206.184.213]) by mx1.freebsd.org (Postfix) with ESMTP id F33068FC0A for ; Sun, 16 Sep 2012 21:00:06 +0000 (UTC) Received: from [127.0.0.1] (wilberforce.math.missouri.edu [128.206.184.213]) by wilberforce.math.missouri.edu (8.14.5/8.14.5) with ESMTP id q8GL05St065122; Sun, 16 Sep 2012 16:00:05 -0500 (CDT) (envelope-from stephen@missouri.edu) Message-ID: <50563DD5.4060303@missouri.edu> Date: Sun, 16 Sep 2012 16:00:05 -0500 From: Stephen Montgomery-Smith User-Agent: Mozilla/5.0 (X11; Linux i686; rv:15.0) Gecko/20120827 Thunderbird/15.0 MIME-Version: 1.0 To: Bruce Evans References: <5017111E.6060003@missouri.edu> <20120814201105.T934@besplex.bde.org> <502A780B.2010106@missouri.edu> <20120815223631.N1751@besplex.bde.org> <502C0CF8.8040003@missouri.edu> <20120906221028.O1542@besplex.bde.org> <5048D00B.8010401@missouri.edu> <504D3CCD.2050006@missouri.edu> <504FF726.9060001@missouri.edu> <20120912191556.F1078@besplex.bde.org> <20120912225847.J1771@besplex.bde.org> <50511B40.3070009@missouri.edu> <20120913204808.T1964@besplex.bde.org> <5051F59C.6000603@missouri.edu> <20120914014208.I2862@besplex.bde.org> <50526050.2070303@missouri.edu> <20120914212403.H1983@besplex.bde.org> <50538E28.6050400@missouri.edu> <20120915231032.C2669@besplex.bde.org> <50548E15.3010405@missouri.edu> <5054C027.2040008@missouri.edu> <5054C200.7090307@missouri.edu> <20120916041132.D6344@besplex.bde.org> <50553424.2080902@missouri.edu> <20120916134730.Y957@besplex.bde.org> <5055ECA8.2080008@missouri.edu> <20120917022614.R2943@besplex.bde.org> <20120917041848.F3504@besplex.b! de.org> In-Reply-To: <20120917041848.F3504@besplex.bde.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-numerics@freebsd.org Subject: Re: Complex arg-trig functions X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 16 Sep 2012 21:00:19 -0000 On 09/16/2012 02:53 PM, Bruce Evans wrote: > On Mon, 17 Sep 2012, Bruce Evans wrote: > >> On Sun, 16 Sep 2012, Stephen Montgomery-Smith wrote: >> >>> On 09/16/2012 12:14 AM, Bruce Evans wrote: >>>> On Sat, 15 Sep 2012, Stephen Montgomery-Smith wrote: >>>> >>>>> One more thing I would like an opinion on. >>>>> >>>>> In my code I check for |z| being small, and then use the >>>>> approximations: >>>>> casinh(z) = z >>>>> cacos(z) = Pi - z >>>> >>>> Actually Pi/2 - z. >>>> >>>>> catanh(z) = z >>> So all things being said and done, I am going to remove the use of >>> these approximations. > > It gives the expected pessimizations, and unexpected accuracy improvements > and unimprovements. On amd64: I got unexpected accuracy improvements as well! I thought it might just be a coincidence, so I ignored it. > @ @@ -313,5 +323,6 @@ > @ return (cpackf(copysignf(0, x), y+y)); > @ if (isinf(y)) > @ - return (cpackf(copysignf(0, x), copysignf(m_pi_2, y))); > @ + return (cpackf(copysignf(0, x), > @ + copysignf(m_pi_2 + tiny, y))); > @ if (x == 0) > @ return (cpackf(x, y+y)); > @ @@ -320,9 +331,16 @@ > @ @ if (isinf(x) || isinf(y)) > @ - return (cpackf(copysignf(0, x), copysignf(m_pi_2, y))); > @ + return (cpackf(copysignf(0, x), copysignf(m_pi_2 + tiny, y))); > @ @ if (ax > RECIP_EPSILON || ay > RECIP_EPSILON) > @ if ((int)(1+tiny)==1) > @ - return (cpackf(copysignf(real_part_reciprocal(ax, ay), > x), copysignf(m_pi_2, y))); > @ + return (cpackf( > @ + copysignf(real_part_reciprocal(ax, ay), x), > @ + copysignf(m_pi_2 + tiny, y))); > @ + > @ + /* XXX the numbers are related to sqrt(6 * FLT_EPSILON). */ > @ + if (ax < 2048 * FLT_EPSILON && ay < 2048 * FLT_EPSILON) > @ + if ((int)ax==0 && (int)ay==0) > @ + return (z); > @ @ if (ax == 1 && ay < FLT_EPSILON) { I implemented all the m_pi_2 + tiny changes. Let me still ponder the |z| being small issue. Or you can put that code back in when it is committed. From owner-freebsd-numerics@FreeBSD.ORG Mon Sep 17 17:15:57 2012 Return-Path: Delivered-To: freebsd-numerics@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 2D5DE1065670 for ; Mon, 17 Sep 2012 17:15:57 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail02.syd.optusnet.com.au (mail02.syd.optusnet.com.au [211.29.132.183]) by mx1.freebsd.org (Postfix) with ESMTP id A52638FC17 for ; Mon, 17 Sep 2012 17:15:56 +0000 (UTC) Received: from c122-106-157-84.carlnfd1.nsw.optusnet.com.au (c122-106-157-84.carlnfd1.nsw.optusnet.com.au [122.106.157.84]) by mail02.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id q8HH70PM016952 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Tue, 18 Sep 2012 03:15:53 +1000 Date: Tue, 18 Sep 2012 03:07:00 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Stephen Montgomery-Smith In-Reply-To: <50563C57.60806@missouri.edu> Message-ID: <20120918012459.V5094@besplex.bde.org> References: <5017111E.6060003@missouri.edu> <502C0CF8.8040003@missouri.edu> <20120906221028.O1542@besplex.bde.org> <5048D00B.8010401@missouri.edu> <504D3CCD.2050006@missouri.edu> <504FF726.9060001@missouri.edu> <20120912191556.F1078@besplex.bde.org> <20120912225847.J1771@besplex.bde.org> <50511B40.3070009@missouri.edu> <20120913204808.T1964@besplex.bde.org> <5051F59C.6000603@missouri.edu> <20120914014208.I2862@besplex.bde.org> <50526050.2070303@missouri.edu> <20120914212403.H1983@besplex.bde.org> <50538E28.6050400@missouri.edu> <20120915231032.C2669@besplex.bde.org> <50548E15.3010405@missouri.edu> <5054C027.2040008@missouri.edu> <5054C200.7090307@missouri.edu> <20120916041132.D6344@besplex.bde.org> <50553424.2080902@missouri.edu> <20120916134730.Y957@besplex.bde.org> <5055ECA8.2080008@missouri.edu> <20120917022614.R2943@besplex.bde.org> <50562213.9020400@missouri.edu> <20120917060116.G3825@besplex.bde.org> <50563C57.60806@missouri.edu> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-numerics@freebsd.org Subject: Re: Complex arg-trig functions X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 17 Sep 2012 17:15:57 -0000 On Sun, 16 Sep 2012, Stephen Montgomery-Smith wrote: > On 09/16/2012 03:29 PM, Bruce Evans wrote: >> On Sun, 16 Sep 2012, Stephen Montgomery-Smith wrote: >> ... >> The only sure way to avoid branch mispredictions is to not have any, >> and catrig is too complicated for that. > > Yes, but I did a time test. And in my case the test was almost always > failing. I test different data, with an over-emphasis on exceptional cases :-). >>> On the other hand let me go through the code and see what happens when >>> |x| is small or |y| is small. There are actually specific formulas >>> that work well in these two cases, and they are probably not that much >>> slower than the formulas I decided to remove. And when you chase I checked a few cases and didn't see any problems, but noticed some more things that could be handled by general code, giving the following minor optimizations (only done for float precision). >>> through all the logic and "if" statements, you may find that you >>> didn't use up a whole bunch of time for these very special cases of >>> |z| small - most of the extra time merely being the decisions invoked >>> by the "if" statements. Branch prediction is working very well, but I would prefer not to stress it unnecessarily. The data in my tests is also too uniformly ordered to stress the branch prediction. @ --- catrigf.c~ 2012-09-17 02:05:43.000000000 +0000 @ +++ catrigf.c 2012-09-17 15:21:59.560420000 +0000 @ @@ -157,12 +157,19 @@ @ } @ @ - if (ax > RECIP_EPSILON || ay > RECIP_EPSILON) @ - if (isinf(x) || isinf(y) || (int)(1+tiny)==1) { @ - if (signbit(x) == 0) @ - w = clog_for_large_values(z) + m_ln2; @ - else @ - w = clog_for_large_values(-z) + m_ln2; @ - return (cpackf(copysignf(crealf(w), x), copysignf(cimagf(w), y))); @ - } @ + if (ax > RECIP_EPSILON || ay > RECIP_EPSILON) { @ + /* clog...() will raise inexact unless x or y is infinite */ @ + if (signbit(x) == 0) @ + w = clog_for_large_values(z) + m_ln2; @ + else @ + w = clog_for_large_values(-z) + m_ln2; @ + return (cpackf(copysignf(crealf(w), x), copysignf(cimagf(w), y))); @ + } Trust the general code (clog()) to raise inexact appropriately. A previous version of this raised inexact by adding `tiny' to w in the correct order. realf(w) is large or infinite, so the expression (realf(w) + tiny + m_ln2) has the same value as (realf(w) + m_ln2) and raises inexact iff realf(w) != +Inf. But this addition is unnecessary. @ + @ +#if 0 @ + /* XXX the numbers are related to sqrt(6 * FLT_EPSILON). */ @ + if (ax < 2048 * FLT_EPSILON && ay < 2048 * FLT_EPSILON) @ + if ((int)ax==0 && (int)ay==0) @ + return (z); @ +#endif Previous optimization turned off for debugging. @ @ do_hard_work(ax, ay, &rx, &B_is_usable, &B, &sqrt_A2my2, &new_y); @ @@ -205,13 +212,19 @@ @ } @ @ - if (ax > RECIP_EPSILON || ay > RECIP_EPSILON) @ - if (isinf(x) || isinf(y) || (int)(1+tiny)==1) { @ - w = clog_for_large_values(z); @ - rx = fabsf(cimagf(w)); @ - ry = crealf(w) + m_ln2; @ - if (sy == 0) @ - ry = -ry; @ - return (cpackf(rx, ry)); @ - } @ + if (ax > RECIP_EPSILON || ay > RECIP_EPSILON) { @ + /* clog...() will raise inexact unless x or y is infinite */ @ + w = clog_for_large_values(z); @ + rx = fabsf(cimagf(w)); @ + ry = crealf(w) + m_ln2; @ + if (sy == 0) @ + ry = -ry; @ + return (cpackf(rx, ry)); @ + } As above. @ + @ +#if 0 @ + /* XXX the number for ay is related to sqrt(6 * FLT_EPSILON). */ @ + if (ax < FLT_EPSILON / 8 && ay < 2048 * FLT_EPSILON) @ + return (cpackf(m_pi_2 + tiny, -y)); @ +#endif Not quite the previous optimization turned off for debugging. It now raises inexact undconditionally by adding tiny to m_pi_2. This seems to actually be a minor pessimization, but I prefer it since it takes less code. The version using (int)(1+tiny) has the advantage that its result is not normally used, while the above is often used; the above does an extra operation in the often-used path. @ @ do_hard_work(ay, ax, &ry, &B_is_usable, &B, &sqrt_A2mx2, &new_x); @ @@ -321,30 +334,28 @@ @ } @ @ - if (isinf(x) || isinf(y)) @ - return (cpackf(copysignf(0, x), copysignf(m_pi_2 + tiny, y))); @ + /* Raise inexact unless z == 0; return for z == 0 as a side effect. */ @ + if ((x == 0 && y == 0) || (int)(1 + tiny) != 1) @ + return (z); Larger optimizations only done for catanhf(): First, the above removes the special code for handling infinities. These will be handled by the "large" case later. Second, it raises inexact for the one remaining case (z == 0) where the result is exact (all the other exact cases involve NaNs. Note that cases involving Infs return m_pi_2 for the imaginary part, so they are never exact). This patch doesn't show the removal of the code for raising inexact in sum_squares(). cacos*() and casin*() should benefit even more from an up-front raising of inexact, since do_hard_work() has 7 magic statements to raise inexact where sum_squares has only 1. @ @ if (ax > RECIP_EPSILON || ay > RECIP_EPSILON) @ - if ((int)(1+tiny)==1) @ - return (cpackf(copysignf(real_part_reciprocal(ax, ay), x), copysignf(m_pi_2, y))); @ + return (cpackf(copysignf(real_part_reciprocal(ax, ay), x), copysignf(m_pi_2, y))); Depend on inexact being raised up-front. There are no magic expressions (int)(1+tiny) left except the new up-front one. There are still not-so- magic expressions (m_pi_2 + tiny). BTW, most or all of the recent fixes to use the latter expressions don't have a comment about raising inexact in catrig.c, while most or all older expressions for setting inexact do have such a comment. A previous version of this optimization raised inexact by adding tiny to m_pi_2. @ + @ + /* XXX the numbers are related to sqrt(6 * FLT_EPSILON). */ @ + if (ax < 2048 * FLT_EPSILON && ay < 2048 * FLT_EPSILON) @ + return (z); Previous optimization not turned off for debugging. It is simpler now that it can depend on inexact being raised up-front. @ @ if (ax == 1 && ay < FLT_EPSILON) { @ - if ((int)ay==0) { @ - if ( ilogbf(ay) > FLT_MIN_EXP) @ - rx = - logf(ay/2) / 2; @ - else @ - rx = - (logf(ay) - m_ln2) / 2; @ - } @ + if (ilogbf(ay) > FLT_MIN_EXP) @ + rx = - logf(ay/2) / 2; @ + else @ + rx = - (logf(ay) - m_ln2) / 2; Depend on inexact being raised up-front. A previous version of this optimization depended instead on logf() raising inexact appropriately (since the arg is never 1, the result is always inexact). @ } else @ rx = log1pf(4*ax / sum_squares(ax-1, ay)) / 4; @ @ - if (ax == 1) { @ - if (ay==0) @ - ry = 0; @ - else @ - ry = atan2f(2, -ay) / 2; @ - } else if (ay < FOUR_SQRT_MIN) { @ - if ((int)ay==0) @ - ry = atan2f(2*ay, (1-ax)*(1+ax)) / 2; @ - } else @ + if (ax == 1) @ + ry = atan2f(2, ay) / 2; @ + else if (ay < FOUR_SQRT_MIN) @ + ry = atan2f(2*ay, (1-ax)*(1+ax)) / 2; @ + else @ ry = atan2f(2*ay, (1-ax)*(1+ax) - ay*ay) / 2; @ Remove the special case for (ax == 1, ay == 0). The general case gives the same result. The correctness of this probably depends on the sign considerations for the next change, and pointed to that change (here it seems to use +0 when y == 0 but -ay otherwise. Remove negation of ay for ax == 1. The sign will be copied into the result later for all cases, so it doesn't matter in the arg. I didn't check the branch cut details for this, but runtime tests passed. Since the sign doesn't matter, we could pass y instead of ay. I don't understand the threshold of FOUR_SQRT_MIN. ay*ay starts underflowing at SQRT_MIN. FOUR_SQRT_MIN seems to work, and has efficiency advantages. But large multiples of FOUR_SQRT_MIN also seem to work, and have larger efficiency advantages ... I now understand what the threshold should be. You have filtered out ax == 1. This makes 1 - ax*ax at least ~2*EPSILON, so ay*ay can be dropped if ay is less than sqrt(2*EPSILON*EPSILON) * 2**-GUARD_DIGITS = EPSILON * 2**-5 say. SQRT_MIN is way smaller than that, so FOUR_SQRT_MIN works too. We should use a larger threshold for efficiency, or avoid the special case for ax == 1. Testing shows that this analysis is off by a factor of about sqrt(EPSILON), since a threshold of EPSILON * 2**7 is optimal. The optimization made no difference to speed; it is just an optimization for understanding. Maybe the special case for ax == 1 can be avoided, or folded together with the same special case for evaluation of the real part. This special case is similar to the one in clog(), but easier. Further optimization: in sum_squares(), y is always ay >= 0, so there is no need to apply fabs*() to it. I think the compiler does this optimization. It can see that y == ay via the inline. BTW, do_hard_work() is usually not inlined, so the compiler wouldn't be able to do such optimizations. However, declaring it as __always_inline didn't improve the speed. Bruce From owner-freebsd-numerics@FreeBSD.ORG Mon Sep 17 22:50:28 2012 Return-Path: Delivered-To: freebsd-numerics@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id C951F106564A for ; Mon, 17 Sep 2012 22:50:28 +0000 (UTC) (envelope-from stephen@missouri.edu) Received: from wilberforce.math.missouri.edu (wilberforce.math.missouri.edu [128.206.184.213]) by mx1.freebsd.org (Postfix) with ESMTP id 57FDC8FC0A for ; Mon, 17 Sep 2012 22:50:28 +0000 (UTC) Received: from [127.0.0.1] (wilberforce.math.missouri.edu [128.206.184.213]) by wilberforce.math.missouri.edu (8.14.5/8.14.5) with ESMTP id q8HMoQkw083746; Mon, 17 Sep 2012 17:50:26 -0500 (CDT) (envelope-from stephen@missouri.edu) Message-ID: <5057A932.3000603@missouri.edu> Date: Mon, 17 Sep 2012 17:50:26 -0500 From: Stephen Montgomery-Smith User-Agent: Mozilla/5.0 (X11; Linux i686; rv:15.0) Gecko/20120827 Thunderbird/15.0 MIME-Version: 1.0 To: Bruce Evans References: <5017111E.6060003@missouri.edu> <502C0CF8.8040003@missouri.edu> <20120906221028.O1542@besplex.bde.org> <5048D00B.8010401@missouri.edu> <504D3CCD.2050006@missouri.edu> <504FF726.9060001@missouri.edu> <20120912191556.F1078@besplex.bde.org> <20120912225847.J1771@besplex.bde.org> <50511B40.3070009@missouri.edu> <20120913204808.T1964@besplex.bde.org> <5051F59C.6000603@missouri.edu> <20120914014208.I2862@besplex.bde.org> <50526050.2070303@missouri.edu> <20120914212403.H1983@besplex.bde.org> <50538E28.6050400@missouri.edu> <20120915231032.C2669@besplex.bde.org> <50548E15.3010405@missouri.edu> <5054C027.2040008@missouri.edu> <5054C200.7090307@missouri.edu> <20120916041132.D6344@besplex.bde.org> <50553424.2080902@missouri.edu> <20120916134730.Y957@besplex.bde.org> <5055ECA8.2080008@missouri.edu> <20120917022614.R2943@besplex.bde.org> <50562213.9020400@missouri.edu> <20120917060116.G3825@besplex.bde.org> <50563C57.60806@missouri.edu> <20120918012459.V5094@besplex.bde.org> In-Reply-To: <20120918012459.V5094@besplex.bde.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-numerics@freebsd.org Subject: Re: Complex arg-trig functions X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 17 Sep 2012 22:50:28 -0000 OK, I am struggling a bit with the latest suggestions. First, I have completely removed all the code related to when |z| is small. I have just lost it all. So I didn't perform any changes related to that code. If you want me to put it back with appropriate "#if 0", can you email those code segments back to me? On 09/17/2012 12:07 PM, Bruce Evans wrote: > @ @@ -321,30 +334,28 @@ > @ } > @ @ - if (isinf(x) || isinf(y)) > @ - return (cpackf(copysignf(0, x), copysignf(m_pi_2 + tiny, y))); > @ + /* Raise inexact unless z == 0; return for z == 0 as a side > effect. */ > @ + if ((x == 0 && y == 0) || (int)(1 + tiny) != 1) > @ + return (z); I'm not too sure where this code is meant to be. It looks like it should be part of testing |z| small, but it seems to be placed where |z| is large. When |z| is large, z=0 will never happen. > cacos*() and casin*() should benefit even more from an up-front raising > of inexact, since do_hard_work() has 7 magic statements to raise inexact > where sum_squares has only 1. Where is the code that raises inexact up-front? > There are no magic expressions (int)(1+tiny) left except the new up-front > one. There are still not-so- magic expressions (m_pi_2 + tiny). BTW, > most or all of the recent fixes to use the latter expressions don't > have a comment about raising inexact in catrig.c, while most or all > older expressions for setting inexact do have such a comment. I put the comments in. > Previous optimization not turned off for debugging. It is simpler now > that it can depend on inexact being raised up-front. Ditto. Which code turns on inexact up front? > @ } else > @ rx = log1pf(4*ax / sum_squares(ax-1, ay)) / 4; > @ @ - if (ax == 1) { > @ - if (ay==0) > @ - ry = 0; > @ - else > @ - ry = atan2f(2, -ay) / 2; > @ - } else if (ay < FOUR_SQRT_MIN) { > @ - if ((int)ay==0) > @ - ry = atan2f(2*ay, (1-ax)*(1+ax)) / 2; > @ - } else > @ + if (ax == 1) > @ + ry = atan2f(2, ay) / 2; > @ + else if (ay < FOUR_SQRT_MIN) > @ + ry = atan2f(2*ay, (1-ax)*(1+ax)) / 2; > @ + else > @ ry = atan2f(2*ay, (1-ax)*(1+ax) - ay*ay) / 2; > @ > > Remove the special case for (ax == 1, ay == 0). The general case gives > the same result. I don't think your code works. It should be ry = atan2f(2, -ay) / 2, not ry = atan2f(2, ay) / 2. In your tests, you should include cases where x or y is equal or close to 1. These are important special cases that I think your test code is very unlikely to hit. These are difficult edge cases for all the arc-trig functions. > Remove negation of ay for ax == 1. The sign will be copied into the result > later for all cases, so it doesn't matter in the arg. I didn't check the > branch cut details for this, but runtime tests passed. See above. > ... I now understand what the threshold should be. You have > filtered out ax == 1. This makes 1 - ax*ax at least ~2*EPSILON, so > ay*ay can be dropped if ay is less than sqrt(2*EPSILON*EPSILON) * > 2**-GUARD_DIGITS = EPSILON * 2**-5 say. SQRT_MIN is way smaller > than that, so FOUR_SQRT_MIN works too. We should use a larger > threshold for efficiency, or avoid the special case for ax == 1. > Testing shows that this analysis is off by a factor of about > sqrt(EPSILON), since a threshold of EPSILON * 2**7 is optimal. > The optimization made no difference to speed; it is just an > optimization for understanding. Maybe the special case for ax == 1 > can be avoided, or folded together with the same special case for > evaluation of the real part. This special case is similar to the > one in clog(), but easier. This was one of the clever ideas in the paper by Hull et al, which I only understood recently. Their code was closer to your approach, I think. Let me think about what you wrote some more. > > Further optimization: in sum_squares(), y is always ay >= 0, so there > is no need to apply fabs*() to it. I think the compiler does this > optimization. It can see that y == ay via the inline. Well spotted. From owner-freebsd-numerics@FreeBSD.ORG Mon Sep 17 22:59:52 2012 Return-Path: Delivered-To: freebsd-numerics@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 5BB3A10657E6 for ; Mon, 17 Sep 2012 22:59:50 +0000 (UTC) (envelope-from stephen@missouri.edu) Received: from wilberforce.math.missouri.edu (wilberforce.math.missouri.edu [128.206.184.213]) by mx1.freebsd.org (Postfix) with ESMTP id 7934F8FC18 for ; Mon, 17 Sep 2012 22:59:48 +0000 (UTC) Received: from [127.0.0.1] (wilberforce.math.missouri.edu [128.206.184.213]) by wilberforce.math.missouri.edu (8.14.5/8.14.5) with ESMTP id q8HMxlQ8084614 for ; Mon, 17 Sep 2012 17:59:47 -0500 (CDT) (envelope-from stephen@missouri.edu) Message-ID: <5057AB63.7040606@missouri.edu> Date: Mon, 17 Sep 2012 17:59:47 -0500 From: Stephen Montgomery-Smith User-Agent: Mozilla/5.0 (X11; Linux i686; rv:15.0) Gecko/20120827 Thunderbird/15.0 MIME-Version: 1.0 To: freebsd-numerics@freebsd.org References: <5017111E.6060003@missouri.edu> <20120906221028.O1542@besplex.bde.org> <5048D00B.8010401@missouri.edu> <504D3CCD.2050006@missouri.edu> <504FF726.9060001@missouri.edu> <20120912191556.F1078@besplex.bde.org> <20120912225847.J1771@besplex.bde.org> <50511B40.3070009@missouri.edu> <20120913204808.T1964@besplex.bde.org> <5051F59C.6000603@missouri.edu> <20120914014208.I2862@besplex.bde.org> <50526050.2070303@missouri.edu> <20120914212403.H1983@besplex.bde.org> <50538E28.6050400@missouri.edu> <20120915231032.C2669@besplex.bde.org> <50548E15.3010405@missouri.edu> <5054C027.2040008@missouri.edu> <5054C200.7090307@missouri.edu> <20120916041132.D6344@besplex.bde.org> <50553424.2080902@missouri.edu> <20120916134730.Y957@besplex.bde.org> <5055ECA8.2080008@missouri.edu> <20120917022614.R2943@besplex.bde.org> <50562213.9020400@missouri.edu> <20120917060116.G3825@besplex.bde.org> <50563C57.60806@missouri.edu> <20120918012459.V5094@besplex.bde.org> <5057A932.3000603@missouri.edu> In-Reply-To: <5057A932.3000603@missouri.edu> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: Complex arg-trig functions X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 17 Sep 2012 22:59:52 -0000 On 09/17/2012 05:50 PM, Stephen Montgomery-Smith wrote: > In your tests, you should include cases where x or y is equal or close > to 1. These are important special cases that I think your test code is > very unlikely to hit. These are difficult edge cases for all the > arc-trig functions. And just to be sure, x or y is equal or close to -1 as well. From owner-freebsd-numerics@FreeBSD.ORG Tue Sep 18 04:02:21 2012 Return-Path: Delivered-To: freebsd-numerics@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 935B41065672 for ; Tue, 18 Sep 2012 04:02:21 +0000 (UTC) (envelope-from stephen@missouri.edu) Received: from wilberforce.math.missouri.edu (wilberforce.math.missouri.edu [128.206.184.213]) by mx1.freebsd.org (Postfix) with ESMTP id 3335E8FC12 for ; Tue, 18 Sep 2012 04:02:20 +0000 (UTC) Received: from [127.0.0.1] (wilberforce.math.missouri.edu [128.206.184.213]) by wilberforce.math.missouri.edu (8.14.5/8.14.5) with ESMTP id q8I42JSt005318 for ; Mon, 17 Sep 2012 23:02:19 -0500 (CDT) (envelope-from stephen@missouri.edu) Message-ID: <5057F24B.7020605@missouri.edu> Date: Mon, 17 Sep 2012 23:02:19 -0500 From: Stephen Montgomery-Smith User-Agent: Mozilla/5.0 (X11; Linux i686; rv:15.0) Gecko/20120827 Thunderbird/15.0 MIME-Version: 1.0 To: freebsd-numerics@freebsd.org References: <5017111E.6060003@missouri.edu> <20120906221028.O1542@besplex.bde.org> <5048D00B.8010401@missouri.edu> <504D3CCD.2050006@missouri.edu> <504FF726.9060001@missouri.edu> <20120912191556.F1078@besplex.bde.org> <20120912225847.J1771@besplex.bde.org> <50511B40.3070009@missouri.edu> <20120913204808.T1964@besplex.bde.org> <5051F59C.6000603@missouri.edu> <20120914014208.I2862@besplex.bde.org> <50526050.2070303@missouri.edu> <20120914212403.H1983@besplex.bde.org> <50538E28.6050400@missouri.edu> <20120915231032.C2669@besplex.bde.org> <50548E15.3010405@missouri.edu> <5054C027.2040008@missouri.edu> <5054C200.7090307@missouri.edu> <20120916041132.D6344@besplex.bde.org> <50553424.2080902@missouri.edu> <20120916134730.Y957@besplex.bde.org> <5055ECA8.2080008@missouri.edu> <20120917022614.R2943@besplex.bde.org> <50562213.9020400@missouri.edu> <20120917060116.G3825@besplex.bde.org> <50563C57.60806@missouri.edu> <20120918012459.V5094@besplex.bde.org> <5057A932.3000603@missouri.edu> In-Reply-To: <5057A932.3000603@missouri.edu> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: Complex arg-trig functions X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 18 Sep 2012 04:02:21 -0000 On 09/17/2012 05:50 PM, Stephen Montgomery-Smith wrote: >> cacos*() and casin*() should benefit even more from an up-front raising >> of inexact, since do_hard_work() has 7 magic statements to raise inexact >> where sum_squares has only 1. > > Where is the code that raises inexact up-front? I don't see why having code upfront will make it much more efficient. Out of these 7 magic statements, at most two of them will be called. But I could put something like if ((x == 0 && y == 0) || (x == 0 && y == 1) || (int)(1+tiny) == 1) { ........ at the beginning of do_hard_work and catanh. >> ... I now understand what the threshold should be. You have >> filtered out ax == 1. This makes 1 - ax*ax at least ~2*EPSILON, so >> ay*ay can be dropped if ay is less than sqrt(2*EPSILON*EPSILON) * >> 2**-GUARD_DIGITS = EPSILON * 2**-5 say. SQRT_MIN is way smaller >> than that, so FOUR_SQRT_MIN works too. We should use a larger >> threshold for efficiency, or avoid the special case for ax == 1. >> Testing shows that this analysis is off by a factor of about >> sqrt(EPSILON), since a threshold of EPSILON * 2**7 is optimal. >> The optimization made no difference to speed; it is just an >> optimization for understanding. Maybe the special case for ax == 1 >> can be avoided, or folded together with the same special case for >> evaluation of the real part. This special case is similar to the >> one in clog(), but easier. OK, I think I made changes more or less according to your suggestions. In the case A < A_crossover, a threshold like DBL_EPSILON*DBL_EPSILON/128 is required. I think the one you set is too large. It is important that sqrt(x) + x/2 is sqrt(x). (Again I don't think your tests would pick this up, because you need to do a lot of tests where y is close to or equal to 1.) From owner-freebsd-numerics@FreeBSD.ORG Tue Sep 18 06:19:26 2012 Return-Path: Delivered-To: freebsd-numerics@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C0E491065673 for ; Tue, 18 Sep 2012 06:19:26 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail06.syd.optusnet.com.au (mail06.syd.optusnet.com.au [211.29.132.187]) by mx1.freebsd.org (Postfix) with ESMTP id 451178FC16 for ; Tue, 18 Sep 2012 06:19:25 +0000 (UTC) Received: from c122-106-157-84.carlnfd1.nsw.optusnet.com.au (c122-106-157-84.carlnfd1.nsw.optusnet.com.au [122.106.157.84]) by mail06.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id q8I6JMOn031723 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Tue, 18 Sep 2012 16:19:23 +1000 Date: Tue, 18 Sep 2012 16:19:22 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Stephen Montgomery-Smith In-Reply-To: <5057A932.3000603@missouri.edu> Message-ID: <20120918150551.Y820@besplex.bde.org> References: <5017111E.6060003@missouri.edu> <5048D00B.8010401@missouri.edu> <504D3CCD.2050006@missouri.edu> <504FF726.9060001@missouri.edu> <20120912191556.F1078@besplex.bde.org> <20120912225847.J1771@besplex.bde.org> <50511B40.3070009@missouri.edu> <20120913204808.T1964@besplex.bde.org> <5051F59C.6000603@missouri.edu> <20120914014208.I2862@besplex.bde.org> <50526050.2070303@missouri.edu> <20120914212403.H1983@besplex.bde.org> <50538E28.6050400@missouri.edu> <20120915231032.C2669@besplex.bde.org> <50548E15.3010405@missouri.edu> <5054C027.2040008@missouri.edu> <5054C200.7090307@missouri.edu> <20120916041132.D6344@besplex.bde.org> <50553424.2080902@missouri.edu> <20120916134730.Y957@besplex.bde.org> <5055ECA8.2080008@missouri.edu> <20120917022614.R2943@besplex.bde.org> <50562213.9020400@missouri.edu> <20120917060116.G3825@besplex.bde.org> <50563C57.60806@missouri.edu> <20120918012459.V5094@besplex.bde.org> <5057A932.3000603@missouri.edu> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-numerics@freebsd.org Subject: Re: Complex arg-trig functions X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 18 Sep 2012 06:19:26 -0000 On Mon, 17 Sep 2012, Stephen Montgomery-Smith wrote: > OK, I am struggling a bit with the latest suggestions. > > First, I have completely removed all the code related to when |z| is small. > I have just lost it all. So I didn't perform any changes related to that > code. If you want me to put it back with appropriate "#if 0", can you email > those code segments back to me? I have not :-). It is also quoted in the mail archives. Will sent it in my next patch. > On 09/17/2012 12:07 PM, Bruce Evans wrote: > >> @ @@ -321,30 +334,28 @@ >> @ } >> @ @ - if (isinf(x) || isinf(y)) >> @ - return (cpackf(copysignf(0, x), copysignf(m_pi_2 + tiny, y))); >> @ + /* Raise inexact unless z == 0; return for z == 0 as a side >> effect. */ >> @ + if ((x == 0 && y == 0) || (int)(1 + tiny) != 1) >> @ + return (z); > > I'm not too sure where this code is meant to be. It looks like it should be > part of testing |z| small, but it seems to be placed where |z| is large. > When |z| is large, z=0 will never happen. As its comment says, this raises inexact [up front] unless z == 0, and [so that the test is not optimized away] returns [z] for z == 0 as a side effect. This is for any z that has not been previously classified (mainly ones with NaNs). Its operation is: - z == 0: find x == 0 and y == 0 and return z - z != 0: find !(x == 0 and y == 0); evaluate (int)(1 + tiny) != 1 and find it to be false while raising inexact; don't return z, but continue with inexact set. >> cacos*() and casin*() should benefit even more from an up-front raising >> of inexact, since do_hard_work() has 7 magic statements to raise inexact >> where sum_squares has only 1. > > Where is the code that raises inexact up-front? As quoted above. Later I tried removing all the 7 magic statements in do_hard_work(), without adding code like the above. This made very little difference. OTOH, the above code costs a cycle or 2, and removing the additions in all magic expressions (m_pi_2 + tiny) gave a small improvement. I think I can explain this, and it shows that we should be using fenv (optimized) and not "optimizing" using magic constext-sensitive expressions. The point is that the code that sets inexact can run in parallel so that the main path can run faster because it doesn't involve an operation like (m_pi_2 + tiny). Good ways for raising exceptions: FE_INEXACT: Your if '((int)(1 + tiny) == 1) return (foo);' works well. This depends on the branch being predictable. But returning is inconvenient. I hope if '((int)(1 + tiny) == 1) volatile_variable = 0;' works similarly. This could be in feraiseexcept(FE_INEXACT) or in a more primitive raise_inexact() (the latter is less verbose and easier to optimize). Then if you actually want to return, the code would be something like { raise_inexact(); return (m_pi_2); }. Better, the branch can be avoided using something like `volatile_variable = (int)(1 + tiny);'. Better still, write this in asm and just do `(int)0.5;' (use asm only to avoid the optimizer removing this). Possibly better still, use a purer FP operation since conversion to int can be slow. In the above, we don't really want a special case for z == 0; we need branches to classify this case but should skip the return since returns use branch resources too. The code becomes: if (x != 0 || y != 0) raise_inexact(); /* No comment. */ #if !THIS_CODE_INTENTIONALLY_LEFT_OUT else return (z); /* No comment. */ #endif FE_OVERFLOW: Instead of evaluating huge*huge and returning it, use something like `volatile_variable = huge*huge; return (INFINITY);'. This is more natural than the above, so it takes at most 1 more instruction (assignment to variable with no dependents) and thus loses little even without parallelism. The version written in asm can also avoid the assignment (just evaluate huge*huge) and lose nothing. FE_UNDERFLOW: Instead of evaluating tiny*tiny and returning it, use something like `volatile_variable = tiny*tiny; return (0);'. I hope there is a variation on this that raises underflow at full speed (underflowing cases are very slow on core2 although not on Athlon64; hopefully they are not so slow if the result of tiny*tiny is not used). The last 2 raisings will also fix the i386 bug that huge*huge and tiny*tiny don't actually raise overflow or underflow or return infinity or 0, since they are evaluated in extra exponent range. It takes conversion to double or float to trigger the exception and to give the correct value. When we try to raise exceptions in a parallel code path, we are hoping for related asynchronicities in the setting of the exception flags so that the usual case where the exception flags are not tested soon proceeds at full speed. It is unclear how compilers and CPUs produce the ordering of operations required by the abstract C machines -- I think a strict interpretation of `volatile' would require synchronizing everything for every access to a volatile variable, but that would be too slow and I've never seen compilers doing much synchronization. >> @ } else >> @ rx = log1pf(4*ax / sum_squares(ax-1, ay)) / 4; >> @ @ - if (ax == 1) { >> @ - if (ay==0) >> @ - ry = 0; >> @ - else >> @ - ry = atan2f(2, -ay) / 2; >> @ - } else if (ay < FOUR_SQRT_MIN) { >> @ - if ((int)ay==0) >> @ - ry = atan2f(2*ay, (1-ax)*(1+ax)) / 2; >> @ - } else >> @ + if (ax == 1) >> @ + ry = atan2f(2, ay) / 2; >> @ + else if (ay < FOUR_SQRT_MIN) >> @ + ry = atan2f(2*ay, (1-ax)*(1+ax)) / 2; >> @ + else >> @ ry = atan2f(2*ay, (1-ax)*(1+ax) - ay*ay) / 2; >> @ >> >> Remove the special case for (ax == 1, ay == 0). The general case gives >> the same result. > > I don't think your code works. It should be ry = atan2f(2, -ay) / 2, not ry > = atan2f(2, ay) / 2. Only logically. As I explained, the negation makes no difference to the result, but of course takes longer, so I removed it. > In your tests, you should include cases where x or y is equal or close to 1. > These are important special cases that I think your test code is very > unlikely to hit. These are difficult edge cases for all the arc-trig > functions. Hmm, I only did this carefully for clog(). I happen to have been testing lots of cases ctanh(1 + tiny, tiny') where tiny* is really tiny (denormal) with either sign, but not so many cases ctanh(1, tiny') and no (?) cases of ctanh(1 + tiny, +-0). >> Remove negation of ay for ax == 1. The sign will be copied into the result >> later for all cases, so it doesn't matter in the arg. I didn't check the >> branch cut details for this, but runtime tests passed. > > See above. I might have missed this. But if the sign matters, why do you set ry = +0 for catanh on both sides of 1 + I*(+-0)? Bruce From owner-freebsd-numerics@FreeBSD.ORG Tue Sep 18 06:41:57 2012 Return-Path: Delivered-To: freebsd-numerics@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 85CA6106566B for ; Tue, 18 Sep 2012 06:41:57 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail13.syd.optusnet.com.au (mail13.syd.optusnet.com.au [211.29.132.194]) by mx1.freebsd.org (Postfix) with ESMTP id 128AC8FC12 for ; Tue, 18 Sep 2012 06:41:56 +0000 (UTC) Received: from c122-106-157-84.carlnfd1.nsw.optusnet.com.au (c122-106-157-84.carlnfd1.nsw.optusnet.com.au [122.106.157.84]) by mail13.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id q8I6frVm020384 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Tue, 18 Sep 2012 16:41:54 +1000 Date: Tue, 18 Sep 2012 16:41:53 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Stephen Montgomery-Smith In-Reply-To: <5057F24B.7020605@missouri.edu> Message-ID: <20120918162105.U991@besplex.bde.org> References: <5017111E.6060003@missouri.edu> <504D3CCD.2050006@missouri.edu> <504FF726.9060001@missouri.edu> <20120912191556.F1078@besplex.bde.org> <20120912225847.J1771@besplex.bde.org> <50511B40.3070009@missouri.edu> <20120913204808.T1964@besplex.bde.org> <5051F59C.6000603@missouri.edu> <20120914014208.I2862@besplex.bde.org> <50526050.2070303@missouri.edu> <20120914212403.H1983@besplex.bde.org> <50538E28.6050400@missouri.edu> <20120915231032.C2669@besplex.bde.org> <50548E15.3010405@missouri.edu> <5054C027.2040008@missouri.edu> <5054C200.7090307@missouri.edu> <20120916041132.D6344@besplex.bde.org> <50553424.2080902@missouri.edu> <20120916134730.Y957@besplex.bde.org> <5055ECA8.2080008@missouri.edu> <20120917022614.R2943@besplex.bde.org> <50562213.9020400@missouri.edu> <20120917060116.G3825@besplex.bde.org> <50563C57.60806@missouri.edu> <20120918012459.V5094@besplex.bde.org> <5057A932.3000603@missouri.edu> <5057F24B.7020605@missouri.edu> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-numerics@freebsd.org Subject: Re: Complex arg-trig functions X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 18 Sep 2012 06:41:57 -0000 On Mon, 17 Sep 2012, Stephen Montgomery-Smith wrote: > On 09/17/2012 05:50 PM, Stephen Montgomery-Smith wrote: > >>> cacos*() and casin*() should benefit even more from an up-front raising >>> of inexact, since do_hard_work() has 7 magic statements to raise inexact >>> where sum_squares has only 1. >> >> Where is the code that raises inexact up-front? > > I don't see why having code upfront will make it much more efficient. Out of > these 7 magic statements, at most two of them will be called. 7 instead of 1 is more complex, and uses more branch prediction resources. > But I could put something like > > if ((x == 0 && y == 0) || (x == 0 && y == 1) || (int)(1+tiny) == 1) { > ........ > at the beginning of do_hard_work and catanh. I put without (x == 0 && y == 1) in catanh(). (x == 0 && y == 1) in it is a bug, since catanh(I) = I*Pi/2 with inexact. However, I seemed to have missed (x == 1 && y == 0) -> catanh(1) = +Inf without inexact. do_hard_work() is too late for this, since the following earlier cases also need it: - large x or y (neither infinite) - small x and y (not both 0, except for acosh(0) = Pi/2 with inexact, etc.) (the lost optimization). > OK, I think I made changes more or less according to your suggestions. > > In the case A < A_crossover, a threshold like DBL_EPSILON*DBL_EPSILON/128 is > required. I think the one you set is too large. It is important that > sqrt(x) + x/2 is sqrt(x). (Again I don't think your tests would pick this > up, because you need to do a lot of tests where y is close to or equal to 1.) Well, there were 2**12 of them with y = 1+denormal, with 7 different denormals, but none with y = 1. Will test some more. (I'm testing denormals with a few 1's in their lower bits since experience shows that values with 0's in their lower bits are too special. For example, ax*ax is exact if enough lower bits in ax are 0.) Bruce From owner-freebsd-numerics@FreeBSD.ORG Tue Sep 18 14:15:49 2012 Return-Path: Delivered-To: freebsd-numerics@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id AD6ED106566B for ; Tue, 18 Sep 2012 14:15:49 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail13.syd.optusnet.com.au (mail13.syd.optusnet.com.au [211.29.132.194]) by mx1.freebsd.org (Postfix) with ESMTP id 09B938FC19 for ; Tue, 18 Sep 2012 14:15:48 +0000 (UTC) Received: from c122-106-157-84.carlnfd1.nsw.optusnet.com.au (c122-106-157-84.carlnfd1.nsw.optusnet.com.au [122.106.157.84]) by mail13.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id q8IEFeHX012910 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Wed, 19 Sep 2012 00:15:46 +1000 Date: Wed, 19 Sep 2012 00:15:40 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Bruce Evans In-Reply-To: <20120918162105.U991@besplex.bde.org> Message-ID: <20120918232850.N2144@besplex.bde.org> References: <5017111E.6060003@missouri.edu> <504FF726.9060001@missouri.edu> <20120912191556.F1078@besplex.bde.org> <20120912225847.J1771@besplex.bde.org> <50511B40.3070009@missouri.edu> <20120913204808.T1964@besplex.bde.org> <5051F59C.6000603@missouri.edu> <20120914014208.I2862@besplex.bde.org> <50526050.2070303@missouri.edu> <20120914212403.H1983@besplex.bde.org> <50538E28.6050400@missouri.edu> <20120915231032.C2669@besplex.bde.org> <50548E15.3010405@missouri.edu> <5054C027.2040008@missouri.edu> <5054C200.7090307@missouri.edu> <20120916041132.D6344@besplex.bde.org> <50553424.2080902@missouri.edu> <20120916134730.Y957@besplex.bde.org> <5055ECA8.2080008@missouri.edu> <20120917022614.R2943@besplex.bde.org> <50562213.9020400@missouri.edu> <20120917060116.G3825@besplex.bde.org> <50563C57.60806@missouri.edu> <20120918012459.V5094@besplex.bde.org> <5057A932.3000603@missouri.edu> <5057F24B.7020605@missouri.edu> <20120918162105.U991@besplex.bde.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: Stephen Montgomery-Smith , freebsd-numerics@FreeBSD.org Subject: Re: Complex arg-trig functions X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 18 Sep 2012 14:15:50 -0000 On Tue, 18 Sep 2012, Bruce Evans wrote: > On Mon, 17 Sep 2012, Stephen Montgomery-Smith wrote: > ... >> But I could put something like >> >> if ((x == 0 && y == 0) || (x == 0 && y == 1) || (int)(1+tiny) == 1) { >> ........ >> at the beginning of do_hard_work and catanh. > > I put without (x == 0 && y == 1) in catanh(). (x == 0 && y == 1) in it > is a bug, since catanh(I) = I*Pi/2 with inexact. However, I seemed to > have missed (x == 1 && y == 0) -> catanh(1) = +Inf without inexact. I also broke cases with infinities... >> In the case A < A_crossover, a threshold like DBL_EPSILON*DBL_EPSILON/128 >> is required. I think the one you set is too large. It is important that >> sqrt(x) + x/2 is sqrt(x). (Again I don't think your tests would pick this >> up, because you need to do a lot of tests where y is close to or equal to >> 1.) > > Well, there were 2**12 of them with y = 1+denormal, with 7 different > denormals, but none with y = 1. Will test some more. (I'm testing ... and many cases with ax or ay precisely 1, due to not testing these. Fixing these and finding a few more simplifications and optimizations gives: @ diff -u2 catrig.c~ catrig.c @ --- catrig.c~ 2012-09-18 03:42:32.000000000 +0000 @ +++ catrig.c 2012-09-18 11:53:28.017331000 +0000 @ @@ -261,4 +261,6 @@ @ @ /* @ + * casinh(z) = z + O(|z|^3) as z -> 0 @ + * @ * casinh(z) = sign(x)*clog(sign(x)*z) + O(1/|z|^2) as z -> infinity @ * The above formula works for the imaginary part as well, because Part of restoring your old optimization -- fix the comments. @ @@ -297,4 +299,5 @@ @ @ if (ax > RECIP_EPSILON || ay > RECIP_EPSILON) { @ + /* clog...() will raise inexact unless x or y is infinite. */ @ if (signbit(x) == 0) @ w = clog_for_large_values(z) + m_ln2; Further minimal changes for the double precision case -- try to document all magic for setting inexact. @ @@ -304,4 +307,8 @@ @ } @ @ + if (ax < DBL_EPSILON && ay < DBL_EPSILON) @ + if ((int)ax==0 && (int)ay==0) /* raise inexact */ @ + return (z); @ + @ do_hard_work(ax, ay, &rx, &B_is_usable, &B, &sqrt_A2my2, &new_y); @ if (B_is_usable) Your old optimization. Not done as completely as in float precision. @ @@ -328,4 +335,6 @@ @ * close to 1. @ * @ + * cacos(z) = PI/2 - z + O(|z|^3) as z -> 0 @ + * @ * cacos(z) = -sign(y)*I*clog(z) + O(1/|z|^2) as z -> infinity @ * The above formula works for the real part as well, because @ @@ -355,6 +364,6 @@ @ if (isinf(y)) @ return (cpack(x+x, -y)); @ - /* cacos(0 + I*NaN) = PI/2 + I*NaN */ @ - if (x == 0) return (cpack(m_pi_2 + tiny, y+y)); /* raise inexact */ @ + /* cacos(0 + I*NaN) = PI/2 + I*NaN with inexact */ @ + if (x == 0) return (cpack(m_pi_2 + tiny, y+y)); @ /* @ * All other cases involving NaN return NaN + I*NaN. Comments about exceptions raised should be together with comments about values returned, at least if we can't attach them closely to the magic that raises them. @ @@ -366,4 +375,5 @@ @ @ if (ax > RECIP_EPSILON || ay > RECIP_EPSILON) { @ + /* clog...() will raise inexact unless x or y is infinite. */ @ w = clog_for_large_values(z); @ rx = fabs(cimag(w)); @ @@ -374,4 +384,7 @@ @ } @ @ + if (ax < DBL_EPSILON && ay < DBL_EPSILON) @ + return (cpack(m_pi_2 + tiny - x, -y)); /* raise inexact */ @ + @ do_hard_work(ay, ax, &ry, &B_is_usable, &B, &sqrt_A2mx2, &new_x); @ if (B_is_usable) { Your old optimization, updated to raise inexact by adding tiny. Not updated to avoid subtracting x -- see the float precision code for that. The fixed comment above goes with this subtraction -- without it, the approximation would be Pi/2 + O(z). @ @@ -517,4 +530,6 @@ @ * + I * atan2(2*y, (1-x)*(1+x)-y*y) / 2 @ * @ + * catanh(z) = z + O(|z|^3) as z -> 0 @ + * @ * catanh(z) = 1/z + sign(y)*I*PI/2 + O(1/|z|^3) as z -> infinity @ * The above formula works for the real part as well, because @ @@ -536,7 +551,7 @@ @ if (isinf(x)) @ return (cpack(copysign(0, x), y+y)); @ - /* catanh(NaN + I*+-Inf) = sign(NaN)0 + I*+-PI/2 */ @ + /* catanh(NaN + I*+-Inf) = sign(NaN)0 + I*+-PI/2 with inexact */ @ if (isinf(y)) @ - return (cpack(copysign(0, x), copysign(m_pi_2 + tiny, y))); /* raise inexact */ @ + return (cpack(copysign(0, x), copysign(m_pi_2 + tiny, y))); @ /* catanh(+-0 + I*NaN) = +-0 + I*NaN */ @ if (x == 0) @ @@ -550,4 +565,5 @@ @ } @ @ + /* XXX should improve following comments. */ @ /* If x or y is inf, then catanh(x + I*y) = 0 + I*sign(y)*PI/2 */ @ if (isinf(x) || isinf(y)) Here there was no space for commenting about the exceptions. The sign of the 0 is not documented, but there is no space for that either. It is in the code as a copysign(). So is the sign for PI/2, but that is in the comment too. @ @@ -557,6 +573,10 @@ @ return (cpack(copysign(real_part_reciprocal(ax, ay), x), copysign(m_pi_2 + tiny, y))); /* raise inexact */ @ @ + if (ax < DBL_EPSILON && ay < DBL_EPSILON) @ + if ((int)ax==0 && (int)ay==0) /* raise inexact */ @ + return (z); @ + Your old optimization. It also improves accuacy significantly -- see the float precision comment. @ if (ax == 1 && ay < DBL_EPSILON) { @ - if ((int)ay==0) { /* raise inexact */ @ + if (1) { /* inexact will be raised by log() */ @ /* @ * If ay == 0, divide-by-zero will be (correctly) I didn't re-indent this. @ diff -u2 catrigf.c~ catrigf.c @ --- catrigf.c~ 2012-09-18 03:42:35.000000000 +0000 @ +++ catrigf.c 2012-09-18 13:23:20.972740000 +0000 @ @@ -165,4 +165,9 @@ @ } @ @ + /* XXX the numbers are related to sqrt(6 * FLT_EPSILON). */ @ + if (ax < 2048 * FLT_EPSILON && ay < 2048 * FLT_EPSILON) @ + if ((int)ax==0 && (int)ay==0) @ + return (z); @ + @ do_hard_work(ax, ay, &rx, &B_is_usable, &B, &sqrt_A2my2, &new_y); @ if (B_is_usable) Old optimization refined. @ @@ -213,4 +218,8 @@ @ } @ @ + /* XXX the number for ay is related to sqrt(6 * FLT_EPSILON). */ @ + if (ax < FLT_EPSILON / 8 && ay < 2048 * FLT_EPSILON) @ + return (cpackf(m_pi_2 + tiny, -y)); @ + @ do_hard_work(ay, ax, &ry, &B_is_usable, &B, &sqrt_A2mx2, &new_x); @ if (B_is_usable) { Old optimization refined. @ @@ -277,7 +286,5 @@ @ { @ if (y < SQRT_MIN) @ - if ((int)y==0) @ - return (x*x); @ - @ + return (x*x); @ return (x*x + y*y); @ } Depend on the up-front setting of inexact here in sum_squares(). @ @@ -288,4 +295,6 @@ @ int ex, ey; @ @ + if (isinf(x) || isinf(y)) @ + return (0); @ if (y == 0) return (1/x); @ if (x == 0) return (x/y/y); Handle special case for infinities here in real_part_reciprocal() instead of the general code. @ @@ -319,29 +328,60 @@ @ } @ @ - if (isinf(x) || isinf(y)) @ - return (cpackf(copysignf(0, x), copysignf(m_pi_2 + tiny, y))); Move this into real_part_reciprocal(), so that the classification of infinities is only done if x or y is large. This is a minor optimization. My previous version removed this, but was broken since real_part_reciprocal() somehow doesn't naturally return 0 for infinities. It does rather slow scaling steps. @ + /* @ + * Handle the annoying special case +-1 + I*+-0, and collaterally @ + * handle the not-so-special case y == 0. C99 specifies that @ + * catanh(+-1 + I*+-0) = +-Inf + I*+-0 instead of the limiting @ + * value +-Inf + I*+-PI/2 since it wants y == 0 to give the same @ + * result as the real atanh() (at least for y == +0). The special @ + * behaviour for +-1 + I*+-0 begins with classifying it to avoid @ + * raising inexact for it. Make the classification as simple and @ + * short as possible (except for this comment about it) and ensure @ + * identical results by calling the real atanh() for all non-NaN x @ + * when y == 0. This turns out to be significantly more accurate. @ + * @ + * TODO: move this before the NaN classification and let atanh() @ + * handle NaN x too. Make a similar special case for x == 0 to @ + * improve accuracy; this takes no extra lines of code since it @ + * removes the need to handle x == 0 under the NaN classification. @ + */ @ + if (y == 0) @ + return (cpackf(atanh(x), y)); See the comment. @ + @ + /* Raise inexact unless z == 0; return for z == 0 as a side effect. */ @ + if ((x == 0 && y == 0) || (int)(1 + tiny) != 1) @ + return (z); z == 0 is the only remaining case that shouldn't raise inexact. @ @ if (ax > RECIP_EPSILON || ay > RECIP_EPSILON) @ - return (cpackf(copysignf(real_part_reciprocal(ax, ay), x), copysignf(m_pi_2 + tiny, y))); @ + return (cpackf(copysignf(real_part_reciprocal(ax, ay), x), copysignf(m_pi_2, y))); Inexact was raised up-front. @ @ - if (ax == 1 && ay < FLT_EPSILON) { @ - if ((int)ay==0) { @ - if ( ilogbf(ay) > FLT_MIN_EXP) @ - rx = - logf(ay/2) / 2; @ - else @ - rx = - (logf(ay) - m_ln2) / 2; @ - } @ - } else Inexact was raised up front, and will also be raised by logf(). ilogbf() is rather slow, though it is now a builtin. There is no need to use it here: - the condition can be written as (ay > 2 * FLT_MIN_EXP) - the expression with m_ln2 is accurate enough, so there is no need for the condition. I thought I tested this assertion and found no difference at all in accuracy, but now I can't see why it is true. This case is fundamentally quite accurate -- within about 1 ulp for the logf() part -- and subtracting m_ln2 will only lose about 0.5 ulps pf accuracy (since |logf(FLT_EPSILON)| dominates m_ln2), so it is not near the worse case for accuracy, but the loss of accuracy is not null. @ + /* XXX the numbers are related to sqrt(6 * FLT_EPSILON). */ @ + if (ax < 2048 * FLT_EPSILON && ay < 2048 * FLT_EPSILON) @ + return (z); Old optimization. Not just an optimization -- see below about accuracy. @ + @ + if (ax == 1 && ay < FLT_EPSILON) @ + rx = - (logf(ay) - m_ln2) / 2; Above with the extra code for accuracy removed. @ + else @ + /* @ + * If we didn't handle y == 0 earlier, the following for @ + * y == 0 would reduce to log1pf(4*ax/(ax-1)**2)) / 4. @ + * This is significantly less accurate than the expression @ + * log1pf(ax+ax+(ax*ax)*x/(1-ax)) / 2 used by atanhf() for @ + * ax < 0.5, though not much less accurate than the expr @ + * log1pf(ax+ax/(1-ax)) / 2 used by atanhf() for 0.5 <= @ + * ax <= 1. Can we do better with ay mixed in? @ + * @ + * This is also significantly less accurate than the @ + * expression (z) used above when ax < 2048 * FLT_EPSILON @ + * and y == 0. Presumably similarly when y is small but @ + * nonzero. This explains why the above optimization also @ + * improves accuracy. @ + */ @ rx = log1pf(4*ax / sum_squares(ax-1, ay)) / 4; See the comment. @ @ - if (ax == 1) { @ - if (ay == 0) @ - ry = 0; @ - else @ - ry = atan2f(2, -ay) / 2; @ - } else if (ay < FOUR_SQRT_MIN) { @ - if ((int)ay==0) @ - ry = atan2f(2*ay, (1-ax)*(1+ax)) / 2; @ - } else @ + if (ax == 1) @ + ry = atan2(2, -ay) / 2; @ + else if (ay < FLT_EPSILON * 128) @ + ry = atan2f(2*ay, (1-ax)*(1+ax)) / 2; @ + else @ ry = atan2f(2*ay, (1-ax)*(1+ax) - ay*ay) / 2; @ This is the part that I completely broke before. Now it does: - special case for ay == 0 moved above - don't remove the minus sign in -ay - use up-front setting of inexact - the expanded threshold still works for me. @ diff -u2 catrigl.c~ catrigl.c @ --- catrigl.c~ 2012-09-18 03:42:37.000000000 +0000 @ +++ catrigl.c 2012-09-18 11:50:35.362160000 +0000 @ @@ -180,4 +180,8 @@ @ } @ @ + if (ax < LDBL_EPSILON && ay < LDBL_EPSILON) @ + if ((int)ax==0 && (int)ay==0) @ + return (z); @ + @ do_hard_work(ax, ay, &rx, &B_is_usable, &B, &sqrt_A2my2, &new_y); @ if (B_is_usable) @ @@ -228,4 +232,7 @@ @ } @ @ + if (ax < LDBL_EPSILON && ay < LDBL_EPSILON) @ + return (cpackl(m_pi_2 + tiny - x, -y)); @ + @ do_hard_work(ay, ax, &ry, &B_is_usable, &B, &sqrt_A2mx2, &new_x); @ if (B_is_usable) { @ @@ -340,4 +347,8 @@ @ return (cpackl(copysignl(real_part_reciprocal(ax, ay), x), copysignl(m_pi_2 + tiny, y))); @ @ + if (ax < LDBL_EPSILON && ay < LDBL_EPSILON) @ + if ((int)ax==0 && (int)ay==0) @ + return (z); @ + @ if (ax == 1 && ay < LDBL_EPSILON) { @ if ((int)ay==0) { catrigl.c only has changes to restore the old optimizations. Bruce From owner-freebsd-numerics@FreeBSD.ORG Tue Sep 18 15:19:16 2012 Return-Path: Delivered-To: freebsd-numerics@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 94BAC106566C for ; Tue, 18 Sep 2012 15:19:16 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail02.syd.optusnet.com.au (mail02.syd.optusnet.com.au [211.29.132.183]) by mx1.freebsd.org (Postfix) with ESMTP id 2285B8FC12 for ; Tue, 18 Sep 2012 15:19:15 +0000 (UTC) Received: from c122-106-157-84.carlnfd1.nsw.optusnet.com.au (c122-106-157-84.carlnfd1.nsw.optusnet.com.au [122.106.157.84]) by mail02.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id q8IFJC2E008798 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Wed, 19 Sep 2012 01:19:14 +1000 Date: Wed, 19 Sep 2012 01:19:12 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Bruce Evans In-Reply-To: <20120918232850.N2144@besplex.bde.org> Message-ID: <20120919010613.T2493@besplex.bde.org> References: <5017111E.6060003@missouri.edu> <20120912225847.J1771@besplex.bde.org> <50511B40.3070009@missouri.edu> <20120913204808.T1964@besplex.bde.org> <5051F59C.6000603@missouri.edu> <20120914014208.I2862@besplex.bde.org> <50526050.2070303@missouri.edu> <20120914212403.H1983@besplex.bde.org> <50538E28.6050400@missouri.edu> <20120915231032.C2669@besplex.bde.org> <50548E15.3010405@missouri.edu> <5054C027.2040008@missouri.edu> <5054C200.7090307@missouri.edu> <20120916041132.D6344@besplex.bde.org> <50553424.2080902@missouri.edu> <20120916134730.Y957@besplex.bde.org> <5055ECA8.2080008@missouri.edu> <20120917022614.R2943@besplex.bde.org> <50562213.9020400@missouri.edu> <20120917060116.G3825@besplex.bde.org> <50563C57.60806@missouri.edu> <20120918012459.V5094@besplex.bde.org> <5057A932.3000603@missouri.edu> <5057F24B.7020605@missouri.edu> <20120918162105.U991@besplex.bde.org> <20120918232850.N2144@besplex.bde.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: Stephen Montgomery-Smith , freebsd-numerics@FreeBSD.org Subject: Re: Complex arg-trig functions X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 18 Sep 2012 15:19:16 -0000 On Wed, 19 Sep 2012, I wrote: > @ + /* > @ + * Handle the annoying special case +-1 + I*+-0, and collaterally > @ + * handle the not-so-special case y == 0. C99 specifies that > @ + * catanh(+-1 + I*+-0) = +-Inf + I*+-0 instead of the limiting > @ + * value +-Inf + I*+-PI/2 since it wants y == 0 to give the same > @ + * result as the real atanh() (at least for y == +0). The special > @ + * behaviour for +-1 + I*+-0 begins with classifying it to avoid > @ + * raising inexact for it. Make the classification as simple and > @ + * short as possible (except for this comment about it) and ensure > @ + * identical results by calling the real atanh() for all non-NaN x > @ + * when y == 0. This turns out to be significantly more accurate. > @ + * > @ + * TODO: move this before the NaN classification and let atanh() > @ + * handle NaN x too. Make a similar special case for x == 0 to > @ + * improve accuracy; this takes no extra lines of code since it > @ + * removes the need to handle x == 0 under the NaN classification. > @ + */ > @ + if (y == 0) > @ + return (cpackf(atanh(x), y)); > > See the comment. Duh, this has to be under (y == 0 && ax <= 1) so that the real function actually applies. > @ @ - if (ax == 1 && ay < FLT_EPSILON) { > @ - if ((int)ay==0) { > @ - if ( ilogbf(ay) > FLT_MIN_EXP) > @ - rx = - logf(ay/2) / 2; > @ - else > @ - rx = - (logf(ay) - m_ln2) / 2; > @ - } > @ - } else > > Inexact was raised up front, and will also be raised by logf(). > > ilogbf() is rather slow, though it is now a builtin. There is no need > to use it here: > - the condition can be written as (ay > 2 * FLT_MIN_EXP) > - the expression with m_ln2 is accurate enough, so there is no need for > the condition. I thought I tested this assertion and found no difference > at all in accuracy, but now I can't see why it is true. This case is > fundamentally quite accurate -- within about 1 ulp for the logf() part -- > and subtracting m_ln2 will only lose about 0.5 ulps pf accuracy (since > |logf(FLT_EPSILON)| dominates m_ln2), so it is not near the worse case > for accuracy, but the loss of accuracy is not null. Now tested. The increase in inaccuracy is only from ~0.7 ulps to ~0.9 ulps. This is acceptable. > @ ... > @ + > @ + if (ax == 1 && ay < FLT_EPSILON) > @ + rx = - (logf(ay) - m_ln2) / 2; However, the outer FLT_EPSILON threshold for the above is too conservative. It can be increased to FLT_EPSILON**2 without expanding the error to above 0.7 ulps, provided this optimization is not used -- with the exanded threshold, this optimization expands the error by another 0.2 ulps, to ~1.1 ulps instead of to ~0.9 ulps. These errors are still in the noise compared with the worst case error of ~2.6 ulps, but it is good to keep errors nelow 1 ulp if this is easy. Bruce From owner-freebsd-numerics@FreeBSD.ORG Wed Sep 19 03:48:42 2012 Return-Path: Delivered-To: freebsd-numerics@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id E67F31065670 for ; Wed, 19 Sep 2012 03:48:42 +0000 (UTC) (envelope-from stephen@missouri.edu) Received: from wilberforce.math.missouri.edu (wilberforce.math.missouri.edu [128.206.184.213]) by mx1.freebsd.org (Postfix) with ESMTP id 9F59D8FC1C for ; Wed, 19 Sep 2012 03:48:42 +0000 (UTC) Received: from [127.0.0.1] (wilberforce.math.missouri.edu [128.206.184.213]) by wilberforce.math.missouri.edu (8.14.5/8.14.5) with ESMTP id q8J3mYxL036898; Tue, 18 Sep 2012 22:48:35 -0500 (CDT) (envelope-from stephen@missouri.edu) Message-ID: <50594092.6000302@missouri.edu> Date: Tue, 18 Sep 2012 22:48:34 -0500 From: Stephen Montgomery-Smith User-Agent: Mozilla/5.0 (X11; Linux i686; rv:15.0) Gecko/20120827 Thunderbird/15.0 MIME-Version: 1.0 To: Bruce Evans References: <5017111E.6060003@missouri.edu> <5048D00B.8010401@missouri.edu> <504D3CCD.2050006@missouri.edu> <504FF726.9060001@missouri.edu> <20120912191556.F1078@besplex.bde.org> <20120912225847.J1771@besplex.bde.org> <50511B40.3070009@missouri.edu> <20120913204808.T1964@besplex.bde.org> <5051F59C.6000603@missouri.edu> <20120914014208.I2862@besplex.bde.org> <50526050.2070303@missouri.edu> <20120914212403.H1983@besplex.bde.org> <50538E28.6050400@missouri.edu> <20120915231032.C2669@besplex.bde.org> <50548E15.3010405@missouri.edu> <5054C027.2040008@missouri.edu> <5054C200.7090307@missouri.edu> <20120916041132.D6344@besplex.bde.org> <50553424.2080902@missouri.edu> <20120916134730.Y957@besplex.bde.org> <5055ECA8.2080008@missouri.edu> <20120917022614.R2943@besplex.bde.org> <50562213.9020400@missouri.edu> <20120917060116.G3825@besplex.bde.org> <50563C57.60806@missouri.edu> <20120918012459.V5094@besplex.bde.org> <5057A932.3000603@missouri.edu> <20120918150551.Y820@besplex.bde.org> In-Reply-To: <20120918150551.Y820@besplex.bde.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-numerics@freebsd.org Subject: Re: Complex arg-trig functions X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 19 Sep 2012 03:48:43 -0000 On 09/18/2012 01:19 AM, Bruce Evans wrote: > On Mon, 17 Sep 2012, Stephen Montgomery-Smith wrote: >> I don't think your code works. It should be ry = atan2f(2, -ay) / 2, >> not ry = atan2f(2, ay) / 2. > > Only logically. As I explained, the negation makes no difference to the > result, but of course takes longer, so I removed it. No, they give different results. atan2(y,x) = Pi - atan2(y,-x) if y is positive. From owner-freebsd-numerics@FreeBSD.ORG Fri Sep 21 01:41:06 2012 Return-Path: Delivered-To: freebsd-numerics@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B3F611065673 for ; Fri, 21 Sep 2012 01:41:06 +0000 (UTC) (envelope-from stephen@missouri.edu) Received: from wilberforce.math.missouri.edu (wilberforce.math.missouri.edu [128.206.184.213]) by mx1.freebsd.org (Postfix) with ESMTP id 6F3B68FC17 for ; Fri, 21 Sep 2012 01:41:06 +0000 (UTC) Received: from [127.0.0.1] (wilberforce.math.missouri.edu [128.206.184.213]) by wilberforce.math.missouri.edu (8.14.5/8.14.5) with ESMTP id q8L1eww2078803; Thu, 20 Sep 2012 20:40:59 -0500 (CDT) (envelope-from stephen@missouri.edu) Message-ID: <505BC5AA.2030604@missouri.edu> Date: Thu, 20 Sep 2012 20:40:58 -0500 From: Stephen Montgomery-Smith User-Agent: Mozilla/5.0 (X11; Linux i686; rv:15.0) Gecko/20120827 Thunderbird/15.0 MIME-Version: 1.0 To: Bruce Evans References: <5017111E.6060003@missouri.edu> <504FF726.9060001@missouri.edu> <20120912191556.F1078@besplex.bde.org> <20120912225847.J1771@besplex.bde.org> <50511B40.3070009@missouri.edu> <20120913204808.T1964@besplex.bde.org> <5051F59C.6000603@missouri.edu> <20120914014208.I2862@besplex.bde.org> <50526050.2070303@missouri.edu> <20120914212403.H1983@besplex.bde.org> <50538E28.6050400@missouri.edu> <20120915231032.C2669@besplex.bde.org> <50548E15.3010405@missouri.edu> <5054C027.2040008@missouri.edu> <5054C200.7090307@missouri.edu> <20120916041132.D6344@besplex.bde.org> <50553424.2080902@missouri.edu> <20120916134730.Y957@besplex.bde.org> <5055ECA8.2080008@missouri.edu> <20120917022614.R2943@besplex.bde.org> <50562213.9020400@missouri.edu> <20120917060116.G3825@besplex.bde.org> <50563C57.60806@missouri.edu> <20120918012459.V5094@besplex.bde.org> <5057A932.3000603@missouri.edu> <5057F24B.7020605@missouri.edu> <20120918162105.U991@besplex.bde.org> <20120918232850.N2144@besplex.bde! .org> In-Reply-To: <20120918232850.N2144@besplex.bde.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-numerics@FreeBSD.org Subject: Re: Complex arg-trig functions X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 21 Sep 2012 01:41:06 -0000 On 09/18/2012 09:15 AM, Bruce Evans wrote: > @ if (ax == 1 && ay < DBL_EPSILON) { > @ - if ((int)ay==0) { /* raise inexact */ > @ + if (1) { /* inexact will be raised by log() */ > @ /* > @ * If ay == 0, divide-by-zero will be (correctly) > > I didn't re-indent this. I have put back the old optimizations in catrig.c. The only change I have made so far is that I did re-indent this. From owner-freebsd-numerics@FreeBSD.ORG Fri Sep 21 02:50:04 2012 Return-Path: Delivered-To: freebsd-numerics@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A685D106566C for ; Fri, 21 Sep 2012 02:50:04 +0000 (UTC) (envelope-from stephen@missouri.edu) Received: from wilberforce.math.missouri.edu (wilberforce.math.missouri.edu [128.206.184.213]) by mx1.freebsd.org (Postfix) with ESMTP id 616288FC14 for ; Fri, 21 Sep 2012 02:50:04 +0000 (UTC) Received: from [127.0.0.1] (wilberforce.math.missouri.edu [128.206.184.213]) by wilberforce.math.missouri.edu (8.14.5/8.14.5) with ESMTP id q8L2o2gY084873; Thu, 20 Sep 2012 21:50:02 -0500 (CDT) (envelope-from stephen@missouri.edu) Message-ID: <505BD5DA.1070302@missouri.edu> Date: Thu, 20 Sep 2012 21:50:02 -0500 From: Stephen Montgomery-Smith User-Agent: Mozilla/5.0 (X11; Linux i686; rv:15.0) Gecko/20120827 Thunderbird/15.0 MIME-Version: 1.0 To: Bruce Evans References: <5017111E.6060003@missouri.edu> <504FF726.9060001@missouri.edu> <20120912191556.F1078@besplex.bde.org> <20120912225847.J1771@besplex.bde.org> <50511B40.3070009@missouri.edu> <20120913204808.T1964@besplex.bde.org> <5051F59C.6000603@missouri.edu> <20120914014208.I2862@besplex.bde.org> <50526050.2070303@missouri.edu> <20120914212403.H1983@besplex.bde.org> <50538E28.6050400@missouri.edu> <20120915231032.C2669@besplex.bde.org> <50548E15.3010405@missouri.edu> <5054C027.2040008@missouri.edu> <5054C200.7090307@missouri.edu> <20120916041132.D6344@besplex.bde.org> <50553424.2080902@missouri.edu> <20120916134730.Y957@besplex.bde.org> <5055ECA8.2080008@missouri.edu> <20120917022614.R2943@besplex.bde.org> <50562213.9020400@missouri.edu> <20120917060116.G3825@besplex.bde.org> <50563C57.60806@missouri.edu> <20120918012459.V5094@besplex.bde.org> <5057A932.3000603@missouri.edu> <5057F24B.7020605@missouri.edu> <20120918162105.U991@besplex.bde.org> <20120918232850.N2144@besplex.bde! .org> In-Reply-To: <20120918232850.N2144@besplex.bde.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-numerics@FreeBSD.org Subject: Re: Complex arg-trig functions X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 21 Sep 2012 02:50:04 -0000 What I did was to make constants called SQRT_6_EPSILON, etc, and then make your suggested optimizations to float also to double and long double. I also wrote my own atanhl function so that your inexact optimizations could be applied to long double as well as double and float. From owner-freebsd-numerics@FreeBSD.ORG Fri Sep 21 03:06:31 2012 Return-Path: Delivered-To: freebsd-numerics@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 632C0106566C for ; Fri, 21 Sep 2012 03:06:31 +0000 (UTC) (envelope-from stephen@missouri.edu) Received: from wilberforce.math.missouri.edu (wilberforce.math.missouri.edu [128.206.184.213]) by mx1.freebsd.org (Postfix) with ESMTP id 1D27B8FC16 for ; Fri, 21 Sep 2012 03:06:29 +0000 (UTC) Received: from [127.0.0.1] (wilberforce.math.missouri.edu [128.206.184.213]) by wilberforce.math.missouri.edu (8.14.5/8.14.5) with ESMTP id q8L36SZH086228; Thu, 20 Sep 2012 22:06:29 -0500 (CDT) (envelope-from stephen@missouri.edu) Message-ID: <505BD9B4.8020801@missouri.edu> Date: Thu, 20 Sep 2012 22:06:28 -0500 From: Stephen Montgomery-Smith User-Agent: Mozilla/5.0 (X11; Linux i686; rv:15.0) Gecko/20120827 Thunderbird/15.0 MIME-Version: 1.0 To: Bruce Evans References: <5017111E.6060003@missouri.edu> <20120912225847.J1771@besplex.bde.org> <50511B40.3070009@missouri.edu> <20120913204808.T1964@besplex.bde.org> <5051F59C.6000603@missouri.edu> <20120914014208.I2862@besplex.bde.org> <50526050.2070303@missouri.edu> <20120914212403.H1983@besplex.bde.org> <50538E28.6050400@missouri.edu> <20120915231032.C2669@besplex.bde.org> <50548E15.3010405@missouri.edu> <5054C027.2040008@missouri.edu> <5054C200.7090307@missouri.edu> <20120916041132.D6344@besplex.bde.org> <50553424.2080902@missouri.edu> <20120916134730.Y957@besplex.bde.org> <5055ECA8.2080008@missouri.edu> <20120917022614.R2943@besplex.bde.org> <50562213.9020400@missouri.edu> <20120917060116.G3825@besplex.bde.org> <50563C57.60806@missouri.edu> <20120918012459.V5094@besplex.bde.org> <5057A932.3000603@missouri.edu> <5057F24B.7020605@missouri.edu> <20120918162105.U991@besplex.bde.org> <20120918232850.N2144@besplex.bde.org> <20120919010613.T2493@besplex.bde.org> In-Reply-To: <20120919010613.T2493@besplex.bde.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-numerics@FreeBSD.org Subject: Re: Complex arg-trig functions X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 21 Sep 2012 03:06:31 -0000 I also added inexact optimizations for casinh and cacos. From owner-freebsd-numerics@FreeBSD.ORG Fri Sep 21 07:23:39 2012 Return-Path: Delivered-To: freebsd-numerics@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 800CE1065728 for ; Fri, 21 Sep 2012 07:23:39 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail12.syd.optusnet.com.au (mail12.syd.optusnet.com.au [211.29.132.193]) by mx1.freebsd.org (Postfix) with ESMTP id 0D2B78FC1C for ; Fri, 21 Sep 2012 07:23:38 +0000 (UTC) Received: from c122-106-157-84.carlnfd1.nsw.optusnet.com.au (c122-106-157-84.carlnfd1.nsw.optusnet.com.au [122.106.157.84]) by mail12.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id q8L7NTek006237 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 21 Sep 2012 17:23:31 +1000 Date: Fri, 21 Sep 2012 17:23:29 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Stephen Montgomery-Smith In-Reply-To: <505BD5DA.1070302@missouri.edu> Message-ID: <20120921161532.R945@besplex.bde.org> References: <5017111E.6060003@missouri.edu> <20120912225847.J1771@besplex.bde.org> <50511B40.3070009@missouri.edu> <20120913204808.T1964@besplex.bde.org> <5051F59C.6000603@missouri.edu> <20120914014208.I2862@besplex.bde.org> <50526050.2070303@missouri.edu> <20120914212403.H1983@besplex.bde.org> <50538E28.6050400@missouri.edu> <20120915231032.C2669@besplex.bde.org> <50548E15.3010405@missouri.edu> <5054C027.2040008@missouri.edu> <5054C200.7090307@missouri.edu> <20120916041132.D6344@besplex.bde.org> <50553424.2080902@missouri.edu> <20120916134730.Y957@besplex.bde.org> <5055ECA8.2080008@missouri.edu> <20120917022614.R2943@besplex.bde.org> <50562213.9020400@missouri.edu> <20120917060116.G3825@besplex.bde.org> <50563C57.60806@missouri.edu> <20120918012459.V5094@besplex.bde.org> <5057A932.3000603@missouri.edu> <5057F24B.7020605@missouri.edu> <20120918162105.U991@besplex.bde.org> <20120918232850.N2144@besplex.bde! .org> <505BD5DA.1070302@missouri.edu> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-numerics@FreeBSD.org Subject: Re: Complex arg-trig functions X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 21 Sep 2012 07:23:39 -0000 On Thu, 20 Sep 2012, Stephen Montgomery-Smith wrote: > What I did was to make constants called SQRT_6_EPSILON, etc, and then make > your suggested optimizations to float also to double and long double. I'm just using SQRT_EPSILON here in all cases. Partial diffs between your new version of catrig.c and my unmerged version. 50,51c52 < SQRT_3_EPSILON = sqrt(3*DBL_EPSILON), < SQRT_6_EPSILON = sqrt(6*DBL_EPSILON), --- > SQRT_EPSILON = 0x1p-27, /* <= sqrt(DBL_EPSILON) */ Your version depends on sqrt(N*DBL_EPSILON) being a constant enough expression for it to be evaluated at compile time. A fairly large optimization. e_atanh.c uses 2**-28 here. This is significantly smaller than any of the above. It has a large saftely margin. But not after dividing the above by 8. e_atanhf.c uses the same 2**-28 here. This is nonsense. Properly translating the 2**-28 to float precision would have given about 2**-14. Exhaustive testing shows that 2**-13 gives the same results. sqrtf(FLT_EPSILON) is much larger (2**-11.5). That has a negative safety margin -- exhaustive testing shows that 2**-12 loses a little bit of accuracy compared with 2**-13. For catanh*(), we have to bound both x and y, and should have a larger safety margin for both. Non-exhaustive testing shows that 2**-12 works OK in float precision. My previous values had a negative safety marging. In double precision, sqrt(DBL_EPSILON) is not an integer power of 2, and the above gives an additional safety margin by rounding down to an integer power of 2. 304,309c313,315 < ... < if (ax < SQRT_6_EPSILON/8 && ay < SQRT_6_EPSILON/8) < return (z); --- > if (ax < SQRT_EPSILON && ay < SQRT_EPSILON) > if ((int)ax == 0 && (int)ay == 0) /* raise inexact */ > return (z); The divisions by 8 give a larger safety margin than my version. 384,389c390,407 < if (ax < DBL_EPSILON/8 && ay < SQRT_6_EPSILON/8) < return (cpack(m_pi_2, -y)); --- > if (ax < SQRT_EPSILON && ay < SQRT_EPSILON) > if ((int)ax == 0 && (int)ay == 0) > return (cpack(pio2_hi - (x - pio2_lo), -y)); I restored your z term in the approximation so that I could use the same threshold for x and y. This is more accurate and covers more cases. The approximation is now _better_ than the corresponding one in acos*() -- they should be using the extra term too. This has other subtlties involving rounding of Pi/2 -- see later mail. 580,581c596,598 < if (ax < SQRT_3_EPSILON/8 && ay < SQRT_3_EPSILON/8) < return (z); --- > if (ax < SQRT_EPSILON && ay < SQRT_EPSILON) > if ((int)ax == 0 && (int)ay == 0) /* raise inexact */ > return (z); > > I also wrote my own atanhl function so that your inexact optimizations could > be applied to long double as well as double and float. Hmm, I didn't notice that atanhl() was missing. I found that atanh[f] uses an inaccurate approximation for small |x|, so returning atanh*() early for y == 0 and |x| <= 1 breaks not only optimality of the above approximation for small |z|, but also its accuracy. I made a similar real function call to atan() for x == 0 (only implemented in float precision, and the equivalent for cacos and casinh() not tried). Now atanl() is not missing, and atan*(x) is not inaccurate for small x, so calling this early only breaks the optimality of the above. To preserve the optimality, I had to put most of the new special cases later in the function instead of earlier as planned. This makes them less good for avoiding special settings of inexact. Setting inexact early is also bad for optimality, so I no longer try to do it. See the next mail. Bruce From owner-freebsd-numerics@FreeBSD.ORG Fri Sep 21 08:25:13 2012 Return-Path: Delivered-To: freebsd-numerics@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 32125106566B for ; Fri, 21 Sep 2012 08:25:13 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from fallbackmx07.syd.optusnet.com.au (fallbackmx07.syd.optusnet.com.au [211.29.132.9]) by mx1.freebsd.org (Postfix) with ESMTP id A56878FC0C for ; Fri, 21 Sep 2012 08:25:11 +0000 (UTC) Received: from mail05.syd.optusnet.com.au (mail05.syd.optusnet.com.au [211.29.132.186]) by fallbackmx07.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id q8L8P4KH031489 for ; Fri, 21 Sep 2012 18:25:04 +1000 Received: from c122-106-157-84.carlnfd1.nsw.optusnet.com.au (c122-106-157-84.carlnfd1.nsw.optusnet.com.au [122.106.157.84]) by mail05.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id q8L8OtMR014157 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 21 Sep 2012 18:24:56 +1000 Date: Fri, 21 Sep 2012 18:24:55 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Stephen Montgomery-Smith In-Reply-To: <505BD9B4.8020801@missouri.edu> Message-ID: <20120921172402.W945@besplex.bde.org> References: <5017111E.6060003@missouri.edu> <50511B40.3070009@missouri.edu> <20120913204808.T1964@besplex.bde.org> <5051F59C.6000603@missouri.edu> <20120914014208.I2862@besplex.bde.org> <50526050.2070303@missouri.edu> <20120914212403.H1983@besplex.bde.org> <50538E28.6050400@missouri.edu> <20120915231032.C2669@besplex.bde.org> <50548E15.3010405@missouri.edu> <5054C027.2040008@missouri.edu> <5054C200.7090307@missouri.edu> <20120916041132.D6344@besplex.bde.org> <50553424.2080902@missouri.edu> <20120916134730.Y957@besplex.bde.org> <5055ECA8.2080008@missouri.edu> <20120917022614.R2943@besplex.bde.org> <50562213.9020400@missouri.edu> <20120917060116.G3825@besplex.bde.org> <50563C57.60806@missouri.edu> <20120918012459.V5094@besplex.bde.org> <5057A932.3000603@missouri.edu> <5057F24B.7020605@missouri.edu> <20120918162105.U991@besplex.bde.org> <20120918232850.N2144@besplex.bde.org> <20120919010613.T2493@besplex.bde.org> <505BD9B4.8020801@missouri.edu> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-numerics@FreeBSD.org Subject: Re: Complex arg-trig functions X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 21 Sep 2012 08:25:13 -0000 On Thu, 20 Sep 2012, Stephen Montgomery-Smith wrote: > I also added inexact optimizations for casinh and cacos. I couldn't get this to give the hoped-for optimization and dropped it even for catanh. You may still prefer it because it is simpler. But later I found intricacies for returning the correct value of Pi/2 which make the inexact optimizations even less useful: The real functions are careful to a fault to return Pi/2 correctly rounded in all rounding modes. They don't use return a constant Pi/2, but evaluate Pi/2 at runtime using pio2_hi + pio2_lo, where pio2_hi is (or should be) Pi/2 rounded _down_ and pio2_lo is an approximation to the residual and is volatile enough for the addition to be done at runtime. The following shortcuts lose this care: - similarly, but with pio2_hi = Pi/2 rounded up. Now pio2_hi + pio2_lo is 1 ulp too high when the rounding mode is either up or towards plus infinity. Rounding of Pi/2 to nearest may go either way. fdlibm code seems to be careful to round it down in all cases. In in FreeBSD libm, at least e_acosf.c is careful to round down when the natural rounding is up, but at invtrig.c is not careful -- it apparently uses natural rounding, which happens to be up for ld80 and down for ld128, or vice versa. - similary, but with pio2_hi rounded to nearest and 'tiny' used instead of pio2_lo. Using 'tiny' requires pio2_hi to be nearest and only works in some rounding modes. - similarly, but with just m_pi_2 = Pi/2 rounded to nearest. Now there is no runtime evaluation, so the result cannot depend on the rounding mode and inexact must be set in some other way. A quick test of most functions in all rounding mode shows that non-default modes work quite well except for the most complicated and/or heavily optimized functions when they are written in C (the totally failing ones are sin/cos/tan/exp*/pow/hypot but not log* (except for log*(1)) or most inverse functions. Optimizations in sin/cos/tan/exp2 require rounding to nearest). My tests weren't non-quick enough to detect any 1-ulp errors for Pi/2, and only showed that the errors mostly don't blow up for inverse functions. In view of this, I'd like to keep doing the Pi/2 intricacies. Partial diffs for catrig.c: % 48a49,50 % > #define pio2_hi m_pi_2 /* works because m_pi_2 rounded down */ % > pio2_lo = 6.1232339957367660e-17, /* 0x3C91A626, 0x33145C07 */ % 384,389c390,407 % 335c341 % < * cacos(z) = PI/2 - z + O(|z|^3) as z -> 0 % --- % > * cacos(z) = PI/2 - z + O(z^3) as z -> 0 This should be PI/2 + O(z) when only the constant term is used, but I restored use of the z term. Start changing O(|n|) to O(n). The absolute value should be implicit. % 384,389c390,407 % < ... % < if (ax < DBL_EPSILON/8 && ay < SQRT_6_EPSILON/8) % < return (cpack(m_pi_2, -y)); % --- % > if (ax < SQRT_EPSILON && ay < SQRT_EPSILON) % > /* % > * This is quite subtle. The expression for PI/2 - x % > * is cloned from e_acos.c, where it is apparently over- % > * designed to work in all rounding modes. It requires % > * pio2_hi to be rounded down even when rounding to % > * nearest would be more accurate. We can't add `tiny' % > * to pio2_hi as usual to raise inexact, since this would % > * break the fussy rounding in some non-default modes. % > * So we use the same method to raise inexact as for the % > * approximation 'z'. e_acos.c uses the even subtler % > * method of depending on inexactness in a higher-degree % > * approximation. That is not practical here, since if % > * we used the x**3 term then we would need an extra % > * case to avoid spurious underflow. % > */ % > if ((int)ax == 0 && (int)ay == 0) % > return (cpack(pio2_hi - (x - pio2_lo), -y)); Despite being too verbose (BTW, don't commit my essays :-), the comment neglects to point out that with the expression written in this form, inexact must be set separately since (x - pio2_lo) might be a value (e.g., 0) that doesn't give inexactness when subtracted. All other returns of Pi/2 are simpler than this, and should return pio2_hi + pio2_lo. The constants should be spelled like this, and not using M_PI or m_pi_2; this is especially important in long double precision since then the constants are declared/defined with this spelling in extern constant tables in invtrig.c to centralize the complications for defining them them for all combinations of ld80/ ld128/i386. So my patch for this is simplest for long double precision -- there it uses invtrig.h and doesn't worry about the known bug that pio2_hi is incorrectly rounded in some cases. With these intricacies, there is less to be gained by setting inexact up front. Adding pio2_lo sequentially is slightly slower than an up-front setting in parallel, but when both are done the up-front setting just adds overhead on average. Some of the optimizations could be done more globably: - an option to not support nonstandard rounding modes for Pi/2. This seems to require pio2_lo to be a static const, unlike in invtrig.*. Make this non-volatile. The compiler will then evaluate pio2_hi + pio2_lo at compile time. - an option to not support careful setting of inexact. The above gives it for Pi/2. Settings of it using (1 + tiny) == 1 would work similarly -- make `tiny' a static nonvolatile const. Bruce From owner-freebsd-numerics@FreeBSD.ORG Fri Sep 21 11:34:30 2012 Return-Path: Delivered-To: freebsd-numerics@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 320ED106564A for ; Fri, 21 Sep 2012 11:34:30 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail05.syd.optusnet.com.au (mail05.syd.optusnet.com.au [211.29.132.186]) by mx1.freebsd.org (Postfix) with ESMTP id A67BF8FC0C for ; Fri, 21 Sep 2012 11:34:29 +0000 (UTC) Received: from c122-106-157-84.carlnfd1.nsw.optusnet.com.au (c122-106-157-84.carlnfd1.nsw.optusnet.com.au [122.106.157.84]) by mail05.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id q8LBYNp4011985 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 21 Sep 2012 21:34:26 +1000 Date: Fri, 21 Sep 2012 21:34:18 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Bruce Evans In-Reply-To: <20120921172402.W945@besplex.bde.org> Message-ID: <20120921212525.W1732@besplex.bde.org> References: <5017111E.6060003@missouri.edu> <5051F59C.6000603@missouri.edu> <20120914014208.I2862@besplex.bde.org> <50526050.2070303@missouri.edu> <20120914212403.H1983@besplex.bde.org> <50538E28.6050400@missouri.edu> <20120915231032.C2669@besplex.bde.org> <50548E15.3010405@missouri.edu> <5054C027.2040008@missouri.edu> <5054C200.7090307@missouri.edu> <20120916041132.D6344@besplex.bde.org> <50553424.2080902@missouri.edu> <20120916134730.Y957@besplex.bde.org> <5055ECA8.2080008@missouri.edu> <20120917022614.R2943@besplex.bde.org> <50562213.9020400@missouri.edu> <20120917060116.G3825@besplex.bde.org> <50563C57.60806@missouri.edu> <20120918012459.V5094@besplex.bde.org> <5057A932.3000603@missouri.edu> <5057F24B.7020605@missouri.edu> <20120918162105.U991@besplex.bde.org> <20120918232850.N2144@besplex.bde.org> <20120919010613.T2493@besplex.bde.org> <505BD9B4.8020801@missouri.edu> <20120921172402.W945@besplex.bde.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: Stephen Montgomery-Smith , freebsd-numerics@FreeBSD.org Subject: Re: Complex arg-trig functions X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 21 Sep 2012 11:34:30 -0000 On Fri, 21 Sep 2012, I wrote: > On Thu, 20 Sep 2012, Stephen Montgomery-Smith wrote: > >> I also added inexact optimizations for casinh and cacos. > > I couldn't get this to give the hoped-for optimization and dropped > it even for catanh. You may still prefer it because it is simpler. It is giving the hoped-for optimizations now... > But later I found intricacies for returning the correct value of > Pi/2 which make the inexact optimizations even less useful: ... for the real parts of cacosf(), casin*f(), but not for the real part of cacoshf(). I tested mainly the latter and catanhf() before, and the change is still giving a small pessimization for cacoshf(). (I haven't tested the new version for catanhf() yet, and won't test in so much detail in other precisions). I think this is because for cacosh*() alone, inexact is set in more cases while calculating Pi/2. Bruce From owner-freebsd-numerics@FreeBSD.ORG Fri Sep 21 14:07:14 2012 Return-Path: Delivered-To: freebsd-numerics@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 05923106564A for ; Fri, 21 Sep 2012 14:07:14 +0000 (UTC) (envelope-from stephen@missouri.edu) Received: from wilberforce.math.missouri.edu (wilberforce.math.missouri.edu [128.206.184.213]) by mx1.freebsd.org (Postfix) with ESMTP id B4A028FC0A for ; Fri, 21 Sep 2012 14:07:13 +0000 (UTC) Received: from [127.0.0.1] (wilberforce.math.missouri.edu [128.206.184.213]) by wilberforce.math.missouri.edu (8.14.5/8.14.5) with ESMTP id q8LE7CbE036295; Fri, 21 Sep 2012 09:07:12 -0500 (CDT) (envelope-from stephen@missouri.edu) Message-ID: <505C7490.90600@missouri.edu> Date: Fri, 21 Sep 2012 09:07:12 -0500 From: Stephen Montgomery-Smith User-Agent: Mozilla/5.0 (X11; Linux i686; rv:15.0) Gecko/20120827 Thunderbird/15.0 MIME-Version: 1.0 To: Bruce Evans References: <5017111E.6060003@missouri.edu> <5051F59C.6000603@missouri.edu> <20120914014208.I2862@besplex.bde.org> <50526050.2070303@missouri.edu> <20120914212403.H1983@besplex.bde.org> <50538E28.6050400@missouri.edu> <20120915231032.C2669@besplex.bde.org> <50548E15.3010405@missouri.edu> <5054C027.2040008@missouri.edu> <5054C200.7090307@missouri.edu> <20120916041132.D6344@besplex.bde.org> <50553424.2080902@missouri.edu> <20120916134730.Y957@besplex.bde.org> <5055ECA8.2080008@missouri.edu> <20120917022614.R2943@besplex.bde.org> <50562213.9020400@missouri.edu> <20120917060116.G3825@besplex.bde.org> <50563C57.60806@missouri.edu> <20120918012459.V5094@besplex.bde.org> <5057A932.3000603@missouri.edu> <5057F24B.7020605@missouri.edu> <20120918162105.U991@besplex.bde.org> <20120918232850.N2144@besplex.bde.org> <20120919010613.T2493@besplex.bde.org> <505BD9B4.8020801@missouri.edu> <20120921172402.W945@besplex.bde.org> <20120921212525.W1732@besplex.bde.org> In-Reply-To: <20120921212525.W1732@besplex.bde.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-numerics@FreeBSD.org Subject: Re: Complex arg-trig functions X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 21 Sep 2012 14:07:14 -0000 I will keep the inexact raising up-front (mostly because I forgot how I did it earlier). I will still use atanh(fl), and rely on someone else to fix it. (If it is inexact near 0, it is only a few ULP, and that is good enough for me.) I'll go ahead and see about the pio2h and pio2l. From owner-freebsd-numerics@FreeBSD.ORG Fri Sep 21 19:05:14 2012 Return-Path: Delivered-To: freebsd-numerics@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C4E40106566B for ; Fri, 21 Sep 2012 19:05:14 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail03.syd.optusnet.com.au (mail03.syd.optusnet.com.au [211.29.132.184]) by mx1.freebsd.org (Postfix) with ESMTP id 4B3068FC0C for ; Fri, 21 Sep 2012 19:05:14 +0000 (UTC) Received: from c122-106-157-84.carlnfd1.nsw.optusnet.com.au (c122-106-157-84.carlnfd1.nsw.optusnet.com.au [122.106.157.84]) by mail03.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id q8LJ5APD011591 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sat, 22 Sep 2012 05:05:12 +1000 Date: Sat, 22 Sep 2012 05:05:10 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Stephen Montgomery-Smith In-Reply-To: <505C7490.90600@missouri.edu> Message-ID: <20120922042112.E3044@besplex.bde.org> References: <5017111E.6060003@missouri.edu> <50526050.2070303@missouri.edu> <20120914212403.H1983@besplex.bde.org> <50538E28.6050400@missouri.edu> <20120915231032.C2669@besplex.bde.org> <50548E15.3010405@missouri.edu> <5054C027.2040008@missouri.edu> <5054C200.7090307@missouri.edu> <20120916041132.D6344@besplex.bde.org> <50553424.2080902@missouri.edu> <20120916134730.Y957@besplex.bde.org> <5055ECA8.2080008@missouri.edu> <20120917022614.R2943@besplex.bde.org> <50562213.9020400@missouri.edu> <20120917060116.G3825@besplex.bde.org> <50563C57.60806@missouri.edu> <20120918012459.V5094@besplex.bde.org> <5057A932.3000603@missouri.edu> <5057F24B.7020605@missouri.edu> <20120918162105.U991@besplex.bde.org> <20120918232850.N2144@besplex.bde.org> <20120919010613.T2493@besplex.bde.org> <505BD9B4.8020801@missouri.edu> <20120921172402.W945@besplex.bde.org> <20120921212525.W1732@besplex.bde.org> <505C7490.90600@missouri.edu> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-numerics@FreeBSD.org Subject: Re: Complex arg-trig functions X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 21 Sep 2012 19:05:15 -0000 On Fri, 21 Sep 2012, Stephen Montgomery-Smith wrote: > I will keep the inexact raising up-front (mostly because I forgot how I did > it earlier). It is working very well. I have minor cleanups to it using a 1-line raise_inexact() macro. > I will still use atanh(fl), and rely on someone else to fix it. (If it is > inexact near 0, it is only a few ULP, and that is good enough for me.) I see you simplified the SQRT_[34]_EPSILON thresholds and fixed their initialization by removing them (the function calls in the static initializers didn't compile). I spent more time on them and found the best values in practice (take the sqrt's accurately and divide them by 2 or 4 instead of your 8), but they are painful to initialize, especiallu for long doubles. A few points of more general interest turned up while debugging this, - atan()'s series is alternating, while the others are not. Alternation causes more cancelation errors. - FOO_EPSILON only applies to 1 side of an addition. E.g., it applies to 1.0 + x where x > 0, for 1.0 + x where x < 0 the size of the corresponding epsilon is half as much. Non-alternation means that the FOO_EPSILON side applies. - the general approximation in cacos(z) and casin(z) is quite good for small z, so larger thresholds for using the special approximations don't affect accuracy much. However, the general approximation in catanh(z) is not so good for small z, so using larger threshold for the special approximation affects accuracy significantly. I found the approximate best point to switch the approximations. - some combination of the previous 3 points means that the switching point is about twice as large relative to the SQRT_N_EPSILON threshold for catan() as for the others (divide by 2 instead of 4). > I'll go ahead and see about the pio2h and pio2l. Please wait for my patch for this. It has all the details for all precisions including pretty-printing the declarations. Or you can do a nearly-global substition of m_pi_2 by pio2_hi + pio2_lo. My other mostly-complete changes: - avoid all the scalb() and related calls. This makes do_hard_work() a bit faster and simpler and real_value_reciprocal() much faster and a bit more complex. - make real_value_reciprocal() handle signs (everything is automatic except for x = inf), and avoid a copysign() after it - a few improvements in comments My other unfinished changes: - figure out if the up-front things in catanh() are best placed there. - decide whether to handle pure real and pure imaginary args specially like I do for both in catanhf(). This interacts with the previous point. - decide whether my old change to remove unnecessary accuracy for the case where ax == 1, ay < FLT_EPSILON in catanh() is correct (you didn't accept it, and maybe other accuracy changes make it extra accuracy more interesting). Bruce From owner-freebsd-numerics@FreeBSD.ORG Fri Sep 21 19:25:10 2012 Return-Path: Delivered-To: freebsd-numerics@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 24E111065672 for ; Fri, 21 Sep 2012 19:25:10 +0000 (UTC) (envelope-from stephen@missouri.edu) Received: from wilberforce.math.missouri.edu (wilberforce.math.missouri.edu [128.206.184.213]) by mx1.freebsd.org (Postfix) with ESMTP id D29378FC0C for ; Fri, 21 Sep 2012 19:25:09 +0000 (UTC) Received: from [127.0.0.1] (wilberforce.math.missouri.edu [128.206.184.213]) by wilberforce.math.missouri.edu (8.14.5/8.14.5) with ESMTP id q8LJP89D058321; Fri, 21 Sep 2012 14:25:08 -0500 (CDT) (envelope-from stephen@missouri.edu) Message-ID: <505CBF14.70908@missouri.edu> Date: Fri, 21 Sep 2012 14:25:08 -0500 From: Stephen Montgomery-Smith User-Agent: Mozilla/5.0 (X11; Linux i686; rv:15.0) Gecko/20120827 Thunderbird/15.0 MIME-Version: 1.0 To: Bruce Evans References: <5017111E.6060003@missouri.edu> <50526050.2070303@missouri.edu> <20120914212403.H1983@besplex.bde.org> <50538E28.6050400@missouri.edu> <20120915231032.C2669@besplex.bde.org> <50548E15.3010405@missouri.edu> <5054C027.2040008@missouri.edu> <5054C200.7090307@missouri.edu> <20120916041132.D6344@besplex.bde.org> <50553424.2080902@missouri.edu> <20120916134730.Y957@besplex.bde.org> <5055ECA8.2080008@missouri.edu> <20120917022614.R2943@besplex.bde.org> <50562213.9020400@missouri.edu> <20120917060116.G3825@besplex.bde.org> <50563C57.60806@missouri.edu> <20120918012459.V5094@besplex.bde.org> <5057A932.3000603@missouri.edu> <5057F24B.7020605@missouri.edu> <20120918162105.U991@besplex.bde.org> <20120918232850.N2144@besplex.bde.org> <20120919010613.T2493@besplex.bde.org> <505BD9B4.8020801@missouri.edu> <20120921172402.W945@besplex.bde.org> <20120921212525.W1732@besplex.bde.org> <505C7490.90600@missouri.edu> <20120922042112.E3044@besplex.bde.org> In-Reply-To: <20120922042112.E3044@besplex.bde.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-numerics@FreeBSD.org Subject: Re: Complex arg-trig functions X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 21 Sep 2012 19:25:10 -0000 On 09/21/2012 02:05 PM, Bruce Evans wrote: > On Fri, 21 Sep 2012, Stephen Montgomery-Smith wrote: > - decide whether my old change to remove unnecessary accuracy for the > case where ax == 1, ay < FLT_EPSILON in catanh() is correct (you > didn't accept it, and maybe other accuracy changes make it extra > accuracy more interesting). Or maybe I missed it. I did put in the pio2_hi etc stuff in before this email telling me to hold off. I assume you still want pio2_hi etc stuff in catanh. There is it still m_pi_2. I was thinking poi2_hi + (tiny + pio2_lo) or maybe declaring pio2_lo as volatile and using pio2_hi + pio2_lo This last week I was very busy and I had to put this project off a while. But now I think things are slowing down again. From owner-freebsd-numerics@FreeBSD.ORG Fri Sep 21 19:33:48 2012 Return-Path: Delivered-To: freebsd-numerics@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 4FD78106564A for ; Fri, 21 Sep 2012 19:33:48 +0000 (UTC) (envelope-from stephen@missouri.edu) Received: from wilberforce.math.missouri.edu (wilberforce.math.missouri.edu [128.206.184.213]) by mx1.freebsd.org (Postfix) with ESMTP id 0977A8FC14 for ; Fri, 21 Sep 2012 19:33:47 +0000 (UTC) Received: from [127.0.0.1] (wilberforce.math.missouri.edu [128.206.184.213]) by wilberforce.math.missouri.edu (8.14.5/8.14.5) with ESMTP id q8LJXkMG058878 for ; Fri, 21 Sep 2012 14:33:47 -0500 (CDT) (envelope-from stephen@missouri.edu) Message-ID: <505CC11A.5030502@missouri.edu> Date: Fri, 21 Sep 2012 14:33:46 -0500 From: Stephen Montgomery-Smith User-Agent: Mozilla/5.0 (X11; Linux i686; rv:15.0) Gecko/20120827 Thunderbird/15.0 MIME-Version: 1.0 To: freebsd-numerics@freebsd.org References: <5017111E.6060003@missouri.edu> <50526050.2070303@missouri.edu> <20120914212403.H1983@besplex.bde.org> <50538E28.6050400@missouri.edu> <20120915231032.C2669@besplex.bde.org> <50548E15.3010405@missouri.edu> <5054C027.2040008@missouri.edu> <5054C200.7090307@missouri.edu> <20120916041132.D6344@besplex.bde.org> <50553424.2080902@missouri.edu> <20120916134730.Y957@besplex.bde.org> <5055ECA8.2080008@missouri.edu> <20120917022614.R2943@besplex.bde.org> <50562213.9020400@missouri.edu> <20120917060116.G3825@besplex.bde.org> <50563C57.60806@missouri.edu> <20120918012459.V5094@besplex.bde.org> <5057A932.3000603@missouri.edu> <5057F24B.7020605@missouri.edu> <20120918162105.U991@besplex.bde.org> <20120918232850.N2144@besplex.bde.org> <20120919010613.T2493@besplex.bde.org> <505BD9B4.8020801@missouri.edu> <20120921172402.W945@besplex.bde.org> <20120921212525.W1732@besplex.bde.org> <505C7490.90600@missouri.edu> <20120922042112.E3044@besplex.bde.org> <505CBF14.70908@missouri.edu> In-Reply-To: <505CBF14.70908@missouri.edu> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: Complex arg-trig functions X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 21 Sep 2012 19:33:48 -0000 On 09/21/2012 02:25 PM, Stephen Montgomery-Smith wrote: > On 09/21/2012 02:05 PM, Bruce Evans wrote: >> On Fri, 21 Sep 2012, Stephen Montgomery-Smith wrote: > >> - decide whether my old change to remove unnecessary accuracy for the >> case where ax == 1, ay < FLT_EPSILON in catanh() is correct (you >> didn't accept it, and maybe other accuracy changes make it extra >> accuracy more interesting). > > Or maybe I missed it. When you send me changes to catrigf.c, I translate it to catrig.c (the double version), and then convert it back to catrigf.c. So sometimes I miss things. From owner-freebsd-numerics@FreeBSD.ORG Fri Sep 21 22:16:03 2012 Return-Path: Delivered-To: freebsd-numerics@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id E55CF106566C for ; Fri, 21 Sep 2012 22:16:03 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail09.syd.optusnet.com.au (mail09.syd.optusnet.com.au [211.29.132.190]) by mx1.freebsd.org (Postfix) with ESMTP id 5AD2A8FC0C for ; Fri, 21 Sep 2012 22:16:02 +0000 (UTC) Received: from c122-106-157-84.carlnfd1.nsw.optusnet.com.au (c122-106-157-84.carlnfd1.nsw.optusnet.com.au [122.106.157.84]) by mail09.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id q8LMFrJ7010417 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sat, 22 Sep 2012 08:15:55 +1000 Date: Sat, 22 Sep 2012 08:15:53 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Stephen Montgomery-Smith In-Reply-To: <505CBF14.70908@missouri.edu> Message-ID: <20120922080942.U3613@besplex.bde.org> References: <5017111E.6060003@missouri.edu> <20120914212403.H1983@besplex.bde.org> <50538E28.6050400@missouri.edu> <20120915231032.C2669@besplex.bde.org> <50548E15.3010405@missouri.edu> <5054C027.2040008@missouri.edu> <5054C200.7090307@missouri.edu> <20120916041132.D6344@besplex.bde.org> <50553424.2080902@missouri.edu> <20120916134730.Y957@besplex.bde.org> <5055ECA8.2080008@missouri.edu> <20120917022614.R2943@besplex.bde.org> <50562213.9020400@missouri.edu> <20120917060116.G3825@besplex.bde.org> <50563C57.60806@missouri.edu> <20120918012459.V5094@besplex.bde.org> <5057A932.3000603@missouri.edu> <5057F24B.7020605@missouri.edu> <20120918162105.U991@besplex.bde.org> <20120918232850.N2144@besplex.bde.org> <20120919010613.T2493@besplex.bde.org> <505BD9B4.8020801@missouri.edu> <20120921172402.W945@besplex.bde.org> <20120921212525.W1732@besplex.bde.org> <505C7490.90600@missouri.edu> <20120922042112.E3044@besplex.bde.org> <505CBF14.70908@missouri.edu> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-numerics@FreeBSD.org Subject: Re: Complex arg-trig functions X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 21 Sep 2012 22:16:04 -0000 On Fri, 21 Sep 2012, Stephen Montgomery-Smith wrote: > I did put in the pio2_hi etc stuff in before this email telling me to hold > off. > > I assume you still want pio2_hi etc stuff in catanh. There is it still > m_pi_2. I was thinking > poi2_hi + (tiny + pio2_lo) > or maybe declaring pio2_lo as volatile and using > pio2_hi + pio2_lo I have the latter. m_pi_2 (better pio2) could be #defined as (pio2_hi + pio2_lo), but I want to avoid this obfuscation. Bruce From owner-freebsd-numerics@FreeBSD.ORG Fri Sep 21 23:14:08 2012 Return-Path: Delivered-To: freebsd-numerics@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 396BF106566B for ; Fri, 21 Sep 2012 23:14:08 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail28.syd.optusnet.com.au (mail28.syd.optusnet.com.au [211.29.133.169]) by mx1.freebsd.org (Postfix) with ESMTP id 278C38FC08 for ; Fri, 21 Sep 2012 23:14:06 +0000 (UTC) Received: from c122-106-157-84.carlnfd1.nsw.optusnet.com.au (c122-106-157-84.carlnfd1.nsw.optusnet.com.au [122.106.157.84]) by mail28.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id q8LNDuYQ025170 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sat, 22 Sep 2012 09:13:57 +1000 Date: Sat, 22 Sep 2012 09:13:56 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Stephen Montgomery-Smith In-Reply-To: <505CC11A.5030502@missouri.edu> Message-ID: <20120922081607.F3613@besplex.bde.org> References: <5017111E.6060003@missouri.edu> <50538E28.6050400@missouri.edu> <20120915231032.C2669@besplex.bde.org> <50548E15.3010405@missouri.edu> <5054C027.2040008@missouri.edu> <5054C200.7090307@missouri.edu> <20120916041132.D6344@besplex.bde.org> <50553424.2080902@missouri.edu> <20120916134730.Y957@besplex.bde.org> <5055ECA8.2080008@missouri.edu> <20120917022614.R2943@besplex.bde.org> <50562213.9020400@missouri.edu> <20120917060116.G3825@besplex.bde.org> <50563C57.60806@missouri.edu> <20120918012459.V5094@besplex.bde.org> <5057A932.3000603@missouri.edu> <5057F24B.7020605@missouri.edu> <20120918162105.U991@besplex.bde.org> <20120918232850.N2144@besplex.bde.org> <20120919010613.T2493@besplex.bde.org> <505BD9B4.8020801@missouri.edu> <20120921172402.W945@besplex.bde.org> <20120921212525.W1732@besplex.bde.org> <505C7490.90600@missouri.edu> <20120922042112.E3044@besplex.bde.org> <505CBF14.70908@missouri.edu> <505CC11A.5030502@missouri.edu> MIME-Version: 1.0 Content-Type: MULTIPART/MIXED; BOUNDARY="0-46617504-1348269236=:3613" Cc: freebsd-numerics@FreeBSD.org Subject: Re: Complex arg-trig functions X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 21 Sep 2012 23:14:08 -0000 This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. --0-46617504-1348269236=:3613 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed On Fri, 21 Sep 2012, Stephen Montgomery-Smith wrote: > On 09/21/2012 02:25 PM, Stephen Montgomery-Smith wrote: >> On 09/21/2012 02:05 PM, Bruce Evans wrote: >>> On Fri, 21 Sep 2012, Stephen Montgomery-Smith wrote: >> >>> - decide whether my old change to remove unnecessary accuracy for the >>> case where ax == 1, ay < FLT_EPSILON in catanh() is correct (you >>> didn't accept it, and maybe other accuracy changes make it extra >>> accuracy more interesting). >> >> Or maybe I missed it. > > When you send me changes to catrigf.c, I translate it to catrig.c (the double > version), and then convert it back to catrigf.c. So sometimes I miss things. This time I merged everything into catrig.c and even ran the conversion scripts to check this. I keep forgetting to add or remove f suffixes when manually converting. Patches tomorrow. Well, the main new one now, for all 3 files since part of it has lots of magic numbers which are not handled by the conversion scripts. % diff -u2 catrig.c~ catrig.c % --- catrig.c~ 2012-09-21 15:51:00.000000000 +0000 % +++ catrig.c 2012-09-21 21:40:34.521926000 +0000 % @@ -206,6 +217,6 @@ % */ % *B_is_usable = 0; % - *sqrt_A2my2 = scalbn(A, DBL_MANT_DIG); % - *new_y = scalbn(y, DBL_MANT_DIG); % + *sqrt_A2my2 = A * (2 / DBL_EPSILON); % + *new_y= y * (2 / DBL_EPSILON); % return; % } % @@ -244,6 +255,7 @@ % * scaling should avoid any underflow problems. % */ % - *sqrt_A2my2 = scalbn(x, 2*DBL_MANT_DIG) * y / sqrt((y+1)*(y-1)); % - *new_y = scalbn(y, 2*DBL_MANT_DIG); % + *sqrt_A2my2 = x * (4/DBL_EPSILON/DBL_EPSILON) * y / % + sqrt((y+1)*(y-1)); % + *new_y = y * (4/DBL_EPSILON/DBL_EPSILON); % } else /* if (y < 1) */ { % /* It's easy to eliminiate these scalbn()s, since the values are constant. scalbn() is a builtin in gcc-4.2 but not in gcc-3.3, and in 4.2 the builtin just calls the extern function. Here the constant values could be calculated at compile time, but gcc doesn't do this. I think clang does. The conversion script handles this fine. % @@ -501,29 +519,40 @@ % /* % * real_part_reciprocal(x, y) = Re(1/(x+I*y)) = x/(x*x + y*y). % - * Assumes x and y are positive or zero, and one of x and y is larger than % + * Assumes x and y are not NaN, and one of x and y is larger than % * RECIP_EPSILON. We avoid unwarranted underflow. It is important to not use The old version was passed positive x and y, but didn't depend on this. The caller then had to fix up the sign. This version is passed x and y with their original signs. The sign is handled automatically by expressions in the function, and the caller doesn't fix it up. % * the code creal(1/z), because the imaginary part may produce an unwanted % * underflow. % + * This is only called in a context where inexact is always raised before % + * the call, so no effort is made to avoid or force inexact. % */ % inline static double % real_part_reciprocal(double x, double y) % { % + double scale; % + uint32_t hx, hy; % + int32_t ix, iy; % + % /* % * This code is inspired by the C99 document n1124.pdf, Section G.5.1, % * example 2. % */ % - int ex, ey; % - % - if (isinf(x) || isinf(y)) % - return (0); % - if (y == 0) return (1/x); % - if (x == 0) return (x/y/y); % - ex = ilogb(x); % - ey = ilogb(y); % - if (ex - ey >= DBL_MANT_DIG) return (1/x); % - if (ey - ex >= DBL_MANT_DIG) return (x/y/y); % - x = scalbn(x, -ex); % - y = scalbn(y, -ex); % - return scalbn(x/(x*x + y*y), -ex); The conversion to not use scalbn() is fairly direct and routine, but also fairly magic. % + GET_HIGH_WORD(hx, x); % + ix = hx & 0x7ff00000; % + GET_HIGH_WORD(hy, y); % + iy = hy & 0x7ff00000; ilogb() is a builtin to much the same extent as scalbn() IIRC -- mostly it isn't. By working with the raw exponent, we avoid complications from the following design bugs in ilogb(): - ilogb(0) returns FP_ILOGB0, so the above needs special cases for x == 0 and y == 0 - ilogb(+-Inf) returns INT_MAX, so the above needs to handle infs earlier than is optimal. With the raw exponents, you can just subtract them and most things work. Denormals cause problems with this subtraction in some contexts, and ilogb() has to do a lot of work find their exponent, and scalbn has to do a lot of work to shift their mantissa (compared with just adding to the exponent for a normal). Here we handle them fairly subtly without any extra code: when one arg is denormal, the absolute value exceeds RECIP_EPSILON, so there is a large exponent differences, and the special cases for large exponent differences handle this case automatically. The case of y infinite but x finite is handled similarly. % +#define BIAS (DBL_MAX_EXP - 1) % +/* XXX more guard digits are useful iff there is extra precision. */ Without extra precision, a cutoff of with fewer guard digits somehow gives better accuracy than one with more. (The old cutoffs in terms of exponent bits give ~DBL_MANT_DIG/2 active bits and ~DBL_MANT_DIG/2 guard bits.) % +#define CUTOFF (DBL_MANT_DIG / 2 + 1) /* just half or 1 guard digit */ % + if (ix - iy >= CUTOFF << 20 || isinf(x)) % + return (1/x); /* +-Inf -> +-0 is special */ Constants are shifted to avoid shifting the exponent bits in ix and iy back and forth. The special cases for infinities have been reduced to this one here. The sign used to be handled by copysign(0, x) when x is +-Inf. Now the common 1/x return is used. % + if (iy - ix >= CUTOFF << 20) % + return (x/y/y); /* should avoid double div, but hard */ % + if (ix <= (BIAS + DBL_MAX_EXP / 2 - CUTOFF) << 20) % + return (x/(x*x + y*y)); % + scale = 0; % + SET_HIGH_WORD(scale, 0x7ff00000 - ix); /* 2**(1-ilogb(x)) */ % + x *= scale; % + y *= scale; % + return (x/(x*x + y*y) * scale); % } % % @@ -577,13 +606,22 @@ % % if (ax > RECIP_EPSILON || ay > RECIP_EPSILON) % - return (cpack(copysign(real_part_reciprocal(ax, ay), x), copysign(m_pi_2, y))); % + return (cpack(real_part_reciprocal(x, y), copysign(pio2_hi + pio2_lo, y))); Handle the sign in the function. Unrelated details that I couldn't edit out without breaking the patch hunk: % % - if (ax < SQRT_EPSILON && ay < SQRT_EPSILON) % + if (ax < SQRT_3_EPSILON/2 && ay < SQRT_3_EPSILON/2) { % + /* % + * z = 0 was filtered out above. All other cases must raise % + * inexact, but this is the only only that needs to do it % + * explicitly. % + */ % + raise_inexact(); % return (z); % + } It is an optimization to only raise inexact here. The early return for y == 0 && ax <= 1 worked well, but not the early raising of inexact. I also reduce the x == 0 case to atanf() early. That leaves no z == 0 case here, so we raise inexact unconditionally. % % if (ax == 1 && ay < DBL_EPSILON) { % +#if 0 /* this only improves accuracy in an already relative accurate case */ % if (ay > 2*DBL_MIN) % rx = - log(ay/2) / 2; % else % +#endif This was the change that you might have missed. % rx = - (log(ay) - m_ln2) / 2; % } else Everything for the other files is routine except for the magic numbers in real_part_reciprocal related to the packing of the bits. I prefer to leave those as magic. A full macroization of them would have to macroize the accesses GET_HIGH_WORD() etc. % diff -u2 catrigf.c~ catrigf.c % --- catrigf.c~ 2012-09-21 15:51:16.000000000 +0000 % +++ catrigf.c 2012-09-21 21:34:41.140231000 +0000 % @@ -108,6 +109,6 @@ % if (y < FOUR_SQRT_MIN) { % *B_is_usable = 0; % - *sqrt_A2my2 = scalbnf(A, FLT_MANT_DIG); % - *new_y = scalbnf(y, FLT_MANT_DIG); % + *sqrt_A2my2 = A * (2 / FLT_EPSILON); % + *new_y= y * (2 / FLT_EPSILON); % return; % } % @@ -124,6 +125,7 @@ % *sqrt_A2my2 = sqrtf(Amy*(A+y)); % } else if (y > 1) { % - *sqrt_A2my2 = scalbnf(x, 2*FLT_MANT_DIG) * y / sqrtf((y+1)*(y-1)); % - *new_y = scalbnf(y, 2*FLT_MANT_DIG); % + *sqrt_A2my2 = x * (4/FLT_EPSILON/FLT_EPSILON) * y / % + sqrtf((y+1)*(y-1)); % + *new_y = y * (4/FLT_EPSILON/FLT_EPSILON); % } else { % *sqrt_A2my2 = sqrtf((1-y)*(1+y)); % @@ -293,17 +299,24 @@ % real_part_reciprocal(float x, float y) % { % - int ex, ey; % - % - if (isinf(x) || isinf(y)) % - return (0); % - if (y == 0) return (1/x); % - if (x == 0) return (x/y/y); % - ex = ilogbf(x); % - ey = ilogbf(y); % - if (ex - ey >= FLT_MANT_DIG) return (1/x); % - if (ey - ex >= FLT_MANT_DIG) return (x/y/y); % - x = scalbnf(x, -ex); % - y = scalbnf(y, -ex); % - return scalbnf(x/(x*x + y*y), -ex); % + float scale; % + uint32_t hx, hy; % + int32_t ix, iy; % + % + GET_FLOAT_WORD(hx, x); % + ix = hx & 0x7f800000; % + GET_FLOAT_WORD(hy, y); % + iy = hy & 0x7f800000; % +#define BIAS (FLT_MAX_EXP - 1) % +#define CUTOFF (FLT_MANT_DIG / 2 + 1) % + if (ix - iy >= CUTOFF << 23 || isinf(x)) % + return (1/x); % + if (iy - ix >= CUTOFF << 23) % + return (x/y/y); % + if (ix <= (BIAS + FLT_MAX_EXP / 2 - CUTOFF) << 23) % + return (x/(x*x + y*y)); % + SET_FLOAT_WORD(scale, 0x7f800000 - ix); % + x *= scale; % + y *= scale; % + return (x/(x*x + y*y) * scale); % } % % @@ -335,13 +348,17 @@ % % if (ax > RECIP_EPSILON || ay > RECIP_EPSILON) % - return (cpackf(copysignf(real_part_reciprocal(ax, ay), x), copysignf(m_pi_2, y))); % + return (cpackf(real_part_reciprocal(x, y), copysignf(pio2_hi + pio2_lo, y))); % % - if (ax < SQRT_EPSILON && ay < SQRT_EPSILON) % + if (ax < SQRT_3_EPSILON/2 && ay < SQRT_3_EPSILON/2) { % + raise_inexact(); % return (z); % + } % % if (ax == 1 && ay < FLT_EPSILON) { % +#if 0 % if (ay > 2*FLT_MIN) % rx = - logf(ay/2) / 2; % else % +#endif % rx = - (logf(ay) - m_ln2) / 2; % } else % diff -u2 catrigl.c~ catrigl.c % --- catrigl.c~ 2012-09-21 16:22:40.000000000 +0000 % +++ catrigl.c 2012-09-21 21:17:46.962698000 +0000 % @@ -122,6 +124,6 @@ % if (y < FOUR_SQRT_MIN) { % *B_is_usable = 0; % - *sqrt_A2my2 = scalbnl(A, LDBL_MANT_DIG); % - *new_y = scalbnl(y, LDBL_MANT_DIG); % + *sqrt_A2my2 = A * (2 / LDBL_EPSILON); % + *new_y= y * (2 / LDBL_EPSILON); % return; % } % @@ -138,6 +140,7 @@ % *sqrt_A2my2 = sqrtl(Amy*(A+y)); % } else if (y > 1) { % - *sqrt_A2my2 = scalbnl(x, 2*LDBL_MANT_DIG) * y / sqrtl((y+1)*(y-1)); % - *new_y = scalbnl(y, 2*LDBL_MANT_DIG); % + *sqrt_A2my2 = x * (4/LDBL_EPSILON/LDBL_EPSILON) * y / % + sqrtl((y+1)*(y-1)); % + *new_y = y * (4/LDBL_EPSILON/LDBL_EPSILON); % } else { % *sqrt_A2my2 = sqrtl((1-y)*(1+y)); % @@ -307,17 +314,24 @@ % real_part_reciprocal(long double x, long double y) % { % - int ex, ey; % - % - if (isinf(x) || isinf(y)) % - return (0); % - if (y == 0) return (1/x); % - if (x == 0) return (x/y/y); % - ex = ilogbl(x); % - ey = ilogbl(y); % - if (ex - ey >= LDBL_MANT_DIG) return (1/x); % - if (ey - ex >= LDBL_MANT_DIG) return (x/y/y); % - x = scalbnl(x, -ex); % - y = scalbnl(y, -ex); % - return scalbnl(x/(x*x + y*y), -ex); % + long double scale; % + uint16_t hx, hy; % + int16_t ix, iy; % + % + GET_LDBL_EXPSIGN(hx, x); % + ix = hx & 0x7fff; % + GET_LDBL_EXPSIGN(hy, y); % + iy = hy & 0x7fff; % +#define BIAS (LDBL_MAX_EXP - 1) % +#define CUTOFF (LDBL_MANT_DIG / 2 + 1) % + if (ix - iy >= CUTOFF || isinf(x)) % + return (1/x); % + if (iy - ix >= CUTOFF) % + return (x/y/y); % + if (ix <= BIAS + LDBL_MAX_EXP / 2 - CUTOFF) % + return (x/(x*x + y*y)); % + SET_LDBL_EXPSIGN(scale, 0x7fff - ix); % + x *= scale; % + y *= scale; % + return (x/(x*x + y*y) * scale); % } % % @@ -349,13 +363,17 @@ % % if (ax > RECIP_EPSILON || ay > RECIP_EPSILON) % - return (cpackl(copysignl(real_part_reciprocal(ax, ay), x), copysignl(m_pi_2, y))); % + return (cpackl(real_part_reciprocal(x, y), copysignl(pio2_hi + pio2_lo, y))); % % - if (ax < SQRT_EPSILON && ay < SQRT_EPSILON) % + if (ax < SQRT_3_EPSILON/2 && ay < SQRT_3_EPSILON/2) { % + raise_inexact(); % return (z); % + } % % if (ax == 1 && ay < LDBL_EPSILON) { % +#if 0 % if (ay > 2*LDBL_MIN) % rx = - logl(ay/2) / 2; % else % +#endif % rx = - (logl(ay) - m_ln2) / 2; % } else The patch is also attached. Bruce --0-46617504-1348269236=:3613 Content-Type: TEXT/PLAIN; charset=US-ASCII; name="catrig.diff" Content-Transfer-Encoding: BASE64 Content-ID: <20120922091356.U3613@besplex.bde.org> Content-Description: Content-Disposition: attachment; filename="catrig.diff" ZGlmZiAtdTIgY2F0cmlnLmN+IGNhdHJpZy5jDQotLS0gY2F0cmlnLmN+CTIw MTItMDktMjEgMTU6NTE6MDAuMDAwMDAwMDAwICswMDAwDQorKysgY2F0cmln LmMJMjAxMi0wOS0yMSAyMTo0MDozNC41MjE5MjYwMDAgKzAwMDANCkBAIC0z NSw0ICszNSw1IEBADQogI3VuZGVmIGlzbmFuDQogI2RlZmluZSBpc25hbih4 KQkoKHgpICE9ICh4KSkNCisjZGVmaW5lCXJhaXNlX2luZXhhY3QoKQlkbyB7 IHZvbGF0aWxlIGludCBqdW5rID0gMSArIHRpbnk7IH0gd2hpbGUoMCkNCiAj dW5kZWYgc2lnbmJpdA0KICNkZWZpbmUgc2lnbmJpdCh4KQkoX19idWlsdGlu X3NpZ25iaXQoeCkpDQpAQCAtNDYsMTIgKzQ3LDIyIEBADQogbV9lID0JCQky LjcxODI4MTgyODQ1OTA0NTJlMCwJLyogIDB4MTViZjBhOGIxNDU3NjkuMHAt NTEgKi8NCiBtX2xuMiA9CQkJNi45MzE0NzE4MDU1OTk0NTMxZS0xLAkvKiAg MHgxNjJlNDJmZWZhMzllZi4wcC01MyAqLw0KLW1fcGlfMiA9CQkxLjU3MDc5 NjMyNjc5NDg5NjZlMCwJLyogIDB4MTkyMWZiNTQ0NDJkMTguMHAtNTIgKi8N Ci1waW8yX2hpID0JCTEuNTcwNzk2MzI2Nzk0ODk2NTU4MDBlKzAwLAkvKiAw eDNGRjkyMUZCLCAweDU0NDQyRDE4ICovDQotcGlvMl9sbyA9CQk2LjEyMzIz Mzk5NTczNjc2NjAzNTg3ZS0xNywJLyogMHgzQzkxQTYyNiwgMHgzMzE0NUMw NyAqLw0KKy8qDQorICogV2Ugbm8gbG9uZ2VyIHVzZSBNX1BJXzIgb3IgbV9w aV8yLiAgSW4gZmxvYXQgcHJlY2lzaW9uLCByb3VuZGluZyB0bw0KKyAqIG5l YXJlc3Qgb2YgUEkvMiBoYXBwZW5zIHRvIHJvdW5kIHVwLCBidXQgd2Ugd2Fu dCByb3VuZGluZyBkb3duIHNvDQorICogdGhhdCB0aGUgZXhwcmVzc2lvbnMg Zm9yIGFwcHJveGltYXRpbmcgUEkvMiBhbmQgKFBJLzIgLSB6KSB3b3JrIGlu IGFsbA0KKyAqIHJvdW5kaW5nIG1vZGVzLiAgVGhpcyBpcyBub3QgdmVyeSBp bXBvcnRhbnQsIGJ1dCBpdCBpcyBuZWNlc3NhcnkgZm9yDQorICogdGhlIHNh bWUgcXVhbGl0eSBvZiBpbXBsZW1lbnRhdGlvbiB0aGF0IGZkbGlibSBoYWQg aW4gMTk5MiBhbmQgdGhhdA0KKyAqIHJlYWwgZnVuY3Rpb25zIG1vc3RseSBz dGlsbCBoYXZlLiAgVGhpcyBpcyBrbm93biB0byBiZSBicm9rZW4gb25seSBp bg0KKyAqIGxkODAgYWNvc2woKSB2aWEgaW52dHJpZy5jIGFuZCBpbiBzb21l IGludmFsaWQgb3B0aW1pemF0aW9ucyBpbiBjb2RlDQorICogdW5kZXIgZGV2 ZWxvcG1lbnQsIGFuZCBub3cgaW4gYWxsIGZ1bmN0aW9ucyBpbiBjYXRyaWds LmMgdmlhIGludnRyaWcuYy4NCisgKi8NCitwaW8yX2hpID0JCTEuNTcwNzk2 MzI2Nzk0ODk2NmUwLAkvKiAgMHgxOTIxZmI1NDQ0MmQxOC4wcC01MiAqLw0K IFJFQ0lQX0VQU0lMT04gPQkJMS9EQkxfRVBTSUxPTiwNCi1TUVJUX0VQU0lM T04gPQkJMHgxcC0yNywJCS8qIDw9IHNxcnQoREJMX0VQU0lMT04pICovIA0K K1NRUlRfM19FUFNJTE9OID0JMi41ODA5NTY4Mjc5NTE3ODQ5ZS04LAkvKiAg MHgxYmI2N2FlODU4NGNhYS4wcC03OCAqLw0KK1NRUlRfNl9FUFNJTE9OID0J My42NTAwMjQxNDk5ODg4NTcxZS04LAkvKiAgMHgxMzk4OGUxNDA5MjEyZS4w cC03NyAqLw0KIFNRUlRfTUlOID0JCTB4MXAtNTExOwkvKiA+PSBzcXJ0KERC TF9NSU4pICovDQogDQogc3RhdGljIGNvbnN0IHZvbGF0aWxlIGRvdWJsZQ0K K3BpbzJfbG8gPQkJNi4xMjMyMzM5OTU3MzY3NjU5ZS0xNywJLyogIDB4MTFh NjI2MzMxNDVjMDcuMHAtMTA2ICovDQogdGlueSA9CQkJMHgxcC0xMDAwOw0K IA0KQEAgLTIwNiw2ICsyMTcsNiBAQA0KIAkJICovDQogCQkqQl9pc191c2Fi bGUgPSAwOw0KLQkJKnNxcnRfQTJteTIgPSBzY2FsYm4oQSwgREJMX01BTlRf RElHKTsNCi0JCSpuZXdfeSA9IHNjYWxibih5LCBEQkxfTUFOVF9ESUcpOw0K KwkJKnNxcnRfQTJteTIgPSBBICogKDIgLyBEQkxfRVBTSUxPTik7DQorCQkq bmV3X3k9IHkgKiAoMiAvIERCTF9FUFNJTE9OKTsNCiAJCXJldHVybjsNCiAJ fQ0KQEAgLTI0NCw2ICsyNTUsNyBAQA0KIAkJCSAqIHNjYWxpbmcgc2hvdWxk IGF2b2lkIGFueSB1bmRlcmZsb3cgcHJvYmxlbXMuDQogCQkJICovDQotCQkJ KnNxcnRfQTJteTIgPSBzY2FsYm4oeCwgMipEQkxfTUFOVF9ESUcpICogeSAv IHNxcnQoKHkrMSkqKHktMSkpOw0KLQkJCSpuZXdfeSA9IHNjYWxibih5LCAy KkRCTF9NQU5UX0RJRyk7DQorCQkJKnNxcnRfQTJteTIgPSB4ICogKDQvREJM X0VQU0lMT04vREJMX0VQU0lMT04pICogeSAvDQorCQkJICAgIHNxcnQoKHkr MSkqKHktMSkpOw0KKwkJCSpuZXdfeSA9IHkgKiAoNC9EQkxfRVBTSUxPTi9E QkxfRVBTSUxPTik7DQogCQl9IGVsc2UgLyogaWYgKHkgPCAxKSAqLyB7DQog CQkJLyoNCkBAIC0zMDMsOSArMzE1LDEyIEBADQogCX0NCiANCi0JLyogcmFp c2UgaW5leGFjdCBpZiB6ICE9IDAuICovDQotCWlmICgoeCA9PSAwICYmIHkg PT0gMCkgfHwgKGludCkoMSArIHRpbnkpICE9IDEpDQorCS8qIEF2b2lkIHNw dXJpb3VzbHkgcmFpc2luZyBpbmV4YWN0IGZvciB6ID0gMC4gKi8NCisJaWYg KHggPT0gMCAmJiB5ID09IDApDQogCQlyZXR1cm4gKHopOw0KIA0KLQlpZiAo YXggPCBTUVJUX0VQU0lMT04gJiYgYXkgPCBTUVJUX0VQU0lMT04pDQorCS8q IEFsbCByZW1haW5pbmcgY2FzZXMgYXJlIGluZXhhY3QuICovDQorCXJhaXNl X2luZXhhY3QoKTsNCisNCisJaWYgKGF4IDwgU1FSVF82X0VQU0lMT04vNCAm JiBheSA8IFNRUlRfNl9FUFNJTE9OLzQpDQogCQlyZXR1cm4gKHopOw0KIA0K QEAgLTM2NCw1ICszNzksNSBAQA0KIAkJCXJldHVybiAoY3BhY2soeCt4LCAt eSkpOw0KIAkJLyogY2Fjb3MoMCArIEkqTmFOKSA9IFBJLzIgKyBJKk5hTiB3 aXRoIGluZXhhY3QgKi8NCi0JCWlmICh4ID09IDApIHJldHVybiAoY3BhY2so bV9waV8yICsgdGlueSwgeSt5KSk7DQorCQlpZiAoeCA9PSAwKSByZXR1cm4g KGNwYWNrKHBpbzJfaGkgKyBwaW8yX2xvLCB5K3kpKTsNCiAJCS8qDQogCQkg KiBBbGwgb3RoZXIgY2FzZXMgaW52b2x2aW5nIE5hTiByZXR1cm4gTmFOICsg SSpOYU4uDQpAQCAtMzgzLDkgKzM5OCwxMiBAQA0KIAl9DQogDQotCS8qIHJh aXNlIGluZXhhY3QgaWYgeiAhPSAxLiAqLw0KLQlpZiAoKHggPT0gMSAmJiB5 ID09IDApIHx8IChpbnQpKDEgKyB0aW55KSAhPSAxKQ0KKwkvKiBBdm9pZCBz cHVyaW91c2x5IHJhaXNpbmcgaW5leGFjdCBmb3IgeiA9IDEuICovDQorCWlm ICh4ID09IDEgJiYgeSA9PSAwKQ0KIAkJcmV0dXJuIChjcGFjaygwLCAteSkp Ow0KIA0KLQlpZiAoYXggPCBTUVJUX0VQU0lMT04gJiYgYXkgPCBTUVJUX0VQ U0lMT04pDQorCS8qIEFsbCByZW1haW5pbmcgY2FzZXMgYXJlIGluZXhhY3Qu ICovDQorCXJhaXNlX2luZXhhY3QoKTsNCisNCisJaWYgKGF4IDwgU1FSVF82 X0VQU0lMT04vNCAmJiBheSA8IFNRUlRfNl9FUFNJTE9OLzQpDQogCQlyZXR1 cm4gKGNwYWNrKHBpbzJfaGkgLSAoeCAtIHBpbzJfbG8pLCAteSkpOw0KIA0K QEAgLTUwMSwyOSArNTE5LDQwIEBADQogLyoNCiAgKiByZWFsX3BhcnRfcmVj aXByb2NhbCh4LCB5KSA9IFJlKDEvKHgrSSp5KSkgPSB4Lyh4KnggKyB5Knkp Lg0KLSAqIEFzc3VtZXMgeCBhbmQgeSBhcmUgcG9zaXRpdmUgb3IgemVybywg YW5kIG9uZSBvZiB4IGFuZCB5IGlzIGxhcmdlciB0aGFuDQorICogQXNzdW1l cyB4IGFuZCB5IGFyZSBub3QgTmFOLCBhbmQgb25lIG9mIHggYW5kIHkgaXMg bGFyZ2VyIHRoYW4NCiAgKiBSRUNJUF9FUFNJTE9OLiAgV2UgYXZvaWQgdW53 YXJyYW50ZWQgdW5kZXJmbG93LiAgSXQgaXMgaW1wb3J0YW50IHRvIG5vdCB1 c2UNCiAgKiB0aGUgY29kZSBjcmVhbCgxL3opLCBiZWNhdXNlIHRoZSBpbWFn aW5hcnkgcGFydCBtYXkgcHJvZHVjZSBhbiB1bndhbnRlZA0KICAqIHVuZGVy Zmxvdy4NCisgKiBUaGlzIGlzIG9ubHkgY2FsbGVkIGluIGEgY29udGV4dCB3 aGVyZSBpbmV4YWN0IGlzIGFsd2F5cyByYWlzZWQgYmVmb3JlDQorICogdGhl IGNhbGwsIHNvIG5vIGVmZm9ydCBpcyBtYWRlIHRvIGF2b2lkIG9yIGZvcmNl IGluZXhhY3QuDQogICovDQogaW5saW5lIHN0YXRpYyBkb3VibGUNCiByZWFs X3BhcnRfcmVjaXByb2NhbChkb3VibGUgeCwgZG91YmxlIHkpDQogew0KKwlk b3VibGUgc2NhbGU7DQorCXVpbnQzMl90IGh4LCBoeTsNCisJaW50MzJfdCBp eCwgaXk7DQorDQogCS8qDQogCSAqIFRoaXMgY29kZSBpcyBpbnNwaXJlZCBi eSB0aGUgQzk5IGRvY3VtZW50IG4xMTI0LnBkZiwgU2VjdGlvbiBHLjUuMSwN CiAJICogZXhhbXBsZSAyLg0KIAkgKi8NCi0JaW50IGV4LCBleTsNCi0NCi0J aWYgKGlzaW5mKHgpIHx8IGlzaW5mKHkpKQ0KLQkJcmV0dXJuICgwKTsNCi0J aWYgKHkgPT0gMCkgcmV0dXJuICgxL3gpOw0KLQlpZiAoeCA9PSAwKSByZXR1 cm4gKHgveS95KTsNCi0JZXggPSBpbG9nYih4KTsNCi0JZXkgPSBpbG9nYih5 KTsNCi0JaWYgKGV4IC0gZXkgPj0gREJMX01BTlRfRElHKSByZXR1cm4gKDEv eCk7DQotCWlmIChleSAtIGV4ID49IERCTF9NQU5UX0RJRykgcmV0dXJuICh4 L3kveSk7DQotCXggPSBzY2FsYm4oeCwgLWV4KTsNCi0JeSA9IHNjYWxibih5 LCAtZXgpOw0KLQlyZXR1cm4gc2NhbGJuKHgvKHgqeCArIHkqeSksIC1leCk7 DQorCUdFVF9ISUdIX1dPUkQoaHgsIHgpOw0KKwlpeCA9IGh4ICYgMHg3ZmYw MDAwMDsNCisJR0VUX0hJR0hfV09SRChoeSwgeSk7DQorCWl5ID0gaHkgJiAw eDdmZjAwMDAwOw0KKyNkZWZpbmUJQklBUwkoREJMX01BWF9FWFAgLSAxKQ0K Ky8qIFhYWCBtb3JlIGd1YXJkIGRpZ2l0cyBhcmUgdXNlZnVsIGlmZiB0aGVy ZSBpcyBleHRyYSBwcmVjaXNpb24uICovDQorI2RlZmluZQlDVVRPRkYJKERC TF9NQU5UX0RJRyAvIDIgKyAxKQkvKiBqdXN0IGhhbGYgb3IgMSBndWFyZCBk aWdpdCAqLw0KKwlpZiAoaXggLSBpeSA+PSBDVVRPRkYgPDwgMjAgfHwgaXNp bmYoeCkpDQorCQlyZXR1cm4gKDEveCk7CQkvKiArLUluZiAtPiArLTAgaXMg c3BlY2lhbCAqLw0KKwlpZiAoaXkgLSBpeCA+PSBDVVRPRkYgPDwgMjApDQor CQlyZXR1cm4gKHgveS95KTsJCS8qIHNob3VsZCBhdm9pZCBkb3VibGUgZGl2 LCBidXQgaGFyZCAqLw0KKwlpZiAoaXggPD0gKEJJQVMgKyBEQkxfTUFYX0VY UCAvIDIgLSBDVVRPRkYpIDw8IDIwKQ0KKwkJcmV0dXJuICh4Lyh4KnggKyB5 KnkpKTsNCisJc2NhbGUgPSAwOw0KKwlTRVRfSElHSF9XT1JEKHNjYWxlLCAw eDdmZjAwMDAwIC0gaXgpOwkvKiAyKiooMS1pbG9nYih4KSkgKi8NCisJeCAq PSBzY2FsZTsNCisJeSAqPSBzY2FsZTsNCisJcmV0dXJuICh4Lyh4KnggKyB5 KnkpICogc2NhbGUpOw0KIH0NCiANCkBAIC01NTQsNyArNTgzLDcgQEANCiAJ CXJldHVybiAoY3BhY2soYXRhbmgoeCksIHkpKTsgDQogDQotCS8qIHJhaXNl IGluZXhhY3QgaWYgeiAhPSAwLiAqLw0KLQlpZiAoKHggPT0gMCAmJiB5ID09 IDApIHx8IChpbnQpKDEgKyB0aW55KSAhPSAxKQ0KLQkJcmV0dXJuICh6KTsN CisJLyogVG8gZW5zdXJlIHRoZSBzYW1lIGFjY3VyYWN5IGFzIGF0YW4oKSwg YW5kIHRvIGZpbHRlciBvdXQgeiA9IDAuICovDQorCWlmICh4ID09IDApDQor CQlyZXR1cm4gKGNwYWNrKHgsIGF0YW4oeSkpKTsNCiANCiAJaWYgKGlzbmFu KHgpIHx8IGlzbmFuKHkpKSB7DQpAQCAtNTY0LDUgKzU5Myw1IEBADQogCQkv KiBjYXRhbmgoTmFOICsgSSorLUluZikgPSBzaWduKE5hTikwICsgSSorLVBJ LzIgKi8NCiAJCWlmIChpc2luZih5KSkNCi0JCQlyZXR1cm4gKGNwYWNrKGNv cHlzaWduKDAsIHgpLCBjb3B5c2lnbihtX3BpXzIsIHkpKSk7DQorCQkJcmV0 dXJuIChjcGFjayhjb3B5c2lnbigwLCB4KSwgY29weXNpZ24ocGlvMl9oaSAr IHBpbzJfbG8sIHkpKSk7DQogCQkvKiBjYXRhbmgoKy0wICsgSSpOYU4pID0g Ky0wICsgSSpOYU4gKi8NCiAJCWlmICh4ID09IDApDQpAQCAtNTc3LDEzICs2 MDYsMjIgQEANCiANCiAJaWYgKGF4ID4gUkVDSVBfRVBTSUxPTiB8fCBheSA+ IFJFQ0lQX0VQU0lMT04pDQotCQlyZXR1cm4gKGNwYWNrKGNvcHlzaWduKHJl YWxfcGFydF9yZWNpcHJvY2FsKGF4LCBheSksIHgpLCBjb3B5c2lnbihtX3Bp XzIsIHkpKSk7DQorCQlyZXR1cm4gKGNwYWNrKHJlYWxfcGFydF9yZWNpcHJv Y2FsKHgsIHkpLCBjb3B5c2lnbihwaW8yX2hpICsgcGlvMl9sbywgeSkpKTsN CiANCi0JaWYgKGF4IDwgU1FSVF9FUFNJTE9OICYmIGF5IDwgU1FSVF9FUFNJ TE9OKQ0KKwlpZiAoYXggPCBTUVJUXzNfRVBTSUxPTi8yICYmIGF5IDwgU1FS VF8zX0VQU0lMT04vMikgew0KKwkJLyoNCisJCSAqIHogPSAwIHdhcyBmaWx0 ZXJlZCBvdXQgYWJvdmUuICBBbGwgb3RoZXIgY2FzZXMgbXVzdCByYWlzZQ0K KwkJICogaW5leGFjdCwgYnV0IHRoaXMgaXMgdGhlIG9ubHkgb25seSB0aGF0 IG5lZWRzIHRvIGRvIGl0DQorCQkgKiBleHBsaWNpdGx5Lg0KKwkJICovDQor CQlyYWlzZV9pbmV4YWN0KCk7DQogCQlyZXR1cm4gKHopOw0KKwl9DQogDQog CWlmIChheCA9PSAxICYmIGF5IDwgREJMX0VQU0lMT04pIHsNCisjaWYgMCAv KiB0aGlzIG9ubHkgaW1wcm92ZXMgYWNjdXJhY3kgaW4gYW4gYWxyZWFkeSBy ZWxhdGl2ZSBhY2N1cmF0ZSBjYXNlICovDQogCQlpZiAoYXkgPiAyKkRCTF9N SU4pDQogCQkJcnggPSAtIGxvZyhheS8yKSAvIDI7DQogCQllbHNlDQorI2Vu ZGlmDQogCQkJcnggPSAtIChsb2coYXkpIC0gbV9sbjIpIC8gMjsNCiAJfSBl bHNlDQpAQCAtNTkyLDUgKzYzMCw1IEBADQogCWlmIChheCA9PSAxKQ0KIAkJ cnkgPSBhdGFuMigyLCAtYXkpIC8gMjsNCi0JZWxzZSBpZiAoYXkgPCBGT1VS X1NRUlRfTUlOKQ0KKwllbHNlIGlmIChheSA8IERCTF9FUFNJTE9OKQ0KIAkJ cnkgPSBhdGFuMigyKmF5LCAoMS1heCkqKDErYXgpKSAvIDI7DQogCWVsc2UN CmRpZmYgLXUyIGNhdHJpZ2YuY34gY2F0cmlnZi5jDQotLS0gY2F0cmlnZi5j fgkyMDEyLTA5LTIxIDE1OjUxOjE2LjAwMDAwMDAwMCArMDAwMA0KKysrIGNh dHJpZ2YuYwkyMDEyLTA5LTIxIDIxOjM0OjQxLjE0MDIzMTAwMCArMDAwMA0K QEAgLTQ1LDQgKzQ1LDUgQEANCiAjdW5kZWYgaXNuYW4NCiAjZGVmaW5lIGlz bmFuKHgpCSgoeCkgIT0gKHgpKQ0KKyNkZWZpbmUJcmFpc2VfaW5leGFjdCgp CWRvIHsgdm9sYXRpbGUgaW50IGp1bmsgPSAxICsgdGlueTsgfSB3aGlsZSgw KQ0KICN1bmRlZiBzaWduYml0DQogI2RlZmluZSBzaWduYml0KHgpCShfX2J1 aWx0aW5fc2lnbmJpdGYoeCkpDQpAQCAtNTUsMTIgKzU2LDEyIEBADQogbV9l ID0JCQkyLjcxODI4MTgyODVlMCwJCS8qICAweGFkZjg1NC4wcC0yMiAqLw0K IG1fbG4yID0JCQk2LjkzMTQ3MTgwNTZlLTEsCS8qICAweGIxNzIxOC4wcC0y NCAqLw0KLW1fcGlfMiA9CQkxLjU3MDc5NjMyNjhlMCwJCS8qICAweGM5MGZk Yi4wcC0yMyAqLw0KLXBpbzJfaGkgPQkJMS41NzA3OTYyNTEzZSswMCwJLyog MHgzZmM5MGZkYSAqLw0KLXBpbzJfbG8gPQkJNy41NDk3ODk0MTU5ZS0wOCwJ LyogMHgzM2EyMjE2OCAqLw0KK3BpbzJfaGkgPQkJMS41NzA3OTYyNTEzZTAs CQkvKiAgMHhjOTBmZGEuMHAtMjMgKi8NCiBSRUNJUF9FUFNJTE9OID0JCTEv RkxUX0VQU0lMT04sDQotU1FSVF9FUFNJTE9OID0JCTIwNDggKiBGTFRfRVBT SUxPTiwNCitTUVJUXzNfRVBTSUxPTiA9CTUuOTgwMTk5NTY3M2UtNCwJLyog IDB4OWNjNDcxLjBwLTM0ICovDQorU1FSVF82X0VQU0lMT04gPQk4LjQ1NzI3 OTMzMzhlLTQsCS8qICAweGRkYjNkNy4wcC0zNCAqLw0KIFNRUlRfTUlOID0J CTB4MXAtNjM7DQogDQogc3RhdGljIGNvbnN0IHZvbGF0aWxlIGZsb2F0DQor cGlvMl9sbyA9CQk3LjU0OTc4OTk1NDllLTgsCS8qICAweGEyMjE2OS4wcC00 NyAqLw0KIHRpbnkgPQkJCTB4MXAtMTAwOw0KIA0KQEAgLTEwOCw2ICsxMDks NiBAQA0KIAlpZiAoeSA8IEZPVVJfU1FSVF9NSU4pIHsNCiAJCSpCX2lzX3Vz YWJsZSA9IDA7DQotCQkqc3FydF9BMm15MiA9IHNjYWxibmYoQSwgRkxUX01B TlRfRElHKTsNCi0JCSpuZXdfeSA9IHNjYWxibmYoeSwgRkxUX01BTlRfRElH KTsNCisJCSpzcXJ0X0EybXkyID0gQSAqICgyIC8gRkxUX0VQU0lMT04pOw0K KwkJKm5ld195PSB5ICogKDIgLyBGTFRfRVBTSUxPTik7DQogCQlyZXR1cm47 DQogCX0NCkBAIC0xMjQsNiArMTI1LDcgQEANCiAJCQkqc3FydF9BMm15MiA9 IHNxcnRmKEFteSooQSt5KSk7DQogCQl9IGVsc2UgaWYgKHkgPiAxKSB7DQot CQkJKnNxcnRfQTJteTIgPSBzY2FsYm5mKHgsIDIqRkxUX01BTlRfRElHKSAq IHkgLyBzcXJ0ZigoeSsxKSooeS0xKSk7DQotCQkJKm5ld195ID0gc2NhbGJu Zih5LCAyKkZMVF9NQU5UX0RJRyk7DQorCQkJKnNxcnRfQTJteTIgPSB4ICog KDQvRkxUX0VQU0lMT04vRkxUX0VQU0lMT04pICogeSAvDQorCQkJICAgIHNx cnRmKCh5KzEpKih5LTEpKTsNCisJCQkqbmV3X3kgPSB5ICogKDQvRkxUX0VQ U0lMT04vRkxUX0VQU0lMT04pOw0KIAkJfSBlbHNlIHsNCiAJCQkqc3FydF9B Mm15MiA9IHNxcnRmKCgxLXkpKigxK3kpKTsNCkBAIC0xNjEsOCArMTYzLDEw IEBADQogCX0NCiANCi0JaWYgKCh4ID09IDAgJiYgeSA9PSAwKSB8fCAoaW50 KSgxICsgdGlueSkgIT0gMSkNCisJaWYgKHggPT0gMCAmJiB5ID09IDApDQog CQlyZXR1cm4gKHopOw0KIA0KLQlpZiAoYXggPCBTUVJUX0VQU0lMT04gJiYg YXkgPCBTUVJUX0VQU0lMT04pDQorCXJhaXNlX2luZXhhY3QoKTsNCisNCisJ aWYgKGF4IDwgU1FSVF82X0VQU0lMT04vNCAmJiBheSA8IFNRUlRfNl9FUFNJ TE9OLzQpDQogCQlyZXR1cm4gKHopOw0KIA0KQEAgLTIwMiw1ICsyMDYsNSBA QA0KIAkJaWYgKGlzaW5mKHkpKQ0KIAkJCXJldHVybiAoY3BhY2tmKHgreCwg LXkpKTsNCi0JCWlmICh4ID09IDApIHJldHVybiAoY3BhY2tmKG1fcGlfMiAr IHRpbnksIHkreSkpOw0KKwkJaWYgKHggPT0gMCkgcmV0dXJuIChjcGFja2Yo cGlvMl9oaSArIHBpbzJfbG8sIHkreSkpOw0KIAkJcmV0dXJuIChjcGFja2Yo eCswLjBMKyh5KzApLCB4KzAuMEwrKHkrMCkpKTsNCiAJfQ0KQEAgLTIxNSw4 ICsyMTksMTAgQEANCiAJfQ0KIA0KLQlpZiAoKHggPT0gMSAmJiB5ID09IDAp IHx8IChpbnQpKDEgKyB0aW55KSAhPSAxKQ0KKwlpZiAoeCA9PSAxICYmIHkg PT0gMCkNCiAJCXJldHVybiAoY3BhY2tmKDAsIC15KSk7DQogDQotCWlmIChh eCA8IFNRUlRfRVBTSUxPTiAmJiBheSA8IFNRUlRfRVBTSUxPTikNCisJcmFp c2VfaW5leGFjdCgpOw0KKw0KKwlpZiAoYXggPCBTUVJUXzZfRVBTSUxPTi80 ICYmIGF5IDwgU1FSVF82X0VQU0lMT04vNCkNCiAJCXJldHVybiAoY3BhY2tm KHBpbzJfaGkgLSAoeCAtIHBpbzJfbG8pLCAteSkpOw0KIA0KQEAgLTI5Mywx NyArMjk5LDI0IEBADQogcmVhbF9wYXJ0X3JlY2lwcm9jYWwoZmxvYXQgeCwg ZmxvYXQgeSkNCiB7DQotCWludCBleCwgZXk7DQotDQotCWlmIChpc2luZih4 KSB8fCBpc2luZih5KSkNCi0JCXJldHVybiAoMCk7DQotCWlmICh5ID09IDAp IHJldHVybiAoMS94KTsNCi0JaWYgKHggPT0gMCkgcmV0dXJuICh4L3kveSk7 DQotCWV4ID0gaWxvZ2JmKHgpOw0KLQlleSA9IGlsb2diZih5KTsNCi0JaWYg KGV4IC0gZXkgPj0gRkxUX01BTlRfRElHKSByZXR1cm4gKDEveCk7DQotCWlm IChleSAtIGV4ID49IEZMVF9NQU5UX0RJRykgcmV0dXJuICh4L3kveSk7DQot CXggPSBzY2FsYm5mKHgsIC1leCk7DQotCXkgPSBzY2FsYm5mKHksIC1leCk7 DQotCXJldHVybiBzY2FsYm5mKHgvKHgqeCArIHkqeSksIC1leCk7DQorCWZs b2F0IHNjYWxlOw0KKwl1aW50MzJfdCBoeCwgaHk7DQorCWludDMyX3QgaXgs IGl5Ow0KKw0KKwlHRVRfRkxPQVRfV09SRChoeCwgeCk7DQorCWl4ID0gaHgg JiAweDdmODAwMDAwOw0KKwlHRVRfRkxPQVRfV09SRChoeSwgeSk7DQorCWl5 ID0gaHkgJiAweDdmODAwMDAwOw0KKyNkZWZpbmUJQklBUwkoRkxUX01BWF9F WFAgLSAxKQ0KKyNkZWZpbmUJQ1VUT0ZGCShGTFRfTUFOVF9ESUcgLyAyICsg MSkNCisJaWYgKGl4IC0gaXkgPj0gQ1VUT0ZGIDw8IDIzIHx8IGlzaW5mKHgp KQ0KKwkJcmV0dXJuICgxL3gpOw0KKwlpZiAoaXkgLSBpeCA+PSBDVVRPRkYg PDwgMjMpDQorCQlyZXR1cm4gKHgveS95KTsNCisJaWYgKGl4IDw9IChCSUFT ICsgRkxUX01BWF9FWFAgLyAyIC0gQ1VUT0ZGKSA8PCAyMykNCisJCXJldHVy biAoeC8oeCp4ICsgeSp5KSk7DQorCVNFVF9GTE9BVF9XT1JEKHNjYWxlLCAw eDdmODAwMDAwIC0gaXgpOw0KKwl4ICo9IHNjYWxlOw0KKwl5ICo9IHNjYWxl Ow0KKwlyZXR1cm4gKHgvKHgqeCArIHkqeSkgKiBzY2FsZSk7DQogfQ0KIA0K QEAgLTMyMSw2ICszMzQsNiBAQA0KIAkJcmV0dXJuIChjcGFja2YoYXRhbmhm KHgpLCB5KSk7IA0KIA0KLQlpZiAoKHggPT0gMCAmJiB5ID09IDApIHx8IChp bnQpKDEgKyB0aW55KSAhPSAxKQ0KLQkJcmV0dXJuICh6KTsNCisJaWYgKHgg PT0gMCkNCisJCXJldHVybiAoY3BhY2tmKHgsIGF0YW5mKHkpKSk7DQogDQog CWlmIChpc25hbih4KSB8fCBpc25hbih5KSkgew0KQEAgLTMyOCw1ICszNDEs NSBAQA0KIAkJCXJldHVybiAoY3BhY2tmKGNvcHlzaWduZigwLCB4KSwgeSt5 KSk7DQogCQlpZiAoaXNpbmYoeSkpDQotCQkJcmV0dXJuIChjcGFja2YoY29w eXNpZ25mKDAsIHgpLCBjb3B5c2lnbmYobV9waV8yLCB5KSkpOw0KKwkJCXJl dHVybiAoY3BhY2tmKGNvcHlzaWduZigwLCB4KSwgY29weXNpZ25mKHBpbzJf aGkgKyBwaW8yX2xvLCB5KSkpOw0KIAkJaWYgKHggPT0gMCkNCiAJCQlyZXR1 cm4gKGNwYWNrZih4LCB5K3kpKTsNCkBAIC0zMzUsMTMgKzM0OCwxNyBAQA0K IA0KIAlpZiAoYXggPiBSRUNJUF9FUFNJTE9OIHx8IGF5ID4gUkVDSVBfRVBT SUxPTikNCi0JCXJldHVybiAoY3BhY2tmKGNvcHlzaWduZihyZWFsX3BhcnRf cmVjaXByb2NhbChheCwgYXkpLCB4KSwgY29weXNpZ25mKG1fcGlfMiwgeSkp KTsNCisJCXJldHVybiAoY3BhY2tmKHJlYWxfcGFydF9yZWNpcHJvY2FsKHgs IHkpLCBjb3B5c2lnbmYocGlvMl9oaSArIHBpbzJfbG8sIHkpKSk7DQogDQot CWlmIChheCA8IFNRUlRfRVBTSUxPTiAmJiBheSA8IFNRUlRfRVBTSUxPTikN CisJaWYgKGF4IDwgU1FSVF8zX0VQU0lMT04vMiAmJiBheSA8IFNRUlRfM19F UFNJTE9OLzIpIHsNCisJCXJhaXNlX2luZXhhY3QoKTsNCiAJCXJldHVybiAo eik7DQorCX0NCiANCiAJaWYgKGF4ID09IDEgJiYgYXkgPCBGTFRfRVBTSUxP Tikgew0KKyNpZiAwDQogCQlpZiAoYXkgPiAyKkZMVF9NSU4pDQogCQkJcngg PSAtIGxvZ2YoYXkvMikgLyAyOw0KIAkJZWxzZQ0KKyNlbmRpZg0KIAkJCXJ4 ID0gLSAobG9nZihheSkgLSBtX2xuMikgLyAyOw0KIAl9IGVsc2UNCkBAIC0z NTAsNSArMzY3LDUgQEANCiAJaWYgKGF4ID09IDEpDQogCQlyeSA9IGF0YW4y ZigyLCAtYXkpIC8gMjsNCi0JZWxzZSBpZiAoYXkgPCBGT1VSX1NRUlRfTUlO KQ0KKwllbHNlIGlmIChheSA8IEZMVF9FUFNJTE9OKQ0KIAkJcnkgPSBhdGFu MmYoMipheSwgKDEtYXgpKigxK2F4KSkgLyAyOw0KIAllbHNlDQpkaWZmIC11 MiBjYXRyaWdsLmN+IGNhdHJpZ2wuYw0KLS0tIGNhdHJpZ2wuY34JMjAxMi0w OS0yMSAxNjoyMjo0MC4wMDAwMDAwMDAgKzAwMDANCisrKyBjYXRyaWdsLmMJ MjAxMi0wOS0yMSAyMToxNzo0Ni45NjI2OTgwMDAgKzAwMDANCkBAIC0zOCw1 ICszOCw0IEBADQogI2luY2x1ZGUgPGZsb2F0Lmg+DQogDQotI2luY2x1ZGUg ImZwbWF0aC5oIg0KICNpbmNsdWRlICJpbnZ0cmlnLmgiDQogI2luY2x1ZGUg Im1hdGguaCINCkBAIC00Nyw0ICs0Niw1IEBADQogI3VuZGVmIGlzbmFuDQog I2RlZmluZSBpc25hbih4KQkoKHgpICE9ICh4KSkNCisjZGVmaW5lCXJhaXNl X2luZXhhY3QoKQlkbyB7IHZvbGF0aWxlIGludCBqdW5rID0gMSArIHRpbnk7 IH0gd2hpbGUoMCkNCiAjdW5kZWYgc2lnbmJpdA0KICNkZWZpbmUgc2lnbmJp dCh4KQkoX19idWlsdGluX3NpZ25iaXRsKHgpKSANCkBAIC01Niw1ICs1Niw0 IEBADQogUVVBUlRFUl9TUVJUX01BWCA9CTB4MXA4MTg5TCwNCiBSRUNJUF9F UFNJTE9OID0JCTEvTERCTF9FUFNJTE9OLA0KLVNRUlRfRVBTSUxPTiA9CQkx RS0xMEwsDQogU1FSVF9NSU4gPQkJMHgxcC04MTkxTDsNCiANCkBAIC02Miwx NCArNjEsMTcgQEANCiBzdGF0aWMgY29uc3QgdW5pb24gSUVFRWwyYml0cw0K IHVtX2UgPQkJTEQ4MEMoMHhhZGY4NTQ1OGEyYmI0YTliLCAgMSwgMCwgMi43 MTgyODE4Mjg0NTkwNDUyMzUzNmUwTCksDQotdW1fbG4yID0JTEQ4MEMoMHhi MTcyMTdmN2QxY2Y3OWFjLCAtMSwgMCwgNi45MzE0NzE4MDU1OTk0NTMwOTQx N2UtMUwpLA0KLXVtX3BpXzIgPQlMRDgwQygweGM5MGZkYWEyMjE2OGMyMzUs ICAwLCAwLCAxLjU3MDc5NjMyNjc5NDg5NjYxOTIzZTBMKTsNCit1bV9sbjIg PQlMRDgwQygweGIxNzIxN2Y3ZDFjZjc5YWMsIC0xLCAwLCA2LjkzMTQ3MTgw NTU5OTQ1MzA5NDE3ZS0xTCk7DQogI2RlZmluZQkJbV9lCXVtX2UuZQ0KICNk ZWZpbmUJCW1fbG4yCXVtX2xuMi5lDQotI2RlZmluZQkJbV9waV8yCXVtX3Bp XzIuZQ0KK3N0YXRpYyBjb25zdCBsb25nIGRvdWJsZQ0KKy8qIFRoZSBuZXh0 IDIgbGl0ZXJhbHMgZm9yIG5vbi1pMzg2LiAgTWlzcm91bmRpbmcgdGhlbSBv biBpMzg2IGlzIGhhcm1sZXNzLiAqLw0KK1NRUlRfM19FUFNJTE9OID0gNS43 MDMxNjI3MzQzNTc1ODkxNTMxMGUtMTAsCS8qICAweDljYzQ3MGEwNDkwOTcz ZTguMHAtOTQgKi8NCitTUVJUXzZfRVBTSUxPTiA9IDguMDY1NDkwMDg3MzQ5 MzI3NzE2NjRlLTEwOwkvKiAgMHhkZGIzZDc0MmMyNjU1MzllLjBwLTk0ICov DQogI2VsaWYgTERCTF9NQU5UX0RJRyA9PSAxMTMNCiBzdGF0aWMgY29uc3Qg bG9uZyBkb3VibGUNCiBtX2UgPQkJMi43MTgyODE4Mjg0NTkwNDUyMzUzNjAy ODc0NzEzNTI2NjI1MGUwTCwJLyogMHgxNWJmMGE4YjE0NTc2OTUzNTVmYjhh YzQwNGU3YS4wcC0xMTEgKi8NCiBtX2xuMiA9CQk2LjkzMTQ3MTgwNTU5OTQ1 MzA5NDE3MjMyMTIxNDU4MTc2NTY4ZS0xTCwJLyogMHgxNjJlNDJmZWZhMzll ZjM1NzkzYzc2NzMwMDdlNi4wcC0xMTMgKi8NCi1tX3BpXzIgPQkxLjU3MDc5 NjMyNjc5NDg5NjYxOTIzMTMyMTY5MTYzOTc1MTQ0ZTBMOwkvKiAweDE5MjFm YjU0NDQyZDE4NDY5ODk4Y2M1MTcwMWI4LjBwLTExMiAqLw0KK1NRUlRfM19F UFNJTE9OID0gMi40MDM3MDMzNTc5Nzk0NTQ5MDk3NTMzNjcyNzE5OTg3ODEy NGUtMTcsCS8qICAweDFiYjY3YWU4NTg0Y2FhNzNiMjU3NDJkNzA3OGI4LjBw LTE2OCAqLw0KK1NRUlRfNl9FUFNJTE9OID0gMy4zOTkzNDk4ODg3NzYyOTU4 NzIzOTA4MjU4NjIyMzMwMDM5MWUtMTc7CS8qICAweDEzOTg4ZTE0MDkyMTJl N2QwMzIxOTE0MzIxYTU1LjBwLTE2NyAqLw0KICNlbHNlDQogI2Vycm9yICJV bnN1cHBvcnRlZCBsb25nIGRvdWJsZSBmb3JtYXQiDQpAQCAtMTIyLDYgKzEy NCw2IEBADQogCWlmICh5IDwgRk9VUl9TUVJUX01JTikgew0KIAkJKkJfaXNf dXNhYmxlID0gMDsNCi0JCSpzcXJ0X0EybXkyID0gc2NhbGJubChBLCBMREJM X01BTlRfRElHKTsNCi0JCSpuZXdfeSA9IHNjYWxibmwoeSwgTERCTF9NQU5U X0RJRyk7DQorCQkqc3FydF9BMm15MiA9IEEgKiAoMiAvIExEQkxfRVBTSUxP Tik7DQorCQkqbmV3X3k9IHkgKiAoMiAvIExEQkxfRVBTSUxPTik7DQogCQly ZXR1cm47DQogCX0NCkBAIC0xMzgsNiArMTQwLDcgQEANCiAJCQkqc3FydF9B Mm15MiA9IHNxcnRsKEFteSooQSt5KSk7DQogCQl9IGVsc2UgaWYgKHkgPiAx KSB7DQotCQkJKnNxcnRfQTJteTIgPSBzY2FsYm5sKHgsIDIqTERCTF9NQU5U X0RJRykgKiB5IC8gc3FydGwoKHkrMSkqKHktMSkpOw0KLQkJCSpuZXdfeSA9 IHNjYWxibmwoeSwgMipMREJMX01BTlRfRElHKTsNCisJCQkqc3FydF9BMm15 MiA9IHggKiAoNC9MREJMX0VQU0lMT04vTERCTF9FUFNJTE9OKSAqIHkgLw0K KwkJCSAgICBzcXJ0bCgoeSsxKSooeS0xKSk7DQorCQkJKm5ld195ID0geSAq ICg0L0xEQkxfRVBTSUxPTi9MREJMX0VQU0lMT04pOw0KIAkJfSBlbHNlIHsN CiAJCQkqc3FydF9BMm15MiA9IHNxcnRsKCgxLXkpKigxK3kpKTsNCkBAIC0x NzUsOCArMTc4LDEwIEBADQogCX0NCiANCi0JaWYgKCh4ID09IDAgJiYgeSA9 PSAwKSB8fCAoaW50KSgxICsgdGlueSkgIT0gMSkNCisJaWYgKHggPT0gMCAm JiB5ID09IDApDQogCQlyZXR1cm4gKHopOw0KIA0KLQlpZiAoYXggPCBTUVJU X0VQU0lMT04gJiYgYXkgPCBTUVJUX0VQU0lMT04pDQorCXJhaXNlX2luZXhh Y3QoKTsNCisNCisJaWYgKGF4IDwgU1FSVF82X0VQU0lMT04vNCAmJiBheSA8 IFNRUlRfNl9FUFNJTE9OLzQpDQogCQlyZXR1cm4gKHopOw0KIA0KQEAgLTIx Niw1ICsyMjEsNSBAQA0KIAkJaWYgKGlzaW5mKHkpKQ0KIAkJCXJldHVybiAo Y3BhY2tsKHgreCwgLXkpKTsNCi0JCWlmICh4ID09IDApIHJldHVybiAoY3Bh Y2tsKG1fcGlfMiArIHRpbnksIHkreSkpOw0KKwkJaWYgKHggPT0gMCkgcmV0 dXJuIChjcGFja2wocGlvMl9oaSArIHBpbzJfbG8sIHkreSkpOw0KIAkJcmV0 dXJuIChjcGFja2woeCswLjBMKyh5KzApLCB4KzAuMEwrKHkrMCkpKTsNCiAJ fQ0KQEAgLTIyOSw4ICsyMzQsMTAgQEANCiAJfQ0KIA0KLQlpZiAoKHggPT0g MSAmJiB5ID09IDApIHx8IChpbnQpKDEgKyB0aW55KSAhPSAxKQ0KKwlpZiAo eCA9PSAxICYmIHkgPT0gMCkNCiAJCXJldHVybiAoY3BhY2tsKDAsIC15KSk7 DQogDQotCWlmIChheCA8IFNRUlRfRVBTSUxPTiAmJiBheSA8IFNRUlRfRVBT SUxPTikNCisJcmFpc2VfaW5leGFjdCgpOw0KKw0KKwlpZiAoYXggPCBTUVJU XzZfRVBTSUxPTi80ICYmIGF5IDwgU1FSVF82X0VQU0lMT04vNCkNCiAJCXJl dHVybiAoY3BhY2tsKHBpbzJfaGkgLSAoeCAtIHBpbzJfbG8pLCAteSkpOw0K IA0KQEAgLTMwNywxNyArMzE0LDI0IEBADQogcmVhbF9wYXJ0X3JlY2lwcm9j YWwobG9uZyBkb3VibGUgeCwgbG9uZyBkb3VibGUgeSkNCiB7DQotCWludCBl eCwgZXk7DQotDQotCWlmIChpc2luZih4KSB8fCBpc2luZih5KSkNCi0JCXJl dHVybiAoMCk7DQotCWlmICh5ID09IDApIHJldHVybiAoMS94KTsNCi0JaWYg KHggPT0gMCkgcmV0dXJuICh4L3kveSk7DQotCWV4ID0gaWxvZ2JsKHgpOw0K LQlleSA9IGlsb2dibCh5KTsNCi0JaWYgKGV4IC0gZXkgPj0gTERCTF9NQU5U X0RJRykgcmV0dXJuICgxL3gpOw0KLQlpZiAoZXkgLSBleCA+PSBMREJMX01B TlRfRElHKSByZXR1cm4gKHgveS95KTsNCi0JeCA9IHNjYWxibmwoeCwgLWV4 KTsNCi0JeSA9IHNjYWxibmwoeSwgLWV4KTsNCi0JcmV0dXJuIHNjYWxibmwo eC8oeCp4ICsgeSp5KSwgLWV4KTsNCisJbG9uZyBkb3VibGUgc2NhbGU7DQor CXVpbnQxNl90IGh4LCBoeTsNCisJaW50MTZfdCBpeCwgaXk7DQorDQorCUdF VF9MREJMX0VYUFNJR04oaHgsIHgpOw0KKwlpeCA9IGh4ICYgMHg3ZmZmOw0K KwlHRVRfTERCTF9FWFBTSUdOKGh5LCB5KTsNCisJaXkgPSBoeSAmIDB4N2Zm ZjsNCisjZGVmaW5lCUJJQVMJKExEQkxfTUFYX0VYUCAtIDEpDQorI2RlZmlu ZQlDVVRPRkYJKExEQkxfTUFOVF9ESUcgLyAyICsgMSkNCisJaWYgKGl4IC0g aXkgPj0gQ1VUT0ZGIHx8IGlzaW5mKHgpKQ0KKwkJcmV0dXJuICgxL3gpOw0K KwlpZiAoaXkgLSBpeCA+PSBDVVRPRkYpDQorCQlyZXR1cm4gKHgveS95KTsN CisJaWYgKGl4IDw9IEJJQVMgKyBMREJMX01BWF9FWFAgLyAyIC0gQ1VUT0ZG KQ0KKwkJcmV0dXJuICh4Lyh4KnggKyB5KnkpKTsNCisJU0VUX0xEQkxfRVhQ U0lHTihzY2FsZSwgMHg3ZmZmIC0gaXgpOw0KKwl4ICo9IHNjYWxlOw0KKwl5 ICo9IHNjYWxlOw0KKwlyZXR1cm4gKHgvKHgqeCArIHkqeSkgKiBzY2FsZSk7 DQogfQ0KIA0KQEAgLTMzMyw4ICszNDcsOCBAQA0KIA0KIAlpZiAoeSA9PSAw ICYmIGF4IDw9IDEpDQotCQlyZXR1cm4gKGNwYWNrbChhdGFuaGwoeCksIHkp KTsgDQorCQlyZXR1cm4gKGNwYWNrbChhdGFuaCh4KSwgeSkpOyAJLyogWFhY IG5lZWQgYXRhbmhsKCkgKi8NCiANCi0JaWYgKCh4ID09IDAgJiYgeSA9PSAw KSB8fCAoaW50KSgxICsgdGlueSkgIT0gMSkNCi0JCXJldHVybiAoeik7DQor CWlmICh4ID09IDApDQorCQlyZXR1cm4gKGNwYWNrbCh4LCBhdGFubCh5KSkp Ow0KIA0KIAlpZiAoaXNuYW4oeCkgfHwgaXNuYW4oeSkpIHsNCkBAIC0zNDIs NSArMzU2LDUgQEANCiAJCQlyZXR1cm4gKGNwYWNrbChjb3B5c2lnbmwoMCwg eCksIHkreSkpOw0KIAkJaWYgKGlzaW5mKHkpKQ0KLQkJCXJldHVybiAoY3Bh Y2tsKGNvcHlzaWdubCgwLCB4KSwgY29weXNpZ25sKG1fcGlfMiwgeSkpKTsN CisJCQlyZXR1cm4gKGNwYWNrbChjb3B5c2lnbmwoMCwgeCksIGNvcHlzaWdu bChwaW8yX2hpICsgcGlvMl9sbywgeSkpKTsNCiAJCWlmICh4ID09IDApDQog CQkJcmV0dXJuIChjcGFja2woeCwgeSt5KSk7DQpAQCAtMzQ5LDEzICszNjMs MTcgQEANCiANCiAJaWYgKGF4ID4gUkVDSVBfRVBTSUxPTiB8fCBheSA+IFJF Q0lQX0VQU0lMT04pDQotCQlyZXR1cm4gKGNwYWNrbChjb3B5c2lnbmwocmVh bF9wYXJ0X3JlY2lwcm9jYWwoYXgsIGF5KSwgeCksIGNvcHlzaWdubChtX3Bp XzIsIHkpKSk7DQorCQlyZXR1cm4gKGNwYWNrbChyZWFsX3BhcnRfcmVjaXBy b2NhbCh4LCB5KSwgY29weXNpZ25sKHBpbzJfaGkgKyBwaW8yX2xvLCB5KSkp Ow0KIA0KLQlpZiAoYXggPCBTUVJUX0VQU0lMT04gJiYgYXkgPCBTUVJUX0VQ U0lMT04pDQorCWlmIChheCA8IFNRUlRfM19FUFNJTE9OLzIgJiYgYXkgPCBT UVJUXzNfRVBTSUxPTi8yKSB7DQorCQlyYWlzZV9pbmV4YWN0KCk7DQogCQly ZXR1cm4gKHopOw0KKwl9DQogDQogCWlmIChheCA9PSAxICYmIGF5IDwgTERC TF9FUFNJTE9OKSB7DQorI2lmIDANCiAJCWlmIChheSA+IDIqTERCTF9NSU4p DQogCQkJcnggPSAtIGxvZ2woYXkvMikgLyAyOw0KIAkJZWxzZQ0KKyNlbmRp Zg0KIAkJCXJ4ID0gLSAobG9nbChheSkgLSBtX2xuMikgLyAyOw0KIAl9IGVs c2UNCkBAIC0zNjQsNSArMzgyLDUgQEANCiAJaWYgKGF4ID09IDEpDQogCQly eSA9IGF0YW4ybCgyLCAtYXkpIC8gMjsNCi0JZWxzZSBpZiAoYXkgPCBGT1VS X1NRUlRfTUlOKQ0KKwllbHNlIGlmIChheSA8IExEQkxfRVBTSUxPTikNCiAJ CXJ5ID0gYXRhbjJsKDIqYXksICgxLWF4KSooMStheCkpIC8gMjsNCiAJZWxz ZQ0K --0-46617504-1348269236=:3613-- From owner-freebsd-numerics@FreeBSD.ORG Fri Sep 21 23:18:58 2012 Return-Path: Delivered-To: freebsd-numerics@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id BD1DA106566B for ; Fri, 21 Sep 2012 23:18:58 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail12.syd.optusnet.com.au (mail12.syd.optusnet.com.au [211.29.132.193]) by mx1.freebsd.org (Postfix) with ESMTP id 30EA18FC0A for ; Fri, 21 Sep 2012 23:18:57 +0000 (UTC) Received: from c122-106-157-84.carlnfd1.nsw.optusnet.com.au (c122-106-157-84.carlnfd1.nsw.optusnet.com.au [122.106.157.84]) by mail12.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id q8LNIs3a029618 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sat, 22 Sep 2012 09:18:56 +1000 Date: Sat, 22 Sep 2012 09:18:54 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Bruce Evans In-Reply-To: <20120922081607.F3613@besplex.bde.org> Message-ID: <20120922091625.Y3828@besplex.bde.org> References: <5017111E.6060003@missouri.edu> <20120915231032.C2669@besplex.bde.org> <50548E15.3010405@missouri.edu> <5054C027.2040008@missouri.edu> <5054C200.7090307@missouri.edu> <20120916041132.D6344@besplex.bde.org> <50553424.2080902@missouri.edu> <20120916134730.Y957@besplex.bde.org> <5055ECA8.2080008@missouri.edu> <20120917022614.R2943@besplex.bde.org> <50562213.9020400@missouri.edu> <20120917060116.G3825@besplex.bde.org> <50563C57.60806@missouri.edu> <20120918012459.V5094@besplex.bde.org> <5057A932.3000603@missouri.edu> <5057F24B.7020605@missouri.edu> <20120918162105.U991@besplex.bde.org> <20120918232850.N2144@besplex.bde.org> <20120919010613.T2493@besplex.bde.org> <505BD9B4.8020801@missouri.edu> <20120921172402.W945@besplex.bde.org> <20120921212525.W1732@besplex.bde.org> <505C7490.90600@missouri.edu> <20120922042112.E3044@besplex.bde.org> <505CBF14.70908@missouri.edu> <505CC11A.5030502@missouri.edu> <20120922081607.F3613@besplex.bde.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: Stephen Montgomery-Smith , freebsd-numerics@FreeBSD.org Subject: Re: Complex arg-trig functions X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 21 Sep 2012 23:18:58 -0000 On Sat, 22 Sep 2012, Bruce Evans wrote: > ... > Patches tomorrow. Well, the main new one now, for all 3 files since > part of it has lots of magic numbers which are not handled by the > conversion scripts. > ... > The patch is also attached. The attachment was larger than intended. It had my complete patch set for catrig*.c. Bruce From owner-freebsd-numerics@FreeBSD.ORG Sat Sep 22 01:11:21 2012 Return-Path: Delivered-To: freebsd-numerics@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 90DDD106564A for ; Sat, 22 Sep 2012 01:11:21 +0000 (UTC) (envelope-from stephen@missouri.edu) Received: from wilberforce.math.missouri.edu (wilberforce.math.missouri.edu [128.206.184.213]) by mx1.freebsd.org (Postfix) with ESMTP id 492C98FC0C for ; Sat, 22 Sep 2012 01:11:20 +0000 (UTC) Received: from [127.0.0.1] (wilberforce.math.missouri.edu [128.206.184.213]) by wilberforce.math.missouri.edu (8.14.5/8.14.5) with ESMTP id q8M1BJoF092260; Fri, 21 Sep 2012 20:11:19 -0500 (CDT) (envelope-from stephen@missouri.edu) Message-ID: <505D1037.8010202@missouri.edu> Date: Fri, 21 Sep 2012 20:11:19 -0500 From: Stephen Montgomery-Smith User-Agent: Mozilla/5.0 (X11; Linux i686; rv:15.0) Gecko/20120827 Thunderbird/15.0 MIME-Version: 1.0 To: Bruce Evans References: <5017111E.6060003@missouri.edu> <20120915231032.C2669@besplex.bde.org> <50548E15.3010405@missouri.edu> <5054C027.2040008@missouri.edu> <5054C200.7090307@missouri.edu> <20120916041132.D6344@besplex.bde.org> <50553424.2080902@missouri.edu> <20120916134730.Y957@besplex.bde.org> <5055ECA8.2080008@missouri.edu> <20120917022614.R2943@besplex.bde.org> <50562213.9020400@missouri.edu> <20120917060116.G3825@besplex.bde.org> <50563C57.60806@missouri.edu> <20120918012459.V5094@besplex.bde.org> <5057A932.3000603@missouri.edu> <5057F24B.7020605@missouri.edu> <20120918162105.U991@besplex.bde.org> <20120918232850.N2144@besplex.bde.org> <20120919010613.T2493@besplex.bde.org> <505BD9B4.8020801@missouri.edu> <20120921172402.W945@besplex.bde.org> <20120921212525.W1732@besplex.bde.org> <505C7490.90600@missouri.edu> <20120922042112.E3044@besplex.bde.org> <505CBF14.70908@missouri.edu> <505CC11A.5030502@missouri.edu> <20120922081607.F3613@besplex.bde.org> <20120922091625.Y3828@besplex.b! de.org> In-Reply-To: <20120922091625.Y3828@besplex.bde.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-numerics@FreeBSD.org Subject: Re: Complex arg-trig functions X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 22 Sep 2012 01:11:21 -0000 On 09/21/2012 06:18 PM, Bruce Evans wrote: > On Sat, 22 Sep 2012, Bruce Evans wrote: > >> ... >> Patches tomorrow. Well, the main new one now, for all 3 files since >> part of it has lots of magic numbers which are not handled by the >> conversion scripts. >> ... >> The patch is also attached. > > The attachment was larger than intended. It had my complete patch set > for catrig*.c. > > Bruce > > Will there be another complete patch set tomorrow, or did you just send it today? From owner-freebsd-numerics@FreeBSD.ORG Sat Sep 22 04:28:51 2012 Return-Path: Delivered-To: freebsd-numerics@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D81D1106564A for ; Sat, 22 Sep 2012 04:28:51 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail05.syd.optusnet.com.au (mail05.syd.optusnet.com.au [211.29.132.186]) by mx1.freebsd.org (Postfix) with ESMTP id 673338FC08 for ; Sat, 22 Sep 2012 04:28:51 +0000 (UTC) Received: from c122-106-157-84.carlnfd1.nsw.optusnet.com.au (c122-106-157-84.carlnfd1.nsw.optusnet.com.au [122.106.157.84]) by mail05.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id q8M4SmGk030895 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sat, 22 Sep 2012 14:28:49 +1000 Date: Sat, 22 Sep 2012 14:28:48 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Stephen Montgomery-Smith In-Reply-To: <505D1037.8010202@missouri.edu> Message-ID: <20120922142349.X4599@besplex.bde.org> References: <5017111E.6060003@missouri.edu> <5054C200.7090307@missouri.edu> <20120916041132.D6344@besplex.bde.org> <50553424.2080902@missouri.edu> <20120916134730.Y957@besplex.bde.org> <5055ECA8.2080008@missouri.edu> <20120917022614.R2943@besplex.bde.org> <50562213.9020400@missouri.edu> <20120917060116.G3825@besplex.bde.org> <50563C57.60806@missouri.edu> <20120918012459.V5094@besplex.bde.org> <5057A932.3000603@missouri.edu> <5057F24B.7020605@missouri.edu> <20120918162105.U991@besplex.bde.org> <20120918232850.N2144@besplex.bde.org> <20120919010613.T2493@besplex.bde.org> <505BD9B4.8020801@missouri.edu> <20120921172402.W945@besplex.bde.org> <20120921212525.W1732@besplex.bde.org> <505C7490.90600@missouri.edu> <20120922042112.E3044@besplex.bde.org> <505CBF14.70908@missouri.edu> <505CC11A.5030502@missouri.edu> <20120922081607.F3613@besplex.bde.org> <20120922091625.Y3828@besplex.b! de.org> <505D1037.8010202@missouri.edu> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-numerics@FreeBSD.org Subject: Re: Complex arg-trig functions X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 22 Sep 2012 04:28:52 -0000 On Fri, 21 Sep 2012, Stephen Montgomery-Smith wrote: > On 09/21/2012 06:18 PM, Bruce Evans wrote: >> On Sat, 22 Sep 2012, Bruce Evans wrote: >> >>> ... >>> Patches tomorrow. Well, the main new one now, for all 3 files since >>> part of it has lots of magic numbers which are not handled by the >>> conversion scripts. >>> ... >>> The patch is also attached. >> >> The attachment was larger than intended. It had my complete patch set >> for catrig*.c. > > Will there be another complete patch set tomorrow, or did you just send it > today? I sent it all and won't change much more for a while. I might describe it more tomorrow. Already made a small change: always use float for `tiny' (it is now only used in raise_inexact), and in raise_inexact assign (1 + tiny) to volatile float instead of volatile int. Bruce From owner-freebsd-numerics@FreeBSD.ORG Sat Sep 22 05:26:26 2012 Return-Path: Delivered-To: freebsd-numerics@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 3FBE2106566B for ; Sat, 22 Sep 2012 05:26:26 +0000 (UTC) (envelope-from stephen@missouri.edu) Received: from wilberforce.math.missouri.edu (wilberforce.math.missouri.edu [128.206.184.213]) by mx1.freebsd.org (Postfix) with ESMTP id DE17D8FC08 for ; Sat, 22 Sep 2012 05:26:25 +0000 (UTC) Received: from [127.0.0.1] (wilberforce.math.missouri.edu [128.206.184.213]) by wilberforce.math.missouri.edu (8.14.5/8.14.5) with ESMTP id q8M5QIgb058997; Sat, 22 Sep 2012 00:26:18 -0500 (CDT) (envelope-from stephen@missouri.edu) Message-ID: <505D4BFA.5050401@missouri.edu> Date: Sat, 22 Sep 2012 00:26:18 -0500 From: Stephen Montgomery-Smith User-Agent: Mozilla/5.0 (X11; Linux i686; rv:15.0) Gecko/20120827 Thunderbird/15.0 MIME-Version: 1.0 To: Bruce Evans References: <5017111E.6060003@missouri.edu> <5054C200.7090307@missouri.edu> <20120916041132.D6344@besplex.bde.org> <50553424.2080902@missouri.edu> <20120916134730.Y957@besplex.bde.org> <5055ECA8.2080008@missouri.edu> <20120917022614.R2943@besplex.bde.org> <50562213.9020400@missouri.edu> <20120917060116.G3825@besplex.bde.org> <50563C57.60806@missouri.edu> <20120918012459.V5094@besplex.bde.org> <5057A932.3000603@missouri.edu> <5057F24B.7020605@missouri.edu> <20120918162105.U991@besplex.bde.org> <20120918232850.N2144@besplex.bde.org> <20120919010613.T2493@besplex.bde.org> <505BD9B4.8020801@missouri.edu> <20120921172402.W945@besplex.bde.org> <20120921212525.W1732@besplex.bde.org> <505C7490.90600@missouri.edu> <20120922042112.E3044@besplex.bde.org> <505CBF14.70908@missouri.edu> <505CC11A.5030502@missouri.edu> <20120922081607.F3613@besplex.bde.org> <20120922091625.Y3828@besplex.b! de.org> <505D1037.8010202@missouri.edu> <20120922142349.X4599@besplex.bde.org> In-Reply-To: <20120922142349.X4599@besplex.bde.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-numerics@FreeBSD.org Subject: Re: Complex arg-trig functions X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 22 Sep 2012 05:26:26 -0000 On 09/21/2012 11:28 PM, Bruce Evans wrote: > On Fri, 21 Sep 2012, Stephen Montgomery-Smith wrote: > >> On 09/21/2012 06:18 PM, Bruce Evans wrote: >>> On Sat, 22 Sep 2012, Bruce Evans wrote: >>> >>>> ... >>>> Patches tomorrow. Well, the main new one now, for all 3 files since >>>> part of it has lots of magic numbers which are not handled by the >>>> conversion scripts. >>>> ... >>>> The patch is also attached. >>> >>> The attachment was larger than intended. It had my complete patch set >>> for catrig*.c. >> >> Will there be another complete patch set tomorrow, or did you just >> send it today? > > I sent it all and won't change much more for a while. I might describe it > more tomorrow. The only change I made was to change atanh to atanhl in catrigl.c, seeing that I had written one for myself. From owner-freebsd-numerics@FreeBSD.ORG Sat Sep 22 05:41:25 2012 Return-Path: Delivered-To: freebsd-numerics@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 76156106566C for ; Sat, 22 Sep 2012 05:41:25 +0000 (UTC) (envelope-from stephen@missouri.edu) Received: from wilberforce.math.missouri.edu (wilberforce.math.missouri.edu [128.206.184.213]) by mx1.freebsd.org (Postfix) with ESMTP id 2FF768FC08 for ; Sat, 22 Sep 2012 05:41:24 +0000 (UTC) Received: from [127.0.0.1] (wilberforce.math.missouri.edu [128.206.184.213]) by wilberforce.math.missouri.edu (8.14.5/8.14.5) with ESMTP id q8M5fOnK060485 for ; Sat, 22 Sep 2012 00:41:24 -0500 (CDT) (envelope-from stephen@missouri.edu) Message-ID: <505D4F84.90005@missouri.edu> Date: Sat, 22 Sep 2012 00:41:24 -0500 From: Stephen Montgomery-Smith User-Agent: Mozilla/5.0 (X11; Linux i686; rv:15.0) Gecko/20120827 Thunderbird/15.0 MIME-Version: 1.0 To: freebsd-numerics@freebsd.org References: <5017111E.6060003@missouri.edu> <20120916041132.D6344@besplex.bde.org> <50553424.2080902@missouri.edu> <20120916134730.Y957@besplex.bde.org> <5055ECA8.2080008@missouri.edu> <20120917022614.R2943@besplex.bde.org> <50562213.9020400@missouri.edu> <20120917060116.G3825@besplex.bde.org> <50563C57.60806@missouri.edu> <20120918012459.V5094@besplex.bde.org> <5057A932.3000603@missouri.edu> <5057F24B.7020605@missouri.edu> <20120918162105.U991@besplex.bde.org> <20120918232850.N2144@besplex.bde.org> <20120919010613.T2493@besplex.bde.org> <505BD9B4.8020801@missouri.edu> <20120921172402.W945@besplex.bde.org> <20120921212525.W1732@besplex.bde.org> <505C7490.90600@missouri.edu> <20120922042112.E3044@besplex.bde.org> <505CBF14.70908@missouri.edu> <505CC11A.5030502@missouri.edu> <20120922081607.F3613@besplex.bde.org> <20120922091625.Y3828@besplex.b! de.org> <505D1037.8010202@missouri.edu> <20120922142349.X4599@besplex.bde.org> <505D4BFA.5050401@missouri.edu> In-Reply-To: <505D4BFA.5050401@missouri.edu> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: Complex arg-trig functions X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 22 Sep 2012 05:41:25 -0000 On 09/22/2012 12:26 AM, Stephen Montgomery-Smith wrote: > On 09/21/2012 11:28 PM, Bruce Evans wrote: >> On Fri, 21 Sep 2012, Stephen Montgomery-Smith wrote: >> >>> On 09/21/2012 06:18 PM, Bruce Evans wrote: >>>> On Sat, 22 Sep 2012, Bruce Evans wrote: >>>> >>>>> ... >>>>> Patches tomorrow. Well, the main new one now, for all 3 files since >>>>> part of it has lots of magic numbers which are not handled by the >>>>> conversion scripts. >>>>> ... >>>>> The patch is also attached. >>>> >>>> The attachment was larger than intended. It had my complete patch set >>>> for catrig*.c. >>> >>> Will there be another complete patch set tomorrow, or did you just >>> send it today? >> >> I sent it all and won't change much more for a while. I might >> describe it >> more tomorrow. > > The only change I made was to change atanh to atanhl in catrigl.c, > seeing that I had written one for myself. I am finding some errors with catrigl.c in real_part_reciprocal. I don't know how SET_LDBL_EXPSIGN is meant to work. But I needed to add the extra statement: + scale = 1; SET_LDBL_EXPSIGN(scale, 0x7fff - ix); From owner-freebsd-numerics@FreeBSD.ORG Sat Sep 22 18:05:47 2012 Return-Path: Delivered-To: freebsd-numerics@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 18F9F106566B for ; Sat, 22 Sep 2012 18:05:47 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail26.syd.optusnet.com.au (mail26.syd.optusnet.com.au [211.29.133.167]) by mx1.freebsd.org (Postfix) with ESMTP id 9BD178FC08 for ; Sat, 22 Sep 2012 18:05:45 +0000 (UTC) Received: from c122-106-157-84.carlnfd1.nsw.optusnet.com.au (c122-106-157-84.carlnfd1.nsw.optusnet.com.au [122.106.157.84]) by mail26.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id q8MI5bMd009676 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sun, 23 Sep 2012 04:05:38 +1000 Date: Sun, 23 Sep 2012 04:05:37 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Stephen Montgomery-Smith In-Reply-To: <505D4F84.90005@missouri.edu> Message-ID: <20120923030719.E1209@besplex.bde.org> References: <5017111E.6060003@missouri.edu> <50553424.2080902@missouri.edu> <20120916134730.Y957@besplex.bde.org> <5055ECA8.2080008@missouri.edu> <20120917022614.R2943@besplex.bde.org> <50562213.9020400@missouri.edu> <20120917060116.G3825@besplex.bde.org> <50563C57.60806@missouri.edu> <20120918012459.V5094@besplex.bde.org> <5057A932.3000603@missouri.edu> <5057F24B.7020605@missouri.edu> <20120918162105.U991@besplex.bde.org> <20120918232850.N2144@besplex.bde.org> <20120919010613.T2493@besplex.bde.org> <505BD9B4.8020801@missouri.edu> <20120921172402.W945@besplex.bde.org> <20120921212525.W1732@besplex.bde.org> <505C7490.90600@missouri.edu> <20120922042112.E3044@besplex.bde.org> <505CBF14.70908@missouri.edu> <505CC11A.5030502@missouri.edu> <20120922081607.F3613@besplex.bde.org> <20120922091625.Y3828@besplex.b! de.org> <505D1037.8010202@missouri.edu> <20120922142349.X4599@besplex.bde.org> <505D4BFA.5050401@missouri.edu> <505D4F84.90005@missouri.edu> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-numerics@freebsd.org Subject: Re: Complex arg-trig functions X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 22 Sep 2012 18:05:47 -0000 On Sat, 22 Sep 2012, Stephen Montgomery-Smith wrote: > I am finding some errors with catrigl.c in real_part_reciprocal. I don't > know how SET_LDBL_EXPSIGN is meant to work. But I needed to add the extra > statement: > > + scale = 1; > SET_LDBL_EXPSIGN(scale, 0x7fff - ix); Good fix. I forgot that the normalization bit is not part of the exponent for ld80. So setting ony the exponent bits gives a pseudo-zero (zero normalized mantissa and nonzero exponent). I think pseudo-zeros are treated as zeros on i387. Your fix works by setting the normalization bit. On i387, scale = 1 gives some exponent and sign that won't be used, and and a mantissa of 0x8000000000000000ULL. SET_LDBL_EXPSIGN() keeps this mantissa and overrides the exponent and sign to (0, whatever). I don't understand why my tests didn't discover this bug. They only cover the exponent range of doubles, but that is plenty to reach the buggy code. In logl() I spent a lot of time optimizing settings of long doubles as bits, end ended up using just SET_LDBL_EXPSIGN() to modify a normal value that didn't need special setting. Alternative algorithms that created a special normal value first or set all the mantissa bits as bits were slower. The access macros for setting the mantissa bits weren't even committed. Many long double functions use direct bit-field accesses instead. This is unportable and tends to be slower. Here is the method used in ld80/s_expl.c for setting 2**k: @ /* Prepare scale factors. */ @ v.xbits.man = 1ULL << 63; This is the non-implicit normalization bit for ld80. ld128 has implicit normalization so it uses 0 here. The macros in _fpmath.h for handling the normalization bit are poor, and the normalization is known for ld80, so this just hard-codes the value. scale = 0 or scale = 1 here tends to be slower, since it asks to set the sign and exponent bits too. I used it in catrig to reduce unportabilities (there is only the expsign access, and there is a macro for that). Compilers may be able to optimize away the extra setting of the sign and exponent bits by noticing that they will be overwritten soon, and when they don't it turns out that setting things twice is often the best method for confusing compilers into generating optimal memory accesses, since optimal often doesn't equal least number. @ if (k >= LDBL_MIN_EXP) { @ v.xbits.expsign = BIAS + k; @ twopk = v.e; @ } else { @ v.xbits.expsign = BIAS + k + 10000; @ twopkp10000 = v.e; @ } This has complications to avoid setting unrepresentable exponent bits for infinities and denormals. In catrig, these complications are not at runtime (the original exponent is large so negating it doesn't ask for an infinity; negating it might ask for a denormal so 1 is added to the negation of it to produce the new exponent, and this cannot ask for an infinity either). I don't like the direct bit-field accesses in the above although I wrote them. Efficiency tests show that these particular bit-field accesses are optimized well enough on amd64 and i386. Bruce From owner-freebsd-numerics@FreeBSD.ORG Sat Sep 22 20:09:13 2012 Return-Path: Delivered-To: freebsd-numerics@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 2538E106564A for ; Sat, 22 Sep 2012 20:09:13 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail08.syd.optusnet.com.au (mail08.syd.optusnet.com.au [211.29.132.189]) by mx1.freebsd.org (Postfix) with ESMTP id 802C08FC0A for ; Sat, 22 Sep 2012 20:09:11 +0000 (UTC) Received: from c122-106-157-84.carlnfd1.nsw.optusnet.com.au (c122-106-157-84.carlnfd1.nsw.optusnet.com.au [122.106.157.84]) by mail08.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id q8MK92hu011974 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sun, 23 Sep 2012 06:09:04 +1000 Date: Sun, 23 Sep 2012 06:09:02 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Bruce Evans In-Reply-To: <20120922142349.X4599@besplex.bde.org> Message-ID: <20120923044814.S1465@besplex.bde.org> References: <5017111E.6060003@missouri.edu> <20120916041132.D6344@besplex.bde.org> <50553424.2080902@missouri.edu> <20120916134730.Y957@besplex.bde.org> <5055ECA8.2080008@missouri.edu> <20120917022614.R2943@besplex.bde.org> <50562213.9020400@missouri.edu> <20120917060116.G3825@besplex.bde.org> <50563C57.60806@missouri.edu> <20120918012459.V5094@besplex.bde.org> <5057A932.3000603@missouri.edu> <5057F24B.7020605@missouri.edu> <20120918162105.U991@besplex.bde.org> <20120918232850.N2144@besplex.bde.org> <20120919010613.T2493@besplex.bde.org> <505BD9B4.8020801@missouri.edu> <20120921172402.W945@besplex.bde.org> <20120921212525.W1732@besplex.bde.org> <505C7490.90600@missouri.edu> <20120922042112.E3044@besplex.bde.org> <505CBF14.70908@missouri.edu> <505CC11A.5030502@missouri.edu> <20120922081607.F3613@besplex.bde.org> <20120922091625.Y3828@besplex.b! de.org> <505D1037.8010202@missouri.edu> <20120922142349.X4599@besplex.bde.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: Stephen Montgomery-Smith , freebsd-numerics@FreeBSD.org Subject: Re: Complex arg-trig functions X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 22 Sep 2012 20:09:13 -0000 On Sat, 22 Sep 2012, Bruce Evans wrote: > On Fri, 21 Sep 2012, Stephen Montgomery-Smith wrote: > >> On 09/21/2012 06:18 PM, Bruce Evans wrote: >>> ... >>> The attachment was larger than intended. It had my complete patch set >>> for catrig*.c. >> >> Will there be another complete patch set tomorrow, or did you just send it >> today? > > I sent it all and won't change much more for a while. I might describe it > more tomorrow. Already made a small change: always use float for `tiny' > (it is now only used in raise_inexact), and in raise_inexact assign > (1 + tiny) to volatile float instead of volatile int. Just 1 detail in the old patch needs more description. First a new patch to finish merging recent changes: % diff -u2 catrig.c~ catrig.c % --- catrig.c~ 2012-09-22 04:49:51.000000000 +0000 % +++ catrig.c 2012-09-22 18:41:34.779454000 +0000 % @@ -35,5 +35,5 @@ % #undef isnan % #define isnan(x) ((x) != (x)) % -#define raise_inexact() do { volatile int junk = 1 + tiny; } while(0) % +#define raise_inexact() do { volatile float junk = 1 + tiny; } while(0) % #undef signbit % #define signbit(x) (__builtin_signbit(x)) No reason to convert it to int. (Not quite similarly for the (int)(1 + tiny). (float)(1 + tiny) == 1 would have failed due to compiler bugfeatures unless tiny is double_t or larger, since the bugfeatures elide the cast. There would have to have been an assignment to a volatile FP variable, directly as here or via STRICT_ASSIGN (whose purpose is to avoid going through the volatile variable when this is unnecessary). The cast to int is really needed when we have a small x and want to set inexact iff x != 0. Perhaps 'if (x != 0) raise_inexact();' is a more efficient way to do that too, as well as being unobfuscated.) % @@ -48,12 +48,12 @@ % m_ln2 = 6.9314718055994531e-1, /* 0x162e42fefa39ef.0p-53 */ % /* % - * We no longer use M_PI_2 or m_pi_2. In float precision, rounding to % + * We no longer use M_PI_2 or m_pi_2. In some precisions (although not % + * in double precision where this comment is attached), rounding to % * nearest of PI/2 happens to round up, but we want rounding down so % * that the expressions for approximating PI/2 and (PI/2 - z) work in all % - * rounding modes. This is not very important, but it is necessary for % - * the same quality of implementation that fdlibm had in 1992 and that % - * real functions mostly still have. This is known to be broken only in % - * ld80 acosl() via invtrig.c and in some invalid optimizations in code % - * under development, and now in all functions in catrigl.c via invtrig.c. % + * rounding modes. This is not very important, but the real inverse trig % + * functions always took great care to do it, and all inverse trig % + * functions are close working right in all rounding modes for their % + * other approximations (unlike the non-inverse ones). % */ % pio2_hi = 1.5707963267948966e0, /* 0x1921fb54442d18.0p-52 */ Tone down this comment a bit. You might want to remove it. % @@ -64,6 +64,7 @@ % % static const volatile double % -pio2_lo = 6.1232339957367659e-17, /* 0x11a62633145c07.0p-106 */ % -tiny = 0x1p-1000; % +pio2_lo = 6.1232339957367659e-17; /* 0x11a62633145c07.0p-106 */ % +static const volatile float % +tiny = 0x1p-100; % % static double complex clog_for_large_values(double complex z); `tiny' is now always float. It was just wasteful for it to be larger. % @@ -550,5 +551,5 @@ % if (ix <= (BIAS + DBL_MAX_EXP / 2 - CUTOFF) << 20) % return (x/(x*x + y*y)); % - scale = 0; % + scale = 1; % SET_HIGH_WORD(scale, 0x7ff00000 - ix); /* 2**(1-ilogb(x)) */ % x *= scale; scale = 0 makes no sense for doubles either. For floats, the mantissa is part of the high word, so no separate initialization is needed, and none is used. I broke the long double case by copying the float code and not initializing the mantissa bits at all (scale = 0 would have given pseudo-zero, but uninitialzed scale gives almost anything). % @@ -618,12 +619,7 @@ % } % % - if (ax == 1 && ay < DBL_EPSILON) { % -#if 0 /* this only improves accuracy in an already relative accurate case */ % - if (ay > 2*DBL_MIN) % - rx = - log(ay/2) / 2; % - else % -#endif % - rx = - (log(ay) - m_ln2) / 2; % - } else % + if (ax == 1 && ay < DBL_EPSILON) % + rx = - (log(ay) - m_ln2) / 2; % + else % rx = log1p(4*ax / sum_squares(ax-1, ay)) / 4; % I think this can be removed. I explained the details of this a week or 2 ago. Here log(ay) is large compared with m_ln2, so there is an extra error of less than half an ulp for adding m_ln2. The error for log(ay) is < 1 ulp, so the total error is < 1.5 ulps (in practice, < 1.2 ulps). Since other parts of catanh() have errors of 2-3 ulps, we shouldn't care about going above 1.2 ulps here. I now understand catanh() well enough to see how to make its errors < 1 ulp using not much more than clog() needs to do the same things: - use an extra-precision log() and log1p() - evaluate |z-1|**2 accurately (already done in clog() - divide accurately by the accurate |z-1|**2. I peeked at the Intel ia64 math library atanh() and it reminded me that Newton's method is good for extra-precision division, and that I already use this method in an unfinished naive implementation of gamma(). (The Intel ia64 math library is insanely complicated, efficient, accurate and large. It takes about 30K of asm code for each of atanhf(), atanh() and atanhl(), each with optimizations specialized for the precision including a specialized inline log1p). (The naive implementation of gamma() uses the functional equation to shit the arg to a large one so that the asymptotic formala is accurate. This takes lots of divisions to convert the result for the shifted arg to the result for the unshifted arg, and each division must be very accurate for final result to be even moderately accurate. Not a good method, since even 1 non-extra-precision division is slow. But I was interested in seeing how far this method could be pushed. It was barely good enough for lgammaf() near its first negative zero, when all intermediate calculations were done in sesqui-double precision.) (The Intel ia64 math library is of course insanely complicated, etc., for *gamma*(). Instead of 30K of asm per function, it takes 220K for lgammal() and significantly less for lower precisions. It even uses large asm for the wrapper functions (pre-C90 support which we axed long ago). It doesn't do any complex functions, at least in the 2005 glibc version. Altogether, in the glibc 2005 version, Intel *gamma*.S takes 630K, which is slightly larger than all of msun/src in FreeBSD, and we do some complex functions.) % diff -u2 catrigf.c~ catrigf.c % --- catrigf.c~ 2012-09-22 04:49:51.000000000 +0000 % +++ catrigf.c 2012-09-22 00:38:55.503733000 +0000 % @@ -45,5 +45,5 @@ % #undef isnan % #define isnan(x) ((x) != (x)) % -#define raise_inexact() do { volatile int junk = 1 + tiny; } while(0) % +#define raise_inexact() do { volatile float junk = 1 + tiny; } while(0) % #undef signbit % #define signbit(x) (__builtin_signbitf(x)) % diff -u2 catrigl.c~ catrigl.c % --- catrigl.c~ 2012-09-22 05:42:13.000000000 +0000 % +++ catrigl.c 2012-09-22 18:23:27.597349000 +0000 % @@ -46,5 +46,5 @@ % #undef isnan % #define isnan(x) ((x) != (x)) % -#define raise_inexact() do { volatile int junk = 1 + tiny; } while(0) % +#define raise_inexact() do { volatile float junk = 1 + tiny; } while(0) % #undef signbit % #define signbit(x) (__builtin_signbitl(x)) % @@ -78,5 +78,5 @@ % #endif % % -static const volatile long double % +static const volatile float % tiny = 0x1p-10000L; % That's all the new changes. Now from the old patch: @ diff -u2 catrig.c~ catrig.c @ --- catrig.c~ 2012-09-21 15:51:00.000000000 +0000 @ +++ catrig.c 2012-09-22 18:41:34.779454000 +0000 @ @@ -577,20 +607,24 @@ @ ... @ if (ax == 1) @ ry = atan2(2, -ay) / 2; @ - else if (ay < FOUR_SQRT_MIN) @ + else if (ay < DBL_EPSILON) @ ry = atan2(2*ay, (1-ax)*(1+ax)) / 2; @ else You accepted this without comment. My calculation is that since ax != 1, |1-ax*ax| is at lease 2*DBL_EPSILON; ay < DBL_EPSILON makes ay*ay < DBL_EPSILON**2, so it is insignificant. This threshold might be off by a small factor. SQRT_MIN makes some sense as a threshold below which ay*ay would underflow. FOUR_SQRT_MIN makes less sense (I think it was just a nearby handy constant). Both need the estimate on |1-ax*ax| to show that a gradually underflowing ay*ay can be dropped since it is insignificant. I think we would prefer to always evaluate the full |z-1|**2, but can't do it because we want to avoid spurious underflow. The complications in catrig seem to be just as large for avoiding overflow and underflow as for getting enough accuracy. I now understand how to make the float case signifcantly more efficient than the double case: calculate everything in extra precision and exponent range, and depend on the extra exponent range preventing underflow and overflow, so that everything can be simpler and faster. More accuracy occurs even more automatically. But this would be too much work for the unimportant float case. The double case is more interesting, but optimizations for it using long double are only possible on arches that have long doubles larger than doubles, and only optimizations on arches that have efficient long doubles. The Intel ia64 math library of course has complications to do this. It generally uses extra precision in double precision routines, with algorithms specialized for this, and then has to work harder in long double precision and use different algorithms since no extra precision is available. Bruce From owner-freebsd-numerics@FreeBSD.ORG Sat Sep 22 20:54:15 2012 Return-Path: Delivered-To: freebsd-numerics@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 0ADC6106564A for ; Sat, 22 Sep 2012 20:54:15 +0000 (UTC) (envelope-from stephen@missouri.edu) Received: from wilberforce.math.missouri.edu (wilberforce.math.missouri.edu [128.206.184.213]) by mx1.freebsd.org (Postfix) with ESMTP id B82398FC08 for ; Sat, 22 Sep 2012 20:54:14 +0000 (UTC) Received: from [127.0.0.1] (wilberforce.math.missouri.edu [128.206.184.213]) by wilberforce.math.missouri.edu (8.14.5/8.14.5) with ESMTP id q8MKsDLf047053; Sat, 22 Sep 2012 15:54:13 -0500 (CDT) (envelope-from stephen@missouri.edu) Message-ID: <505E2575.6030302@missouri.edu> Date: Sat, 22 Sep 2012 15:54:13 -0500 From: Stephen Montgomery-Smith User-Agent: Mozilla/5.0 (X11; Linux i686; rv:15.0) Gecko/20120827 Thunderbird/15.0 MIME-Version: 1.0 To: Bruce Evans References: <5017111E.6060003@missouri.edu> <20120916041132.D6344@besplex.bde.org> <50553424.2080902@missouri.edu> <20120916134730.Y957@besplex.bde.org> <5055ECA8.2080008@missouri.edu> <20120917022614.R2943@besplex.bde.org> <50562213.9020400@missouri.edu> <20120917060116.G3825@besplex.bde.org> <50563C57.60806@missouri.edu> <20120918012459.V5094@besplex.bde.org> <5057A932.3000603@missouri.edu> <5057F24B.7020605@missouri.edu> <20120918162105.U991@besplex.bde.org> <20120918232850.N2144@besplex.bde.org> <20120919010613.T2493@besplex.bde.org> <505BD9B4.8020801@missouri.edu> <20120921172402.W945@besplex.bde.org> <20120921212525.W1732@besplex.bde.org> <505C7490.90600@missouri.edu> <20120922042112.E3044@besplex.bde.org> <505CBF14.70908@missouri.edu> <505CC11A.5030502@missouri.edu> <20120922081607.F3613@besplex.bde.org> <20120922091625.Y3828@besplex.b! de.org> <505D1037.8010202@missouri.edu> <20120922142349.X4599@besplex.bde.org> <20120923044814.S1465@besplex.bde.org> In-Reply-To: <20120923044814.S1465@besplex.bde.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-numerics@FreeBSD.org Subject: Re: Complex arg-trig functions X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 22 Sep 2012 20:54:15 -0000 On 09/22/2012 03:09 PM, Bruce Evans wrote: > % +static const volatile float > % tiny = 0x1p-10000L; I assume you meant to also change tiny to 0x1p-100. > % > > That's all the new changes. Now from the old patch: > > @ diff -u2 catrig.c~ catrig.c > @ --- catrig.c~ 2012-09-21 15:51:00.000000000 +0000 > @ +++ catrig.c 2012-09-22 18:41:34.779454000 +0000 > @ @@ -577,20 +607,24 @@ > @ ... > @ if (ax == 1) > @ ry = atan2(2, -ay) / 2; > @ - else if (ay < FOUR_SQRT_MIN) > @ + else if (ay < DBL_EPSILON) > @ ry = atan2(2*ay, (1-ax)*(1+ax)) / 2; > @ else > > You accepted this without comment. My calculation is that since ax > != 1, |1-ax*ax| is at lease 2*DBL_EPSILON; ay < DBL_EPSILON makes > ay*ay < DBL_EPSILON**2, so it is insignificant. This threshold might > be off by a small factor. Yes, I think I wasn't paying attention. But I agree with you. From owner-freebsd-numerics@FreeBSD.ORG Sat Sep 22 21:04:15 2012 Return-Path: Delivered-To: freebsd-numerics@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id DAA111065670 for ; Sat, 22 Sep 2012 21:04:15 +0000 (UTC) (envelope-from stephen@missouri.edu) Received: from wilberforce.math.missouri.edu (wilberforce.math.missouri.edu [128.206.184.213]) by mx1.freebsd.org (Postfix) with ESMTP id 941E68FC08 for ; Sat, 22 Sep 2012 21:04:15 +0000 (UTC) Received: from [127.0.0.1] (wilberforce.math.missouri.edu [128.206.184.213]) by wilberforce.math.missouri.edu (8.14.5/8.14.5) with ESMTP id q8ML4EBG048011 for ; Sat, 22 Sep 2012 16:04:14 -0500 (CDT) (envelope-from stephen@missouri.edu) Message-ID: <505E27CE.3060107@missouri.edu> Date: Sat, 22 Sep 2012 16:04:14 -0500 From: Stephen Montgomery-Smith User-Agent: Mozilla/5.0 (X11; Linux i686; rv:15.0) Gecko/20120827 Thunderbird/15.0 MIME-Version: 1.0 To: freebsd-numerics@freebsd.org References: <5017111E.6060003@missouri.edu> <50553424.2080902@missouri.edu> <20120916134730.Y957@besplex.bde.org> <5055ECA8.2080008@missouri.edu> <20120917022614.R2943@besplex.bde.org> <50562213.9020400@missouri.edu> <20120917060116.G3825@besplex.bde.org> <50563C57.60806@missouri.edu> <20120918012459.V5094@besplex.bde.org> <5057A932.3000603@missouri.edu> <5057F24B.7020605@missouri.edu> <20120918162105.U991@besplex.bde.org> <20120918232850.N2144@besplex.bde.org> <20120919010613.T2493@besplex.bde.org> <505BD9B4.8020801@missouri.edu> <20120921172402.W945@besplex.bde.org> <20120921212525.W1732@besplex.bde.org> <505C7490.90600@missouri.edu> <20120922042112.E3044@besplex.bde.org> <505CBF14.70908@missouri.edu> <505CC11A.5030502@missouri.edu> <20120922081607.F3613@besplex.bde.org> <20120922091625.Y3828@besplex.b! de.org> <505D1037.8010202@missouri.edu> <20120922142349.X4599@besplex.bde.org> <20120923044814.S1465@besplex.bde.org> <505E2575.6030302@missouri.edu> In-Reply-To: <505E2575.6030302@missouri.edu> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: Complex arg-trig functions X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 22 Sep 2012 21:04:16 -0000 1. Your recent optimizations seem to have given an overall 3% time saving in my timing tests. That's pretty good in my opinion. 2. In my accuracy tests for casin(h), I have never seen the double or long double have an error greater than 4 ULP. But for the float case I have seen 4.15 ULP. 3. I saw that you have ideas on making catanh have an error less than 1 ULP. Just saying that I saw those comments, although I didn't read them very carefully. From owner-freebsd-numerics@FreeBSD.ORG Sat Sep 22 21:12:41 2012 Return-Path: Delivered-To: freebsd-numerics@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 424A7106566C for ; Sat, 22 Sep 2012 21:12:41 +0000 (UTC) (envelope-from stephen@missouri.edu) Received: from wilberforce.math.missouri.edu (wilberforce.math.missouri.edu [128.206.184.213]) by mx1.freebsd.org (Postfix) with ESMTP id EFA798FC0C for ; Sat, 22 Sep 2012 21:12:40 +0000 (UTC) Received: from [127.0.0.1] (wilberforce.math.missouri.edu [128.206.184.213]) by wilberforce.math.missouri.edu (8.14.5/8.14.5) with ESMTP id q8MLCdiF048573 for ; Sat, 22 Sep 2012 16:12:40 -0500 (CDT) (envelope-from stephen@missouri.edu) Message-ID: <505E29C8.6030305@missouri.edu> Date: Sat, 22 Sep 2012 16:12:40 -0500 From: Stephen Montgomery-Smith User-Agent: Mozilla/5.0 (X11; Linux i686; rv:15.0) Gecko/20120827 Thunderbird/15.0 MIME-Version: 1.0 To: freebsd-numerics@freebsd.org References: <5017111E.6060003@missouri.edu> <20120916134730.Y957@besplex.bde.org> <5055ECA8.2080008@missouri.edu> <20120917022614.R2943@besplex.bde.org> <50562213.9020400@missouri.edu> <20120917060116.G3825@besplex.bde.org> <50563C57.60806@missouri.edu> <20120918012459.V5094@besplex.bde.org> <5057A932.3000603@missouri.edu> <5057F24B.7020605@missouri.edu> <20120918162105.U991@besplex.bde.org> <20120918232850.N2144@besplex.bde.org> <20120919010613.T2493@besplex.bde.org> <505BD9B4.8020801@missouri.edu> <20120921172402.W945@besplex.bde.org> <20120921212525.W1732@besplex.bde.org> <505C7490.90600@missouri.edu> <20120922042112.E3044@besplex.bde.org> <505CBF14.70908@missouri.edu> <505CC11A.5030502@missouri.edu> <20120922081607.F3613@besplex.bde.org> <20120922091625.Y3828@besplex.b! de.org> <505D1037.8010202@missouri.edu> <20120922142349.X4599@besplex.bde.org> <20120923044814.S1465@besplex.bde.org> <505E2575.6030302@missouri.edu> <505E27CE.3060107@missouri.edu> In-Reply-To: <505E27CE.3060107@missouri.edu> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: Complex arg-trig functions X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 22 Sep 2012 21:12:41 -0000 Here is a little cleaning in the code. /* To ensure the same accuracy as atan(), and to filter out z = 0. */ if (x == 0) return (cpack(x, atan(y))); if (isnan(x) || isnan(y)) { /* catanh(+-Inf + I*NaN) = +-0 + I*NaN */ if (isinf(x)) return (cpack(copysign(0, x), y+y)); /* catanh(NaN + I*+-Inf) = sign(NaN)0 + I*+-PI/2 */ if (isinf(y)) return (cpack(copysign(0, x), copysign(pio2_hi + pio2_lo, y))); - /* catanh(+-0 + I*NaN) = +-0 + I*NaN */ - if (x == 0) - return (cpack(x, y+y)); From owner-freebsd-numerics@FreeBSD.ORG Sat Sep 22 21:17:44 2012 Return-Path: Delivered-To: freebsd-numerics@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 894881065670 for ; Sat, 22 Sep 2012 21:17:44 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail05.syd.optusnet.com.au (mail05.syd.optusnet.com.au [211.29.132.186]) by mx1.freebsd.org (Postfix) with ESMTP id F2DE88FC08 for ; Sat, 22 Sep 2012 21:17:43 +0000 (UTC) Received: from c122-106-157-84.carlnfd1.nsw.optusnet.com.au (c122-106-157-84.carlnfd1.nsw.optusnet.com.au [122.106.157.84]) by mail05.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id q8MLHZjr000737 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sun, 23 Sep 2012 07:17:36 +1000 Date: Sun, 23 Sep 2012 07:17:35 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Stephen Montgomery-Smith In-Reply-To: <505E2575.6030302@missouri.edu> Message-ID: <20120923071717.G1963@besplex.bde.org> References: <5017111E.6060003@missouri.edu> <20120916134730.Y957@besplex.bde.org> <5055ECA8.2080008@missouri.edu> <20120917022614.R2943@besplex.bde.org> <50562213.9020400@missouri.edu> <20120917060116.G3825@besplex.bde.org> <50563C57.60806@missouri.edu> <20120918012459.V5094@besplex.bde.org> <5057A932.3000603@missouri.edu> <5057F24B.7020605@missouri.edu> <20120918162105.U991@besplex.bde.org> <20120918232850.N2144@besplex.bde.org> <20120919010613.T2493@besplex.bde.org> <505BD9B4.8020801@missouri.edu> <20120921172402.W945@besplex.bde.org> <20120921212525.W1732@besplex.bde.org> <505C7490.90600@missouri.edu> <20120922042112.E3044@besplex.bde.org> <505CBF14.70908@missouri.edu> <505CC11A.5030502@missouri.edu> <20120922081607.F3613@besplex.bde.org> <20120922091625.Y3828@besplex.b! de.org> <505D1037.8010202@missouri.edu> <20120922142349.X4599@besplex.bde.org> <20120923044814.S1465@besplex.bde.org> <505E2575.6030302@missouri.edu> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-numerics@freebsd.org, Bruce Evans Subject: Re: Complex arg-trig functions X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 22 Sep 2012 21:17:44 -0000 On Sat, 22 Sep 2012, Stephen Montgomery-Smith wrote: > On 09/22/2012 03:09 PM, Bruce Evans wrote: > >> % +static const volatile float >> % tiny = 0x1p-10000L; > > I assume you meant to also change tiny to 0x1p-100. Right. Oops. Bruce From owner-freebsd-numerics@FreeBSD.ORG Sat Sep 22 21:47:38 2012 Return-Path: Delivered-To: freebsd-numerics@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id CC78A106566B for ; Sat, 22 Sep 2012 21:47:38 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail01.syd.optusnet.com.au (mail01.syd.optusnet.com.au [211.29.132.182]) by mx1.freebsd.org (Postfix) with ESMTP id 5BF5F8FC14 for ; Sat, 22 Sep 2012 21:47:37 +0000 (UTC) Received: from c122-106-157-84.carlnfd1.nsw.optusnet.com.au (c122-106-157-84.carlnfd1.nsw.optusnet.com.au [122.106.157.84]) by mail01.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id q8MLlTQc012562 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sun, 23 Sep 2012 07:47:30 +1000 Date: Sun, 23 Sep 2012 07:47:29 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Stephen Montgomery-Smith In-Reply-To: <505E27CE.3060107@missouri.edu> Message-ID: <20120923073807.K2059@besplex.bde.org> References: <5017111E.6060003@missouri.edu> <5055ECA8.2080008@missouri.edu> <20120917022614.R2943@besplex.bde.org> <50562213.9020400@missouri.edu> <20120917060116.G3825@besplex.bde.org> <50563C57.60806@missouri.edu> <20120918012459.V5094@besplex.bde.org> <5057A932.3000603@missouri.edu> <5057F24B.7020605@missouri.edu> <20120918162105.U991@besplex.bde.org> <20120918232850.N2144@besplex.bde.org> <20120919010613.T2493@besplex.bde.org> <505BD9B4.8020801@missouri.edu> <20120921172402.W945@besplex.bde.org> <20120921212525.W1732@besplex.bde.org> <505C7490.90600@missouri.edu> <20120922042112.E3044@besplex.bde.org> <505CBF14.70908@missouri.edu> <505CC11A.5030502@missouri.edu> <20120922081607.F3613@besplex.bde.org> <20120922091625.Y3828@besplex.b! de.org> <505D1037.8010202@missouri.edu> <20120922142349.X4599@besplex.bde.org> <20120923044814.S1465@besplex.bde.org> <505E2575.6030302@missouri.edu> <505E27CE.3060107@missouri.edu> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-numerics@freebsd.org Subject: Re: Complex arg-trig functions X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 22 Sep 2012 21:47:38 -0000 On Sat, 22 Sep 2012, Stephen Montgomery-Smith wrote: > 1. Your recent optimizations seem to have given an overall 3% time saving in > my timing tests. That's pretty good in my opinion. Hopefully more for large and small args :-). > 2. In my accuracy tests for casin(h), I have never seen the double or long > double have an error greater than 4 ULP. But for the float case I have seen > 4.15 ULP. I haven't seen any larger than 3.4. What is the worst case you found? Errors found for float precision tend to be because the density of bad cases is higher so it is easier to test more of them accidentally. I did do some non-random testing for all float cases in narrow strips about x or y = 0 or 1, but not for all combinations of this with all functions. Bruce From owner-freebsd-numerics@FreeBSD.ORG Sat Sep 22 22:25:43 2012 Return-Path: Delivered-To: freebsd-numerics@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id AA6781065673 for ; Sat, 22 Sep 2012 22:25:43 +0000 (UTC) (envelope-from stephen@missouri.edu) Received: from wilberforce.math.missouri.edu (wilberforce.math.missouri.edu [128.206.184.213]) by mx1.freebsd.org (Postfix) with ESMTP id 64DE78FC17 for ; Sat, 22 Sep 2012 22:25:42 +0000 (UTC) Received: from [127.0.0.1] (wilberforce.math.missouri.edu [128.206.184.213]) by wilberforce.math.missouri.edu (8.14.5/8.14.5) with ESMTP id q8MMPf0f026359; Sat, 22 Sep 2012 17:25:41 -0500 (CDT) (envelope-from stephen@missouri.edu) Message-ID: <505E3AE6.2010006@missouri.edu> Date: Sat, 22 Sep 2012 17:25:42 -0500 From: Stephen Montgomery-Smith User-Agent: Mozilla/5.0 (X11; Linux i686; rv:15.0) Gecko/20120827 Thunderbird/15.0 MIME-Version: 1.0 To: Bruce Evans References: <5017111E.6060003@missouri.edu> <5055ECA8.2080008@missouri.edu> <20120917022614.R2943@besplex.bde.org> <50562213.9020400@missouri.edu> <20120917060116.G3825@besplex.bde.org> <50563C57.60806@missouri.edu> <20120918012459.V5094@besplex.bde.org> <5057A932.3000603@missouri.edu> <5057F24B.7020605@missouri.edu> <20120918162105.U991@besplex.bde.org> <20120918232850.N2144@besplex.bde.org> <20120919010613.T2493@besplex.bde.org> <505BD9B4.8020801@missouri.edu> <20120921172402.W945@besplex.bde.org> <20120921212525.W1732@besplex.bde.org> <505C7490.90600@missouri.edu> <20120922042112.E3044@besplex.bde.org> <505CBF14.70908@missouri.edu> <505CC11A.5030502@missouri.edu> <20120922081607.F3613@besplex.bde.org> <20120922091625.Y3828@besplex.b! de.org> <505D1037.8010202@missouri.edu> <20120922142349.X4599@besplex.bde.org> <20120923044814.S1465@besplex.bde.org> <505E2575.6030302@missouri.edu> <505E27CE.3060107@missouri.edu> <20120923073807.K2059@besplex.bde.org> In-Reply-To: <20120923073807.K2059@besplex.bde.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-numerics@freebsd.org Subject: Re: Complex arg-trig functions X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 22 Sep 2012 22:25:43 -0000 On 09/22/2012 04:47 PM, Bruce Evans wrote: > On Sat, 22 Sep 2012, Stephen Montgomery-Smith wrote: > >> 2. In my accuracy tests for casin(h), I have never seen the double or >> long double have an error greater than 4 ULP. But for the float case >> I have seen 4.15 ULP. > > I haven't seen any larger than 3.4. What is the worst case you found? > Errors found for float precision tend to be because the density of bad > cases is higher so it is easier to test more of them accidentally. I > did do some non-random testing for all float cases in narrow strips > about x or y = 0 or 1, but not for all combinations of this with all > functions. Here are some examples for float. In all these outputs: The first entry is the "count". The second entry is the function. The third and fourth entries are the real and imaginary part of the error in ULP. The fifth and sixth entries are the real and imaginary part of the input. The seventh and eighth and ninth and tenth entries are the real part and imaginary part of the answers from the float/double respectively (printed to few enough decimal places that you cannot tell they are different.) 2365614 acos 3.75621 0.86681 1.0338860750198364258 -0.090228326618671417236 0.246582 0.361712 0.246582 0.361712 3087248 acos 3.56538 0.1165 2.3730618953704833984 0.26976472139358520508 0.124496 -1.51821 0.124496 -1.51821 5973027 asinh 3.61544 0.513 0.10977014899253845215 0.48254761099815368652 0.124712 0.499309 0.124712 0.499309 6558511 acosh 3.57286 0.419525 -0.29658588767051696777 -0.11975207924842834473 0.124975 -1.8695 0.124975 -1.8695 9998127 acos 3.51324 1.09793 1.0892471075057983398 -0.12541522085666656494 0.247452 0.491951 0.247452 0.491951 14879751 asinh 3.5643 1.83067 -0.11303693056106567383 0.4351412653923034668 -0.124994 0.446448 -0.124994 0.446448 19510082 asin 3.61922 0.0103899 0.46096378564834594727 -0.01612871512770652771 0.478995 -0.0181731 0.478995 -0.0181731 I can send more examples on request. I'm not seeing a real pattern here. From owner-freebsd-numerics@FreeBSD.ORG Sat Sep 22 23:46:39 2012 Return-Path: Delivered-To: freebsd-numerics@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 81FA4106566B for ; Sat, 22 Sep 2012 23:46:39 +0000 (UTC) (envelope-from m.e.sanliturk@gmail.com) Received: from mail-oa0-f54.google.com (mail-oa0-f54.google.com [209.85.219.54]) by mx1.freebsd.org (Postfix) with ESMTP id 411F88FC08 for ; Sat, 22 Sep 2012 23:46:39 +0000 (UTC) Received: by oagm1 with SMTP id m1so5660913oag.13 for ; Sat, 22 Sep 2012 16:46:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=q/YQ5GthtkHc/xsyR9m2stOY8y5DyQZ/TZMBQ9Dl7/A=; b=HaJN94nG3lGUavCLGK0blD+udHyMfuTMBj7IFletfZN8K6qhXIiCXMLJw97BtJSdvW NpEhviS6sCP25lNgziNq3xCOj0TSkFCHDEb8aGxS6EswoBT8sdCT2bnZxxnKlfUcQrqJ P4KMBsmaN0aeWdj+0WdROyIgFXPcRwmA73rDRRYyejsCjrlEow9Ilv5XOswJy/DZxvoC nzC/priGtuS/5Onj2txEvbCPgk3+Tj0dvHQNzd1ZGCx/KlujEVX+WnJhdBcm7jDLND8G zJ8yBbjLYEFtFhCBRTyM95tdywo0Whq6uaNoTIgZ2eYn7KO0P32bKeGbwsxqZN0AnyCU U6Xw== MIME-Version: 1.0 Received: by 10.182.76.194 with SMTP id m2mr6921147obw.27.1348357598258; Sat, 22 Sep 2012 16:46:38 -0700 (PDT) Received: by 10.182.141.66 with HTTP; Sat, 22 Sep 2012 16:46:38 -0700 (PDT) Date: Sat, 22 Sep 2012 16:46:38 -0700 Message-ID: From: Mehmet Erol Sanliturk To: freebsd-numerics@freebsd.org Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Subject: Book names about Computer Approximations X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 22 Sep 2012 23:46:39 -0000 Dear All , I want to buy some books about computer approximations to functions such as elementary , distributions , etc. like the books http://www.amazon.com/Approximations-Digital-Computers-Cecil-Hastings/dp/B000Q5GBG6/ref=sr_1_1?s=books&ie=UTF8&qid=1348357087&sr=1-1 http://www.amazon.com/Computer-Approximations-John-Fraser-Hart/dp/0882756427 http://www.amazon.com/Elementary-Functions-Implementation-Jean-Michel-Muller/dp/0817643729/ref=pd_sim_sbs_b_2 http://www.amazon.com/Elementary-Functions-Prentice-Hall-computational-mathematics/dp/0138220646/ref=pd_sim_sbs_b_3 I have searched "review of computer approximation books" , but I could not find any useful source . I am not near to a library or a bookseller ( even it is not possible to find such books in Turkey , it is necessary to order them ) to see sample copies . If you have time , would you please suggest names , or links , or ISBN numbers , whichever is suitable for you , which I can find from publishers , especially recently published and can be used to develop good quality procedures from their contents . Thank you very much . Mehmet Erol Sanliturk