From owner-freebsd-numerics@FreeBSD.ORG Fri Sep 21 08:25:13 2012 Return-Path: Delivered-To: freebsd-numerics@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 32125106566B for ; Fri, 21 Sep 2012 08:25:13 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from fallbackmx07.syd.optusnet.com.au (fallbackmx07.syd.optusnet.com.au [211.29.132.9]) by mx1.freebsd.org (Postfix) with ESMTP id A56878FC0C for ; Fri, 21 Sep 2012 08:25:11 +0000 (UTC) Received: from mail05.syd.optusnet.com.au (mail05.syd.optusnet.com.au [211.29.132.186]) by fallbackmx07.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id q8L8P4KH031489 for ; Fri, 21 Sep 2012 18:25:04 +1000 Received: from c122-106-157-84.carlnfd1.nsw.optusnet.com.au (c122-106-157-84.carlnfd1.nsw.optusnet.com.au [122.106.157.84]) by mail05.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id q8L8OtMR014157 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 21 Sep 2012 18:24:56 +1000 Date: Fri, 21 Sep 2012 18:24:55 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Stephen Montgomery-Smith In-Reply-To: <505BD9B4.8020801@missouri.edu> Message-ID: <20120921172402.W945@besplex.bde.org> References: <5017111E.6060003@missouri.edu> <50511B40.3070009@missouri.edu> <20120913204808.T1964@besplex.bde.org> <5051F59C.6000603@missouri.edu> <20120914014208.I2862@besplex.bde.org> <50526050.2070303@missouri.edu> <20120914212403.H1983@besplex.bde.org> <50538E28.6050400@missouri.edu> <20120915231032.C2669@besplex.bde.org> <50548E15.3010405@missouri.edu> <5054C027.2040008@missouri.edu> <5054C200.7090307@missouri.edu> <20120916041132.D6344@besplex.bde.org> <50553424.2080902@missouri.edu> <20120916134730.Y957@besplex.bde.org> <5055ECA8.2080008@missouri.edu> <20120917022614.R2943@besplex.bde.org> <50562213.9020400@missouri.edu> <20120917060116.G3825@besplex.bde.org> <50563C57.60806@missouri.edu> <20120918012459.V5094@besplex.bde.org> <5057A932.3000603@missouri.edu> <5057F24B.7020605@missouri.edu> <20120918162105.U991@besplex.bde.org> <20120918232850.N2144@besplex.bde.org> <20120919010613.T2493@besplex.bde.org> <505BD9B4.8020801@missouri.edu> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-numerics@FreeBSD.org Subject: Re: Complex arg-trig functions X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 21 Sep 2012 08:25:13 -0000 On Thu, 20 Sep 2012, Stephen Montgomery-Smith wrote: > I also added inexact optimizations for casinh and cacos. I couldn't get this to give the hoped-for optimization and dropped it even for catanh. You may still prefer it because it is simpler. But later I found intricacies for returning the correct value of Pi/2 which make the inexact optimizations even less useful: The real functions are careful to a fault to return Pi/2 correctly rounded in all rounding modes. They don't use return a constant Pi/2, but evaluate Pi/2 at runtime using pio2_hi + pio2_lo, where pio2_hi is (or should be) Pi/2 rounded _down_ and pio2_lo is an approximation to the residual and is volatile enough for the addition to be done at runtime. The following shortcuts lose this care: - similarly, but with pio2_hi = Pi/2 rounded up. Now pio2_hi + pio2_lo is 1 ulp too high when the rounding mode is either up or towards plus infinity. Rounding of Pi/2 to nearest may go either way. fdlibm code seems to be careful to round it down in all cases. In in FreeBSD libm, at least e_acosf.c is careful to round down when the natural rounding is up, but at invtrig.c is not careful -- it apparently uses natural rounding, which happens to be up for ld80 and down for ld128, or vice versa. - similary, but with pio2_hi rounded to nearest and 'tiny' used instead of pio2_lo. Using 'tiny' requires pio2_hi to be nearest and only works in some rounding modes. - similarly, but with just m_pi_2 = Pi/2 rounded to nearest. Now there is no runtime evaluation, so the result cannot depend on the rounding mode and inexact must be set in some other way. A quick test of most functions in all rounding mode shows that non-default modes work quite well except for the most complicated and/or heavily optimized functions when they are written in C (the totally failing ones are sin/cos/tan/exp*/pow/hypot but not log* (except for log*(1)) or most inverse functions. Optimizations in sin/cos/tan/exp2 require rounding to nearest). My tests weren't non-quick enough to detect any 1-ulp errors for Pi/2, and only showed that the errors mostly don't blow up for inverse functions. In view of this, I'd like to keep doing the Pi/2 intricacies. Partial diffs for catrig.c: % 48a49,50 % > #define pio2_hi m_pi_2 /* works because m_pi_2 rounded down */ % > pio2_lo = 6.1232339957367660e-17, /* 0x3C91A626, 0x33145C07 */ % 384,389c390,407 % 335c341 % < * cacos(z) = PI/2 - z + O(|z|^3) as z -> 0 % --- % > * cacos(z) = PI/2 - z + O(z^3) as z -> 0 This should be PI/2 + O(z) when only the constant term is used, but I restored use of the z term. Start changing O(|n|) to O(n). The absolute value should be implicit. % 384,389c390,407 % < ... % < if (ax < DBL_EPSILON/8 && ay < SQRT_6_EPSILON/8) % < return (cpack(m_pi_2, -y)); % --- % > if (ax < SQRT_EPSILON && ay < SQRT_EPSILON) % > /* % > * This is quite subtle. The expression for PI/2 - x % > * is cloned from e_acos.c, where it is apparently over- % > * designed to work in all rounding modes. It requires % > * pio2_hi to be rounded down even when rounding to % > * nearest would be more accurate. We can't add `tiny' % > * to pio2_hi as usual to raise inexact, since this would % > * break the fussy rounding in some non-default modes. % > * So we use the same method to raise inexact as for the % > * approximation 'z'. e_acos.c uses the even subtler % > * method of depending on inexactness in a higher-degree % > * approximation. That is not practical here, since if % > * we used the x**3 term then we would need an extra % > * case to avoid spurious underflow. % > */ % > if ((int)ax == 0 && (int)ay == 0) % > return (cpack(pio2_hi - (x - pio2_lo), -y)); Despite being too verbose (BTW, don't commit my essays :-), the comment neglects to point out that with the expression written in this form, inexact must be set separately since (x - pio2_lo) might be a value (e.g., 0) that doesn't give inexactness when subtracted. All other returns of Pi/2 are simpler than this, and should return pio2_hi + pio2_lo. The constants should be spelled like this, and not using M_PI or m_pi_2; this is especially important in long double precision since then the constants are declared/defined with this spelling in extern constant tables in invtrig.c to centralize the complications for defining them them for all combinations of ld80/ ld128/i386. So my patch for this is simplest for long double precision -- there it uses invtrig.h and doesn't worry about the known bug that pio2_hi is incorrectly rounded in some cases. With these intricacies, there is less to be gained by setting inexact up front. Adding pio2_lo sequentially is slightly slower than an up-front setting in parallel, but when both are done the up-front setting just adds overhead on average. Some of the optimizations could be done more globably: - an option to not support nonstandard rounding modes for Pi/2. This seems to require pio2_lo to be a static const, unlike in invtrig.*. Make this non-volatile. The compiler will then evaluate pio2_hi + pio2_lo at compile time. - an option to not support careful setting of inexact. The above gives it for Pi/2. Settings of it using (1 + tiny) == 1 would work similarly -- make `tiny' a static nonvolatile const. Bruce