Date: Fri, 21 Sep 2012 18:24:55 +1000 (EST) From: Bruce Evans <brde@optusnet.com.au> To: Stephen Montgomery-Smith <stephen@missouri.edu> Cc: freebsd-numerics@FreeBSD.org Subject: Re: Complex arg-trig functions Message-ID: <20120921172402.W945@besplex.bde.org> In-Reply-To: <505BD9B4.8020801@missouri.edu> References: <5017111E.6060003@missouri.edu> <50511B40.3070009@missouri.edu> <20120913204808.T1964@besplex.bde.org> <5051F59C.6000603@missouri.edu> <20120914014208.I2862@besplex.bde.org> <50526050.2070303@missouri.edu> <20120914212403.H1983@besplex.bde.org> <50538E28.6050400@missouri.edu> <20120915231032.C2669@besplex.bde.org> <50548E15.3010405@missouri.edu> <5054C027.2040008@missouri.edu> <5054C200.7090307@missouri.edu> <20120916041132.D6344@besplex.bde.org> <50553424.2080902@missouri.edu> <20120916134730.Y957@besplex.bde.org> <5055ECA8.2080008@missouri.edu> <20120917022614.R2943@besplex.bde.org> <50562213.9020400@missouri.edu> <20120917060116.G3825@besplex.bde.org> <50563C57.60806@missouri.edu> <20120918012459.V5094@besplex.bde.org> <5057A932.3000603@missouri.edu> <5057F24B.7020605@missouri.edu> <20120918162105.U991@besplex.bde.org> <20120918232850.N2144@besplex.bde.org> <20120919010613.T2493@besplex.bde.org> <505BD9B4.8020801@missouri.edu>
next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, 20 Sep 2012, Stephen Montgomery-Smith wrote: > I also added inexact optimizations for casinh and cacos. I couldn't get this to give the hoped-for optimization and dropped it even for catanh. You may still prefer it because it is simpler. But later I found intricacies for returning the correct value of Pi/2 which make the inexact optimizations even less useful: The real functions are careful to a fault to return Pi/2 correctly rounded in all rounding modes. They don't use return a constant Pi/2, but evaluate Pi/2 at runtime using pio2_hi + pio2_lo, where pio2_hi is (or should be) Pi/2 rounded _down_ and pio2_lo is an approximation to the residual and is volatile enough for the addition to be done at runtime. The following shortcuts lose this care: - similarly, but with pio2_hi = Pi/2 rounded up. Now pio2_hi + pio2_lo is 1 ulp too high when the rounding mode is either up or towards plus infinity. Rounding of Pi/2 to nearest may go either way. fdlibm code seems to be careful to round it down in all cases. In in FreeBSD libm, at least e_acosf.c is careful to round down when the natural rounding is up, but at invtrig.c is not careful -- it apparently uses natural rounding, which happens to be up for ld80 and down for ld128, or vice versa. - similary, but with pio2_hi rounded to nearest and 'tiny' used instead of pio2_lo. Using 'tiny' requires pio2_hi to be nearest and only works in some rounding modes. - similarly, but with just m_pi_2 = Pi/2 rounded to nearest. Now there is no runtime evaluation, so the result cannot depend on the rounding mode and inexact must be set in some other way. A quick test of most functions in all rounding mode shows that non-default modes work quite well except for the most complicated and/or heavily optimized functions when they are written in C (the totally failing ones are sin/cos/tan/exp*/pow/hypot but not log* (except for log*(1)) or most inverse functions. Optimizations in sin/cos/tan/exp2 require rounding to nearest). My tests weren't non-quick enough to detect any 1-ulp errors for Pi/2, and only showed that the errors mostly don't blow up for inverse functions. In view of this, I'd like to keep doing the Pi/2 intricacies. Partial diffs for catrig.c: % 48a49,50 % > #define pio2_hi m_pi_2 /* works because m_pi_2 rounded down */ % > pio2_lo = 6.1232339957367660e-17, /* 0x3C91A626, 0x33145C07 */ % 384,389c390,407 % 335c341 % < * cacos(z) = PI/2 - z + O(|z|^3) as z -> 0 % --- % > * cacos(z) = PI/2 - z + O(z^3) as z -> 0 This should be PI/2 + O(z) when only the constant term is used, but I restored use of the z term. Start changing O(|n|) to O(n). The absolute value should be implicit. % 384,389c390,407 % < ... % < if (ax < DBL_EPSILON/8 && ay < SQRT_6_EPSILON/8) % < return (cpack(m_pi_2, -y)); % --- % > if (ax < SQRT_EPSILON && ay < SQRT_EPSILON) % > /* % > * This is quite subtle. The expression for PI/2 - x % > * is cloned from e_acos.c, where it is apparently over- % > * designed to work in all rounding modes. It requires % > * pio2_hi to be rounded down even when rounding to % > * nearest would be more accurate. We can't add `tiny' % > * to pio2_hi as usual to raise inexact, since this would % > * break the fussy rounding in some non-default modes. % > * So we use the same method to raise inexact as for the % > * approximation 'z'. e_acos.c uses the even subtler % > * method of depending on inexactness in a higher-degree % > * approximation. That is not practical here, since if % > * we used the x**3 term then we would need an extra % > * case to avoid spurious underflow. % > */ % > if ((int)ax == 0 && (int)ay == 0) % > return (cpack(pio2_hi - (x - pio2_lo), -y)); Despite being too verbose (BTW, don't commit my essays :-), the comment neglects to point out that with the expression written in this form, inexact must be set separately since (x - pio2_lo) might be a value (e.g., 0) that doesn't give inexactness when subtracted. All other returns of Pi/2 are simpler than this, and should return pio2_hi + pio2_lo. The constants should be spelled like this, and not using M_PI or m_pi_2; this is especially important in long double precision since then the constants are declared/defined with this spelling in extern constant tables in invtrig.c to centralize the complications for defining them them for all combinations of ld80/ ld128/i386. So my patch for this is simplest for long double precision -- there it uses invtrig.h and doesn't worry about the known bug that pio2_hi is incorrectly rounded in some cases. With these intricacies, there is less to be gained by setting inexact up front. Adding pio2_lo sequentially is slightly slower than an up-front setting in parallel, but when both are done the up-front setting just adds overhead on average. Some of the optimizations could be done more globably: - an option to not support nonstandard rounding modes for Pi/2. This seems to require pio2_lo to be a static const, unlike in invtrig.*. Make this non-volatile. The compiler will then evaluate pio2_hi + pio2_lo at compile time. - an option to not support careful setting of inexact. The above gives it for Pi/2. Settings of it using (1 + tiny) == 1 would work similarly -- make `tiny' a static nonvolatile const. Bruce
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20120921172402.W945>