Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 21 Sep 2012 18:24:55 +1000 (EST)
From:      Bruce Evans <brde@optusnet.com.au>
To:        Stephen Montgomery-Smith <stephen@missouri.edu>
Cc:        freebsd-numerics@FreeBSD.org
Subject:   Re: Complex arg-trig functions
Message-ID:  <20120921172402.W945@besplex.bde.org>
In-Reply-To: <505BD9B4.8020801@missouri.edu>
References:  <5017111E.6060003@missouri.edu> <50511B40.3070009@missouri.edu> <20120913204808.T1964@besplex.bde.org> <5051F59C.6000603@missouri.edu> <20120914014208.I2862@besplex.bde.org> <50526050.2070303@missouri.edu> <20120914212403.H1983@besplex.bde.org> <50538E28.6050400@missouri.edu> <20120915231032.C2669@besplex.bde.org> <50548E15.3010405@missouri.edu> <5054C027.2040008@missouri.edu> <5054C200.7090307@missouri.edu> <20120916041132.D6344@besplex.bde.org> <50553424.2080902@missouri.edu> <20120916134730.Y957@besplex.bde.org> <5055ECA8.2080008@missouri.edu> <20120917022614.R2943@besplex.bde.org> <50562213.9020400@missouri.edu> <20120917060116.G3825@besplex.bde.org> <50563C57.60806@missouri.edu> <20120918012459.V5094@besplex.bde.org> <5057A932.3000603@missouri.edu> <5057F24B.7020605@missouri.edu> <20120918162105.U991@besplex.bde.org> <20120918232850.N2144@besplex.bde.org> <20120919010613.T2493@besplex.bde.org> <505BD9B4.8020801@missouri.edu>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, 20 Sep 2012, Stephen Montgomery-Smith wrote:

> I also added inexact optimizations for casinh and cacos.

I couldn't get this to give the hoped-for optimization and dropped
it even for catanh.  You may still prefer it because it is simpler.

But later I found intricacies for returning the correct value of
Pi/2 which make the inexact optimizations even less useful:

The real functions are careful to a fault to return Pi/2 correctly
rounded in all rounding modes.  They don't use return a constant Pi/2,
but evaluate Pi/2 at runtime using pio2_hi + pio2_lo, where pio2_hi
is (or should be) Pi/2 rounded _down_ and pio2_lo is an approximation
to the residual and is volatile enough for the addition to be done at
runtime.  The following shortcuts lose this care:

- similarly, but with pio2_hi = Pi/2 rounded up.  Now pio2_hi + pio2_lo
   is 1 ulp too high when the rounding mode is either up or towards
   plus infinity.  Rounding of Pi/2 to nearest may go either way.
   fdlibm code seems to be careful to round it down in all cases.  In
   in FreeBSD libm, at least e_acosf.c is careful to round down when
   the natural rounding is up, but at invtrig.c is not careful -- it
   apparently uses natural rounding, which happens to be up for ld80
   and down for ld128, or vice versa.
- similary, but with pio2_hi rounded to nearest and 'tiny' used instead
   of pio2_lo.  Using 'tiny' requires pio2_hi to be nearest and only
   works in some rounding modes.
- similarly, but with just m_pi_2 = Pi/2 rounded to nearest.  Now there
   is no runtime evaluation, so the result cannot depend on the rounding
   mode and inexact must be set in some other way.

A quick test of most functions in all rounding mode shows that non-default
modes work quite well except for the most complicated and/or heavily
optimized functions when they are written in C (the totally failing ones
are sin/cos/tan/exp*/pow/hypot but not log* (except for log*(1)) or most
inverse functions.  Optimizations in sin/cos/tan/exp2 require rounding
to nearest).  My tests weren't non-quick enough to detect any 1-ulp
errors for Pi/2, and only showed that the errors mostly don't blow up for
inverse functions.

In view of this, I'd like to keep doing the Pi/2 intricacies.
Partial diffs for catrig.c:

% 48a49,50
% > #define	pio2_hi		m_pi_2		/* works because m_pi_2 rounded down */
% > pio2_lo =		6.1232339957367660e-17,	/* 0x3C91A626, 0x33145C07 */
% 384,389c390,407
% 335c341
% <  * cacos(z) = PI/2 - z + O(|z|^3)   as z -> 0
% ---
% >  * cacos(z) = PI/2 - z + O(z^3)   as z -> 0

This should be PI/2 + O(z) when only the constant term is used, but I
restored use of the z term.

Start changing O(|n|) to O(n).  The absolute value should be implicit.

% 384,389c390,407
% < ...
% < 	if (ax < DBL_EPSILON/8 && ay < SQRT_6_EPSILON/8)
% < 		return (cpack(m_pi_2, -y));
% ---
% > 	if (ax < SQRT_EPSILON && ay < SQRT_EPSILON)
% > 		/*
% > 		 * This is quite subtle.  The expression for PI/2 - x
% > 		 * is cloned from e_acos.c, where it is apparently over-
% > 		 * designed to work in all rounding modes.  It requires
% > 		 * pio2_hi to be rounded down even when rounding to
% > 		 * nearest would be more accurate.  We can't add `tiny'
% > 		 * to pio2_hi as usual to raise inexact, since this would
% > 		 * break the fussy rounding in some non-default modes.
% > 		 * So we use the same method to raise inexact as for the
% > 		 * approximation 'z'.  e_acos.c uses the even subtler
% > 		 * method of depending on inexactness in a higher-degree
% > 		 * approximation.  That is not practical here, since if
% > 		 * we used the x**3 term then we would need an extra
% > 		 * case to avoid spurious underflow.
% > 		 */
% > 		if ((int)ax == 0 && (int)ay == 0)
% > 			return (cpack(pio2_hi - (x - pio2_lo), -y));

Despite being too verbose (BTW, don't commit my essays :-), the comment
neglects to point out that with the expression written in this form,
inexact must be set separately since (x - pio2_lo) might be a value
(e.g., 0) that doesn't give inexactness when subtracted.

All other returns of Pi/2 are simpler than this, and should return
pio2_hi + pio2_lo.  The constants should be spelled like this, and
not using M_PI or m_pi_2; this is especially important in long double
precision since then the constants are declared/defined with this
spelling in extern constant tables in invtrig.c to centralize the
complications for defining them them for all combinations of ld80/
ld128/i386.  So my patch for this is simplest for long double
precision -- there it uses invtrig.h and doesn't worry about the
known bug that pio2_hi is incorrectly rounded in some cases.

With these intricacies, there is less to be gained by setting inexact
up front.  Adding pio2_lo sequentially is slightly slower than an
up-front setting in parallel, but when both are done the up-front
setting just adds overhead on average.  Some of the optimizations
could be done more globably:
- an option to not support nonstandard rounding modes for Pi/2.  This
   seems to require pio2_lo to be a static const, unlike in invtrig.*.
   Make this non-volatile.  The compiler will then evaluate
   pio2_hi + pio2_lo at compile time.
- an option to not support careful setting of inexact.  The above
   gives it for Pi/2.  Settings of it using (1 + tiny) == 1 would
   work similarly -- make `tiny' a static nonvolatile const.

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20120921172402.W945>