From owner-freebsd-numerics@FreeBSD.ORG  Fri Sep 21 08:25:13 2012
Return-Path: <owner-freebsd-numerics@FreeBSD.ORG>
Delivered-To: freebsd-numerics@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 32125106566B
	for <freebsd-numerics@FreeBSD.org>;
	Fri, 21 Sep 2012 08:25:13 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from fallbackmx07.syd.optusnet.com.au
	(fallbackmx07.syd.optusnet.com.au [211.29.132.9])
	by mx1.freebsd.org (Postfix) with ESMTP id A56878FC0C
	for <freebsd-numerics@FreeBSD.org>;
	Fri, 21 Sep 2012 08:25:11 +0000 (UTC)
Received: from mail05.syd.optusnet.com.au (mail05.syd.optusnet.com.au
	[211.29.132.186])
	by fallbackmx07.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
	q8L8P4KH031489
	for <freebsd-numerics@FreeBSD.org>; Fri, 21 Sep 2012 18:25:04 +1000
Received: from c122-106-157-84.carlnfd1.nsw.optusnet.com.au
	(c122-106-157-84.carlnfd1.nsw.optusnet.com.au [122.106.157.84])
	by mail05.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
	q8L8OtMR014157
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Fri, 21 Sep 2012 18:24:56 +1000
Date: Fri, 21 Sep 2012 18:24:55 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Stephen Montgomery-Smith <stephen@missouri.edu>
In-Reply-To: <505BD9B4.8020801@missouri.edu>
Message-ID: <20120921172402.W945@besplex.bde.org>
References: <5017111E.6060003@missouri.edu> <50511B40.3070009@missouri.edu>
	<20120913204808.T1964@besplex.bde.org> <5051F59C.6000603@missouri.edu>
	<20120914014208.I2862@besplex.bde.org> <50526050.2070303@missouri.edu>
	<20120914212403.H1983@besplex.bde.org> <50538E28.6050400@missouri.edu>
	<20120915231032.C2669@besplex.bde.org> <50548E15.3010405@missouri.edu>
	<5054C027.2040008@missouri.edu> <5054C200.7090307@missouri.edu>
	<20120916041132.D6344@besplex.bde.org> <50553424.2080902@missouri.edu>
	<20120916134730.Y957@besplex.bde.org> <5055ECA8.2080008@missouri.edu>
	<20120917022614.R2943@besplex.bde.org> <50562213.9020400@missouri.edu>
	<20120917060116.G3825@besplex.bde.org> <50563C57.60806@missouri.edu>
	<20120918012459.V5094@besplex.bde.org> <5057A932.3000603@missouri.edu>
	<5057F24B.7020605@missouri.edu> <20120918162105.U991@besplex.bde.org>
	<20120918232850.N2144@besplex.bde.org>
	<20120919010613.T2493@besplex.bde.org>
	<505BD9B4.8020801@missouri.edu>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: freebsd-numerics@FreeBSD.org
Subject: Re: Complex arg-trig functions
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
	<freebsd-numerics.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
	<mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
	<mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 21 Sep 2012 08:25:13 -0000

On Thu, 20 Sep 2012, Stephen Montgomery-Smith wrote:

> I also added inexact optimizations for casinh and cacos.

I couldn't get this to give the hoped-for optimization and dropped
it even for catanh.  You may still prefer it because it is simpler.

But later I found intricacies for returning the correct value of
Pi/2 which make the inexact optimizations even less useful:

The real functions are careful to a fault to return Pi/2 correctly
rounded in all rounding modes.  They don't use return a constant Pi/2,
but evaluate Pi/2 at runtime using pio2_hi + pio2_lo, where pio2_hi
is (or should be) Pi/2 rounded _down_ and pio2_lo is an approximation
to the residual and is volatile enough for the addition to be done at
runtime.  The following shortcuts lose this care:

- similarly, but with pio2_hi = Pi/2 rounded up.  Now pio2_hi + pio2_lo
   is 1 ulp too high when the rounding mode is either up or towards
   plus infinity.  Rounding of Pi/2 to nearest may go either way.
   fdlibm code seems to be careful to round it down in all cases.  In
   in FreeBSD libm, at least e_acosf.c is careful to round down when
   the natural rounding is up, but at invtrig.c is not careful -- it
   apparently uses natural rounding, which happens to be up for ld80
   and down for ld128, or vice versa.
- similary, but with pio2_hi rounded to nearest and 'tiny' used instead
   of pio2_lo.  Using 'tiny' requires pio2_hi to be nearest and only
   works in some rounding modes.
- similarly, but with just m_pi_2 = Pi/2 rounded to nearest.  Now there
   is no runtime evaluation, so the result cannot depend on the rounding
   mode and inexact must be set in some other way.

A quick test of most functions in all rounding mode shows that non-default
modes work quite well except for the most complicated and/or heavily
optimized functions when they are written in C (the totally failing ones
are sin/cos/tan/exp*/pow/hypot but not log* (except for log*(1)) or most
inverse functions.  Optimizations in sin/cos/tan/exp2 require rounding
to nearest).  My tests weren't non-quick enough to detect any 1-ulp
errors for Pi/2, and only showed that the errors mostly don't blow up for
inverse functions.

In view of this, I'd like to keep doing the Pi/2 intricacies.
Partial diffs for catrig.c:

% 48a49,50
% > #define	pio2_hi		m_pi_2		/* works because m_pi_2 rounded down */
% > pio2_lo =		6.1232339957367660e-17,	/* 0x3C91A626, 0x33145C07 */
% 384,389c390,407
% 335c341
% <  * cacos(z) = PI/2 - z + O(|z|^3)   as z -> 0
% ---
% >  * cacos(z) = PI/2 - z + O(z^3)   as z -> 0

This should be PI/2 + O(z) when only the constant term is used, but I
restored use of the z term.

Start changing O(|n|) to O(n).  The absolute value should be implicit.

% 384,389c390,407
% < ...
% < 	if (ax < DBL_EPSILON/8 && ay < SQRT_6_EPSILON/8)
% < 		return (cpack(m_pi_2, -y));
% ---
% > 	if (ax < SQRT_EPSILON && ay < SQRT_EPSILON)
% > 		/*
% > 		 * This is quite subtle.  The expression for PI/2 - x
% > 		 * is cloned from e_acos.c, where it is apparently over-
% > 		 * designed to work in all rounding modes.  It requires
% > 		 * pio2_hi to be rounded down even when rounding to
% > 		 * nearest would be more accurate.  We can't add `tiny'
% > 		 * to pio2_hi as usual to raise inexact, since this would
% > 		 * break the fussy rounding in some non-default modes.
% > 		 * So we use the same method to raise inexact as for the
% > 		 * approximation 'z'.  e_acos.c uses the even subtler
% > 		 * method of depending on inexactness in a higher-degree
% > 		 * approximation.  That is not practical here, since if
% > 		 * we used the x**3 term then we would need an extra
% > 		 * case to avoid spurious underflow.
% > 		 */
% > 		if ((int)ax == 0 && (int)ay == 0)
% > 			return (cpack(pio2_hi - (x - pio2_lo), -y));

Despite being too verbose (BTW, don't commit my essays :-), the comment
neglects to point out that with the expression written in this form,
inexact must be set separately since (x - pio2_lo) might be a value
(e.g., 0) that doesn't give inexactness when subtracted.

All other returns of Pi/2 are simpler than this, and should return
pio2_hi + pio2_lo.  The constants should be spelled like this, and
not using M_PI or m_pi_2; this is especially important in long double
precision since then the constants are declared/defined with this
spelling in extern constant tables in invtrig.c to centralize the
complications for defining them them for all combinations of ld80/
ld128/i386.  So my patch for this is simplest for long double
precision -- there it uses invtrig.h and doesn't worry about the
known bug that pio2_hi is incorrectly rounded in some cases.

With these intricacies, there is less to be gained by setting inexact
up front.  Adding pio2_lo sequentially is slightly slower than an
up-front setting in parallel, but when both are done the up-front
setting just adds overhead on average.  Some of the optimizations
could be done more globably:
- an option to not support nonstandard rounding modes for Pi/2.  This
   seems to require pio2_lo to be a static const, unlike in invtrig.*.
   Make this non-volatile.  The compiler will then evaluate
   pio2_hi + pio2_lo at compile time.
- an option to not support careful setting of inexact.  The above
   gives it for Pi/2.  Settings of it using (1 + tiny) == 1 would
   work similarly -- make `tiny' a static nonvolatile const.

Bruce