From owner-freebsd-numerics@FreeBSD.ORG  Sun Aug  5 10:06:02 2012
Return-Path: <owner-freebsd-numerics@FreeBSD.ORG>
Delivered-To: freebsd-numerics@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 0F7DF106564A
	for <freebsd-numerics@FreeBSD.org>;
	Sun,  5 Aug 2012 10:06:02 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from mail26.syd.optusnet.com.au (mail26.syd.optusnet.com.au
	[211.29.133.167])
	by mx1.freebsd.org (Postfix) with ESMTP id 682C88FC1D
	for <freebsd-numerics@FreeBSD.org>;
	Sun,  5 Aug 2012 10:05:59 +0000 (UTC)
Received: from c122-106-171-246.carlnfd1.nsw.optusnet.com.au
	(c122-106-171-246.carlnfd1.nsw.optusnet.com.au [122.106.171.246])
	by mail26.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
	q75A5ruA001658
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Sun, 5 Aug 2012 20:05:54 +1000
Date: Sun, 5 Aug 2012 20:05:53 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Stephen Montgomery-Smith <stephen@missouri.edu>
In-Reply-To: <501D9C36.2040207@missouri.edu>
Message-ID: <20120805175106.X3574@besplex.bde.org>
References: <5017111E.6060003@missouri.edu> <501C361D.4010807@missouri.edu>
	<20120804165555.X1231@besplex.bde.org> <501D51D7.1020101@missouri.edu>
	<20120805030609.R3101@besplex.bde.org> <501D9C36.2040207@missouri.edu>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: freebsd-numerics@FreeBSD.org, Bruce Evans <brde@optusnet.com.au>
Subject: Re: Complex arg-trig functions
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
	<freebsd-numerics.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
	<mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
	<mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 05 Aug 2012 10:06:02 -0000

On Sat, 4 Aug 2012, Stephen Montgomery-Smith wrote:

> On 08/04/2012 03:45 PM, Bruce Evans wrote:
>>>
>>>> I made good progress with clog[fl]().  The accuracy problems near 0
>>>> are long solved and proved to be solved.
>
> I am starting to go through your code.
>
> What is the purpose of the macros norm and spadd?

It is to provide easy-to-use maximally efficient primitives for
multi-precision arithmetic.  Full multi-precision is too slow, so old
fdlibm code uses subsets of it tailored to the context.  This is about
as efficient as possible, but every use looks different so it is hard
to read, and context-sensitive tricks are hard to write too.  norm()
and spadd() are the only primitives that I have found generally useful
and efficient.

I used to have a primitive for hi+lo splitting, but found it not general
enough to use.  For splitting, you often have a suitable subset of the
bits in a register and don't want to use a general interface that would
have to copy all the bits from another variable.  Splitting is
inherently inefficient, and my primitive originally focused on putting
MD code in it to optimize it.  I eventually found ways to avoid it or
to optimize the MI code, and the MI code turned out to be quite good
but is still not optimized well by compilers.

Here are my full versions of these interfaces.  Documenting them makes
them unreadable :-).  The following changes are needed:
- make everything type-generic.  Already done for the cut-down versions
   for clog*().  I don't know how to put the debugging code in type-generic
   macros.
- deverbosify the comments.
- due to the compiler bugfeatures whose description forms the main part
   of the comments, type-generic macros won't actually work if the arg
   type is float or double and float differs from float_t or double
   differs from double_t.  These non-type-generic functions tend to
   cause compile-time warnings when there is a mismatch.  How can this
   even be described without incrementing the verbosity instead of
   descreasing it?

% /*-
%  * "Normalize" the terms in the infinite-precision expression (*a + *b)
%  * for the sum of 2 doubles so that *b is as small as possible relative
%  * to *a.  (The resulting *a is the value of the expression in double
%  * precision and the resulting *b is the rounding error.)  |*a| must
%  * be >= |*b|, and exponent overflow or underflow must not occur.  This
%  * uses a Theorem of Dekker.  See Knuth (1981) 4.2.2 Theorem C.
%  *
%  * The doubles must actually be double_t's (spelled as __double_t for
%  * namespace reasons) to avoid the following problems with plain
%  * doubles:
%  *
%  * - If expressions are evaluated in extra precision, then Theorem C
%  *   doesn't apply unless the extra precision is discarded after every
%  *   addition and subtraction.
%  * - C specifies that casts and assignments discard any extra precision.
%  *   However, this is very broken in many compilers.
%  * - If we work around the compiler bugs using STRICT_ASSIGN(), then we
%  *   get not only ugly code but also very inefficient code and
%  *   unnecessarily inaccurate code.  The conversions forced by the
%  *   strict assignments are not just inefficient (since they involve
%  *   extra operations); they are usually very inefficient since they
%  *   give extra dependencies.  Discarding the extra precision that we
%  *   got for "free" then reduces the maximum precision "free"ly
%  *   available in final values.
%  * - If we don't work around the compiler bugs, then we get code that
%  *   usually works and is efficient and accurate as possible (all 3!),
%  *   but this is very fragile.  It depends on the compiler bugs being
%  *   so perfect that extra precision is never lost before it is needed
%  *   and callers not having complicated code or even function calls that
%  *   would force loss of the extra precision.
%  *
%  * Using double_t's throughout works essentially by producing the same
%  * code as the "usually works" fragile case, but intentionally.
%  * Callers must pass us double_t's, and should also use double_t's for
%  * almost all their variables (and thus intermediate values), to get
%  * consistent extra precision and to avoid conversions.  If the
%  * values cannot live throughout in registers, then spilling them will
%  * cause much the same inefficiencies as conversions, but the spills
%  * should now preserve extra precision even in the presence of the
%  * normal compiler bugs in this area, so the final result remains as
%  * accurate as possible.
%  */
% static inline void
% onorm(__double_t *a, __double_t *b)
% {
% 	__double_t w;
% 
% #ifdef DEBUG
% 	volatile __double_t ia, ib, r, vw;
% 
% 	ia = *a;
% 	ib = *b;
% 	assert(ia == 0 || fabs(ia) >= fabs(ib));
% #endif
% 	w = *a + *b;
% 	*b = (*a - w) + *b;
% 	*a = w;
% 
% #ifdef DEBUG
% 	assert((long double)ia + ib == (long double)*a + *b);
% 	vw = ia + ib;
% 	r = ia - vw;
% 	r += ib;
% 	assert(vw == *a && r == *b);
% #endif
% }
% 
% /*
%  * We now use the inline version again.  This macro version has been
%  * reduced to a wrapper that hides the messy detail of taking the addresses
%  * (`a' and b must be lvalues, and the macro obfuscates this but keeps the
%  * API stable).  Perhaps taking the addresses should be put back in the
%  * API.  The results should be identical, but the inline version version
%  * is a cycle or 2 faster in more cases than it is a cycle slower.
%  * Apparently, more assignments help the register allocator.
%  */
% #define	norm(a, b)	onorm(&(a), &(b))
% /* Maybe back to macro version. */
% #define	Norm(a, b) do {		\
% 	__double_t w;		\
% 				\
% 	w = (a) + (b);		\
% 	(b) = ((a) - w) + (b);	\
% 	(a) = w;		\
% } while (0)
% 
% /*
%  * Set x += c, where x is represented in extra precision as a + b.
%  * x must be sufficiently normalized and sufficiently larger than c,
%  * and the result is then sufficiently normalized.
%  *
%  * The details of ordering are that |a| must be >= |c| (so that (a, c)
%  * can be normalized without extra work to swap 'a' with c).  The details of
%  * the normalization are that b must be small relative to the normalized a.
%  * Normalization of (a, c) makes the normalized c tiny relative to the
%  * normalized a, so b remains small relative to 'a' in the result.  However,
%  * b need not ever be tiny relative to a.  For example, b might be about
%  * 2**20 times smaller than 'a' to give about 20 extra bits of precision.
%  * That is usually enough, and adding c (which by normalization is about
%  * 2**53 times smaller than a) cannot change b significantly.  However,
%  * cancellation of 'a' with c in normalization of (a, c) may reduce a
%  * significantly relative to b.  The caller must ensure that significant
%  * cancellation doesn't occur, either by having c of the same sign as 'a',
%  * or by having |c| a few percent smaller than |a|.  Pre-normalization of
%  * (a, b) may help.
%  *
%  * This is is a variant of an algorithm of Kahan (see Knuth (1981) 4.2.2
%  * exercise 19).  We gain considerable efficiency by requiring the terms to
%  * be sufficiently normalized and sufficiently increasing.
%  * 
%  * `sp' means that the added term is in "single precision", i.e., not
%  * in multiple precision.
%  */
% #define	spadd(a, b, c) do {	\
% 	__double_t __tmp;	\
% 				\
% 	__tmp = (c);		\
% 	onorm(&__tmp, &(a));	\
% 	(b) += (a);		\
% 	(a) = __tmp;		\
% } while (0)
% 
% /* norm() for float precision. */
% static inline void
% onormf(__float_t *a, __float_t *b)
% {
% 	__float_t w;
% 
% #ifdef DEBUG
% 	volatile __float_t ia, ib, r, vw;
% 
% 	ia = *a;
% 	ib = *b;
% 	assert(ia == 0 || fabsf(ia) >= fabsf(ib));
% #endif
% 	w = *a + *b;
% 	*b = (*a - w) + *b;
% 	*a = w;
% 
% #ifdef DEBUG
% 	assert((long double)ia + ib == (long double)*a + *b);
% 	vw = ia + ib;
% 	r = ia - vw;
% 	r += ib;
% 	assert(vw == *a && r == *b);
% #endif
% }
% 
% #define	normf(a, b)	onormf(&(a), &(b))
% 
% /* spadd() for float precision. */
% #define	spaddf(a, b, c) do {	\
% 	__float_t __tmp;	\
% 				\
% 	__tmp = (c);		\
% 	onormf(&__tmp, &(a));	\
% 	(b) += (a);		\
% 	(a) = __tmp;		\
% } while (0)
% 
% /*
%  * norm() for long double precision.  This is simpler since there is no
%  * extra precision to worry about (or if there is, then we have larger
%  * problems), but without extra precision the last 2 assertions in the
%  * debugging code are even weaker.
%  */
% static inline void
% onorml(long double *a, long double *b)
% {
% 	long double w;
% 
% #ifdef DEBUG
% 	volatile long double ia, ib, r, vw;
% 
% 	ia = *a;
% 	ib = *b;
% 	assert(ia == 0 || fabsl(ia) >= fabsl(ib));
% #endif
% 	w = *a + *b;
% 	*b = (*a - w) + *b;
% 	*a = w;
% 
% #ifdef DEBUG
% 	assert(ia + ib == *a + *b);
% 	vw = ia + ib;
% 	r = ia - vw;
% 	r += ib;
% 	assert(vw == *a && r == *b);
% #endif
% }
% 
% #define	norml(a, b)	onorml(&(a), &(b))
% 
% /* spadd() for long double precision. */
% #define	spaddl(a, b, c) do {	\
% 	long double __tmp;	\
% 				\
% 	__tmp = (c);		\
% 	onorml(&__tmp, &(a));	\
% 	(b) += (a);		\
% 	(a) = __tmp;		\
% } while (0)

<end of full norm(), etc.>

> Also, the proof you provided (which I haven't gone through in detail yet) - 
> do you intend to put them as comments in the code?

No.  It is far too long.

> For example, when you write:
>
> %	EXTRACT_WORDS(hx, lx, ax);
> %	bits = ((uint64_t)hx << 32) | lx;
> %	bits += 0x04000000;
> %	hx = bits >> 32;
> %	lx = bits & 0xf8000000;
> %	INSERT_WORDS(xh, hx, lx);
> %	xl = ax - xh;
>
> it would be very helpful to include a comment like
>
> %	/*
> 	 * xh is ax rounded to nearest 26 bit floating point.
> 	 * xl is such that x = xh + xl.
> 	 * Because xh is rounded and not truncated to 26 bits,
> 	 * only the top 26 bits (not 27=53-26) of the mantissa of
> 	 * xl can be non-zero.
> 	 */
>
> because while that can be deduced from reading the code, it is by no means 
> obvious.

Again, far too long.  This is much more routine than the above.  Splitting
is done without comment all over old fdlibm code.  Only the point about
the last bit is not routine.

> By the way, that rounding to the nearest 26 bits is a very smart idea.

But it is the natural way.  It is just easier and more efficient (saves 1
integer instruction) to lop off the low bits, so lopping should be used
if it works (I need to try it for float and ld80 precisions where it
should work because MANT_DIG is even).

e_hypot.c has hi+lo decomposition that is not as delicate as the above:

%     /* medium size a and b */
% 	w = a-b;
% 	if (w>b) {
% 	    t1 = 0;
% 	    SET_HIGH_WORD(t1,ha);
% 	    t2 = a-t1;
% 	    w  = sqrt(t1*t1-(b*(-b)-t2*(a+t1)));
% 	} else {
% 	    a  = a+a;
% 	    y1 = 0;
% 	    SET_HIGH_WORD(y1,hb);

This lops off the low 32 bits (y1 = b lopped).  The previous 2 lines
can be written as INSERT_WORDS(y1, hb, 0), but old fdlibm code likes
to write it as above.  Both methods have serious efficiency problems
on modern CPUs unless the compiler optimizes them, since they asks for
partial object accesses (2 * 32-bit words instead of 1 64-bit access).
Old fdlibm code accidentally reduces these problems by asking for a
full initialization of y.

% 	    y2 = b - y1;

Remaining 32 bits of b.

% 	    t1 = 0;
% 	    SET_HIGH_WORD(t1,ha+0x00100000);
% 	    t2 = a - t1;

Similarly for a, except a was doubled just above, so ha is out of date
and we fix it up by adding 1 to its exponent bit-field.  This is a
good example of a context-sensitive splitting.  (However, better code
maintain ha next to a and let the compiler optimize away the change
to ha in paths where the change is not used.)

% 	    w  = sqrt(t1*y1-(w*(-w)-(t1*y2+t2*b)));
% 	}

The arg for sqrt() must be a*a + b*b rearranged for accurcy.  Since
w*w is a 32x32 -> 53 bit operation, this is obviously not a full
doubled precision operation.

I noticed a full doubled precision multiplication most recently in
das's s_fma.c:

% /*
%  * Compute a*b exactly, returning the exact result in a struct dd.  We assume
%  * that both a and b are normalized, so no underflow or overflow will occur.
%  * The current rounding mode must be round-to-nearest.
%  */

I think the requirement on the rounding mode is only so that the splitting
rounds to nearest.

% static inline struct dd
% dd_mul(double a, double b)
% {
% 	static const double split = 0x1p27 + 1.0;
% 	struct dd ret;
% 	double ha, hb, la, lb, p, q;
% 
% 	p = a * split;
% 	ha = a - p;
% 	ha += p;

I prefer splitting by addition.  This method seems to be just slower
(1 extra multiplication and giving 1 extra dependence than for ha =
ha + redux - redux).

% 	la = a - ha;

This and the additive method are fragile, and the above is broken on
i386 with rounding precision FP_PE.  clogl() uses the additive method,
but clog() and clogf() use more direct methods to avoid problems with
extra precision.

The assignments to ha are also missing STRICT_ASSIGN().

Corresponding float precision code would be broken on i386 with the
default rounding precision FP_PD.

We currently assume than the rounding precision on i386 is never changed
from its default of FP_PD, except when calling long double functions it
should always be changed to FP_PE, so that code like above code works.
We are more careful in float functions.  My tests can be configured to
test non-default precisions.  This only shows a few double functions that
don't work with extra precision.  But many float functions don't work in
only float precision, because they have been optimized to depend on
double precision.

% 
% 	p = b * split;
% 	hb = b - p;
% 	hb += p;
% 	lb = b - hb;
% 
% 	p = ha * hb;
% 	q = ha * lb + la * hb;

I don't see why this is always exact.  For squaring, it is exact since
it is 2*ha*la.

% 
% 	ret.hi = p + q;
% 	ret.lo = p - ret.hi + q + la * lb;

I don't see why the final addition is exact.  Except for that,
(ret.hi, ret.lo) is norm(p, q), which is exact by Knuth Theorem C.
Note that Theorem C applies because |p| >= |q|.  norm() in the
above depends on this too.  norm() in clog*() uses a the larger
code justified by a nearby Theorem (B?).  s_fma.c also has a
function dd_add() which uses the larger code.  I copied the
larger code in clog*() from there.  This is an example of why
the smaller norm() is a good primitive -- the larger one does
more so it is less primitive, and things should be organized so
that the larger one is rarely needed.

% 	return (ret);
% }

Doubled precision addition is harder than multiplication, and I
couldn't find a general enough version in s_fma.c to copy.  There
is a related problem with naming all of these functions.  norm()
isn't a very descriptive name, but it is the best that I could
think of.  spadd() is closer to sesquied precision than singled
or doubled (it gives doubled + singled -> doubled).  The above
dd_mul() is also only sesquied and closer to singled than is spadd()
(it gives singled x singled -> doubled).  dd_add() is just the
larger norm(), so it gives singled + singled -> doubled.  s_fma.c
gets by without anything like spadd() mainly because it only returns
a singled result.  After a dp_add() or 2, it adds the 2 components
and then has to do a delicate rounding step, but no addition of
doubled precision results.

There are some more delicate and less routine roundings (not for
splitting) at a critical bit in the middle.  Some of these deserve and
have comments.  See s_cbrt.c (my verbose comment), s_exp2.c (no
comment for redux magic; s_exp2l.c might say more) and e_rem_pio2.c
(redux magic obtained from s_exp2.c).  After seeing a few of these
0x1.MpN redux constants, using them becomes routine except for the
problems with extra precision.

Bruce

From owner-freebsd-numerics@FreeBSD.ORG  Sun Aug  5 18:48:56 2012
Return-Path: <owner-freebsd-numerics@FreeBSD.ORG>
Delivered-To: freebsd-numerics@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id CAE281065670
	for <freebsd-numerics@freebsd.org>;
	Sun,  5 Aug 2012 18:48:56 +0000 (UTC)
	(envelope-from stephen@missouri.edu)
Received: from wilberforce.math.missouri.edu (wilberforce.math.missouri.edu
	[128.206.184.213])
	by mx1.freebsd.org (Postfix) with ESMTP id 90DE78FC12
	for <freebsd-numerics@freebsd.org>;
	Sun,  5 Aug 2012 18:48:56 +0000 (UTC)
Received: from [127.0.0.1] (wilberforce.math.missouri.edu [128.206.184.213])
	by wilberforce.math.missouri.edu (8.14.5/8.14.5) with ESMTP id
	q75ImrIv006552; Sun, 5 Aug 2012 13:48:54 -0500 (CDT)
	(envelope-from stephen@missouri.edu)
Message-ID: <501EC015.3000808@missouri.edu>
Date: Sun, 05 Aug 2012 13:48:53 -0500
From: Stephen Montgomery-Smith <stephen@missouri.edu>
User-Agent: Mozilla/5.0 (X11; Linux i686;
	rv:14.0) Gecko/20120714 Thunderbird/14.0
MIME-Version: 1.0
To: Bruce Evans <brde@optusnet.com.au>
References: <5017111E.6060003@missouri.edu> <501C361D.4010807@missouri.edu>
	<20120804165555.X1231@besplex.bde.org>
	<501D51D7.1020101@missouri.edu>
	<20120805030609.R3101@besplex.bde.org>
	<501D9C36.2040207@missouri.edu>
	<20120805175106.X3574@besplex.bde.org>
In-Reply-To: <20120805175106.X3574@besplex.bde.org>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: freebsd-numerics@freebsd.org
Subject: Re: Complex arg-trig functions
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
	<freebsd-numerics.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
	<mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
	<mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 05 Aug 2012 18:48:56 -0000

On 08/05/2012 05:05 AM, Bruce Evans wrote:
> On Sat, 4 Aug 2012, Stephen Montgomery-Smith wrote:

>>
>> What is the purpose of the macros norm and spadd?
>
> [Answer snipped for brevity]

Thank you.  Your answers are very helpful.  I had no idea what norm and 
spadd did, and now just even glancing through your answers tells me 
exactly what they do (even if I haven't yet fully understood why they work).

If you don't want lengthy comments in the code, perhaps you could simply 
include the following link in a comment:
http://lists.freebsd.org/pipermail/freebsd-numerics/2012-August/000020.html

Let me say that in my proposed code for casinh, etc, that I do have 
essay length comments.
http://people.freebsd.org/~stephen/catrig.c

I plan to respect your style (keep out very lengthy comments), but I 
hope whomever commits my code will respect my style and keep the lengthy 
comments.


From owner-freebsd-numerics@FreeBSD.ORG  Sun Aug  5 19:19:55 2012
Return-Path: <owner-freebsd-numerics@FreeBSD.ORG>
Delivered-To: freebsd-numerics@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 770D8106566B
	for <freebsd-numerics@freebsd.org>;
	Sun,  5 Aug 2012 19:19:55 +0000 (UTC)
	(envelope-from sgk@troutmask.apl.washington.edu)
Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu
	[128.95.76.21]) by mx1.freebsd.org (Postfix) with ESMTP id 505A58FC14
	for <freebsd-numerics@freebsd.org>;
	Sun,  5 Aug 2012 19:19:55 +0000 (UTC)
Received: from troutmask.apl.washington.edu (localhost.apl.washington.edu
	[127.0.0.1])
	by troutmask.apl.washington.edu (8.14.5/8.14.5) with ESMTP id
	q75JJsPd050391; Sun, 5 Aug 2012 12:19:54 -0700 (PDT)
	(envelope-from sgk@troutmask.apl.washington.edu)
Received: (from sgk@localhost)
	by troutmask.apl.washington.edu (8.14.5/8.14.5/Submit) id
	q75JJsVv050390; Sun, 5 Aug 2012 12:19:54 -0700 (PDT)
	(envelope-from sgk)
Date: Sun, 5 Aug 2012 12:19:54 -0700
From: Steve Kargl <sgk@troutmask.apl.washington.edu>
To: Stephen Montgomery-Smith <stephen@missouri.edu>
Message-ID: <20120805191954.GA50379@troutmask.apl.washington.edu>
References: <5017111E.6060003@missouri.edu> <501C361D.4010807@missouri.edu>
	<20120804165555.X1231@besplex.bde.org>
	<501D51D7.1020101@missouri.edu>
	<20120805030609.R3101@besplex.bde.org>
	<501D9C36.2040207@missouri.edu>
	<20120805175106.X3574@besplex.bde.org>
	<501EC015.3000808@missouri.edu>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <501EC015.3000808@missouri.edu>
User-Agent: Mutt/1.4.2.3i
Cc: freebsd-numerics@freebsd.org, Bruce Evans <brde@optusnet.com.au>
Subject: Re: Complex arg-trig functions
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
	<freebsd-numerics.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
	<mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
	<mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 05 Aug 2012 19:19:55 -0000

On Sun, Aug 05, 2012 at 01:48:53PM -0500, Stephen Montgomery-Smith wrote:
> On 08/05/2012 05:05 AM, Bruce Evans wrote:
> >On Sat, 4 Aug 2012, Stephen Montgomery-Smith wrote:
> 
> >>
> >>What is the purpose of the macros norm and spadd?
> >
> >[Answer snipped for brevity]
> 
> Thank you.  Your answers are very helpful.  I had no idea what norm and 
> spadd did, and now just even glancing through your answers tells me 
> exactly what they do (even if I haven't yet fully understood why they work).
> 
> If you don't want lengthy comments in the code, perhaps you could simply 
> include the following link in a comment:
> http://lists.freebsd.org/pipermail/freebsd-numerics/2012-August/000020.html
> 
> Let me say that in my proposed code for casinh, etc, that I do have 
> essay length comments.
> http://people.freebsd.org/~stephen/catrig.c
> 
> I plan to respect your style (keep out very lengthy comments), but I 
> hope whomever commits my code will respect my style and keep the lengthy 
> comments.

Essay long comments probably belong in the man page if the essay is
important to the details of implementation.  Bruce may disagree with
me.

-- 
Steve

From owner-freebsd-numerics@FreeBSD.ORG  Sun Aug  5 19:40:57 2012
Return-Path: <owner-freebsd-numerics@FreeBSD.ORG>
Delivered-To: freebsd-numerics@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id E7ED4106566B
	for <freebsd-numerics@freebsd.org>;
	Sun,  5 Aug 2012 19:40:57 +0000 (UTC)
	(envelope-from stephen@missouri.edu)
Received: from wilberforce.math.missouri.edu (wilberforce.math.missouri.edu
	[128.206.184.213])
	by mx1.freebsd.org (Postfix) with ESMTP id 90AEF8FC1A
	for <freebsd-numerics@freebsd.org>;
	Sun,  5 Aug 2012 19:40:57 +0000 (UTC)
Received: from [127.0.0.1] (wilberforce.math.missouri.edu [128.206.184.213])
	by wilberforce.math.missouri.edu (8.14.5/8.14.5) with ESMTP id
	q75JeuRj009888; Sun, 5 Aug 2012 14:40:56 -0500 (CDT)
	(envelope-from stephen@missouri.edu)
Message-ID: <501ECC48.40001@missouri.edu>
Date: Sun, 05 Aug 2012 14:40:56 -0500
From: Stephen Montgomery-Smith <stephen@missouri.edu>
User-Agent: Mozilla/5.0 (X11; Linux i686;
	rv:14.0) Gecko/20120714 Thunderbird/14.0
MIME-Version: 1.0
To: Steve Kargl <sgk@troutmask.apl.washington.edu>
References: <5017111E.6060003@missouri.edu> <501C361D.4010807@missouri.edu>
	<20120804165555.X1231@besplex.bde.org>
	<501D51D7.1020101@missouri.edu>
	<20120805030609.R3101@besplex.bde.org>
	<501D9C36.2040207@missouri.edu>
	<20120805175106.X3574@besplex.bde.org>
	<501EC015.3000808@missouri.edu>
	<20120805191954.GA50379@troutmask.apl.washington.edu>
In-Reply-To: <20120805191954.GA50379@troutmask.apl.washington.edu>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: freebsd-numerics@freebsd.org, Bruce Evans <brde@optusnet.com.au>
Subject: Re: Complex arg-trig functions
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
	<freebsd-numerics.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
	<mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
	<mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 05 Aug 2012 19:40:58 -0000

On 08/05/2012 02:19 PM, Steve Kargl wrote:
> On Sun, Aug 05, 2012 at 01:48:53PM -0500, Stephen Montgomery-Smith wrote:

>> Let me say that in my proposed code for casinh, etc, that I do have
>> essay length comments.
>> http://people.freebsd.org/~stephen/catrig.c
>
> Essay long comments probably belong in the man page if the essay is
> important to the details of implementation.  Bruce may disagree with
> me.

Except that the comments cannot really be understood without the code, 
and vice versa.

For example, there are places where I compute things like X+Y, and 
computing Y would cause an underflow, but I happen to know that Y is so 
much smaller than X that I only need to compute X.  But the reason why I 
can be sure that Y is so much smaller than X is quite subtle - you have 
to read the code that generates X and the parts that make up Y to see 
why this is true.

Look in my code for the phrase "This cannot happen" and look at the 
comment this phrase appears in, and the comment below it.  It simply 
doesn't belong in a man page.  But the arguments are of the nature of a 
mathematical proof, and if the arguments aren't presented somewhere, in 
my opinion the code would be incomprehensible.

You could argue that the reader of the code could simply read the paper 
(*) I got the algorithm from.  But I made significant deviations from 
the algorithm, not least because that paper used exception handling to 
handle underflows and overflows, whereas I detect them before they 
happen.  Also, I defined the function, which I called
f(a,b) = 0.5 * (hypot(a,b) - b) = 0.5*a*a/(hypot(a,b) + b)
which was implicitly used several times in that paper, but never 
presented as a single concept.

In my opinion the algorithm is not obvious, but it is also necessary, 
because numeric instability can occur in rather awkward ways.  It isn't 
immediately obvious how to work around them.  I thought the ideas in the 
paper were brilliant.

I feel very strongly that my essay long comments should stay inside the 
code.  I think I do a good job explaining a rather subtle algorithm, and 
I would hope that a reader could completely understand the algorithm by 
reading my code + comments.

(*) "Implementing the complex arcsine and arccosine functions using 
exception handling" by T. E. Hull, Thomas F. Fairgrieve, and Ping Tak 
Peter Tang, published in ACM Transactions on Mathematical Software, 
Volume 23 Issue 3, 1997, Pages 299-335, 
http://dl.acm.org/citation.cfm?id=275324

From owner-freebsd-numerics@FreeBSD.ORG  Tue Aug  7 20:57:43 2012
Return-Path: <owner-freebsd-numerics@FreeBSD.ORG>
Delivered-To: freebsd-numerics@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 5FB581065670
	for <freebsd-numerics@freebsd.org>;
	Tue,  7 Aug 2012 20:57:43 +0000 (UTC)
	(envelope-from peter@rulingia.com)
Received: from vps.rulingia.com (host-122-100-2-194.octopus.com.au
	[122.100.2.194])
	by mx1.freebsd.org (Postfix) with ESMTP id C587C8FC17
	for <freebsd-numerics@freebsd.org>;
	Tue,  7 Aug 2012 20:57:41 +0000 (UTC)
Received: from server.rulingia.com (c220-239-247-45.belrs5.nsw.optusnet.com.au
	[220.239.247.45])
	by vps.rulingia.com (8.14.5/8.14.5) with ESMTP id q77KveRu048932
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK)
	for <freebsd-numerics@freebsd.org>; Wed, 8 Aug 2012 06:57:40 +1000 (EST)
	(envelope-from peter@rulingia.com)
X-Bogosity: Ham, spamicity=0.000000
Received: from server.rulingia.com (localhost.rulingia.com [127.0.0.1])
	by server.rulingia.com (8.14.5/8.14.5) with ESMTP id q77KvPDr010806
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO)
	for <freebsd-numerics@freebsd.org>; Wed, 8 Aug 2012 06:57:26 +1000 (EST)
	(envelope-from peter@server.rulingia.com)
Received: (from peter@localhost)
	by server.rulingia.com (8.14.5/8.14.5/Submit) id q77KvPA6010805
	for freebsd-numerics@freebsd.org; Wed, 8 Aug 2012 06:57:25 +1000 (EST)
	(envelope-from peter)
Date: Wed, 8 Aug 2012 06:57:25 +1000
From: Peter Jeremy <peter@rulingia.com>
To: freebsd-numerics@freebsd.org
Message-ID: <20120807205725.GA10572@server.rulingia.com>
References: <5017111E.6060003@missouri.edu> <501C361D.4010807@missouri.edu>
	<20120804165555.X1231@besplex.bde.org>
	<501D51D7.1020101@missouri.edu>
	<20120805030609.R3101@besplex.bde.org>
	<501D9C36.2040207@missouri.edu>
	<20120805175106.X3574@besplex.bde.org>
	<501EC015.3000808@missouri.edu>
	<20120805191954.GA50379@troutmask.apl.washington.edu>
MIME-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature"; boundary="qMm9M+Fa2AknHoGS"
Content-Disposition: inline
In-Reply-To: <20120805191954.GA50379@troutmask.apl.washington.edu>
X-PGP-Key: http://www.rulingia.com/keys/peter.pgp
User-Agent: Mutt/1.5.21 (2010-09-15)
Subject: Re: Complex arg-trig functions
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
	<freebsd-numerics.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
	<mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
	<mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 07 Aug 2012 20:57:43 -0000


--qMm9M+Fa2AknHoGS
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On 2012-Aug-05 12:19:54 -0700, Steve Kargl <sgk@troutmask.apl.washington.ed=
u> wrote:
>On Sun, Aug 05, 2012 at 01:48:53PM -0500, Stephen Montgomery-Smith wrote:
>> I plan to respect your style (keep out very lengthy comments), but I=20
>> hope whomever commits my code will respect my style and keep the lengthy=
=20
>> comments.
>
>Essay long comments probably belong in the man page if the essay is
>important to the details of implementation.  Bruce may disagree with
>me.

I also believe that implementation details belong in the code and not
the man page.  The man page should contain information necessary to
use the function: Domain, range, behaviour with exceptional inputs,
accuracy and similar.  Any "unexpected" behaviours as a side-effect of
the implementation can be mentioned under BUGS.  The user shouldn't
need to know that it's implemented by range-reduction to [0 .. e/7]
and then evaluated with an 11th order truncated Chebyshev polynomial
that's had coefficients tweaked by a demon I summoned one evening.

OTOH, the code itself should contain comments explaining what it is
doing - particularly where it is relying on details of the floating
point implementation or the difference between an operation on real
(or complex) numbers and the same operation on float/double/...
numbers.  It's also useful to document why alternative (particularly
"obviously better" alternative) approaches won't work.

--=20
Peter Jeremy

--qMm9M+Fa2AknHoGS
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (FreeBSD)

iEYEARECAAYFAlAhgTUACgkQ/opHv/APuIfeuQCfasUDBNGffA4ztejWbAEPLBCc
WeAAniSA2z7iZZFlr47CYHlYonMtEvBk
=4suN
-----END PGP SIGNATURE-----

--qMm9M+Fa2AknHoGS--

From owner-freebsd-numerics@FreeBSD.ORG  Tue Aug  7 21:19:21 2012
Return-Path: <owner-freebsd-numerics@FreeBSD.ORG>
Delivered-To: freebsd-numerics@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 925A6106564A
	for <freebsd-numerics@freebsd.org>;
	Tue,  7 Aug 2012 21:19:21 +0000 (UTC)
	(envelope-from sgk@troutmask.apl.washington.edu)
Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu
	[128.95.76.21]) by mx1.freebsd.org (Postfix) with ESMTP id 4C7D28FC17
	for <freebsd-numerics@freebsd.org>;
	Tue,  7 Aug 2012 21:19:21 +0000 (UTC)
Received: from troutmask.apl.washington.edu (localhost.apl.washington.edu
	[127.0.0.1])
	by troutmask.apl.washington.edu (8.14.5/8.14.5) with ESMTP id
	q77LJExI061807; Tue, 7 Aug 2012 14:19:14 -0700 (PDT)
	(envelope-from sgk@troutmask.apl.washington.edu)
Received: (from sgk@localhost)
	by troutmask.apl.washington.edu (8.14.5/8.14.5/Submit) id
	q77LJEg0061806; Tue, 7 Aug 2012 14:19:14 -0700 (PDT)
	(envelope-from sgk)
Date: Tue, 7 Aug 2012 14:19:14 -0700
From: Steve Kargl <sgk@troutmask.apl.washington.edu>
To: Peter Jeremy <peter@rulingia.com>
Message-ID: <20120807211914.GA58818@troutmask.apl.washington.edu>
References: <5017111E.6060003@missouri.edu> <501C361D.4010807@missouri.edu>
	<20120804165555.X1231@besplex.bde.org>
	<501D51D7.1020101@missouri.edu>
	<20120805030609.R3101@besplex.bde.org>
	<501D9C36.2040207@missouri.edu>
	<20120805175106.X3574@besplex.bde.org>
	<501EC015.3000808@missouri.edu>
	<20120805191954.GA50379@troutmask.apl.washington.edu>
	<20120807205725.GA10572@server.rulingia.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20120807205725.GA10572@server.rulingia.com>
User-Agent: Mutt/1.4.2.3i
Cc: freebsd-numerics@freebsd.org
Subject: Re: Complex arg-trig functions
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
	<freebsd-numerics.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
	<mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
	<mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 07 Aug 2012 21:19:21 -0000

On Wed, Aug 08, 2012 at 06:57:25AM +1000, Peter Jeremy wrote:
> On 2012-Aug-05 12:19:54 -0700, Steve Kargl <sgk@troutmask.apl.washington.edu> wrote:
> >On Sun, Aug 05, 2012 at 01:48:53PM -0500, Stephen Montgomery-Smith wrote:
> >> I plan to respect your style (keep out very lengthy comments), but I 
> >> hope whomever commits my code will respect my style and keep the lengthy 
> >> comments.
> >
> >Essay long comments probably belong in the man page if the essay is
> >important to the details of implementation.  Bruce may disagree with
> >me.
> 
> I also believe that implementation details belong in the code and not
> the man page.  The man page should contain information necessary to
> use the function: Domain, range, behaviour with exceptional inputs,
> accuracy and similar.  Any "unexpected" behaviours as a side-effect of
> the implementation can be mentioned under BUGS.  The user shouldn't
> need to know that it's implemented by range-reduction to [0 .. e/7]
> and then evaluated with an 11th order truncated Chebyshev polynomial
> that's had coefficients tweaked by a demon I summoned one evening.
> 
> OTOH, the code itself should contain comments explaining what it is
> doing - particularly where it is relying on details of the floating
> point implementation or the difference between an operation on real
> (or complex) numbers and the same operation on float/double/...
> numbers.  It's also useful to document why alternative (particularly
> "obviously better" alternative) approaches won't work.

There can be 2 man pages.  One that provides the user with
the usual information you outline above (e.g., casin.3),
and one that provides implementation details (e.g. casin.7).

OTOH, I'm just happy someone else is taking up the task(s)
of implementing missing C99 libm functionality.  

-- 
Steve

From owner-freebsd-numerics@FreeBSD.ORG  Tue Aug  7 23:45:12 2012
Return-Path: <owner-freebsd-numerics@FreeBSD.ORG>
Delivered-To: freebsd-numerics@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id A8D481065672
	for <freebsd-numerics@freebsd.org>;
	Tue,  7 Aug 2012 23:45:12 +0000 (UTC)
	(envelope-from stephen@missouri.edu)
Received: from wilberforce.math.missouri.edu (wilberforce.math.missouri.edu
	[128.206.184.213])
	by mx1.freebsd.org (Postfix) with ESMTP id 6C5FB8FC15
	for <freebsd-numerics@freebsd.org>;
	Tue,  7 Aug 2012 23:45:12 +0000 (UTC)
Received: from [127.0.0.1] (wilberforce.math.missouri.edu [128.206.184.213])
	by wilberforce.math.missouri.edu (8.14.5/8.14.5) with ESMTP id
	q77NMuKQ016617
	for <freebsd-numerics@freebsd.org>; Tue, 7 Aug 2012 18:22:57 -0500 (CDT)
	(envelope-from stephen@missouri.edu)
Message-ID: <5021A350.40608@missouri.edu>
Date: Tue, 07 Aug 2012 18:22:56 -0500
From: Stephen Montgomery-Smith <stephen@missouri.edu>
User-Agent: Mozilla/5.0 (X11; Linux i686;
	rv:14.0) Gecko/20120714 Thunderbird/14.0
MIME-Version: 1.0
To: freebsd-numerics@freebsd.org
References: <5017111E.6060003@missouri.edu> <501C361D.4010807@missouri.edu>
	<20120804165555.X1231@besplex.bde.org>
	<501D51D7.1020101@missouri.edu>
	<20120805030609.R3101@besplex.bde.org>
	<501D9C36.2040207@missouri.edu>
	<20120805175106.X3574@besplex.bde.org>
	<501EC015.3000808@missouri.edu>
	<20120805191954.GA50379@troutmask.apl.washington.edu>
	<20120807205725.GA10572@server.rulingia.com>
In-Reply-To: <20120807205725.GA10572@server.rulingia.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: Complex arg-trig functions
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
	<freebsd-numerics.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
	<mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
	<mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 07 Aug 2012 23:45:12 -0000

On 08/07/2012 03:57 PM, Peter Jeremy wrote:
> On 2012-Aug-05 12:19:54 -0700, Steve Kargl <sgk@troutmask.apl.washington.edu> wrote:
>> On Sun, Aug 05, 2012 at 01:48:53PM -0500, Stephen Montgomery-Smith wrote:
>>> I plan to respect your style (keep out very lengthy comments), but I
>>> hope whomever commits my code will respect my style and keep the lengthy
>>> comments.
>>
>> Essay long comments probably belong in the man page if the essay is
>> important to the details of implementation.  Bruce may disagree with
>> me.
>
> I also believe that implementation details belong in the code and not
> the man page.  The man page should contain information necessary to
> use the function: Domain, range, behaviour with exceptional inputs,
> accuracy and similar.  Any "unexpected" behaviours as a side-effect of
> the implementation can be mentioned under BUGS.  The user shouldn't
> need to know that it's implemented by range-reduction to [0 .. e/7]
> and then evaluated with an 11th order truncated Chebyshev polynomial
> that's had coefficients tweaked by a demon I summoned one evening.
>
> OTOH, the code itself should contain comments explaining what it is
> doing - particularly where it is relying on details of the floating
> point implementation or the difference between an operation on real
> (or complex) numbers and the same operation on float/double/...
> numbers.  It's also useful to document why alternative (particularly
> "obviously better" alternative) approaches won't work.
>

I completely agree.  I followed this philosophy in my code.


From owner-freebsd-numerics@FreeBSD.ORG  Wed Aug  8 00:04:13 2012
Return-Path: <owner-freebsd-numerics@FreeBSD.ORG>
Delivered-To: freebsd-numerics@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 54164106566C
	for <freebsd-numerics@freebsd.org>;
	Wed,  8 Aug 2012 00:04:13 +0000 (UTC)
	(envelope-from peter@rulingia.com)
Received: from vps.rulingia.com (host-122-100-2-194.octopus.com.au
	[122.100.2.194])
	by mx1.freebsd.org (Postfix) with ESMTP id DA0088FC0A
	for <freebsd-numerics@freebsd.org>;
	Wed,  8 Aug 2012 00:04:12 +0000 (UTC)
Received: from server.rulingia.com (c220-239-247-45.belrs5.nsw.optusnet.com.au
	[220.239.247.45])
	by vps.rulingia.com (8.14.5/8.14.5) with ESMTP id q78045DW049406
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK)
	for <freebsd-numerics@freebsd.org>; Wed, 8 Aug 2012 10:04:06 +1000 (EST)
	(envelope-from peter@rulingia.com)
X-Bogosity: Ham, spamicity=0.000000
Received: from server.rulingia.com (localhost.rulingia.com [127.0.0.1])
	by server.rulingia.com (8.14.5/8.14.5) with ESMTP id q7803v7n034688
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO)
	for <freebsd-numerics@freebsd.org>; Wed, 8 Aug 2012 10:03:57 +1000 (EST)
	(envelope-from peter@server.rulingia.com)
Received: (from peter@localhost)
	by server.rulingia.com (8.14.5/8.14.5/Submit) id q7803vGk034686
	for freebsd-numerics@freebsd.org; Wed, 8 Aug 2012 10:03:57 +1000 (EST)
	(envelope-from peter)
Date: Wed, 8 Aug 2012 10:03:57 +1000
From: Peter Jeremy <peter@rulingia.com>
To: freebsd-numerics@freebsd.org
Message-ID: <20120808000357.GA11128@server.rulingia.com>
MIME-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature"; boundary="0OAP2g/MAC+5xKAE"
Content-Disposition: inline
X-PGP-Key: http://www.rulingia.com/keys/peter.pgp
User-Agent: Mutt/1.5.21 (2010-09-15)
Subject: cpow(3) implementations.
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
	<freebsd-numerics.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
	<mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
	<mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 08 Aug 2012 00:04:13 -0000


--0OAP2g/MAC+5xKAE
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

The C99 standard (at least WG14/N1256) is quite liberal as regards
cpow() and specifically allows a conforming implementation to just do:
  cpow(z, w) =3D cexp(w * clog(z))

The downside of this approach is that log() inherently loses precision
by pushing (most of) the exponent bits into the fraction, displacing
original fraction bits.  I've therefore been looking at how to
implement cpow(3) with a precision similar to pow(3).  The following
are some thoughts, together with questions.

In the following:
  w =3D a + I*b
  z =3D c + I*d
  cis(r) =3D cos(r) + I*sin(r)
  t =3D u + I*v =3D clog(c + I*d)
              =3D log(hypot(c, d)) + I*atan2(d, c)

  cpow(z, w) =3D cexp(w * clog(z))
             =3D cpow(c + I*d, a + I*b)
             =3D cexp((a + I*b) * clog(c + I*d))
             =3D cexp((a + I*b) * (u + I*v))
             =3D cexp((a*u - b*v) + I*(a*v + b*u))
             =3D exp(a*u - b*v) * cis(a*v + b*u)

Unfortunately, either or both of (a*u - b*v) and (a*v + b*u) are
potentially subject to catastrophic cancellation.  However:
  exp(a*u - b*v) =3D exp(a*u) / exp(b*v)
                 =3D pow(a, hypot(c, d)) / exp(b*v)
Ignoring overflow/underflow issues, does this avoid the cancellation
issues associated with the subtraction?  (I realise there are still
issues when a =E2=89=A4 0 due to the pow(3) definition).

Since cis() is periodic, it's also possible to range-reduce each term
of (a*v + b*u) independently - but that just fiddles with the issue
since it just adds another layer of precision loss.  Can anyone
suggest an alternative approach?

--=20
Peter Jeremy

--0OAP2g/MAC+5xKAE
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (FreeBSD)

iEYEARECAAYFAlAhrO0ACgkQ/opHv/APuIc18gCdEa+5IIDC+yucx8uyX/69ZjUk
CsgAn25wsZLxw32By+J2sRTpbOguZoEj
=+T1K
-----END PGP SIGNATURE-----

--0OAP2g/MAC+5xKAE--

From owner-freebsd-numerics@FreeBSD.ORG  Wed Aug  8 01:35:54 2012
Return-Path: <owner-freebsd-numerics@FreeBSD.ORG>
Delivered-To: freebsd-numerics@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id D17EE106564A
	for <freebsd-numerics@freebsd.org>;
	Wed,  8 Aug 2012 01:35:54 +0000 (UTC)
	(envelope-from sgk@troutmask.apl.washington.edu)
Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu
	[128.95.76.21]) by mx1.freebsd.org (Postfix) with ESMTP id A7B658FC0A
	for <freebsd-numerics@freebsd.org>;
	Wed,  8 Aug 2012 01:35:54 +0000 (UTC)
Received: from troutmask.apl.washington.edu (localhost.apl.washington.edu
	[127.0.0.1])
	by troutmask.apl.washington.edu (8.14.5/8.14.5) with ESMTP id
	q781Z0Pf019395; Tue, 7 Aug 2012 18:35:00 -0700 (PDT)
	(envelope-from sgk@troutmask.apl.washington.edu)
Received: (from sgk@localhost)
	by troutmask.apl.washington.edu (8.14.5/8.14.5/Submit) id
	q781Yx77019394; Tue, 7 Aug 2012 18:34:59 -0700 (PDT)
	(envelope-from sgk)
Date: Tue, 7 Aug 2012 18:34:59 -0700
From: Steve Kargl <sgk@troutmask.apl.washington.edu>
To: Peter Jeremy <peter@rulingia.com>
Message-ID: <20120808013459.GA19280@troutmask.apl.washington.edu>
References: <20120808000357.GA11128@server.rulingia.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20120808000357.GA11128@server.rulingia.com>
User-Agent: Mutt/1.4.2.3i
Cc: freebsd-numerics@freebsd.org
Subject: Re: cpow(3) implementations.
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
	<freebsd-numerics.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
	<mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
	<mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 08 Aug 2012 01:35:54 -0000

On Wed, Aug 08, 2012 at 10:03:57AM +1000, Peter Jeremy wrote:
> The C99 standard (at least WG14/N1256) is quite liberal as regards
> cpow() and specifically allows a conforming implementation to just do:
>   cpow(z, w) = cexp(w * clog(z))
> 
> The downside of this approach is that log() inherently loses precision
> by pushing (most of) the exponent bits into the fraction, displacing
> original fraction bits.  I've therefore been looking at how to
> implement cpow(3) with a precision similar to pow(3).  The following
> are some thoughts, together with questions.
> 
> In the following:
>   w = a + I*b
>   z = c + I*d
>   cis(r) = cos(r) + I*sin(r)
>   t = u + I*v = clog(c + I*d)
>               = log(hypot(c, d)) + I*atan2(d, c)
> 
>   cpow(z, w) = cexp(w * clog(z))
>              = cpow(c + I*d, a + I*b)
>              = cexp((a + I*b) * clog(c + I*d))
>              = cexp((a + I*b) * (u + I*v))
>              = cexp((a*u - b*v) + I*(a*v + b*u))
>              = exp(a*u - b*v) * cis(a*v + b*u)
> 
> Unfortunately, either or both of (a*u - b*v) and (a*v + b*u) are
> potentially subject to catastrophic cancellation.  However:
>   exp(a*u - b*v) = exp(a*u) / exp(b*v)
>                  = pow(a, hypot(c, d)) / exp(b*v)
> Ignoring overflow/underflow issues, does this avoid the cancellation
> issues associated with the subtraction?  (I realise there are still
> issues when a ??? 0 due to the pow(3) definition).
> 
> Since cis() is periodic, it's also possible to range-reduce each term
> of (a*v + b*u) independently - but that just fiddles with the issue
> since it just adds another layer of precision loss.  Can anyone
> suggest an alternative approach?

I cheated and peaked at NetBSD's implementation, which is
based on Moshier's cephes implementation.  Your approach
not only has to worry about catastrophic cancellation, but
may need to switch between exp() and expm1() and log() and
log1p() to achieve accuracy near the problematic areas for
these functions.  Anyway, Moshier converts z in polar coordinates
and use pow(), exp(), sin(), and cos().

psuedo-code:

x = creal(z)
y = cimag(z)
r = hypot(x,y)
t = carg(z)

if y == 0
   if x == 0
      return 0
   else
      return pow(r,x)*(cos(t*x)+I*sin(t*x))
else
   return pow(r,x)*exp(-t*y)*(cos(t*x+y)+I*sin(t*x+y))

I don't if this is any better.  

-- 
Steve

From owner-freebsd-numerics@FreeBSD.ORG  Wed Aug  8 02:58:41 2012
Return-Path: <owner-freebsd-numerics@FreeBSD.ORG>
Delivered-To: freebsd-numerics@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 7C675106566C
	for <freebsd-numerics@freebsd.org>;
	Wed,  8 Aug 2012 02:58:41 +0000 (UTC)
	(envelope-from stephen@missouri.edu)
Received: from wilberforce.math.missouri.edu (wilberforce.math.missouri.edu
	[128.206.184.213])
	by mx1.freebsd.org (Postfix) with ESMTP id 2A6288FC0A
	for <freebsd-numerics@freebsd.org>;
	Wed,  8 Aug 2012 02:58:41 +0000 (UTC)
Received: from [127.0.0.1] (wilberforce.math.missouri.edu [128.206.184.213])
	by wilberforce.math.missouri.edu (8.14.5/8.14.5) with ESMTP id
	q782wdOY030787
	for <freebsd-numerics@freebsd.org>; Tue, 7 Aug 2012 21:58:40 -0500 (CDT)
	(envelope-from stephen@missouri.edu)
Message-ID: <5021D5DF.7010709@missouri.edu>
Date: Tue, 07 Aug 2012 21:58:39 -0500
From: Stephen Montgomery-Smith <stephen@missouri.edu>
User-Agent: Mozilla/5.0 (X11; Linux i686;
	rv:14.0) Gecko/20120714 Thunderbird/14.0
MIME-Version: 1.0
To: freebsd-numerics@freebsd.org
References: <20120808000357.GA11128@server.rulingia.com>
In-Reply-To: <20120808000357.GA11128@server.rulingia.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: cpow(3) implementations.
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
	<freebsd-numerics.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
	<mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
	<mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 08 Aug 2012 02:58:41 -0000

On 08/07/2012 07:03 PM, Peter Jeremy wrote:
> The C99 standard (at least WG14/N1256) is quite liberal as regards
> cpow() and specifically allows a conforming implementation to just do:
>    cpow(z, w) = cexp(w * clog(z))
>
> The downside of this approach is that log() inherently loses precision
> by pushing (most of) the exponent bits into the fraction, displacing
> original fraction bits.  I've therefore been looking at how to
> implement cpow(3) with a precision similar to pow(3).  The following
> are some thoughts, together with questions.
>
> In the following:
>    w = a + I*b
>    z = c + I*d
>    cis(r) = cos(r) + I*sin(r)
>    t = u + I*v = clog(c + I*d)
>                = log(hypot(c, d)) + I*atan2(d, c)
>
>    cpow(z, w) = cexp(w * clog(z))
>               = cpow(c + I*d, a + I*b)
>               = cexp((a + I*b) * clog(c + I*d))
>               = cexp((a + I*b) * (u + I*v))
>               = cexp((a*u - b*v) + I*(a*v + b*u))
>               = exp(a*u - b*v) * cis(a*v + b*u)

I wouldn't regard errors in a*u-b*v as catastrophic cancellation.  This 
is because exp(0) = 1.  So if the error in computing a*u-b*v is approx 
DBL_EPSILON, and a*u-b*v approx zero, even though the relative error in 
computing a*u-b*v is going to large (perhaps even infinite), 
nevertheless the error in exp(a*u-b*v) may still be bounded by 1 or 2 ulp.


More generally, as a mathematician, I would be far more concerned that 
cpow(z,w) return accurate answers when w is real, and especially when w 
is a small integer.  Real life applications of cpow(z,y) when w is not 
real are very few and far between.

I would be pleased if cpow(x,y) made special provisions for when y is a 
small integer, and so, for example, cpow(z,2) was computed as z*z = 
(x+y)*(x-y) + 2x*y*I.

For cpow(z,3), you are going to have a hard time avoiding large relative 
errors when x^2 = 3y^2 (i.e. z is parallel to a cube root of 1). 
Frankly I see that as somewhat unavoidable.

Nevertheless, if you pumped up cpow(z,w) so that when w was a relatively 
small integer, that it broke w into its base 2 expansion, and then 
multiplied lots of terms of the form cpow(z,2^n), each of which was 
computed by n repetitions of cpow(z,2), and then for negative integers 
by taking the reciprocal of the whole thing, and then in all other cases 
simply use cexp(w*clog(z)), I would be very happy.

Other than that, if your cpow produced cpow(-1,0.5) = I + 1e-16, I 
wouldn't be shocked at all, and I would find this kind of error totally 
acceptable.

From owner-freebsd-numerics@FreeBSD.ORG  Wed Aug  8 09:24:58 2012
Return-Path: <owner-freebsd-numerics@FreeBSD.ORG>
Delivered-To: freebsd-numerics@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 14690106564A
	for <freebsd-numerics@freebsd.org>;
	Wed,  8 Aug 2012 09:24:58 +0000 (UTC)
	(envelope-from peter@rulingia.com)
Received: from vps.rulingia.com (host-122-100-2-194.octopus.com.au
	[122.100.2.194])
	by mx1.freebsd.org (Postfix) with ESMTP id 9C5878FC08
	for <freebsd-numerics@freebsd.org>;
	Wed,  8 Aug 2012 09:24:57 +0000 (UTC)
Received: from server.rulingia.com (c220-239-247-45.belrs5.nsw.optusnet.com.au
	[220.239.247.45])
	by vps.rulingia.com (8.14.5/8.14.5) with ESMTP id q789Ot27050752
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK)
	for <freebsd-numerics@freebsd.org>; Wed, 8 Aug 2012 19:24:56 +1000 (EST)
	(envelope-from peter@rulingia.com)
X-Bogosity: Ham, spamicity=0.000000
Received: from server.rulingia.com (localhost.rulingia.com [127.0.0.1])
	by server.rulingia.com (8.14.5/8.14.5) with ESMTP id q789OkvT066394
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO)
	for <freebsd-numerics@freebsd.org>; Wed, 8 Aug 2012 19:24:46 +1000 (EST)
	(envelope-from peter@server.rulingia.com)
Received: (from peter@localhost)
	by server.rulingia.com (8.14.5/8.14.5/Submit) id q789OjF4066393
	for freebsd-numerics@freebsd.org; Wed, 8 Aug 2012 19:24:45 +1000 (EST)
	(envelope-from peter)
Date: Wed, 8 Aug 2012 19:24:45 +1000
From: Peter Jeremy <peter@rulingia.com>
To: freebsd-numerics@freebsd.org
Message-ID: <20120808092445.GB65056@server.rulingia.com>
References: <20120808000357.GA11128@server.rulingia.com>
	<5021D5DF.7010709@missouri.edu>
MIME-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature"; boundary="5vNYLRcllDrimb99"
Content-Disposition: inline
In-Reply-To: <5021D5DF.7010709@missouri.edu>
X-PGP-Key: http://www.rulingia.com/keys/peter.pgp
User-Agent: Mutt/1.5.21 (2010-09-15)
Subject: Re: cpow(3) implementations.
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
	<freebsd-numerics.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
	<mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
	<mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 08 Aug 2012 09:24:58 -0000


--5vNYLRcllDrimb99
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On 2012-Aug-07 18:34:59 -0700, Steve Kargl <sgk@troutmask.apl.washington.ed=
u> wrote:
>On Wed, Aug 08, 2012 at 10:03:57AM +1000, Peter Jeremy wrote:
>>   cpow(z, w) =3D exp(a*u - b*v) * cis(a*v + b*u)
=2E..
>>   exp(a*u - b*v) =3D exp(a*u) / exp(b*v)
>>                  =3D pow(a, hypot(c, d)) / exp(b*v)

Oops, that last line is wrong and should be:
   exp(a*u - b*v) =3D pow(hypot(c, d), a) / exp(b*v)

>I cheated and peaked at NetBSD's implementation, which is
>based on Moshier's cephes implementation.

The pseudocode didn't make sense to me (I couldn't see the 2nd
argument anywhere) so I had a peek at both NetBSD and cephes-2.8
and the code is identical in both.  I believe the pseudo code
should be:

cpow(z,w):
x =3D creal(w)
y =3D cimag(w)
r =3D cabs(z)
t =3D carg(z)

if y =3D=3D 0
   if x =3D=3D 0
      return 0
   else
      return pow(r,x)*(cos(t*x)+I*sin(t*x))
else
   return pow(r,x)*exp(-t*y)*(cos(t*x+y*log(r))+I*sin(t*x+y*log(r)))

Note that the final line looks the same as my final line (modulo
algebraic errors).  The rest is just special casing.

On 2012-Aug-07 21:58:39 -0500, Stephen Montgomery-Smith <stephen@missouri.e=
du> wrote:
>I wouldn't regard errors in a*u-b*v as catastrophic cancellation.  This=20
>is because exp(0) =3D 1.  So if the error in computing a*u-b*v is approx=
=20
>DBL_EPSILON, and a*u-b*v approx zero, even though the relative error in=20
>computing a*u-b*v is going to large (perhaps even infinite),=20
>nevertheless the error in exp(a*u-b*v) may still be bounded by 1 or 2 ulp.

OK.

>I would be pleased if cpow(x,y) made special provisions for when y is a=20
>small integer, and so, for example, cpow(z,2) was computed as z*z =3D=20
>(x+y)*(x-y) + 2x*y*I.

Thu square-and-multiply approach makes sense.  Where do you draw the
line on "small"?  At some point the rounding errors will exceed the
trig approach.

>Other than that, if your cpow produced cpow(-1,0.5) =3D I + 1e-16, I=20
>wouldn't be shocked at all, and I would find this kind of error totally=20
>acceptable.

I like the idea of also special-casing "simple" rational powers.  This
offers things like cpow(-1.,0.5) =3D I and cpow(-8.,1./3.) =3D -2

Thank you both for your input.

--=20
Peter Jeremy

--5vNYLRcllDrimb99
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (FreeBSD)

iEYEARECAAYFAlAiMF0ACgkQ/opHv/APuIebdACfeAT7FOE2T0hKyq4Pe9cAkvuc
xUsAn3rJsudDpsacUnPZbDIAN6wT6/IX
=J5iy
-----END PGP SIGNATURE-----

--5vNYLRcllDrimb99--

From owner-freebsd-numerics@FreeBSD.ORG  Wed Aug  8 16:50:43 2012
Return-Path: <owner-freebsd-numerics@FreeBSD.ORG>
Delivered-To: freebsd-numerics@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id E8BE11065673
	for <freebsd-numerics@freebsd.org>;
	Wed,  8 Aug 2012 16:50:43 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from mail01.syd.optusnet.com.au (mail01.syd.optusnet.com.au
	[211.29.132.182])
	by mx1.freebsd.org (Postfix) with ESMTP id 67E4E8FC0C
	for <freebsd-numerics@freebsd.org>;
	Wed,  8 Aug 2012 16:50:42 +0000 (UTC)
Received: from c122-106-171-246.carlnfd1.nsw.optusnet.com.au
	(c122-106-171-246.carlnfd1.nsw.optusnet.com.au [122.106.171.246])
	by mail01.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
	q78GoYvH014335
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Thu, 9 Aug 2012 02:50:35 +1000
Date: Thu, 9 Aug 2012 02:50:34 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Stephen Montgomery-Smith <stephen@missouri.edu>
In-Reply-To: <501EC015.3000808@missouri.edu>
Message-ID: <20120809013542.D3915@besplex.bde.org>
References: <5017111E.6060003@missouri.edu> <501C361D.4010807@missouri.edu>
	<20120804165555.X1231@besplex.bde.org> <501D51D7.1020101@missouri.edu>
	<20120805030609.R3101@besplex.bde.org> <501D9C36.2040207@missouri.edu>
	<20120805175106.X3574@besplex.bde.org> <501EC015.3000808@missouri.edu>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: freebsd-numerics@freebsd.org, Bruce Evans <brde@optusnet.com.au>
Subject: Re: Complex arg-trig functions
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
	<freebsd-numerics.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
	<mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
	<mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 08 Aug 2012 16:50:44 -0000

On Sun, 5 Aug 2012, Stephen Montgomery-Smith wrote:

> On 08/05/2012 05:05 AM, Bruce Evans wrote:
>> On Sat, 4 Aug 2012, Stephen Montgomery-Smith wrote:
>
>>> 
>>> What is the purpose of the macros norm and spadd?
>> 
>> [Answer snipped for brevity]
>
> Thank you.  Your answers are very helpful.  I had no idea what norm and spadd 
> did, and now just even glancing through your answers tells me exactly what 
> they do (even if I haven't yet fully understood why they work).

I googled Dekker's algorithm and found norm() nearby in the book by
Muller et al "Handbook of Floating Point...".  They call it 2sumF (fast
2sum).  It is Knuth's Theorem B They call Knuth's Theorem B 2sum.
These are even worse names than norm() or das's dd_add(), especially
for C :-).  Dekker had a lot to do with these algorithms too.  Muller
discusses refinements that aren't in Knuth.

In clog(), I switched to the full Dekker algorithm for multiplication
and something more like a Dekker algorithm for addition, and now don't
use spadd() or so many norm()s.  My norm() is normally algorithm 2sumF,
but in clog() I made it have the effect of 2sum (don't depend on the
order of the args).  I changed it back to 2sumF and named this normF(),
and made norm() == 2sumF (do more arithmetic operations to avoid fabs()'s
and branches for swapping the args), and used normF() where possible and
otherwise norm().  Just one of each.

das's dd_mul() seems to be not quite Dekker's algorithm as given in Muller.
It is slower.  Maybe it is actually Dekker's original algorithm:

% /*
%  * Compute a*b exactly, returning the exact result in a struct dd.  We assume
%  * that both a and b are normalized, so no underflow or overflow will occur.
%  * The current rounding mode must be round-to-nearest.
%  */
% static inline struct dd
% dd_mul(double a, double b)
% {
% 	static const double split = 0x1p27 + 1.0;
% 	struct dd ret;
% 	double ha, hb, la, lb, p, q;
% 
% 	p = a * split;
% 	ha = a - p;
% 	ha += p;
% 	la = a - ha;

The splitting part is Veldkamp's (?) algorithm.  I now use it for
efficiency, and had to work around compiler bugfeatures, and partially
work around the fudamental inefficiency of writing it like the above.

Double expressions may be evaluated in extra precision (and are on i386
if someone changes the rounding precision to FP_PE).  To try to get rid
of the extra precision, the above assigns the result of all expressions
to a variable.  C90 implicitly and C99 explicitly require assigments and
casts to discard extra precision.  But discarding extra precision is slow,
and in most code it just reduces accuracy, so gcc on i386 never did this.
This totally breaks the above algorithm.  (FreeBSD uses a default rounding
precision of FP_PD mainly to avoid such breakage for doubles, although not
for floats.)

The right way to fix this is to use double_t for most variables, so that
there is no extra precision.

@ 	static const double_t split = 0x1pNN + 1.0;
@ 	struct dd ret;
@ 	double_t ha, hb, la, lb, p, q;
@ 
@ 	p = a * split;
@ 	ha = a - p;
@ 	ha += p;
@ 	la = a - ha;

Unfortunately, then NN is hard to determine efficiently.  The rounding
precision must be FP_PD or FP_PE, else doubles won't work at all.  On
i386: when it is FP_PD, NN must be 27, the same as now; when it is FP_PE,
NN must be 38 (11 more because of 11 extra bits of precision).  Similarly
for float precision, except NN = 12 only works for the unusual rounding
precision FP_PS; it must be 41 with FP_PD and 52 with FP_PE.

This is too hard to configure, so I use an intermediate way:

@ 	static const double_t split = 0x1p53 + 1;
@ 	struct dd ret;
@ 	double_t ha, hb, la, lb, p, q;
@ 
@ 	p = (double)(a * split);
@ 	ha = (double)(a - p) + p;
@ 	la = a - ha;

The casts are like assignments, and would have no effect on i386 due
to the compiler bugfeatures if the variables had type double.  They
correctly have no effect if there is no extra precision (then double_t
== double).  On i386, double_t is long double, and the cast is now
not null.  This tricks the compiler into generating the correct code.
STRICT_ASSIGN() can also be used to do essentially the same thing,
but the above looks nicer and is more portable.

Style changes in the above:
- write 1 as 1, not as 1.0.  Let the compiler promote it to the type
   of the other operand.
- don't use an assignment for the intermediate step in (a - p + p).
   An assignment is neither necessary nor (with compiler bugfeatures)
   sufficient for discarding extra precision.

Muller's book is careful about making rounding explicit.  E.g., it
writes the splitting part as something like:

@ 	p = round(a * split);
@ 	ha = round(a - p);
@ 	ha = round(ha + p);
@ 	la = round(a - ha);

IIRC, it writes round() even when the result is exact, but that would
not be careful so hopefully it doesn't.  This could be done in C code
too, by casting every operation that must be rounded (to kill extra
precision), except with the compiler bugfeatures this doesn't actually
work.  This is hard to write and harder to read, so I prefer to use
expressions like a - p + p, without any intermediate variables or
casts, and depend on strict left to right evaluation and omit
parentheses too.  Using double_t makes this work, modulo other compiler
bugfeatures.

% 
% 	p = b * split;
% 	hb = b - p;
% 	hb += p;
% 	lb = b - hb;
% 
% 	p = ha * hb;

In Dekker's algorithm as given by Muller, p is a * b (rounded).  This
has the technical advantage that the hi term becomes available earlier.

% 	q = ha * lb + la * hb;
% 
% 	ret.hi = p + q;
% 	ret.lo = p - ret.hi + q + la * lb;

Dekker's algorithm is:

@ 	ret.hi = a * b;
@ 	ret.lo = ha * hb - ret.hi + ha * lb + la * hb + la * lb;

This looks shorter, but actually has the same number of operations (9).
I now see that it has the technical disadvantage that the lo term
becomes available later.  So dd_add() is probably better if you want
the full ret available fastest, while the version from Muller is better
if you want ret.hi available fastest.

Muller's organization is also easier to understand -- you calculate
the rounded product, then a correction term.  It is fairly clear that
(ha * hb - ret.hi) is exact.  Then more terms are added, from high to
low, presumably with enough cancelations to make them exact too.  I
wonder why the term equal to q isn't parenthesized?:

@ 	ret.lo = ha * hb - ret.hi + (q = ha * lb + la * hb) + la * lb;

Parenthesizing this term makes ret.lo available earlier, if there is
some otherwise-unused paralleism.

% 	return (ret);
% }


> If you don't want lengthy comments in the code, perhaps you could simply 
> include the following link in a comment:
> http://lists.freebsd.org/pipermail/freebsd-numerics/2012-August/000020.html

Link to the whole list :-).

> Let me say that in my proposed code for casinh, etc, that I do have essay 
> length comments.
> http://people.freebsd.org/~stephen/catrig.c
>
> I plan to respect your style (keep out very lengthy comments), but I hope 
> whomever commits my code will respect my style and keep the lengthy comments.

I respect style to a fault, but try to get you to change it before
committing much :-).

Bruce

From owner-freebsd-numerics@FreeBSD.ORG  Wed Aug  8 17:12:22 2012
Return-Path: <owner-freebsd-numerics@FreeBSD.ORG>
Delivered-To: freebsd-numerics@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 0EE37106566C
	for <freebsd-numerics@freebsd.org>;
	Wed,  8 Aug 2012 17:12:22 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from mail14.syd.optusnet.com.au (mail14.syd.optusnet.com.au
	[211.29.132.195])
	by mx1.freebsd.org (Postfix) with ESMTP id 9675D8FC08
	for <freebsd-numerics@freebsd.org>;
	Wed,  8 Aug 2012 17:12:21 +0000 (UTC)
Received: from c122-106-171-246.carlnfd1.nsw.optusnet.com.au
	(c122-106-171-246.carlnfd1.nsw.optusnet.com.au [122.106.171.246])
	by mail14.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
	q78HCI2t009761
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Thu, 9 Aug 2012 03:12:20 +1000
Date: Thu, 9 Aug 2012 03:12:18 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Peter Jeremy <peter@rulingia.com>
In-Reply-To: <20120807205725.GA10572@server.rulingia.com>
Message-ID: <20120809025220.N4114@besplex.bde.org>
References: <5017111E.6060003@missouri.edu> <501C361D.4010807@missouri.edu>
	<20120804165555.X1231@besplex.bde.org> <501D51D7.1020101@missouri.edu>
	<20120805030609.R3101@besplex.bde.org> <501D9C36.2040207@missouri.edu>
	<20120805175106.X3574@besplex.bde.org> <501EC015.3000808@missouri.edu>
	<20120805191954.GA50379@troutmask.apl.washington.edu>
	<20120807205725.GA10572@server.rulingia.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: freebsd-numerics@freebsd.org
Subject: Re: Complex arg-trig functions
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
	<freebsd-numerics.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
	<mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
	<mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 08 Aug 2012 17:12:22 -0000

On Wed, 8 Aug 2012, Peter Jeremy wrote:

> On 2012-Aug-05 12:19:54 -0700, Steve Kargl <sgk@troutmask.apl.washington.edu> wrote:
>> On Sun, Aug 05, 2012 at 01:48:53PM -0500, Stephen Montgomery-Smith wrote:
>>> I plan to respect your style (keep out very lengthy comments), but I
>>> hope whomever commits my code will respect my style and keep the lengthy
>>> comments.
>>
>> Essay long comments probably belong in the man page if the essay is
>> important to the details of implementation.  Bruce may disagree with
>> me.

Of course only details visible to applications belong in the man page.
But I think many details belong in meta-comments in something more
like a readme than in every source file.

> I also believe that implementation details belong in the code and not
> the man page.  The man page should contain information necessary to
> use the function: Domain, range, behaviour with exceptional inputs,
> accuracy and similar.  Any "unexpected" behaviours as a side-effect of
> the implementation can be mentioned under BUGS.

Note that FreeBSD man pages mostly have less than this, and less than
even the non-Annex parts of C99, after removing some out-of date
meta-info from math.3.

> The user shouldn't
> need to know that it's implemented by range-reduction to [0 .. e/7]
> and then evaluated with an 11th order truncated Chebyshev polynomial
> that's had coefficients tweaked by a demon I summoned one evening.

Unless the implementation affects the user-visible behaviour.  For
example, on i386 trig functions are still implemented using the i387
(except in my version), so applications are exposed to the huge errors
near multiples of pi/2 (starting at 1*pi/2) that are normal for the i387.
A note about this would point to the right place to blame :-).

> OTOH, the code itself should contain comments explaining what it is
> doing - particularly where it is relying on details of the floating
> point implementation or the difference between an operation on real
> (or complex) numbers and the same operation on float/double/...
> numbers.

That's too detailed.  Almost every line depends on IEEE754 arithmetic
and on this not being too broken.  Many depend on the arithmetic _not_
being the same as real arithmetic.  An example is almost every line
in Dekker's algorithm.

> It's also useful to document why alternative (particularly
> "obviously better" alternative) approaches won't work.

Again too detailed.  On most lines I consider a couple of routine
alternatives.  For critical lines, I consider and try many more.

Bruce

From owner-freebsd-numerics@FreeBSD.ORG  Wed Aug  8 18:11:58 2012
Return-Path: <owner-freebsd-numerics@FreeBSD.ORG>
Delivered-To: freebsd-numerics@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 66DEC1065674
	for <freebsd-numerics@freebsd.org>;
	Wed,  8 Aug 2012 18:11:58 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from mail35.syd.optusnet.com.au (mail35.syd.optusnet.com.au
	[211.29.133.51])
	by mx1.freebsd.org (Postfix) with ESMTP id EFC7A8FC08
	for <freebsd-numerics@freebsd.org>;
	Wed,  8 Aug 2012 18:11:57 +0000 (UTC)
Received: from c122-106-171-246.carlnfd1.nsw.optusnet.com.au
	(c122-106-171-246.carlnfd1.nsw.optusnet.com.au [122.106.171.246])
	by mail35.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
	q78IBnX6003719
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Thu, 9 Aug 2012 04:11:50 +1000
Date: Thu, 9 Aug 2012 04:11:49 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Stephen Montgomery-Smith <stephen@missouri.edu>
In-Reply-To: <5021D5DF.7010709@missouri.edu>
Message-ID: <20120809033123.V4514@besplex.bde.org>
References: <20120808000357.GA11128@server.rulingia.com>
	<5021D5DF.7010709@missouri.edu>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: freebsd-numerics@freebsd.org
Subject: Re: cpow(3) implementations.
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
	<freebsd-numerics.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
	<mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
	<mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 08 Aug 2012 18:11:58 -0000

On Tue, 7 Aug 2012, Stephen Montgomery-Smith wrote:

> On 08/07/2012 07:03 PM, Peter Jeremy wrote:
>> The C99 standard (at least WG14/N1256) is quite liberal as regards
>> cpow() and specifically allows a conforming implementation to just do:
>>    cpow(z, w) = cexp(w * clog(z))
>> 
>> The downside of this approach is that log() inherently loses precision
>> by pushing (most of) the exponent bits into the fraction, displacing
>> original fraction bits.  I've therefore been looking at how to
>> implement cpow(3) with a precision similar to pow(3).  The following
>> are some thoughts, together with questions.
>> 
>> In the following:
>>    w = a + I*b
>>    z = c + I*d
>>    cis(r) = cos(r) + I*sin(r)
>>    t = u + I*v = clog(c + I*d)
>>                = log(hypot(c, d)) + I*atan2(d, c)
>> 
>>    cpow(z, w) = cexp(w * clog(z))
>>               = cpow(c + I*d, a + I*b)
>>               = cexp((a + I*b) * clog(c + I*d))
>>               = cexp((a + I*b) * (u + I*v))
>>               = cexp((a*u - b*v) + I*(a*v + b*u))
>>               = exp(a*u - b*v) * cis(a*v + b*u)
>
> I wouldn't regard errors in a*u-b*v as catastrophic cancellation.  This is 
> because exp(0) = 1.  So if the error in computing a*u-b*v is approx 
> DBL_EPSILON, and a*u-b*v approx zero, even though the relative error in 
> computing a*u-b*v is going to large (perhaps even infinite), nevertheless the 
> error in exp(a*u-b*v) may still be bounded by 1 or 2 ulp.

Indeed.  It's sort of the opposite of catastrophic cancelation.  We
might have a*u-b*v off by a factor of 2, with value DBL_EPSILON when
it should be 2*DBL_EPSILON.  That is about 2**52 ulps in double
precision.  But then adding 1 turns it into an error of just 1 ulp.

Even more so for the cos() part of cis().  Now the small term with the
large error gets squared before adding 1.

But for the sin() part of cis(), the final error is the one that the
cancelation gave.  It may be small absolute for cis(), but on
multiplication by large exp() it will become large absolute (though
no larger relative, and always small relative to the cos() part which
is about 1).

> More generally, as a mathematician, I would be far more concerned that 
> cpow(z,w) return accurate answers when w is real, and especially when w is a 
> small integer.  Real life applications of cpow(z,y) when w is not real are 
> very few and far between.

This is easy when z is also real, by using pow().  Then there is the
issue of continuity -- how close is cpow(x + I*eps, y) to pow(x, y)?
Similarly for cpow(x, y + I*eps).

> I would be pleased if cpow(x,y) made special provisions for when y is a small 
> integer, and so, for example, cpow(z,2) was computed as z*z = (x+y)*(x-y) + 
> 2x*y*I.

x^2 - y^2 is easier than x^2 + y^2 - 1 for clog().

> For cpow(z,3), you are going to have a hard time avoiding large relative 
> errors when x^2 = 3y^2 (i.e. z is parallel to a cube root of 1). Frankly I 
> see that as somewhat unavoidable.

x^2 - 3y^2 is still easier than for clog().  But go to z^4, and the real
part is x^4 - 6 x^2 y^2 + y^4, so it's much harder than clog().  Seems
to need quadrupled precision for the sum of the 4th powers, then a trick
to avoid maybe octupled precision :-).

> Nevertheless, if you pumped up cpow(z,w) so that when w was a relatively 
> small integer, that it broke w into its base 2 expansion, and then multiplied 
> lots of terms of the form cpow(z,2^n), each of which was computed by n 
> repetitions of cpow(z,2), and then for negative integers by taking the 
> reciprocal of the whole thing, and then in all other cases simply use 
> cexp(w*clog(z)), I would be very happy.

Is it obvious that this is reasonably continuous?  I suppose it is,
except what "reasonably" is.

> Other than that, if your cpow produced cpow(-1,0.5) = I + 1e-16, I wouldn't 
> be shocked at all, and I would find this kind of error totally acceptable.

cexp(0.5 * I * M_PI) in fact produces I + 0.612e-16.  This is probably
very accurate, since M_PI differs from pi by up to half an ulp.  clog(-1)
can do no better than produce 0.5 * I * M_PI (and it should have no
difficult producing exactly that).

Bruce

From owner-freebsd-numerics@FreeBSD.ORG  Wed Aug  8 19:49:09 2012
Return-Path: <owner-freebsd-numerics@FreeBSD.ORG>
Delivered-To: freebsd-numerics@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id B32D3106566C
	for <freebsd-numerics@freebsd.org>;
	Wed,  8 Aug 2012 19:49:09 +0000 (UTC)
	(envelope-from stephen@missouri.edu)
Received: from wilberforce.math.missouri.edu (wilberforce.math.missouri.edu
	[128.206.184.213])
	by mx1.freebsd.org (Postfix) with ESMTP id 89ACF8FC18
	for <freebsd-numerics@freebsd.org>;
	Wed,  8 Aug 2012 19:49:09 +0000 (UTC)
Received: from [127.0.0.1] (wilberforce.math.missouri.edu [128.206.184.213])
	by wilberforce.math.missouri.edu (8.14.5/8.14.5) with ESMTP id
	q78Jn65A097379; Wed, 8 Aug 2012 14:49:07 -0500 (CDT)
	(envelope-from stephen@missouri.edu)
Message-ID: <5022C2B2.2030803@missouri.edu>
Date: Wed, 08 Aug 2012 14:49:06 -0500
From: Stephen Montgomery-Smith <stephen@missouri.edu>
User-Agent: Mozilla/5.0 (X11; Linux i686;
	rv:14.0) Gecko/20120714 Thunderbird/14.0
MIME-Version: 1.0
To: Bruce Evans <brde@optusnet.com.au>
References: <20120808000357.GA11128@server.rulingia.com>
	<5021D5DF.7010709@missouri.edu>
	<20120809033123.V4514@besplex.bde.org>
In-Reply-To: <20120809033123.V4514@besplex.bde.org>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: freebsd-numerics@freebsd.org
Subject: Re: cpow(3) implementations.
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
	<freebsd-numerics.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
	<mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
	<mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 08 Aug 2012 19:49:09 -0000

On 08/08/2012 01:11 PM, Bruce Evans wrote:
> On Tue, 7 Aug 2012, Stephen Montgomery-Smith wrote:
>
>> On 08/07/2012 07:03 PM, Peter Jeremy wrote:
>>> The C99 standard (at least WG14/N1256) is quite liberal as regards
>>> cpow() and specifically allows a conforming implementation to just do:
>>>    cpow(z, w) = cexp(w * clog(z))
>>>
>>> The downside of this approach is that log() inherently loses precision
>>> by pushing (most of) the exponent bits into the fraction, displacing
>>> original fraction bits.  I've therefore been looking at how to
>>> implement cpow(3) with a precision similar to pow(3).  The following
>>> are some thoughts, together with questions.
>>>
>>> In the following:
>>>    w = a + I*b
>>>    z = c + I*d
>>>    cis(r) = cos(r) + I*sin(r)
>>>    t = u + I*v = clog(c + I*d)
>>>                = log(hypot(c, d)) + I*atan2(d, c)
>>>
>>>    cpow(z, w) = cexp(w * clog(z))
>>>               = cpow(c + I*d, a + I*b)
>>>               = cexp((a + I*b) * clog(c + I*d))
>>>               = cexp((a + I*b) * (u + I*v))
>>>               = cexp((a*u - b*v) + I*(a*v + b*u))
>>>               = exp(a*u - b*v) * cis(a*v + b*u)
>>
>> I wouldn't regard errors in a*u-b*v as catastrophic cancellation.
>> This is because exp(0) = 1.  So if the error in computing a*u-b*v is
>> approx DBL_EPSILON, and a*u-b*v approx zero, even though the relative
>> error in computing a*u-b*v is going to large (perhaps even infinite),
>> nevertheless the error in exp(a*u-b*v) may still be bounded by 1 or 2
>> ulp.
>
> Indeed.  It's sort of the opposite of catastrophic cancelation.  We
> might have a*u-b*v off by a factor of 2, with value DBL_EPSILON when
> it should be 2*DBL_EPSILON.  That is about 2**52 ulps in double
> precision.  But then adding 1 turns it into an error of just 1 ulp.
>
> Even more so for the cos() part of cis().  Now the small term with the
> large error gets squared before adding 1.
>
> But for the sin() part of cis(), the final error is the one that the
> cancelation gave.  It may be small absolute for cis(), but on
> multiplication by large exp() it will become large absolute (though
> no larger relative, and always small relative to the cos() part which
> is about 1).
>
>> More generally, as a mathematician, I would be far more concerned that
>> cpow(z,w) return accurate answers when w is real, and especially when
>> w is a small integer.  Real life applications of cpow(z,y) when w is
>> not real are very few and far between.
>
> This is easy when z is also real, by using pow().  Then there is the
> issue of continuity -- how close is cpow(x + I*eps, y) to pow(x, y)?
> Similarly for cpow(x, y + I*eps).
>
>> I would be pleased if cpow(x,y) made special provisions for when y is
>> a small integer, and so, for example, cpow(z,2) was computed as z*z =
>> (x+y)*(x-y) + 2x*y*I.
>
> x^2 - y^2 is easier than x^2 + y^2 - 1 for clog().
>
>> For cpow(z,3), you are going to have a hard time avoiding large
>> relative errors when x^2 = 3y^2 (i.e. z is parallel to a cube root of
>> 1). Frankly I see that as somewhat unavoidable.
>
> x^2 - 3y^2 is still easier than for clog().  But go to z^4, and the real
> part is x^4 - 6 x^2 y^2 + y^4, so it's much harder than clog().  Seems
> to need quadrupled precision for the sum of the 4th powers, then a trick
> to avoid maybe octupled precision :-).

How about doing the real part of z^4 as
((x-y)*(x+y)-2*x*y)*((x-y)*(x+y)+2*x*y) ?
That is what my suggested process would do.


From owner-freebsd-numerics@FreeBSD.ORG  Thu Aug  9 03:19:14 2012
Return-Path: <owner-freebsd-numerics@FreeBSD.ORG>
Delivered-To: freebsd-numerics@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 59624106566B
	for <freebsd-numerics@freebsd.org>;
	Thu,  9 Aug 2012 03:19:14 +0000 (UTC)
	(envelope-from stephen@missouri.edu)
Received: from wilberforce.math.missouri.edu (wilberforce.math.missouri.edu
	[128.206.184.213])
	by mx1.freebsd.org (Postfix) with ESMTP id 305858FC12
	for <freebsd-numerics@freebsd.org>;
	Thu,  9 Aug 2012 03:19:13 +0000 (UTC)
Received: from [127.0.0.1] (wilberforce.math.missouri.edu [128.206.184.213])
	by wilberforce.math.missouri.edu (8.14.5/8.14.5) with ESMTP id
	q793J65p039860
	for <freebsd-numerics@freebsd.org>; Wed, 8 Aug 2012 22:19:06 -0500 (CDT)
	(envelope-from stephen@missouri.edu)
Message-ID: <50232C2A.8040704@missouri.edu>
Date: Wed, 08 Aug 2012 22:19:06 -0500
From: Stephen Montgomery-Smith <stephen@missouri.edu>
User-Agent: Mozilla/5.0 (X11; Linux i686;
	rv:14.0) Gecko/20120714 Thunderbird/14.0
MIME-Version: 1.0
To: freebsd-numerics@freebsd.org
References: <20120808000357.GA11128@server.rulingia.com>
	<5021D5DF.7010709@missouri.edu>
	<20120809033123.V4514@besplex.bde.org>
	<5022C2B2.2030803@missouri.edu>
In-Reply-To: <5022C2B2.2030803@missouri.edu>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: cpow(3) implementations.
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
	<freebsd-numerics.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
	<mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
	<mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 09 Aug 2012 03:19:14 -0000

On 08/08/2012 02:49 PM, Stephen Montgomery-Smith wrote:
> On 08/08/2012 01:11 PM, Bruce Evans wrote:
>> On Tue, 7 Aug 2012, Stephen Montgomery-Smith wrote:
>>
>>> On 08/07/2012 07:03 PM, Peter Jeremy wrote:
>>>> The C99 standard (at least WG14/N1256) is quite liberal as regards
>>>> cpow() and specifically allows a conforming implementation to just do:
>>>>    cpow(z, w) = cexp(w * clog(z))
>>>>
>>>> The downside of this approach is that log() inherently loses precision
>>>> by pushing (most of) the exponent bits into the fraction, displacing
>>>> original fraction bits.  I've therefore been looking at how to
>>>> implement cpow(3) with a precision similar to pow(3).  The following
>>>> are some thoughts, together with questions.
>>>>
>>>> In the following:
>>>>    w = a + I*b
>>>>    z = c + I*d
>>>>    cis(r) = cos(r) + I*sin(r)
>>>>    t = u + I*v = clog(c + I*d)
>>>>                = log(hypot(c, d)) + I*atan2(d, c)
>>>>
>>>>    cpow(z, w) = cexp(w * clog(z))
>>>>               = cpow(c + I*d, a + I*b)
>>>>               = cexp((a + I*b) * clog(c + I*d))
>>>>               = cexp((a + I*b) * (u + I*v))
>>>>               = cexp((a*u - b*v) + I*(a*v + b*u))
>>>>               = exp(a*u - b*v) * cis(a*v + b*u)
>>>
>>> I wouldn't regard errors in a*u-b*v as catastrophic cancellation.
>>> This is because exp(0) = 1.  So if the error in computing a*u-b*v is
>>> approx DBL_EPSILON, and a*u-b*v approx zero, even though the relative
>>> error in computing a*u-b*v is going to large (perhaps even infinite),
>>> nevertheless the error in exp(a*u-b*v) may still be bounded by 1 or 2
>>> ulp.
>>
>> Indeed.  It's sort of the opposite of catastrophic cancelation.  We
>> might have a*u-b*v off by a factor of 2, with value DBL_EPSILON when
>> it should be 2*DBL_EPSILON.  That is about 2**52 ulps in double
>> precision.  But then adding 1 turns it into an error of just 1 ulp.
>>
>> Even more so for the cos() part of cis().  Now the small term with the
>> large error gets squared before adding 1.
>>
>> But for the sin() part of cis(), the final error is the one that the
>> cancelation gave.  It may be small absolute for cis(), but on
>> multiplication by large exp() it will become large absolute (though
>> no larger relative, and always small relative to the cos() part which
>> is about 1).
>>
>>> More generally, as a mathematician, I would be far more concerned that
>>> cpow(z,w) return accurate answers when w is real, and especially when
>>> w is a small integer.  Real life applications of cpow(z,y) when w is
>>> not real are very few and far between.
>>
>> This is easy when z is also real, by using pow().  Then there is the
>> issue of continuity -- how close is cpow(x + I*eps, y) to pow(x, y)?
>> Similarly for cpow(x, y + I*eps).
>>
>>> I would be pleased if cpow(x,y) made special provisions for when y is
>>> a small integer, and so, for example, cpow(z,2) was computed as z*z =
>>> (x+y)*(x-y) + 2x*y*I.
>>
>> x^2 - y^2 is easier than x^2 + y^2 - 1 for clog().
>>
>>> For cpow(z,3), you are going to have a hard time avoiding large
>>> relative errors when x^2 = 3y^2 (i.e. z is parallel to a cube root of
>>> 1). Frankly I see that as somewhat unavoidable.
>>
>> x^2 - 3y^2 is still easier than for clog().  But go to z^4, and the real
>> part is x^4 - 6 x^2 y^2 + y^4, so it's much harder than clog().  Seems
>> to need quadrupled precision for the sum of the 4th powers, then a trick
>> to avoid maybe octupled precision :-).
>
> How about doing the real part of z^4 as
> ((x-y)*(x+y)-2*x*y)*((x-y)*(x+y)+2*x*y) ?
> That is what my suggested process would do.


I tested it.  It performs very poorly.

I wrote a cpow function, just for integers.  cpow(x+I*y,n) does very 
poorly when x/y approx tan(Pi/(2n)), that is, x+I*y is parallel to an 
nth root of -1.  Only the cases n=0,1,2,-1,-2 do well.

As a mathematician, I wouldn't expect any better.  In fact I would be 
pleasantly surprised if it got the case n=2 correct.

Just to point out:

tan(Pi/4) = 1
tan(Pi/8) = sqrt(2) - 1
tan(Pi/16) = (1 + Sqrt(2)) (sqrt(4 - 2 sqrt(2)) - 1)

if a_n = tan(Pi/(2n)), then
a_{n+1} = (sqrt(a_n^2+1) - 1) / a_n


From owner-freebsd-numerics@FreeBSD.ORG  Thu Aug  9 21:34:00 2012
Return-Path: <owner-freebsd-numerics@FreeBSD.ORG>
Delivered-To: freebsd-numerics@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 6C274106564A
	for <freebsd-numerics@freebsd.org>;
	Thu,  9 Aug 2012 21:34:00 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from mail03.syd.optusnet.com.au (mail03.syd.optusnet.com.au
	[211.29.132.184])
	by mx1.freebsd.org (Postfix) with ESMTP id DE2C08FC08
	for <freebsd-numerics@freebsd.org>;
	Thu,  9 Aug 2012 21:33:59 +0000 (UTC)
Received: from c122-106-171-246.carlnfd1.nsw.optusnet.com.au
	(c122-106-171-246.carlnfd1.nsw.optusnet.com.au [122.106.171.246])
	by mail03.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
	q79LXuvx028045
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Fri, 10 Aug 2012 07:33:57 +1000
Date: Fri, 10 Aug 2012 07:33:55 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Stephen Montgomery-Smith <stephen@missouri.edu>
In-Reply-To: <50232C2A.8040704@missouri.edu>
Message-ID: <20120810072650.E4653@besplex.bde.org>
References: <20120808000357.GA11128@server.rulingia.com>
	<5021D5DF.7010709@missouri.edu>
	<20120809033123.V4514@besplex.bde.org> <5022C2B2.2030803@missouri.edu>
	<50232C2A.8040704@missouri.edu>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: freebsd-numerics@freebsd.org
Subject: Re: cpow(3) implementations.
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
	<freebsd-numerics.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
	<mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
	<mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 09 Aug 2012 21:34:00 -0000

On Wed, 8 Aug 2012, Stephen Montgomery-Smith wrote:

> On 08/08/2012 02:49 PM, Stephen Montgomery-Smith wrote:
>> On 08/08/2012 01:11 PM, Bruce Evans wrote:
>>> On Tue, 7 Aug 2012, Stephen Montgomery-Smith wrote:
> ...
>>>> For cpow(z,3), you are going to have a hard time avoiding large
>>>> relative errors when x^2 = 3y^2 (i.e. z is parallel to a cube root of
>>>> 1). Frankly I see that as somewhat unavoidable.
>>> 
>>> x^2 - 3y^2 is still easier than for clog().  But go to z^4, and the real
>>> part is x^4 - 6 x^2 y^2 + y^4, so it's much harder than clog().  Seems
>>> to need quadrupled precision for the sum of the 4th powers, then a trick
>>> to avoid maybe octupled precision :-).
>> 
>> How about doing the real part of z^4 as
>> ((x-y)*(x+y)-2*x*y)*((x-y)*(x+y)+2*x*y) ?
>> That is what my suggested process would do.
>
> I tested it.  It performs very poorly.

x^2 - y^2 in z^2 may have large cancelation, and then squaring again
may have further large cancelation, so the problem doesn't really change

> I wrote a cpow function, just for integers.  cpow(x+I*y,n) does very poorly 
> when x/y approx tan(Pi/(2n)), that is, x+I*y is parallel to an nth root of 
> -1.  Only the cases n=0,1,2,-1,-2 do well.
>
> As a mathematician, I wouldn't expect any better.  In fact I would be 
> pleasantly surprised if it got the case n=2 correct.

I think that is "as a numerical analyst" :-).  The magic of analytic
functions involves delicate cancelations.

Bruce