Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 9 Aug 2012 02:50:34 +1000 (EST)
From:      Bruce Evans <brde@optusnet.com.au>
To:        Stephen Montgomery-Smith <stephen@missouri.edu>
Cc:        freebsd-numerics@freebsd.org, Bruce Evans <brde@optusnet.com.au>
Subject:   Re: Complex arg-trig functions
Message-ID:  <20120809013542.D3915@besplex.bde.org>
In-Reply-To: <501EC015.3000808@missouri.edu>
References:  <5017111E.6060003@missouri.edu> <501C361D.4010807@missouri.edu> <20120804165555.X1231@besplex.bde.org> <501D51D7.1020101@missouri.edu> <20120805030609.R3101@besplex.bde.org> <501D9C36.2040207@missouri.edu> <20120805175106.X3574@besplex.bde.org> <501EC015.3000808@missouri.edu>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, 5 Aug 2012, Stephen Montgomery-Smith wrote:

> On 08/05/2012 05:05 AM, Bruce Evans wrote:
>> On Sat, 4 Aug 2012, Stephen Montgomery-Smith wrote:
>
>>> 
>>> What is the purpose of the macros norm and spadd?
>> 
>> [Answer snipped for brevity]
>
> Thank you.  Your answers are very helpful.  I had no idea what norm and spadd 
> did, and now just even glancing through your answers tells me exactly what 
> they do (even if I haven't yet fully understood why they work).

I googled Dekker's algorithm and found norm() nearby in the book by
Muller et al "Handbook of Floating Point...".  They call it 2sumF (fast
2sum).  It is Knuth's Theorem B They call Knuth's Theorem B 2sum.
These are even worse names than norm() or das's dd_add(), especially
for C :-).  Dekker had a lot to do with these algorithms too.  Muller
discusses refinements that aren't in Knuth.

In clog(), I switched to the full Dekker algorithm for multiplication
and something more like a Dekker algorithm for addition, and now don't
use spadd() or so many norm()s.  My norm() is normally algorithm 2sumF,
but in clog() I made it have the effect of 2sum (don't depend on the
order of the args).  I changed it back to 2sumF and named this normF(),
and made norm() == 2sumF (do more arithmetic operations to avoid fabs()'s
and branches for swapping the args), and used normF() where possible and
otherwise norm().  Just one of each.

das's dd_mul() seems to be not quite Dekker's algorithm as given in Muller.
It is slower.  Maybe it is actually Dekker's original algorithm:

% /*
%  * Compute a*b exactly, returning the exact result in a struct dd.  We assume
%  * that both a and b are normalized, so no underflow or overflow will occur.
%  * The current rounding mode must be round-to-nearest.
%  */
% static inline struct dd
% dd_mul(double a, double b)
% {
% 	static const double split = 0x1p27 + 1.0;
% 	struct dd ret;
% 	double ha, hb, la, lb, p, q;
% 
% 	p = a * split;
% 	ha = a - p;
% 	ha += p;
% 	la = a - ha;

The splitting part is Veldkamp's (?) algorithm.  I now use it for
efficiency, and had to work around compiler bugfeatures, and partially
work around the fudamental inefficiency of writing it like the above.

Double expressions may be evaluated in extra precision (and are on i386
if someone changes the rounding precision to FP_PE).  To try to get rid
of the extra precision, the above assigns the result of all expressions
to a variable.  C90 implicitly and C99 explicitly require assigments and
casts to discard extra precision.  But discarding extra precision is slow,
and in most code it just reduces accuracy, so gcc on i386 never did this.
This totally breaks the above algorithm.  (FreeBSD uses a default rounding
precision of FP_PD mainly to avoid such breakage for doubles, although not
for floats.)

The right way to fix this is to use double_t for most variables, so that
there is no extra precision.

@ 	static const double_t split = 0x1pNN + 1.0;
@ 	struct dd ret;
@ 	double_t ha, hb, la, lb, p, q;
@ 
@ 	p = a * split;
@ 	ha = a - p;
@ 	ha += p;
@ 	la = a - ha;

Unfortunately, then NN is hard to determine efficiently.  The rounding
precision must be FP_PD or FP_PE, else doubles won't work at all.  On
i386: when it is FP_PD, NN must be 27, the same as now; when it is FP_PE,
NN must be 38 (11 more because of 11 extra bits of precision).  Similarly
for float precision, except NN = 12 only works for the unusual rounding
precision FP_PS; it must be 41 with FP_PD and 52 with FP_PE.

This is too hard to configure, so I use an intermediate way:

@ 	static const double_t split = 0x1p53 + 1;
@ 	struct dd ret;
@ 	double_t ha, hb, la, lb, p, q;
@ 
@ 	p = (double)(a * split);
@ 	ha = (double)(a - p) + p;
@ 	la = a - ha;

The casts are like assignments, and would have no effect on i386 due
to the compiler bugfeatures if the variables had type double.  They
correctly have no effect if there is no extra precision (then double_t
== double).  On i386, double_t is long double, and the cast is now
not null.  This tricks the compiler into generating the correct code.
STRICT_ASSIGN() can also be used to do essentially the same thing,
but the above looks nicer and is more portable.

Style changes in the above:
- write 1 as 1, not as 1.0.  Let the compiler promote it to the type
   of the other operand.
- don't use an assignment for the intermediate step in (a - p + p).
   An assignment is neither necessary nor (with compiler bugfeatures)
   sufficient for discarding extra precision.

Muller's book is careful about making rounding explicit.  E.g., it
writes the splitting part as something like:

@ 	p = round(a * split);
@ 	ha = round(a - p);
@ 	ha = round(ha + p);
@ 	la = round(a - ha);

IIRC, it writes round() even when the result is exact, but that would
not be careful so hopefully it doesn't.  This could be done in C code
too, by casting every operation that must be rounded (to kill extra
precision), except with the compiler bugfeatures this doesn't actually
work.  This is hard to write and harder to read, so I prefer to use
expressions like a - p + p, without any intermediate variables or
casts, and depend on strict left to right evaluation and omit
parentheses too.  Using double_t makes this work, modulo other compiler
bugfeatures.

% 
% 	p = b * split;
% 	hb = b - p;
% 	hb += p;
% 	lb = b - hb;
% 
% 	p = ha * hb;

In Dekker's algorithm as given by Muller, p is a * b (rounded).  This
has the technical advantage that the hi term becomes available earlier.

% 	q = ha * lb + la * hb;
% 
% 	ret.hi = p + q;
% 	ret.lo = p - ret.hi + q + la * lb;

Dekker's algorithm is:

@ 	ret.hi = a * b;
@ 	ret.lo = ha * hb - ret.hi + ha * lb + la * hb + la * lb;

This looks shorter, but actually has the same number of operations (9).
I now see that it has the technical disadvantage that the lo term
becomes available later.  So dd_add() is probably better if you want
the full ret available fastest, while the version from Muller is better
if you want ret.hi available fastest.

Muller's organization is also easier to understand -- you calculate
the rounded product, then a correction term.  It is fairly clear that
(ha * hb - ret.hi) is exact.  Then more terms are added, from high to
low, presumably with enough cancelations to make them exact too.  I
wonder why the term equal to q isn't parenthesized?:

@ 	ret.lo = ha * hb - ret.hi + (q = ha * lb + la * hb) + la * lb;

Parenthesizing this term makes ret.lo available earlier, if there is
some otherwise-unused paralleism.

% 	return (ret);
% }


> If you don't want lengthy comments in the code, perhaps you could simply 
> include the following link in a comment:
> http://lists.freebsd.org/pipermail/freebsd-numerics/2012-August/000020.html

Link to the whole list :-).

> Let me say that in my proposed code for casinh, etc, that I do have essay 
> length comments.
> http://people.freebsd.org/~stephen/catrig.c
>
> I plan to respect your style (keep out very lengthy comments), but I hope 
> whomever commits my code will respect my style and keep the lengthy comments.

I respect style to a fault, but try to get you to change it before
committing much :-).

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20120809013542.D3915>