Date: Thu, 9 Aug 2012 02:50:34 +1000 (EST) From: Bruce Evans <brde@optusnet.com.au> To: Stephen Montgomery-Smith <stephen@missouri.edu> Cc: freebsd-numerics@freebsd.org, Bruce Evans <brde@optusnet.com.au> Subject: Re: Complex arg-trig functions Message-ID: <20120809013542.D3915@besplex.bde.org> In-Reply-To: <501EC015.3000808@missouri.edu> References: <5017111E.6060003@missouri.edu> <501C361D.4010807@missouri.edu> <20120804165555.X1231@besplex.bde.org> <501D51D7.1020101@missouri.edu> <20120805030609.R3101@besplex.bde.org> <501D9C36.2040207@missouri.edu> <20120805175106.X3574@besplex.bde.org> <501EC015.3000808@missouri.edu>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, 5 Aug 2012, Stephen Montgomery-Smith wrote: > On 08/05/2012 05:05 AM, Bruce Evans wrote: >> On Sat, 4 Aug 2012, Stephen Montgomery-Smith wrote: > >>> >>> What is the purpose of the macros norm and spadd? >> >> [Answer snipped for brevity] > > Thank you. Your answers are very helpful. I had no idea what norm and spadd > did, and now just even glancing through your answers tells me exactly what > they do (even if I haven't yet fully understood why they work). I googled Dekker's algorithm and found norm() nearby in the book by Muller et al "Handbook of Floating Point...". They call it 2sumF (fast 2sum). It is Knuth's Theorem B They call Knuth's Theorem B 2sum. These are even worse names than norm() or das's dd_add(), especially for C :-). Dekker had a lot to do with these algorithms too. Muller discusses refinements that aren't in Knuth. In clog(), I switched to the full Dekker algorithm for multiplication and something more like a Dekker algorithm for addition, and now don't use spadd() or so many norm()s. My norm() is normally algorithm 2sumF, but in clog() I made it have the effect of 2sum (don't depend on the order of the args). I changed it back to 2sumF and named this normF(), and made norm() == 2sumF (do more arithmetic operations to avoid fabs()'s and branches for swapping the args), and used normF() where possible and otherwise norm(). Just one of each. das's dd_mul() seems to be not quite Dekker's algorithm as given in Muller. It is slower. Maybe it is actually Dekker's original algorithm: % /* % * Compute a*b exactly, returning the exact result in a struct dd. We assume % * that both a and b are normalized, so no underflow or overflow will occur. % * The current rounding mode must be round-to-nearest. % */ % static inline struct dd % dd_mul(double a, double b) % { % static const double split = 0x1p27 + 1.0; % struct dd ret; % double ha, hb, la, lb, p, q; % % p = a * split; % ha = a - p; % ha += p; % la = a - ha; The splitting part is Veldkamp's (?) algorithm. I now use it for efficiency, and had to work around compiler bugfeatures, and partially work around the fudamental inefficiency of writing it like the above. Double expressions may be evaluated in extra precision (and are on i386 if someone changes the rounding precision to FP_PE). To try to get rid of the extra precision, the above assigns the result of all expressions to a variable. C90 implicitly and C99 explicitly require assigments and casts to discard extra precision. But discarding extra precision is slow, and in most code it just reduces accuracy, so gcc on i386 never did this. This totally breaks the above algorithm. (FreeBSD uses a default rounding precision of FP_PD mainly to avoid such breakage for doubles, although not for floats.) The right way to fix this is to use double_t for most variables, so that there is no extra precision. @ static const double_t split = 0x1pNN + 1.0; @ struct dd ret; @ double_t ha, hb, la, lb, p, q; @ @ p = a * split; @ ha = a - p; @ ha += p; @ la = a - ha; Unfortunately, then NN is hard to determine efficiently. The rounding precision must be FP_PD or FP_PE, else doubles won't work at all. On i386: when it is FP_PD, NN must be 27, the same as now; when it is FP_PE, NN must be 38 (11 more because of 11 extra bits of precision). Similarly for float precision, except NN = 12 only works for the unusual rounding precision FP_PS; it must be 41 with FP_PD and 52 with FP_PE. This is too hard to configure, so I use an intermediate way: @ static const double_t split = 0x1p53 + 1; @ struct dd ret; @ double_t ha, hb, la, lb, p, q; @ @ p = (double)(a * split); @ ha = (double)(a - p) + p; @ la = a - ha; The casts are like assignments, and would have no effect on i386 due to the compiler bugfeatures if the variables had type double. They correctly have no effect if there is no extra precision (then double_t == double). On i386, double_t is long double, and the cast is now not null. This tricks the compiler into generating the correct code. STRICT_ASSIGN() can also be used to do essentially the same thing, but the above looks nicer and is more portable. Style changes in the above: - write 1 as 1, not as 1.0. Let the compiler promote it to the type of the other operand. - don't use an assignment for the intermediate step in (a - p + p). An assignment is neither necessary nor (with compiler bugfeatures) sufficient for discarding extra precision. Muller's book is careful about making rounding explicit. E.g., it writes the splitting part as something like: @ p = round(a * split); @ ha = round(a - p); @ ha = round(ha + p); @ la = round(a - ha); IIRC, it writes round() even when the result is exact, but that would not be careful so hopefully it doesn't. This could be done in C code too, by casting every operation that must be rounded (to kill extra precision), except with the compiler bugfeatures this doesn't actually work. This is hard to write and harder to read, so I prefer to use expressions like a - p + p, without any intermediate variables or casts, and depend on strict left to right evaluation and omit parentheses too. Using double_t makes this work, modulo other compiler bugfeatures. % % p = b * split; % hb = b - p; % hb += p; % lb = b - hb; % % p = ha * hb; In Dekker's algorithm as given by Muller, p is a * b (rounded). This has the technical advantage that the hi term becomes available earlier. % q = ha * lb + la * hb; % % ret.hi = p + q; % ret.lo = p - ret.hi + q + la * lb; Dekker's algorithm is: @ ret.hi = a * b; @ ret.lo = ha * hb - ret.hi + ha * lb + la * hb + la * lb; This looks shorter, but actually has the same number of operations (9). I now see that it has the technical disadvantage that the lo term becomes available later. So dd_add() is probably better if you want the full ret available fastest, while the version from Muller is better if you want ret.hi available fastest. Muller's organization is also easier to understand -- you calculate the rounded product, then a correction term. It is fairly clear that (ha * hb - ret.hi) is exact. Then more terms are added, from high to low, presumably with enough cancelations to make them exact too. I wonder why the term equal to q isn't parenthesized?: @ ret.lo = ha * hb - ret.hi + (q = ha * lb + la * hb) + la * lb; Parenthesizing this term makes ret.lo available earlier, if there is some otherwise-unused paralleism. % return (ret); % } > If you don't want lengthy comments in the code, perhaps you could simply > include the following link in a comment: > http://lists.freebsd.org/pipermail/freebsd-numerics/2012-August/000020.html Link to the whole list :-). > Let me say that in my proposed code for casinh, etc, that I do have essay > length comments. > http://people.freebsd.org/~stephen/catrig.c > > I plan to respect your style (keep out very lengthy comments), but I hope > whomever commits my code will respect my style and keep the lengthy comments. I respect style to a fault, but try to get you to change it before committing much :-). Bruce
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20120809013542.D3915>