Date: Sun, 12 Aug 2012 23:00:29 -0000 From: Bruce Evans <brde@optusnet.com.au> To: Peter Jeremy <peter@rulingia.com> Cc: Diane Bruce <db@db.net>, Bruce Evans <brde@optusnet.com.au>, John Baldwin <jhb@freebsd.org>, David Chisnall <theraven@freebsd.org>, Stephen Montgomery-Smith <stephen@missouri.edu>, Bruce Evans <bde@freebsd.org>, Steve Kargl <sgk@troutmask.apl.washington.edu>, David Schultz <das@freebsd.org>, Warner Losh <imp@bsdimp.com> Subject: Re: Use of C99 extra long double math functions after r236148 Message-ID: <20120721133600.R877@besplex.bde.org> Resent-Message-ID: <20120812230022.GS20453@server.rulingia.com> In-Reply-To: <20120721003103.GA73662@server.rulingia.com> References: <20120717084457.U3890@besplex.bde.org> <5004A5C7.1040405@missouri.edu> <5004DEA9.1050001@missouri.edu> <20120717040118.GA86840@troutmask.apl.washington.edu> <20120717042125.GF66913@server.rulingia.com> <20120717043848.GB87001@troutmask.apl.washington.edu> <20120717225328.GA86902@server.rulingia.com> <20120717232740.GA95026@troutmask.apl.washington.edu> <20120718001337.GA87817@server.rulingia.com> <20120718123627.D1575@besplex.bde.org> <20120721003103.GA73662@server.rulingia.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, 21 Jul 2012, Peter Jeremy wrote: > Hi Bruce or das@ or Steve, > > I have a question on the following code from s_ccosh.c: > % /* > % * cosh(NaN + I NaN) = d(NaN) + I d(NaN). > % * > % * cosh(NaN +- I Inf) = d(NaN) + I d(NaN). > % * Optionally raises the invalid floating-point exception. > % * Choice = raise. > % * > % * cosh(NaN + I y) = d(NaN) + I d(NaN). > % * Optionally raises the invalid floating-point exception for finite > % * nonzero y. Choice = don't raise (except for signaling NaNs). > % */ > % return (cpack((x * x) * (y - y), (x + x) * (y - y))); > > x is always NaN so the real part presumably just needs to be quietened > before returning - ie (x + x) would seem to be sufficient. Why does > the code use ((x * x) * (y - y))? The more complex expression is to handle more cases with the same code (mainly so as to avoid extra tests and branches for the classification. For exceptional cases, we don't care much if the expressions in the return statements are slower because they are more complex so as to be general, or if they are simpler and faster but larger because the classification is finer). We could try to avoid extra tests and branches for the usual cases by first doing a coarse classification and then a fine classification of the exceptional sub-cases. This would probably be efficient (if the CPU has reasonable branch prediction), but it will give larger code. Such expressions are complicated mainly to get the signs and NaN mixing right in all cases. Quoting this line again: > % return (cpack((x * x) * (y - y), (x + x) * (y - y))); > y has no restriction on its value so an arithmetic operation with x is > a good way to convert it to a NaN. Wouldn't (y + x) be sufficient? Remember that cos[h]() is even and sin[h]() is odd. (x * x) is to get an even sign for cosh[h]() and (x + x) is to get an odd sign for sin[h](). After undoing this magic, the expression becomes: > % return (cpack(cosh(x) * (y - y), sinh(x) * (y - y))); and it remains to explain the (y - y) terms. (It would probably work to use this expression. cosh() and sinh() should get the signs and NaNs right. We optimize away these calls because we can. This depends on knowing that x is Inf or NaN. Oops, here x is only NaN, so there is more to explain: - in other return statements, x can be either Inf or NaN. We copy the magic expressions for simplicity. - signs are not very important for NaNs, but I like to make them consistent so that my test programs don't spew errors for unimportant inconsistencies (it is easier to make the inconsistencies not occur than to automatically decide if they are related to bugs. das@ doesn't like spending any time on this detail). - using the magic expressions gives the same behaviour for NaNs that calling the functions would. For example, cosh(x) naturally wants to square x. It may or may not do this explicitly for NaNs, but it should for the same reasons that we should (more so), and in fact it does: it returns x*x for both NaNs and Infs. This gets the sign right for Infs, and quietens NaNs, and gives a sign for the NaN result in the most natural way (whatever the hardware does for x*x) - Here we have the option of quietening NaNs using some other expression, perhaps x+y (since we have another arg), or x+x (since we don't care about the sign of NaNs(. But this might give gratuitously different results, depending on what the hardware does. - x86 hardware mostly doesn't touch the sign of NaNs when they are operated on. The main exceptions are than fabs() clears the sign bit and negation may toggle the sign bit, depending on how negation is implemented by the compiler (i387 fchs toggles the sign, but compilers a subtraction operation which doesn't). SSE gives annoying differences for NaNs, but IIRC these don't involve the sign bit for non-mixing operations. - the quiet bit can give annoying differences. NaNs could be mixed using the expression x+y (if you prefer signs preserved) or x*y (if you prefer signs multiplied). The result should be some functions of the inputs. IIRC, IEEE7xx specifies this, but doesn't specify the details. Most hardware does something like looking at the bits of the 2 NaNs and returning the bits of the "largest" one. The main differences are whether the sign and the quiet bits to be included in the decision, and if so, how they affect the ordering. There are annoying differences from them sometimes affecting the ordering, depending on the hardware and the operand size. The above operates first purely on each of x and y, in order to quieten them before mixing: x*y might use the quiet bit in the ordering decision (x*x)*(y-y) kills the quiet bit before mixing, so that it cannot affect the ordering descision Again, the second expression is what happens naturally if you write cosh(x)*sinh(y). ) Back to explaining the (y - y) terms: > % return (cpack(cosh(x) * (y - y), sinh(x) * (y - y))); These are to optimize sin(y) and cos(y). sin() is odd and cos() is even, but parity considerations no longer apply for y = +-Inf because the value is now NaN, so its sign is meaningless. The sign of this NaN is also unspecified, and we take advantage of this by letting it be whatever the hardware gives for (y - y). We also want to be compatible with what the function call would do, and fdlibm's sin() and cos() in fact do exactly this, for the usual technical reasons. Now y can be NaN, +-Inf or finite, and (y - y) handles all cases. In sin() and cos(), this expression handles the NaN and +-Inf cases, so we are consistent. > My understanding is that: > - Addition is generally faster than multiplication Apart from the technical (parity) reasons, we don't really care about efficiency in these exceptional cases, though we are doing perhaps excessive optimizations to avoid the function calls. Another organization of the function might start by evaluating the 4 sub-functions in all cases, so that the results can be used in all cases later. IIRC, this is not done mainly because it isn't clear that it doesn't give spurious exceptions. It also gives efficiency in a natural way for special but non-exceptional cases like pure real and pure imaginary. > - Signs are irrelevant for NaN so merging the sign of x doesn't matter. > - NaN + NaN returns the (quietened?) left-hand NaN See above. > - Inf + NaN returns the (quietened?) right-hand NaN I forget if IEEE7xx requires thus, but some C functions near this one are required to treat a NaN as an indeterminate value which always combines with Inf to produce Inf. The sign of a NaN could reasonably matter here: +Inf + +NaN = +Inf since the indeterminate value of +NaN cannot be -Inf +Inf + -NaN = NaN since the indeterminate value of -NaN can be -Inf > - finite + NaN returns the (quietened?) right-hand NaN > > Also, whilst things like ((x + x) * (y - y)) are reasonably efficient Another point is that using consistent expressions allows common code for the common parts (like x*x) if that is a good (space) optimization. > on x86, few (if any) RISC architectures support exceptional conditions > in hardware. My understanding is that SPARC would trap back into the > userland handler (lib/libc/sparc64/fpu) on each operation unless both > arguments and the result are normalised numbers. Explicitly fiddling > with the FPU state would seem faster than multiple traps. Here it is more the size of the magic expressions that gives slowness. Almost all of them either start with a NaN or create one. Even on x86, there is annoying slowness for overflow and underflow in a MD way. Exceptional cases for exp*() run much slower on i386(core2) than on i386(athlon-any(?)) or amd64(core2) or amd64(athlon), because instructions (if not their results) that overflow and underflow are handled slowly on core2(i387) but at full speed on all the other combinations including core2(SSE). Bruce
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20120721133600.R877>