Date: Sun, 12 Aug 2012 23:02:15 -0000 From: Bruce Evans <brde@optusnet.com.au> To: Stephen Montgomery-Smith <stephen@missouri.edu> Cc: Diane Bruce <db@db.net>, Bruce Evans <brde@optusnet.com.au>, John Baldwin <jhb@freebsd.org>, David Chisnall <theraven@freebsd.org>, Bruce Evans <bde@freebsd.org>, Steve Kargl <sgk@troutmask.apl.washington.edu>, David Schultz <das@freebsd.org>, Peter Jeremy <peter@rulingia.com>, Warner Losh <imp@bsdimp.com> Subject: Re: Use of C99 extra long double math functions after r236148 Message-ID: <20120724100014.I934@besplex.bde.org> Resent-Message-ID: <20120812230208.GA20453@server.rulingia.com> In-Reply-To: <500D6182.8010003@missouri.edu> References: <5004A5C7.1040405@missouri.edu> <5004DEA9.1050001@missouri.edu> <20120717040118.GA86840@troutmask.apl.washington.edu> <20120717042125.GF66913@server.rulingia.com> <20120717043848.GB87001@troutmask.apl.washington.edu> <20120717225328.GA86902@server.rulingia.com> <20120717232740.GA95026@troutmask.apl.washington.edu> <20120718001337.GA87817@server.rulingia.com> <20120718123627.D1575@besplex.bde.org> <20120722121219.GC73662@server.rulingia.com> <20120722220031.GA7791@server.rulingia.com> <20120723141319.P1189@besplex.bde.org> <500CD98E.9080103@missouri.edu> <500D6182.8010003@missouri.edu>
next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, 23 Jul 2012, Stephen Montgomery-Smith wrote: > On 07/22/2012 11:56 PM, Stephen Montgomery-Smith wrote: >> This is the work I have done to produce casinh, casin, cacos and cacosh. >> The latter two took me a lot more time than I expected. It took me a >> lot of time to try to find the correct branches. > > Once I have cacos, cacosh turns out to be much easier than I thought: > > double complex > cacosh(double complex z) > { > complex double w; > > w = cacos(z); > if (signbit(cimag(w)) == 0) > return cpack(cimag(w),-creal(w)); > else > return cpack(-cimag(w),creal(w)); > } It's easy to fix all (?) the style and efficiency and exceptional case bugs in such a small example: - always use 'double complex', not 'complex double' (although this is backwards compared with the function names -- they have a c prefix and an an [fl] suffix -- it is what C99 uses) - always put spaces after commas in parameter lists - in new functions, always put silly parentheses around return values, to conform to KNF. This is done form most or all complex functions that have been committed (s_ccosh.c, etc). - in old functions, never put silly parentheses around return values, since this is not fdlibm style. (Similarly for some spaces.) - always use fdlibm classification methods, using bits. Never use signbit(). signbit() is currently only used by csqrt(), which sets a bad example. It might be a compiler builtin, but it is apparently not, so it is a slow function call (takes about half as long as a whole function for medium-weight functions like cosf()). This doesn't matter much here, but it is easier to practice using the fdlibm classification methods in a simple context. - consider using copysign() instead of the '-' operator. Now copysign() is the slow extern function for sign handling while '-' is a fast builtin. But it isn't clear that '-' is correct. It probably isn't, since s_ccosh.c uses copysign() a lot. I think the '-' operator works right on +-0, so s_ccosh.c is only using it to get the signs right for NaNs (the same as would happen using a simple formula). All the comples functions are required to have certain relection properties, and I like the reflection to apply to the sign bit of NaNs too. Otherwise the sign of NaN results is unpredicable. das wouldn't care about this of course. What actually happens for the '-' operator applied to a NaN is very MD. i387 has a negation operator (fchs). It and the i387 absolute value operator (fabs) are about the only operators that can change the sign of a NaN. gcc generates an fchs for 'return (-x)'. Thus it is expected that -x toggles the sign of a NaN on i386. However, gcc also does the invalid optimization of rewriting 'return (x + (-x))' to 'return (x - x)'. i387 also has a reverse subtraction operator which may cause problems. Implementing -x as (0 -x) would be invalid for +-0 too, and the reverse subtraction operator makes this bug more likely. Even addition is not commutative for NaNs on some arches where the NaN mixing rules for x+y are bad. (Of course x+y != y+x for all NaNs in FP, but the bad hardware makes it different in bits too.) fabs and fchs also fail to trigger or quieten signaling NaNs. This is consistent with some dubious optimizations in software. Even fdlibm fabs*() just clears the sign bit and doesn't try to trigger signaling NaNs. SSE doesn't have absolute value or negation operators for FP, and these seem to always be implemented by setting and toggling bits, so the behaviour is consistent. There have been the following developments for i387: - gcc-3.3 implements fabs(x) and -x using builtin bit toggling that doesn't use the i387, even with -mi386 -O0. - gcc-4.2 knows that direct hacking on FP like this is bad, or at least that it doesn't understand FP, so it uses the hardware. It now loads the values and uses hardware fabs and fchs on them. Now it is the hardware's fault for not triggering signaling NaNs. This accidentally fixes the case of floats and doubles, since signaling NaNs are triggered by the conversion on loading them. The optimization might go the other way, and change most of the copysign()s in s_ccosh.c to negations. When I worked on ccosh(), I didn't want to know the details and first just sprinkled the copysign() calls until the results were consistent with alternative implementations. Later I checked that some of them were consistent with reflection principles. Now even more than you want to know about NaNs: yesterday I checked whether soft-float on sparc64 had fixed some bugs with NaNs, so that I could delete my code that does extra work to not see these bugs. Unfortunately, the largest bugs are still there: signaling NaNs are never triggered on sparc64 with the default of -mno-hard-quad-float. The emulation just doesn't emulate, so it fails to trigger them in cases not like fabs() where everyone except the buggy emulation agrees that they should trigger. The non-triggering is complete: they aren't quietened, and they don't generate FE_INVALID. Compiling with -mhard-quad-float fixes this. But no one uses this, since it is about 4 times slower. Even without this, long doubles are about 1000 times slower on sparc64 than on x86 (only 300 times slower in cycle counts, but x86 has faster CPU clocks). Bruce
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20120724100014.I934>