From owner-freebsd-numerics@FreeBSD.ORG Sun Aug 12 23:02:15 2012 Return-Path: Delivered-To: freebsd-numerics@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 88B20106566C for ; Sun, 12 Aug 2012 23:02:15 +0000 (UTC) (envelope-from peter@rulingia.com) Received: from vps.rulingia.com (host-122-100-2-194.octopus.com.au [122.100.2.194]) by mx1.freebsd.org (Postfix) with ESMTP id 0DFF58FC12 for ; Sun, 12 Aug 2012 23:02:14 +0000 (UTC) Received: from server.rulingia.com (c220-239-249-137.belrs5.nsw.optusnet.com.au [220.239.249.137]) by vps.rulingia.com (8.14.5/8.14.5) with ESMTP id q7CN2Ejo075601 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK) for ; Mon, 13 Aug 2012 09:02:15 +1000 (EST) (envelope-from peter@rulingia.com) X-Bogosity: Ham, spamicity=0.000000 Received: from server.rulingia.com (localhost.rulingia.com [127.0.0.1]) by server.rulingia.com (8.14.5/8.14.5) with ESMTP id q7CN28Xn021111 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Mon, 13 Aug 2012 09:02:08 +1000 (EST) (envelope-from peter@server.rulingia.com) Received: (from peter@localhost) by server.rulingia.com (8.14.5/8.14.5/Submit) id q7CN284p021110 for freebsd-numerics@freebsd.org; Mon, 13 Aug 2012 09:02:08 +1000 (EST) (envelope-from peter) Resent-From: Peter Jeremy Resent-Date: Mon, 13 Aug 2012 09:02:08 +1000 Resent-Message-ID: <20120812230208.GA20453@server.rulingia.com> Resent-To: freebsd-numerics@freebsd.org Received: from vps.rulingia.com (host-122-100-2-194.octopus.com.au [122.100.2.194]) by server.rulingia.com (8.14.5/8.14.5) with ESMTP id q6O1QuWR064160 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK) for ; Tue, 24 Jul 2012 11:26:56 +1000 (EST) (envelope-from brde@optusnet.com.au) Received: from mail30.syd.optusnet.com.au (mail30.syd.optusnet.com.au [211.29.133.193]) by vps.rulingia.com (8.14.5/8.14.5) with ESMTP id q6O1QtZW017601 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Tue, 24 Jul 2012 11:26:56 +1000 (EST) (envelope-from brde@optusnet.com.au) Received: from c122-106-171-246.carlnfd1.nsw.optusnet.com.au (c122-106-171-246.carlnfd1.nsw.optusnet.com.au [122.106.171.246]) by mail30.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id q6O1QWLP003837 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Tue, 24 Jul 2012 11:26:33 +1000 From: Bruce Evans Mail-Followup-To: freebsd-numerics@freebsd.org X-X-Sender: bde@besplex.bde.org To: Stephen Montgomery-Smith In-Reply-To: <500D6182.8010003@missouri.edu> Message-ID: <20120724100014.I934@besplex.bde.org> References: <5004A5C7.1040405@missouri.edu> <5004DEA9.1050001@missouri.edu> <20120717040118.GA86840@troutmask.apl.washington.edu> <20120717042125.GF66913@server.rulingia.com> <20120717043848.GB87001@troutmask.apl.washington.edu> <20120717225328.GA86902@server.rulingia.com> <20120717232740.GA95026@troutmask.apl.washington.edu> <20120718001337.GA87817@server.rulingia.com> <20120718123627.D1575@besplex.bde.org> <20120722121219.GC73662@server.rulingia.com> <20120722220031.GA7791@server.rulingia.com> <20120723141319.P1189@besplex.bde.org> <500CD98E.9080103@missouri.edu> <500D6182.8010003@missouri.edu> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Mailman-Approved-At: Sun, 12 Aug 2012 23:55:59 +0000 Cc: Diane Bruce , Bruce Evans , John Baldwin , David Chisnall , Bruce Evans , Steve Kargl , David Schultz , Peter Jeremy , Warner Losh Subject: Re: Use of C99 extra long double math functions after r236148 X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Date: Sun, 12 Aug 2012 23:02:15 -0000 X-Original-Date: Tue, 24 Jul 2012 11:26:32 +1000 (EST) X-List-Received-Date: Sun, 12 Aug 2012 23:02:15 -0000 On Mon, 23 Jul 2012, Stephen Montgomery-Smith wrote: > On 07/22/2012 11:56 PM, Stephen Montgomery-Smith wrote: >> This is the work I have done to produce casinh, casin, cacos and cacosh. >> The latter two took me a lot more time than I expected. It took me a >> lot of time to try to find the correct branches. > > Once I have cacos, cacosh turns out to be much easier than I thought: > > double complex > cacosh(double complex z) > { > complex double w; > > w = cacos(z); > if (signbit(cimag(w)) == 0) > return cpack(cimag(w),-creal(w)); > else > return cpack(-cimag(w),creal(w)); > } It's easy to fix all (?) the style and efficiency and exceptional case bugs in such a small example: - always use 'double complex', not 'complex double' (although this is backwards compared with the function names -- they have a c prefix and an an [fl] suffix -- it is what C99 uses) - always put spaces after commas in parameter lists - in new functions, always put silly parentheses around return values, to conform to KNF. This is done form most or all complex functions that have been committed (s_ccosh.c, etc). - in old functions, never put silly parentheses around return values, since this is not fdlibm style. (Similarly for some spaces.) - always use fdlibm classification methods, using bits. Never use signbit(). signbit() is currently only used by csqrt(), which sets a bad example. It might be a compiler builtin, but it is apparently not, so it is a slow function call (takes about half as long as a whole function for medium-weight functions like cosf()). This doesn't matter much here, but it is easier to practice using the fdlibm classification methods in a simple context. - consider using copysign() instead of the '-' operator. Now copysign() is the slow extern function for sign handling while '-' is a fast builtin. But it isn't clear that '-' is correct. It probably isn't, since s_ccosh.c uses copysign() a lot. I think the '-' operator works right on +-0, so s_ccosh.c is only using it to get the signs right for NaNs (the same as would happen using a simple formula). All the comples functions are required to have certain relection properties, and I like the reflection to apply to the sign bit of NaNs too. Otherwise the sign of NaN results is unpredicable. das wouldn't care about this of course. What actually happens for the '-' operator applied to a NaN is very MD. i387 has a negation operator (fchs). It and the i387 absolute value operator (fabs) are about the only operators that can change the sign of a NaN. gcc generates an fchs for 'return (-x)'. Thus it is expected that -x toggles the sign of a NaN on i386. However, gcc also does the invalid optimization of rewriting 'return (x + (-x))' to 'return (x - x)'. i387 also has a reverse subtraction operator which may cause problems. Implementing -x as (0 -x) would be invalid for +-0 too, and the reverse subtraction operator makes this bug more likely. Even addition is not commutative for NaNs on some arches where the NaN mixing rules for x+y are bad. (Of course x+y != y+x for all NaNs in FP, but the bad hardware makes it different in bits too.) fabs and fchs also fail to trigger or quieten signaling NaNs. This is consistent with some dubious optimizations in software. Even fdlibm fabs*() just clears the sign bit and doesn't try to trigger signaling NaNs. SSE doesn't have absolute value or negation operators for FP, and these seem to always be implemented by setting and toggling bits, so the behaviour is consistent. There have been the following developments for i387: - gcc-3.3 implements fabs(x) and -x using builtin bit toggling that doesn't use the i387, even with -mi386 -O0. - gcc-4.2 knows that direct hacking on FP like this is bad, or at least that it doesn't understand FP, so it uses the hardware. It now loads the values and uses hardware fabs and fchs on them. Now it is the hardware's fault for not triggering signaling NaNs. This accidentally fixes the case of floats and doubles, since signaling NaNs are triggered by the conversion on loading them. The optimization might go the other way, and change most of the copysign()s in s_ccosh.c to negations. When I worked on ccosh(), I didn't want to know the details and first just sprinkled the copysign() calls until the results were consistent with alternative implementations. Later I checked that some of them were consistent with reflection principles. Now even more than you want to know about NaNs: yesterday I checked whether soft-float on sparc64 had fixed some bugs with NaNs, so that I could delete my code that does extra work to not see these bugs. Unfortunately, the largest bugs are still there: signaling NaNs are never triggered on sparc64 with the default of -mno-hard-quad-float. The emulation just doesn't emulate, so it fails to trigger them in cases not like fabs() where everyone except the buggy emulation agrees that they should trigger. The non-triggering is complete: they aren't quietened, and they don't generate FE_INVALID. Compiling with -mhard-quad-float fixes this. But no one uses this, since it is about 4 times slower. Even without this, long doubles are about 1000 times slower on sparc64 than on x86 (only 300 times slower in cycle counts, but x86 has faster CPU clocks). Bruce