From owner-freebsd-numerics@FreeBSD.ORG  Sun Aug 12 23:02:15 2012
Return-Path: <owner-freebsd-numerics@FreeBSD.ORG>
Delivered-To: freebsd-numerics@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 88B20106566C
	for <freebsd-numerics@freebsd.org>;
	Sun, 12 Aug 2012 23:02:15 +0000 (UTC)
	(envelope-from peter@rulingia.com)
Received: from vps.rulingia.com (host-122-100-2-194.octopus.com.au
	[122.100.2.194])
	by mx1.freebsd.org (Postfix) with ESMTP id 0DFF58FC12
	for <freebsd-numerics@freebsd.org>;
	Sun, 12 Aug 2012 23:02:14 +0000 (UTC)
Received: from server.rulingia.com
	(c220-239-249-137.belrs5.nsw.optusnet.com.au [220.239.249.137])
	by vps.rulingia.com (8.14.5/8.14.5) with ESMTP id q7CN2Ejo075601
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK)
	for <freebsd-numerics@freebsd.org>;
	Mon, 13 Aug 2012 09:02:15 +1000 (EST)
	(envelope-from peter@rulingia.com)
X-Bogosity: Ham, spamicity=0.000000
Received: from server.rulingia.com (localhost.rulingia.com [127.0.0.1])
	by server.rulingia.com (8.14.5/8.14.5) with ESMTP id q7CN28Xn021111
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO)
	for <freebsd-numerics@freebsd.org>;
	Mon, 13 Aug 2012 09:02:08 +1000 (EST)
	(envelope-from peter@server.rulingia.com)
Received: (from peter@localhost)
	by server.rulingia.com (8.14.5/8.14.5/Submit) id q7CN284p021110
	for freebsd-numerics@freebsd.org; Mon, 13 Aug 2012 09:02:08 +1000 (EST)
	(envelope-from peter)
Resent-From: Peter Jeremy <peter@rulingia.com>
Resent-Date: Mon, 13 Aug 2012 09:02:08 +1000
Resent-Message-ID: <20120812230208.GA20453@server.rulingia.com>
Resent-To: freebsd-numerics@freebsd.org
Received: from vps.rulingia.com (host-122-100-2-194.octopus.com.au
	[122.100.2.194])
	by server.rulingia.com (8.14.5/8.14.5) with ESMTP id q6O1QuWR064160
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK)
	for <peter@home.rulingia.com>; Tue, 24 Jul 2012 11:26:56 +1000 (EST)
	(envelope-from brde@optusnet.com.au)
Received: from mail30.syd.optusnet.com.au (mail30.syd.optusnet.com.au
	[211.29.133.193])
	by vps.rulingia.com (8.14.5/8.14.5) with ESMTP id q6O1QtZW017601
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO)
	for <peter@rulingia.com>; Tue, 24 Jul 2012 11:26:56 +1000 (EST)
	(envelope-from brde@optusnet.com.au)
Received: from c122-106-171-246.carlnfd1.nsw.optusnet.com.au
	(c122-106-171-246.carlnfd1.nsw.optusnet.com.au [122.106.171.246])
	by mail30.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
	q6O1QWLP003837
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Tue, 24 Jul 2012 11:26:33 +1000
From: Bruce Evans <brde@optusnet.com.au>
Mail-Followup-To: freebsd-numerics@freebsd.org
X-X-Sender: bde@besplex.bde.org
To: Stephen Montgomery-Smith <stephen@missouri.edu>
In-Reply-To: <500D6182.8010003@missouri.edu>
Message-ID: <20120724100014.I934@besplex.bde.org>
References: <5004A5C7.1040405@missouri.edu> <5004DEA9.1050001@missouri.edu>
	<20120717040118.GA86840@troutmask.apl.washington.edu>
	<20120717042125.GF66913@server.rulingia.com>
	<20120717043848.GB87001@troutmask.apl.washington.edu>
	<20120717225328.GA86902@server.rulingia.com>
	<20120717232740.GA95026@troutmask.apl.washington.edu>
	<20120718001337.GA87817@server.rulingia.com>
	<20120718123627.D1575@besplex.bde.org>
	<20120722121219.GC73662@server.rulingia.com>
	<20120722220031.GA7791@server.rulingia.com>
	<20120723141319.P1189@besplex.bde.org> <500CD98E.9080103@missouri.edu>
	<500D6182.8010003@missouri.edu>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Mailman-Approved-At: Sun, 12 Aug 2012 23:55:59 +0000
Cc: Diane Bruce <db@db.net>, Bruce Evans <brde@optusnet.com.au>,
	John Baldwin <jhb@freebsd.org>, David Chisnall <theraven@freebsd.org>,
	Bruce Evans <bde@freebsd.org>,
	Steve Kargl <sgk@troutmask.apl.washington.edu>,
	David Schultz <das@freebsd.org>,
	Peter Jeremy <peter@rulingia.com>, Warner Losh <imp@bsdimp.com>
Subject: Re: Use of C99 extra long double math functions after r236148
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
	<freebsd-numerics.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
	<mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
	<mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
Date: Sun, 12 Aug 2012 23:02:15 -0000
X-Original-Date: Tue, 24 Jul 2012 11:26:32 +1000 (EST)
X-List-Received-Date: Sun, 12 Aug 2012 23:02:15 -0000

On Mon, 23 Jul 2012, Stephen Montgomery-Smith wrote:

> On 07/22/2012 11:56 PM, Stephen Montgomery-Smith wrote:
>> This is the work I have done to produce casinh, casin, cacos and cacosh.
>>   The latter two took me a lot more time than I expected.  It took me a
>> lot of time to try to find the correct branches.
>
> Once I have cacos, cacosh turns out to be much easier than I thought:
>
> double complex
> cacosh(double complex z)
> {
> 	complex double w;
>
> 	w = cacos(z);
> 	if (signbit(cimag(w)) == 0)
> 		return cpack(cimag(w),-creal(w));
> 	else
> 		return cpack(-cimag(w),creal(w));
> }

It's easy to fix all (?) the style and efficiency and exceptional case bugs
in such a small example:

- always use 'double complex', not 'complex double' (although this is
   backwards compared with the function names -- they have a c prefix and
   an an [fl] suffix -- it is what C99 uses)
- always put spaces after commas in parameter lists
- in new functions, always put silly parentheses around return values, to
   conform to KNF.  This is done form most or all complex functions that
   have been committed (s_ccosh.c, etc).
- in old functions, never put silly parentheses around return values, since
   this is not fdlibm style.  (Similarly for some spaces.)

- always use fdlibm classification methods, using bits.  Never use
   signbit().  signbit() is currently only used by csqrt(), which sets a
   bad example.  It might be a compiler builtin, but it is apparently
   not, so it is a slow function call (takes about half as long as a
   whole function for medium-weight functions like cosf()).  This doesn't
   matter much here, but it is easier to practice using the fdlibm
   classification methods in a simple context.

- consider using copysign() instead of the '-' operator.  Now copysign()
   is the slow extern function for sign handling while '-' is a fast builtin.
   But it isn't clear that '-' is correct.  It probably isn't, since
   s_ccosh.c uses copysign() a lot.  I think the '-' operator works right
   on +-0, so s_ccosh.c is only using it to get the signs right for NaNs
   (the same as would happen using a simple formula).  All the comples
   functions are required to have certain relection properties, and I
   like the reflection to apply to the sign bit of NaNs too.  Otherwise
   the sign of NaN results is unpredicable.  das wouldn't care about this
   of course.

   What actually happens for the '-' operator applied to a NaN is very MD.
   i387 has a negation operator (fchs).  It and the i387 absolute value
   operator (fabs) are about the only operators that can change the sign
   of a NaN.  gcc generates an fchs for 'return (-x)'.  Thus it is expected
   that -x toggles the sign of a NaN on i386.  However, gcc also
   does the invalid optimization of rewriting 'return (x + (-x))' to
   'return (x - x)'.  i387 also has a reverse subtraction operator which
   may cause problems.  Implementing -x as (0 -x) would be invalid for
   +-0 too, and the reverse subtraction operator makes this bug more
   likely.  Even addition is not commutative for NaNs on some arches
   where the NaN mixing rules for x+y are bad.  (Of course x+y != y+x
   for all NaNs in FP, but the bad hardware makes it different in bits
   too.)

   fabs and fchs also fail to trigger or quieten signaling NaNs.  This is
   consistent with some dubious optimizations in software.  Even fdlibm
   fabs*() just clears the sign bit and doesn't try to trigger signaling
   NaNs.  SSE doesn't have absolute value or negation operators for FP,
   and these seem to always be implemented by setting and toggling bits,
   so the behaviour is consistent.  There have been the following
   developments for i387:
   - gcc-3.3 implements fabs(x) and -x using builtin bit toggling that
     doesn't use the i387, even with -mi386 -O0.
   - gcc-4.2 knows that direct hacking on FP like this is bad, or at
     least that it doesn't understand FP, so it uses the hardware.  It
     now loads the values and uses hardware fabs and fchs on them.  Now
     it is the hardware's fault for not triggering signaling NaNs.  This
     accidentally fixes the case of floats and doubles, since signaling
     NaNs are triggered by the conversion on loading them.

   The optimization might go the other way, and change most of the
   copysign()s in s_ccosh.c to negations.  When I worked on ccosh(),
   I didn't want to know the details and first just sprinkled the
   copysign() calls until the results were consistent with alternative
   implementations.  Later I checked that some of them were consistent
   with reflection principles.

Now even more than you want to know about NaNs: yesterday I checked
whether soft-float on sparc64 had fixed some bugs with NaNs, so that
I could delete my code that does extra work to not see these bugs.
Unfortunately, the largest bugs are still there: signaling NaNs are
never triggered on sparc64 with the default of -mno-hard-quad-float.
The emulation just doesn't emulate, so it fails to trigger them in
cases not like fabs() where everyone except the buggy emulation agrees
that they should trigger.  The non-triggering is complete: they aren't
quietened, and they don't generate FE_INVALID.  Compiling with
-mhard-quad-float fixes this.  But no one uses this, since it is about
4 times slower.  Even without this, long doubles are about 1000 times
slower on sparc64 than on x86 (only 300 times slower in cycle counts,
but x86 has faster CPU clocks).

Bruce