From owner-freebsd-numerics@freebsd.org  Wed May 17 23:26:09 2017
Return-Path: <owner-freebsd-numerics@freebsd.org>
Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id B6EC1D71CBA
 for <freebsd-numerics@mailman.ysv.freebsd.org>;
 Wed, 17 May 2017 23:26:09 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from mail104.syd.optusnet.com.au (mail104.syd.optusnet.com.au
 [211.29.132.246])
 by mx1.freebsd.org (Postfix) with ESMTP id 6620514CB
 for <freebsd-numerics@freebsd.org>; Wed, 17 May 2017 23:26:08 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from besplex.bde.org (c122-106-153-191.carlnfd1.nsw.optusnet.com.au
 [122.106.153.191])
 by mail104.syd.optusnet.com.au (Postfix) with ESMTPS id E87F8429AEB;
 Thu, 18 May 2017 09:25:59 +1000 (AEST)
Date: Thu, 18 May 2017 09:25:58 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Steve Kargl <sgk@troutmask.apl.washington.edu>
cc: freebsd-numerics@freebsd.org
Subject: Re: Implementation of half-cycle trignometric functions
In-Reply-To: <20170517180927.GA54431@troutmask.apl.washington.edu>
Message-ID: <20170518072636.E951@besplex.bde.org>
References: <20170409220809.GA25076@troutmask.apl.washington.edu>
 <20170427231411.GA11346@troutmask.apl.washington.edu>
 <20170428175851.A1386@besplex.bde.org>
 <20170516224618.GA40855@troutmask.apl.washington.edu>
 <20170517094848.A52247@besplex.bde.org>
 <20170517180927.GA54431@troutmask.apl.washington.edu>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Optus-CM-Score: 0
X-Optus-CM-Analysis: v=2.2 cv=WvBbCZXv c=1 sm=1 tr=0
 a=Tj3pCpwHnMupdyZSltBt7Q==:117 a=Tj3pCpwHnMupdyZSltBt7Q==:17
 a=kj9zAlcOel0A:10 a=Sek2kApjW0X8YfFk6SkA:9 a=CjuIK1q_8ugA:10
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
 <freebsd-numerics.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics/>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 17 May 2017 23:26:09 -0000

On Wed, 17 May 2017, Steve Kargl wrote:

> On Wed, May 17, 2017 at 12:49:45PM +1000, Bruce Evans wrote:
>> On Tue, 16 May 2017, Steve Kargl wrote:
>>
>>> Index: lib/msun/ld128/s_cospil.c
>>> ...
>>> +static const long double
>>> +pihi = 3.14159265358979322702026593105983920e+00L,
>>> +pilo = 1.14423774522196636802434264184180742e-17L;
>>
>> These are not in normal format, and are hard to read.  I can't see if
>> pihi has the correct number of zero bits for exact multiplication.
>
> I don't have access to an ld128 system with suitable
> facilities to allow me to spit out the hex representation.

flame is still in the freebsd cluster, but is almost unusable because
its files are an old copy of actual home directories on freefall.

Utilities for printing hex and FP in good formats are remarkably rare.
pari/gp supports many special math types but can only input or output
ordinarly numbers in decimal.  I use my library routines to print hex
and special FP formats for it.  sh, printf(1) and awk(1) are limited
by native tyoes and have many other defects.

> As such, I've added a comment
>
> /*
> * pi_hi contains the leading 56 bits of a 169 bit approximation for pi.
> */

Why 56?  53 + 56 = 109.

>> These don't have the normal spelling.  fdlibm never uses pihi or pio2hi,
>> or pi_hi.  It often uses pio2_hi and other pio2_*.  My s_sinpi.c uses
>> pi_hi.
>
> fdlibm has no convention.  For example, e_log10.c actually uses
> a mixature of convensions.
>
> static const double
> two54      =  1.80143985094819840000e+16, /* 0x43500000, 0x00000000 */
> ivln10hi   =  4.34294481878168880939e-01, /* 0x3fdbcb7b, 0x15200000 */
> ivln10lo   =  2.50829467116452752298e-11, /* 0x3dbb9438, 0xca9aadd5 */
> log10_2hi  =  3.01029995663611771306e-01, /* 0x3FD34413, 0x509F6000 */
> log10_2lo  =  3.69423907715893078616e-13; /* 0x3D59FEF3, 0x11F12B36 */

My log* (committed only for long double precision) cleans this up a bit
and uses invln10_hi, etc.

My improvements to lgamma()'s sin_pi*() are buggy.  I forgot the original
target of avoiding inexact better:

XX --- /tmp/bak/e_lgamma_r.c	Thu Oct 30 09:20:46 2014
XX +++ e_lgamma_r.c	Thu May 18 07:46:37 2017
XX @@ -157,5 +157,5 @@
XX 
XX  /*
XX - * Compute sin(pi*x) without actually doing the pi*x multiplication.
XX + * Compute sin(pi*x), with the multiplication done after reducing x.
XX   * sin_pi(x) is only called for x < 0 and |x| < 2**(p-1) where p is
XX   * the precision of x.

You added this comment.  It is not very useful.  Its first sentence was
wrong about not doing the multiplication (fixed).  It's second sentence
is redundant.  The caller is nearby, and spells p-1 literally so that
it is easier to see that the magic numbers here match up.  The caller's
code is:

 	    if(ix>=0x43300000) 	/* |x|>=2**52, must be -integer */
 		return one/vzero;
 	    t = sin_pi(x);
 	    if(t==zero) return one/vzero; /* -integer */

This is a not so good optimization optimization.  This function has to be
careful with large values and it would be clearer to do all the checking
itself.  Then the t==zero check will handle the large integer.  My bug
and old code and optimizations are related to this.

XX @@ -165,35 +165,149 @@
XX  {
XX  	volatile double vz;
XX -	double y,z;
XX -	int n;
XX +	double s,y,z;
XX 
XX  	y = -x;
XX 
XX -	vz = y+0x1p52;			/* depend on 0 <= y < 0x1p52 */

The dependencies are for 2 things:
- no overflow when y is very large
- no spurious inexact when y is just 0x1p52.  For y below 0x1p52, the
   addition keeps the integer part and loses the fractional part, so
   it gives inexact if the fractional part is nonzero.  This is just
   what we want for lgamma.  However, for general sinpi() we don't
   want inexact for half-integers, and for general tanpi() we don't
   want inexact for quarter-integers.

XX -	z = vz-0x1p52;			/* rint(y) for the above range */
XX -	if (z == y)
XX -	    return zero;

My initially fix was to remove the above code.  Integers can be classifyed
later.

fdlibm starts with floor() here.  This works for lgamma(), but gives the
same inexact for half- and quarter-integers as the above, so is not
suitable for general used.

XX -
XX -	vz = y+0x1p50;
XX -	GET_LOW_WORD(n,vz);		/* bits for rounded y (units 0.25) */
XX -	z = vz-0x1p50;			/* y rounded to a multiple of 0.25 */
XX -	if (z > y) {
XX -	    z -= 0.25;			/* adjust to round down */
XX -	    n--;

This classifies quarter-integers well.  y is 0 at exact quarter-integers, and
there is no spurious inexact for them.  n tells you which quarter-integer
it was.

XX +	/*
XX +	 * Redundant step to demonstrate a general version.  The usual magic
XX +	 * for rounding needs only no overflow, which this step ensures.
XX +	 */
XX +	if (y >= 0x1p53)
XX +	    return zero;		/* even integer */

The comment is wrong.  I changed from 0x1p5{0,2} to 0x1p53 to get y in the
full range [0, 2) instead of [0, 0.25) or [0, 1), but this gives spurious
inexact for odd integers between 0x1p52 and 0x1p53 as well as for the half-
and quarter-integers.

This can be fixed using special classification near 0x1p52:

- >= 0x1p53: even integer
- in [0x1p52, 0x1p53): integer
- in [0x1p51, 0x1p52): half-integer (that is, integer or integer + 0.5
- in [0x1p50, 0x1p51): quarter-integer
- < 0x1p50: use general method

but it seems better to go back to the general method of adding and
subtracting 0x1p50.  That gives a null rounding step (without inexact)
for all values >= 0x1p50 provided only that overflow doesn't occur.
I changed this because it reduces y a bit too much, and needs something
like the GET of n to classify the quadrant.

XX +
XX +	if (y < 2)
XX +	    goto y_reduced;
XX +	vz = y+0x1p53;
XX +	z = vz-0x1p53;			/* y rounded to a multiple of 2 */
XX +	if (z > y)
XX +	    z -= 2;			/* adjust to round down */
XX +	y = y - z;			/* y mod 2 */
XX +y_reduced:

The problem with this method is fundamental.  Rounding of y to an
integer multiple of 2 in any normal way should cause inexact whenever
y is not such a multiple, but we want to avoid inexact except for
multiples of 0.25.

I think the remainder operation doesn't set inexact, so remainder(y, 2)
would work.  I expected it to be slow, but it is actually faster than
the above on i386!  (remainder is in hardware on x86, but it uses the
i387, so is not quite as good on amd64.)  Code using remainder():

XX 	/*
XX 	 * Redundant step to demonstrate a general version.  The usual magic
XX 	 * for rounding needs only no overflow, which this step ensures.
XX 	 */
XX 	if (y >= 0x1p53)
XX 	    return zero;		/* even integer */
XX 
XX 	if (y < 2)
XX 	    goto y_reduced;
XX #if 0
XX 	vz = y+0x1p53;
XX 	z = vz-0x1p53;			/* y rounded to a multiple of 2 */
XX 	if (z > y)
XX 	    z -= 2;			/* adjust to round down */
XX 	y = y - z;			/* y mod 2 */
XX #else
XX 	y = remainder(y, 2);
XX 	if (y < 0)
XX 		y += 2;
XX #endif
XX y_reduced:

x86 remainder() has a slow loop for large args, so although it would work
up to DBL_MAX, the special case to handle large args is still needed as
an optimization.

If this works, then it also works for conversions from degrees to radians
(first take the remainder mod 360 instead of 2 in the above), and later
scale by pi/180 instead of pi.

Bruce