From owner-freebsd-numerics@freebsd.org Wed May 17 23:26:09 2017 Return-Path: Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id B6EC1D71CBA for ; Wed, 17 May 2017 23:26:09 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail104.syd.optusnet.com.au (mail104.syd.optusnet.com.au [211.29.132.246]) by mx1.freebsd.org (Postfix) with ESMTP id 6620514CB for ; Wed, 17 May 2017 23:26:08 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from besplex.bde.org (c122-106-153-191.carlnfd1.nsw.optusnet.com.au [122.106.153.191]) by mail104.syd.optusnet.com.au (Postfix) with ESMTPS id E87F8429AEB; Thu, 18 May 2017 09:25:59 +1000 (AEST) Date: Thu, 18 May 2017 09:25:58 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Steve Kargl cc: freebsd-numerics@freebsd.org Subject: Re: Implementation of half-cycle trignometric functions In-Reply-To: <20170517180927.GA54431@troutmask.apl.washington.edu> Message-ID: <20170518072636.E951@besplex.bde.org> References: <20170409220809.GA25076@troutmask.apl.washington.edu> <20170427231411.GA11346@troutmask.apl.washington.edu> <20170428175851.A1386@besplex.bde.org> <20170516224618.GA40855@troutmask.apl.washington.edu> <20170517094848.A52247@besplex.bde.org> <20170517180927.GA54431@troutmask.apl.washington.edu> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.2 cv=WvBbCZXv c=1 sm=1 tr=0 a=Tj3pCpwHnMupdyZSltBt7Q==:117 a=Tj3pCpwHnMupdyZSltBt7Q==:17 a=kj9zAlcOel0A:10 a=Sek2kApjW0X8YfFk6SkA:9 a=CjuIK1q_8ugA:10 X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 17 May 2017 23:26:09 -0000 On Wed, 17 May 2017, Steve Kargl wrote: > On Wed, May 17, 2017 at 12:49:45PM +1000, Bruce Evans wrote: >> On Tue, 16 May 2017, Steve Kargl wrote: >> >>> Index: lib/msun/ld128/s_cospil.c >>> ... >>> +static const long double >>> +pihi = 3.14159265358979322702026593105983920e+00L, >>> +pilo = 1.14423774522196636802434264184180742e-17L; >> >> These are not in normal format, and are hard to read. I can't see if >> pihi has the correct number of zero bits for exact multiplication. > > I don't have access to an ld128 system with suitable > facilities to allow me to spit out the hex representation. flame is still in the freebsd cluster, but is almost unusable because its files are an old copy of actual home directories on freefall. Utilities for printing hex and FP in good formats are remarkably rare. pari/gp supports many special math types but can only input or output ordinarly numbers in decimal. I use my library routines to print hex and special FP formats for it. sh, printf(1) and awk(1) are limited by native tyoes and have many other defects. > As such, I've added a comment > > /* > * pi_hi contains the leading 56 bits of a 169 bit approximation for pi. > */ Why 56? 53 + 56 = 109. >> These don't have the normal spelling. fdlibm never uses pihi or pio2hi, >> or pi_hi. It often uses pio2_hi and other pio2_*. My s_sinpi.c uses >> pi_hi. > > fdlibm has no convention. For example, e_log10.c actually uses > a mixature of convensions. > > static const double > two54 = 1.80143985094819840000e+16, /* 0x43500000, 0x00000000 */ > ivln10hi = 4.34294481878168880939e-01, /* 0x3fdbcb7b, 0x15200000 */ > ivln10lo = 2.50829467116452752298e-11, /* 0x3dbb9438, 0xca9aadd5 */ > log10_2hi = 3.01029995663611771306e-01, /* 0x3FD34413, 0x509F6000 */ > log10_2lo = 3.69423907715893078616e-13; /* 0x3D59FEF3, 0x11F12B36 */ My log* (committed only for long double precision) cleans this up a bit and uses invln10_hi, etc. My improvements to lgamma()'s sin_pi*() are buggy. I forgot the original target of avoiding inexact better: XX --- /tmp/bak/e_lgamma_r.c Thu Oct 30 09:20:46 2014 XX +++ e_lgamma_r.c Thu May 18 07:46:37 2017 XX @@ -157,5 +157,5 @@ XX XX /* XX - * Compute sin(pi*x) without actually doing the pi*x multiplication. XX + * Compute sin(pi*x), with the multiplication done after reducing x. XX * sin_pi(x) is only called for x < 0 and |x| < 2**(p-1) where p is XX * the precision of x. You added this comment. It is not very useful. Its first sentence was wrong about not doing the multiplication (fixed). It's second sentence is redundant. The caller is nearby, and spells p-1 literally so that it is easier to see that the magic numbers here match up. The caller's code is: if(ix>=0x43300000) /* |x|>=2**52, must be -integer */ return one/vzero; t = sin_pi(x); if(t==zero) return one/vzero; /* -integer */ This is a not so good optimization optimization. This function has to be careful with large values and it would be clearer to do all the checking itself. Then the t==zero check will handle the large integer. My bug and old code and optimizations are related to this. XX @@ -165,35 +165,149 @@ XX { XX volatile double vz; XX - double y,z; XX - int n; XX + double s,y,z; XX XX y = -x; XX XX - vz = y+0x1p52; /* depend on 0 <= y < 0x1p52 */ The dependencies are for 2 things: - no overflow when y is very large - no spurious inexact when y is just 0x1p52. For y below 0x1p52, the addition keeps the integer part and loses the fractional part, so it gives inexact if the fractional part is nonzero. This is just what we want for lgamma. However, for general sinpi() we don't want inexact for half-integers, and for general tanpi() we don't want inexact for quarter-integers. XX - z = vz-0x1p52; /* rint(y) for the above range */ XX - if (z == y) XX - return zero; My initially fix was to remove the above code. Integers can be classifyed later. fdlibm starts with floor() here. This works for lgamma(), but gives the same inexact for half- and quarter-integers as the above, so is not suitable for general used. XX - XX - vz = y+0x1p50; XX - GET_LOW_WORD(n,vz); /* bits for rounded y (units 0.25) */ XX - z = vz-0x1p50; /* y rounded to a multiple of 0.25 */ XX - if (z > y) { XX - z -= 0.25; /* adjust to round down */ XX - n--; This classifies quarter-integers well. y is 0 at exact quarter-integers, and there is no spurious inexact for them. n tells you which quarter-integer it was. XX + /* XX + * Redundant step to demonstrate a general version. The usual magic XX + * for rounding needs only no overflow, which this step ensures. XX + */ XX + if (y >= 0x1p53) XX + return zero; /* even integer */ The comment is wrong. I changed from 0x1p5{0,2} to 0x1p53 to get y in the full range [0, 2) instead of [0, 0.25) or [0, 1), but this gives spurious inexact for odd integers between 0x1p52 and 0x1p53 as well as for the half- and quarter-integers. This can be fixed using special classification near 0x1p52: - >= 0x1p53: even integer - in [0x1p52, 0x1p53): integer - in [0x1p51, 0x1p52): half-integer (that is, integer or integer + 0.5 - in [0x1p50, 0x1p51): quarter-integer - < 0x1p50: use general method but it seems better to go back to the general method of adding and subtracting 0x1p50. That gives a null rounding step (without inexact) for all values >= 0x1p50 provided only that overflow doesn't occur. I changed this because it reduces y a bit too much, and needs something like the GET of n to classify the quadrant. XX + XX + if (y < 2) XX + goto y_reduced; XX + vz = y+0x1p53; XX + z = vz-0x1p53; /* y rounded to a multiple of 2 */ XX + if (z > y) XX + z -= 2; /* adjust to round down */ XX + y = y - z; /* y mod 2 */ XX +y_reduced: The problem with this method is fundamental. Rounding of y to an integer multiple of 2 in any normal way should cause inexact whenever y is not such a multiple, but we want to avoid inexact except for multiples of 0.25. I think the remainder operation doesn't set inexact, so remainder(y, 2) would work. I expected it to be slow, but it is actually faster than the above on i386! (remainder is in hardware on x86, but it uses the i387, so is not quite as good on amd64.) Code using remainder(): XX /* XX * Redundant step to demonstrate a general version. The usual magic XX * for rounding needs only no overflow, which this step ensures. XX */ XX if (y >= 0x1p53) XX return zero; /* even integer */ XX XX if (y < 2) XX goto y_reduced; XX #if 0 XX vz = y+0x1p53; XX z = vz-0x1p53; /* y rounded to a multiple of 2 */ XX if (z > y) XX z -= 2; /* adjust to round down */ XX y = y - z; /* y mod 2 */ XX #else XX y = remainder(y, 2); XX if (y < 0) XX y += 2; XX #endif XX y_reduced: x86 remainder() has a slow loop for large args, so although it would work up to DBL_MAX, the special case to handle large args is still needed as an optimization. If this works, then it also works for conversions from degrees to radians (first take the remainder mod 360 instead of 2 in the above), and later scale by pi/180 instead of pi. Bruce