Date: Fri, 28 Apr 2017 09:56:58 -0700 From: Steve Kargl <sgk@troutmask.apl.washington.edu> To: Bruce Evans <brde@optusnet.com.au> Cc: freebsd-numerics@freebsd.org, freebsd-hackers@freebsd.org Subject: Re: Implementation of half-cycle trignometric functions Message-ID: <20170428165658.GA17560@troutmask.apl.washington.edu> In-Reply-To: <20170428183733.V1497@besplex.bde.org> References: <20170409220809.GA25076@troutmask.apl.washington.edu> <20170427231411.GA11346@troutmask.apl.washington.edu> <20170428010122.GA12814@troutmask.apl.washington.edu> <20170428183733.V1497@besplex.bde.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, Apr 28, 2017 at 07:13:16PM +1000, Bruce Evans wrote: > On Thu, 27 Apr 2017, Steve Kargl wrote: > > > On Thu, Apr 27, 2017 at 04:14:11PM -0700, Steve Kargl wrote: > >> > >> I have attached a new diff to the bugzilla report. The > >> diff is 3090 lines and won't be broadcast the mailing list. > >> > >> This diff includes fixes for a few inconsequential bugs > >> and implements modified Estrin's method for sum a few > >> ploynomials. If you want the previous Horner's method > >> then add -DHORNER to your CFLAGS. > > > > For those curious about testing, here are some numbers > > for the Estrin's method. These were obtained on an AMD > > FX(tm)-8350 @ 4018.34-MHz. The times are in microseconds > > and the number in parentheses are estimated cycles. > > > > | cospi | sinpi | tanpi > > ------------+--------------+--------------+-------------- > > float | 0.0089 (37) | 0.0130 (54) | 0.0194 (80) > > double | 0.0134 (55) | 0.0160 (66) | 0.0249 (102) > > long double | 0.0557 (228) | 0.0698 (287) | 0.0956 (393) > > ------------+--------------+--------------+-------------- > > > > The timing tests are done on the interval [0,0.25] as > > this is the interval where the polynomial approximations > > apply. Limited accuracy testing gives > > These still seem slow. I did a quick test of naive evaluations of > these functions as standard_function(Pi * x) and get times a bit faster > on Haswell, except 2-4 times faster for the long double case, with the > handicaps of using gcc-3.3.3 and i386. Of course, the naive evaluation > is inaccurate, especially near multiples of Pi/2. The slowness comes from handling extra precision arithmetic. For sinpi(x) = x * p(x) where p(x) = s0 + x2 * S(x2) is the polynomial approximation and x2 = x * x. One needs to take care in the ordering on the evaluation where x and x2 are split into high and low parts. See the source code posted in response to your other email. > > x in [0,0.25] | tanpif | tanpi | tanpil > > -----------------+------------+------------+------------- > > MAX ULP | 1.37954760 | 1.37300168 | 1.38800823 > > Just use the naive evaluation to get similar errors in this > range. It is probably faster too. I haven't checked tanpif, but naive evaluation is faster for sinpif(x). But getting the worry answer fast seems to be counter to what one may expect from a math library. Consider the two naive implementations: float bde1(float x) { return (sinf((float)M_PI * x)); } float bde2(float x) { return (sinf((float)(M_PI * x))); } float bde3(float x) { return ((float)sin(M_PI * x)); } x in [0,0.25] | sinpif | bde1(x) | bde2(x) | bde3(x) -----------------+-------------+-------------+--------------+------------ Time usec (cyc) | 0.0130 (54) | 0.0084 (35) | 0.0093 (38) | 0.0109 (45) MAX ULP | 0.80103672 | 1.94017851 | 1.46830785 | 0.5 1.0 < ULP | 0 | 736496 | 47607 | 0 0.9 < ULP <= 1.0 | 0 | 563122 | 101162 | 0 0.8 < ULP <= 0.9 | 1 | 751605 | 386128 | 0 0.7 < ULP <= 0.8 | 727 | 942349 | 753647 | 0 0.6 < ULP <= 0.7 | 19268 | 1169719 | 1123518 | 0 x in [3990,3992] | sinpif | bde1(x) | bde2(x) | bde3(x) -----------------+-------------+-------------+-------------+------------ Time usec (cyc) | 0.0161 (66) | 0.0137 (56) | 0.0144 (59) | 0.0211 (87) MAX ULP | 0.65193975 | 10458803. | 6471461. | 0.5 1.0 < ULP | 0 | 16773117 | 16762878 | 0 0.9 < ULP <= 1.0 | 0 | 0 | 0 | 0 0.8 < ULP <= 0.9 | 0 | 0 | 0 | 0 0.7 < ULP <= 0.8 | 0 | 0 | 0 | 0 0.6 < ULP <= 0.7 | 19268 | 2047 | 2047 | 0 So bde3(x) is best, but if you're going to use sin(M_PI*x) for float sinpif, then use sinpi((double)x) which is faster than bde3 and just as accurate. -- Steve 20170425 https://www.youtube.com/watch?v=VWUpyCsUKR4 20161221 https://www.youtube.com/watch?v=IbCHE-hONow
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20170428165658.GA17560>