Date: Fri, 28 Apr 2017 19:13:16 +1000 (EST) From: Bruce Evans <brde@optusnet.com.au> To: Steve Kargl <sgk@troutmask.apl.washington.edu> Cc: freebsd-numerics@freebsd.org, freebsd-hackers@freebsd.org Subject: Re: Implementation of half-cycle trignometric functions Message-ID: <20170428183733.V1497@besplex.bde.org> In-Reply-To: <20170428010122.GA12814@troutmask.apl.washington.edu> References: <20170409220809.GA25076@troutmask.apl.washington.edu> <20170427231411.GA11346@troutmask.apl.washington.edu> <20170428010122.GA12814@troutmask.apl.washington.edu>
next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, 27 Apr 2017, Steve Kargl wrote: > On Thu, Apr 27, 2017 at 04:14:11PM -0700, Steve Kargl wrote: >> >> I have attached a new diff to the bugzilla report. The >> diff is 3090 lines and won't be broadcast the mailing list. >> >> This diff includes fixes for a few inconsequential bugs >> and implements modified Estrin's method for sum a few >> ploynomials. If you want the previous Horner's method >> then add -DHORNER to your CFLAGS. > > For those curious about testing, here are some numbers > for the Estrin's method. These were obtained on an AMD > FX(tm)-8350 @ 4018.34-MHz. The times are in microseconds > and the number in parentheses are estimated cycles. > > | cospi | sinpi | tanpi > ------------+--------------+--------------+-------------- > float | 0.0089 (37) | 0.0130 (54) | 0.0194 (80) > double | 0.0134 (55) | 0.0160 (66) | 0.0249 (102) > long double | 0.0557 (228) | 0.0698 (287) | 0.0956 (393) > ------------+--------------+--------------+-------------- > > The timing tests are done on the interval [0,0.25] as > this is the interval where the polynomial approximations > apply. Limited accuracy testing gives These still seem slow. I did a quick test of naive evaluations of these functions as standard_function(Pi * x) and get times a bit faster on Haswell, except 2-4 times faster for the long double case, with the handicaps of using gcc-3.3.3 and i386. Of course, the naive evaluation is inaccurate, especially near multiples of Pi/2. > x in [0,0.25] | tanpif | tanpi | tanpil > -----------------+------------+------------+------------- > MAX ULP | 1.37954760 | 1.37300168 | 1.38800823 Just use the naive evaluation to get similar errors in this range. It is probably faster too. For tiny x, both reduce to the approximation Pi*x, with an error like this expected unless the evaluation is done in extra precision. > In the interval [0.25,0.5] tanpi[fl] is computed by > cospi / sinpi. The numbers look like > > x in [0.25,0.5] | tanpif | tanpi | tanpil > -----------------+------------+------------+------------- > MAX ULP | 1.93529165 | 2.04485533 | 1.95823689 The errors build up only linearly in the number of operations, which is good. Note that on i386 with its extended precision, in float precision the naive method is accurate to nearly 0.5 ulps provided you use extended precision for Pi, the multiplication, and also the function, so sinpif() is only worth having if it can do this almost as fast as sinf() (about 15 cycles throughput and less than 100 latency (50?) on modern x86). The extra precision is used automatically by sinf() (by using a double hack. Double is not very different from float+extended on i386). I think accuracy is enough up to extend float precision up to a useful multiple of Pi (suppose double precision and not full extended, so only 53 bits for Pi, so 29 extra; lose 24 to cancelations and 5 are left, so the accuracy is enough up to about 2**5*Pi). Bruce
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20170428183733.V1497>