From owner-freebsd-hackers@freebsd.org Fri Apr 28 20:15:24 2017 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 3B16CD549B6; Fri, 28 Apr 2017 20:15:24 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu [128.95.76.21]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "troutmask", Issuer "troutmask" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 170831269; Fri, 28 Apr 2017 20:15:24 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (localhost [127.0.0.1]) by troutmask.apl.washington.edu (8.15.2/8.15.2) with ESMTPS id v3SKFMnx033229 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Fri, 28 Apr 2017 13:15:22 -0700 (PDT) (envelope-from sgk@troutmask.apl.washington.edu) Received: (from sgk@localhost) by troutmask.apl.washington.edu (8.15.2/8.15.2/Submit) id v3SKFMZ8033228; Fri, 28 Apr 2017 13:15:22 -0700 (PDT) (envelope-from sgk) Date: Fri, 28 Apr 2017 13:15:22 -0700 From: Steve Kargl To: Bruce Evans Cc: freebsd-numerics@freebsd.org, freebsd-hackers@freebsd.org Subject: Re: Implementation of half-cycle trignometric functions Message-ID: <20170428201522.GA32785@troutmask.apl.washington.edu> Reply-To: sgk@troutmask.apl.washington.edu References: <20170409220809.GA25076@troutmask.apl.washington.edu> <20170427231411.GA11346@troutmask.apl.washington.edu> <20170428010122.GA12814@troutmask.apl.washington.edu> <20170428183733.V1497@besplex.bde.org> <20170428165658.GA17560@troutmask.apl.washington.edu> <20170429035131.E3406@besplex.bde.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170429035131.E3406@besplex.bde.org> User-Agent: Mutt/1.7.2 (2016-11-26) X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 28 Apr 2017 20:15:24 -0000 On Sat, Apr 29, 2017 at 04:40:14AM +1000, Bruce Evans wrote: > On Fri, 28 Apr 2017, Steve Kargl wrote: > > > On Fri, Apr 28, 2017 at 07:13:16PM +1000, Bruce Evans wrote: > >> On Thu, 27 Apr 2017, Steve Kargl wrote: > >>> For those curious about testing, here are some numbers > >>> for the Estrin's method. These were obtained on an AMD > >>> FX(tm)-8350 @ 4018.34-MHz. The times are in microseconds > >>> and the number in parentheses are estimated cycles. > >>> > >>> | cospi | sinpi | tanpi > >>> ------------+--------------+--------------+-------------- > >>> float | 0.0089 (37) | 0.0130 (54) | 0.0194 (80) > >>> double | 0.0134 (55) | 0.0160 (66) | 0.0249 (102) > >>> long double | 0.0557 (228) | 0.0698 (287) | 0.0956 (393) > >>> ------------+--------------+--------------+-------------- > >>> > >>> The timing tests are done on the interval [0,0.25] as > >>> this is the interval where the polynomial approximations > >>> apply. Limited accuracy testing gives > >> > >> These still seem slow. I did a quick test of naive evaluations of > >> these functions as standard_function(Pi * x) and get times a bit faster > >> on Haswell, except 2-4 times faster for the long double case, with the > >> handicaps of using gcc-3.3.3 and i386. Of course, the naive evaluation > >> is inaccurate, especially near multiples of Pi/2. > > > > The slowness comes from handling extra precision arithmetic. For > > sinpi(x) = x * p(x) where p(x) = s0 + x2 * S(x2) is the polynomial > > approximation and x2 = x * x. One needs to take care in the ordering > > on the evaluation where x and x2 are split into high and low parts. > > See the source code posted in response to your other email. > > See logl() for how to do this almost as well as possible using the 2sum > primitives. Don't forget to test on i386 with i387, where you will need > lots of slow STRICT_ASSIGN()s or careful use of float_t and double_t > as in my uncomitted logf() and logl()i or more relevantly clog() (cplex.c) > to avoid problems with extra precision (2sum requires this). > > Calculating the polynomial as x * p(x) seems wrong. It starts life as the Remes approximation for p(x) = sin(pi*x) / x. Yes, I also looked into using p(x) = sin(pi*x) / (pi*x). At some point, one must do sinpi(x) = sin(pi*x) = x * p(x). One could do, as is done with sin(x), f(x) = (sin(pi*x) - pi*x)/(pi*x)**3 then sinpi(x) = pi*x + (pi*x)**3 * f(x), but this then adds the complication that pi*x must be computed in extra precision. For float, with the algorithm I have developed, one can simply do everything in double and then cast the result to float. In fact, my __kernel_sinpif was initially written in double with a max ULP less than 0.55. I then looked at what portions of the polynomial could be computed in float without reducing the 0.55. This then reveals what needs to be handle specially. But, what is one to do with double and long double functions? > That is too inaccurate even for sin(x), and probably slower too. I'm not sure why you're claiming that it is too inaccurate. Let's look at sinpi(x) and sin(x) on (roughly) the same interval to remove the different precisions involved in sinpif(x) and sinf(x). This is my sinpi(x) troutmask:kargl[221] ./testd -S -m 0.00 -M 0.25 -n 24 MAX ULP: 0.79851085 Total tested: 16777214 0.7 < ULP <= 0.8: 1075 0.6 < ULP <= 0.7: 22722 This is libm sin(x) troutmask:kargl[224] ./testd -S -m 0.00 -M 0.7853981634 -n 24 MAX ULP: 0.72922199 Total tested: 16777214 0.7 < ULP <= 0.8: 36 0.6 < ULP <= 0.7: 12193 The numbers above really aren't too different, and someone that actually understands numerical analysis might be able to take my code and (marginally) improve the numbers. > > float > > bde1(float x) > > { > > return (sinf((float)M_PI * x)); > > } > > sinpif() with range reduction in float precision and multiplication > by M_PI and __kernel_sin() to finish off would be much faster and > more accurate than doing everything in float precision. On i386/i387, Multiplication by M_PI and __kernel_sin() are double precision intermediates. That works for float. What are you going to do for double and long double? One of the benefits of writing sinpif(x) in float precision is that the exacts same algorithm is used in sinpi(x) and sinpil(x). One of these can be exhaustively tested. -- Steve 20170425 https://www.youtube.com/watch?v=VWUpyCsUKR4 20161221 https://www.youtube.com/watch?v=IbCHE-hONow