Date: Mon, 24 Apr 2017 23:36:43 -0700 From: Steve Kargl <sgk@troutmask.apl.washington.edu> To: Bruce Evans <brde@optusnet.com.au> Cc: freebsd-numerics@freebsd.org, freebsd-standards@freebsd.org Subject: Re: Implementation of half-cycle trignometric functions Message-ID: <20170425063643.GA45906@troutmask.apl.washington.edu> In-Reply-To: <20170425143120.R933@besplex.bde.org> References: <20170409220809.GA25076@troutmask.apl.washington.edu> <20170413171248.GA86780@troutmask.apl.washington.edu> <20170424232743.GA54621@troutmask.apl.washington.edu> <20170425143120.R933@besplex.bde.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, Apr 25, 2017 at 03:13:44PM +1000, Bruce Evans wrote: > On Mon, 24 Apr 2017, Steve Kargl wrote: > > > I have what appears to be working versions of asinpi[fl]. It > > was suggested elsewhere that using an Estrin's method to > > sum the polynomial approximations instead of Horner's method > > may allow modern CPUs to better schedule generated code. > > I have implemented an Estrin's-like method for sinpi[l] > > and cospi[l], and indeed the generated code is faster on > > my Intel core2 duo with only a slight degradation in the > > observed max ULP. I'll post new versions to bugzilla in > > the near future. > > I didn't remember Estrin, but checked Knuth and see that his > method is a subset of what I use routinely. I might have learned > it from Knuth. I tried all methods in Knuth, but they didn't seem > to be any better at first. Later I learned more about scheduling. > Estrin's method now seems routine as a subset of scheduling. I think I came across this in Knuth as well, but I may have picked of the name Estrin from Mueller's book (or google :). > > You already used it in expl: for ld80: > I recall that that is your handy work. > X x2 = x * x; > X x4 = x2 * x2; > > Unlike in Knuth's description of Estrin, we don't try to order the > operations to suit the CPU(s) in the code, but just try to keep them > independent, as required for them to execute independently. Compilers > will rearrange them anyway, sometimes even correctly, but it makes > little difference (on OOE CPUs) to throw all of the operations into > the pipeline and see how fast something comes out). > > X q = x4 * (x2 * (x4 * > X /* > X * XXX the number of terms is no longer good for > X * pairwise grouping of all except B3, and the > X * grouping is no longer from highest down. > X */ > X (x2 * B12 + (x * B11 + B10)) + > X (x2 * (x * B9 + B8) + (x * B7 + B6))) + > X (x * B5 + B4.e)) + x2 * x * B3.e; > > This is arranged to look a bit like Horner's method at first (operations > not written in separate statements), but it is basically Estrin's method > with no extensions to more complicated sub-expressions than Estrin's, > but complications for accuracy: > - I think accuracy for the B3 term is unimportant, but for some reason > (perhaps just the one in the comment) it is grouped separately. Estrin > might use x3 * (B3.e + x * B4.e). Yes, it is most likely an accuracy issue. I found something similar with sinpi/cospi. My k_sinpi.c has #if 0 double x2; /* Horner's method. */ x2 = x * x; sm = x2 * (x2 * (x2 * (x2 * (x2 * (x2 * s7 + s6) + s5) + s4) + s3) + s2) + s1; #else double x2, x4; /* Sort of Estrin's method. */ x2 = x * x; x4 = x2 * x2; sm = (s5 + s6 * x2 + s7 * x4) * x4 * x4 + (s2 + s3 * x2 + s4 * x4) * x2 + s1; #endif The addition of s1 must occur last, or I get greater 1 ULP for a number of values. My timing show on Intel core2 duo /* Horner's methods. */ laptop-kargl:kargl[219] ./testd -S -s -m 0 -M 0.25 -n 22 sinpi time: 0.066 usec per call cospi time: 0.063 usec per call /* Estrin's methods. */ laptop-kargl:kargl[212] ./testd -S -s -m 0 -M 0.25 -n 22 sinpi time: 0.059 usec per call cospi time: 0.055 usec per call Here, -n 22 means to test 2**22 - 1 values in the interval [0,0.25]. For the older functions that were committed a few years ago, I should probably go back to recheck timings and accuracy with and without Estrin's method. -- Steve 20161221 https://www.youtube.com/watch?v=IbCHE-hONow
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20170425063643.GA45906>