Date: Sat, 29 Apr 2017 08:09:39 +1000 (EST) From: Bruce Evans <brde@optusnet.com.au> To: Steve Kargl <sgk@troutmask.apl.washington.edu> Cc: Bruce Evans <brde@optusnet.com.au>, freebsd-numerics@freebsd.org, freebsd-hackers@freebsd.org Subject: Re: Implementation of half-cycle trignometric functions Message-ID: <20170429070036.A4005@besplex.bde.org> In-Reply-To: <20170428201522.GA32785@troutmask.apl.washington.edu> References: <20170409220809.GA25076@troutmask.apl.washington.edu> <20170427231411.GA11346@troutmask.apl.washington.edu> <20170428010122.GA12814@troutmask.apl.washington.edu> <20170428183733.V1497@besplex.bde.org> <20170428165658.GA17560@troutmask.apl.washington.edu> <20170429035131.E3406@besplex.bde.org> <20170428201522.GA32785@troutmask.apl.washington.edu>
next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, 28 Apr 2017, Steve Kargl wrote: > On Sat, Apr 29, 2017 at 04:40:14AM +1000, Bruce Evans wrote: >> On Fri, 28 Apr 2017, Steve Kargl wrote: >>> ... >>> The slowness comes from handling extra precision arithmetic. For >>> sinpi(x) = x * p(x) where p(x) = s0 + x2 * S(x2) is the polynomial >>> approximation and x2 = x * x. One needs to take care in the ordering >>> on the evaluation where x and x2 are split into high and low parts. >>> See the source code posted in response to your other email. >> ... >> Calculating the polynomial as x * p(x) seems wrong. > > It starts life as the Remes approximation for p(x) = sin(pi*x) / x. > Yes, I also looked into using p(x) = sin(pi*x) / (pi*x). At some > point, one must do sinpi(x) = sin(pi*x) = x * p(x). I forgot that the linear term needs to be multiplied by Pi. Actually, the problem is only in the notation and the above description. s0 looks like a constant, but it is actually Pi*x in extra precision, and no extra precision is needed or done for x2. __kernel_cospi() seems to be buggy. It splits x2, but doesn't use extra precision for calculating x2. Is extra precision really needed for the x2 term? It is like the x3 term for sin*(), with the critical difference that the coeffient is much larger (-1/2 instead of -1/6 before multiplication by pi). This requires extra accuracy for cos() too, but not for the calculating the term. I optimized this by changing a slow fdlibm method to essentially 2sum. The x2/2 term is inaccurate but no significant further inaccuracies occur for adding terms. You have Pi*Pi*x2/2 which is more inaccurate. I guess the extra half an ulp accuracy here is too much. You avoid this half an ulp using extra precision, and seem to use lots of extra precision instead of the final 2sum step. Probably much slower. Here is my modified method: x2 = x*x; /* hopefully really don't need extra prec */ (hi, lo) = split(x2); resid = S(x2); hi = C2_hi * hi; lo = C2_lo * hi + C1_hi * lo + C1_lo * lo; lo = lo + resid; #ifdef hopefully_not_needed Re-split hi+lo if lo is too large. #endif return 3sum(1, hi, lo) 3sum is in math_private.h and is supposed to be used instead of magic- looking open code. Here it would reduce to the same expressions as in __kernel_sin(). 1+hi = hi_1 + lo_1; /* by hi-lo decomp. */ return (lo + lo_1 + hi_1); /* safely add the lo's and "carry" */ > One could do, as is done with sin(x), > > f(x) = (sin(pi*x) - pi*x)/(pi*x)**3 > > then sinpi(x) = pi*x + (pi*x)**3 * f(x), but this then adds > the complication that pi*x must be computed in extra precision. Isn't that what is done now? Of course pi*x must be evaluated in extra precision, but hopefully no more is needed. The x3 term would be much harder to do all in extra precision than the x2 term for cospi(). >> That is too inaccurate even for sin(x), and probably slower too. > > I'm not sure why you're claiming that it is too inaccurate. Writing sin(x) as x * (1 + C2*x*x + ...) without extra precision. This gives at best an error of half an ulp for the part in parentheses and another half an ulp for the multiplication. The final step must be an addition of a large term with a term at least 2 times smaller so that the error can be controlled (this depends on 1 of the terms being exact). > ... >>> float >>> bde1(float x) >>> { >>> return (sinf((float)M_PI * x)); >>> } >> >> sinpif() with range reduction in float precision and multiplication >> by M_PI and __kernel_sin() to finish off would be much faster and >> more accurate than doing everything in float precision. On i386/i387, > > Multiplication by M_PI and __kernel_sin() are double precision > intermediates. That works for float. That's what I mean by naive code having a chance of working. It takes foot-shooting to reduce M_PI to float. Unfortunately, the API + ABI does reduce (M_PI * x) to float in sinf(M_PI * x) if sinf() is extern. But if sinf() is in hardware, the compiler might not destroy the extra precision before or after using the hardware. > What are you going to > do for double and long double? One of the benefits of writing > sinpif(x) in float precision is that the exacts same algorithm is > used in sinpi(x) and sinpil(x). One of these can be exhaustively > tested. That is about the only benefit of the float functions, so I resist optimizing them using doubles. It would be better to use extra precision automatically, but not force it. Bruce
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20170429070036.A4005>