Date: Sat, 16 Dec 2017 15:08:52 +1100 (EST) From: Bruce Evans <brde@optusnet.com.au> To: Steve Kargl <sgk@troutmask.apl.washington.edu> Cc: freebsd-numerics@freebsd.org Subject: Re: progress on powl. Message-ID: <20171216135205.D971@besplex.bde.org> In-Reply-To: <20171216013325.GA27344@troutmask.apl.washington.edu> References: <20171216013325.GA27344@troutmask.apl.washington.edu>
next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, 15 Dec 2017, Steve Kargl wrote: > As expect no one elase is working on powl, I have > dusted off my work-in-progress that is now some > 1+ year in the works. I've now got a sort of > working version in very limited testing. > ... > I'm still slowing working out some thresholds and > few other kinks. The biggest problem is that > src/e_pow.c may be the worse commented code that > I've had the pleasure to read. There are 3 polynomial > approximations. One is easy to work. The other two > are gaint mysteries at moment. It seems OK to me. I only see 2 polys and a simple rational function. One for log() is obviously needed, and the special one used is documented. P* for exp() is not documented in e_pow.c and is not obvious, but is identical to P*() for exp() in e_exp.c. exp() converges too slowly so is reduced to a special rational function related to (exp(x)+1)/(exp(x)-1) which is related to Bernoulli numbers and thus to lots of interesting power series. The special rational function is basically just this rational function in exp(x) inverted. It is suprising that P* is accurate enough for pow(). exp() exponentiates errors. It is apparently enough to calculate y*log2(x) = n + y' accurately. Since |y'| <= 0.5, the errors don't get exponentiated. But then we have to evaluate exp(y'*log2) accurately, and I'm surprised if the naive method in the comment is enough. We should evaluate y' in extra precision and used this for calculating y'*log2, and I think we do. Then ignoring the error from this is apparently OK. It should be < 0.5 ulps, and then it is easy to avoid adding more than 1 ulp of error for exp() but not so easy to avoid adding more than 0.5 ulps of error. However, if we just use exp(x) = 1 + x + 2*x/2 + ..., then this is easy enough. We have x in extra precision and should add it to 1 in extra precision, and the other terms are too small to contribute much to the error. Different care is needed for the rational function. My clog*(z) has similar problems for the real part. At the end, it must calculate log*(x) where x has extra precision. Unfortunately, only the internal kernels of logl() and of my uncommitted version of log() and logf() support extra precision. Since I don't want to bloat clog() by open-coding these internals like pow() does for exp()'s internals, I just call the log*() functions. This blows out the error to up to ~1.8 ulps. See pow.c in 4.4BSD libm for a more brute force approach that is probably better. Part are the same, down to misspelling "multi" as "muti". Maybe a watermark. We still have extra-precision exp and log functions which were used for this and which w use for tgamma. Unfortunately, these are only extra for double precision. This prevents a simple translation of tgamma() too. The kernels exp__D.c and log__L.c are actually simple, especially the latter, so this is not a good excuse for not translating them. BTW, you (er, I) have not got around to committing my double and float versions of coshl(), sinhl() and tanhl(). But expl() already has your kernel in a non-external place. The extra-precision kernels should work better than the open-coded ones in powl(). I now remember that even the double and float internal ones have a couple of bits of extra precision, while your expl() one has 8-10. I was going to translate the expl() kernel to lower precision, but found that the old internal ones are good enough in both accuracy and efficency for the hyperbolic functions. So my hyperbolic functions in double and float precision are almost identical with the ones in long double precision, except for details hidden in the kernels. I only cleaned up the double and float kernels a little when reorganizing to separate them. However, I couldn't get the kernels to work as fast as open coding for exp*() itself. clang now optimizes the old exp*() _very_ well so it shouldn't have any problems turning the kernels into the same code as the open- coded versions. The old exp() kernels only have a couple of bits of extra precision, but this is enough. i386 now uses the i387 for exp() but not for expf(). This is slow since it has to switch precisions. This is not very accurate even with 11 extra bits of precision, since the method is sloppy giving and loses about 8 of the extras. This is not significantly better than the 2 extra bits given by the kernels. Except possibly for exp(). If you have an error of 0.76 ulps than another 0.25 ulps takes the final error above 1.00. The expl() kernel shouldn't have this problem since its error is smaller than the exp() kernel or any adjustments that pow() could reasonably make to its open-coded version of this. The float coeffs for P* should be easy to fix since we already did that if necessary for their copies in expf(). Bruce
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20171216135205.D971>