From owner-freebsd-numerics@freebsd.org Sat Dec 16 01:33:31 2017 Return-Path: Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 96646E931DE for ; Sat, 16 Dec 2017 01:33:31 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu [128.95.76.21]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "troutmask", Issuer "troutmask" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 7D58D6F2C1 for ; Sat, 16 Dec 2017 01:33:31 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (localhost [127.0.0.1]) by troutmask.apl.washington.edu (8.15.2/8.15.2) with ESMTPS id vBG1XPdB029874 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO) for ; Fri, 15 Dec 2017 17:33:25 -0800 (PST) (envelope-from sgk@troutmask.apl.washington.edu) Received: (from sgk@localhost) by troutmask.apl.washington.edu (8.15.2/8.15.2/Submit) id vBG1XPt4029873 for freebsd-numerics@freebsd.org; Fri, 15 Dec 2017 17:33:25 -0800 (PST) (envelope-from sgk) Date: Fri, 15 Dec 2017 17:33:25 -0800 From: Steve Kargl To: freebsd-numerics@freebsd.org Subject: progress on powl. Message-ID: <20171216013325.GA27344@troutmask.apl.washington.edu> Reply-To: sgk@troutmask.apl.washington.edu MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.7.2 (2016-11-26) X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 16 Dec 2017 01:33:31 -0000 All, As expect no one elase is working on powl, I have dusted off my work-in-progress that is now some 1+ year in the works. I've now got a sort of working version in very limited testing. % ./testl -a 1.2345 3.124 libmu = LD80C(0xf7303d48d62e9df4, 0, 1.93115964943387024991e+00L), mpfru = LD80C(0xf7303d48d62e9df5, 0, 1.93115964943387025002e+00L), ulp = 0.73609 I'm still slowing working out some thresholds and few other kinks. The biggest problem is that src/e_pow.c may be the worse commented code that I've had the pleasure to read. There are 3 polynomial approximations. One is easy to work. The other two are gaint mysteries at moment. I'll note that src/e_powf.c uses the same polynomials and it appears that these use naively rounded coefficients. I suspect that these polynomials can be reduced to lower order, but have pursued that yet. -- Steve From owner-freebsd-numerics@freebsd.org Sat Dec 16 04:09:02 2017 Return-Path: Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 7F8CBE9B746 for ; Sat, 16 Dec 2017 04:09:02 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail105.syd.optusnet.com.au (mail105.syd.optusnet.com.au [211.29.132.249]) by mx1.freebsd.org (Postfix) with ESMTP id 316B8758D7 for ; Sat, 16 Dec 2017 04:09:01 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from [192.168.0.102] (c110-21-101-228.carlnfd1.nsw.optusnet.com.au [110.21.101.228]) by mail105.syd.optusnet.com.au (Postfix) with ESMTPS id 78395104356D; Sat, 16 Dec 2017 15:08:53 +1100 (AEDT) Date: Sat, 16 Dec 2017 15:08:52 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Steve Kargl cc: freebsd-numerics@freebsd.org Subject: Re: progress on powl. In-Reply-To: <20171216013325.GA27344@troutmask.apl.washington.edu> Message-ID: <20171216135205.D971@besplex.bde.org> References: <20171216013325.GA27344@troutmask.apl.washington.edu> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.2 cv=KeqiiUQD c=1 sm=1 tr=0 a=PalzARQSbocsUSjMRkwAPg==:117 a=PalzARQSbocsUSjMRkwAPg==:17 a=kj9zAlcOel0A:10 a=rCpuhL93WexzyIwt4mAA:9 a=CjuIK1q_8ugA:10 X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 16 Dec 2017 04:09:02 -0000 On Fri, 15 Dec 2017, Steve Kargl wrote: > As expect no one elase is working on powl, I have > dusted off my work-in-progress that is now some > 1+ year in the works. I've now got a sort of > working version in very limited testing. > ... > I'm still slowing working out some thresholds and > few other kinks. The biggest problem is that > src/e_pow.c may be the worse commented code that > I've had the pleasure to read. There are 3 polynomial > approximations. One is easy to work. The other two > are gaint mysteries at moment. It seems OK to me. I only see 2 polys and a simple rational function. One for log() is obviously needed, and the special one used is documented. P* for exp() is not documented in e_pow.c and is not obvious, but is identical to P*() for exp() in e_exp.c. exp() converges too slowly so is reduced to a special rational function related to (exp(x)+1)/(exp(x)-1) which is related to Bernoulli numbers and thus to lots of interesting power series. The special rational function is basically just this rational function in exp(x) inverted. It is suprising that P* is accurate enough for pow(). exp() exponentiates errors. It is apparently enough to calculate y*log2(x) = n + y' accurately. Since |y'| <= 0.5, the errors don't get exponentiated. But then we have to evaluate exp(y'*log2) accurately, and I'm surprised if the naive method in the comment is enough. We should evaluate y' in extra precision and used this for calculating y'*log2, and I think we do. Then ignoring the error from this is apparently OK. It should be < 0.5 ulps, and then it is easy to avoid adding more than 1 ulp of error for exp() but not so easy to avoid adding more than 0.5 ulps of error. However, if we just use exp(x) = 1 + x + 2*x/2 + ..., then this is easy enough. We have x in extra precision and should add it to 1 in extra precision, and the other terms are too small to contribute much to the error. Different care is needed for the rational function. My clog*(z) has similar problems for the real part. At the end, it must calculate log*(x) where x has extra precision. Unfortunately, only the internal kernels of logl() and of my uncommitted version of log() and logf() support extra precision. Since I don't want to bloat clog() by open-coding these internals like pow() does for exp()'s internals, I just call the log*() functions. This blows out the error to up to ~1.8 ulps. See pow.c in 4.4BSD libm for a more brute force approach that is probably better. Part are the same, down to misspelling "multi" as "muti". Maybe a watermark. We still have extra-precision exp and log functions which were used for this and which w use for tgamma. Unfortunately, these are only extra for double precision. This prevents a simple translation of tgamma() too. The kernels exp__D.c and log__L.c are actually simple, especially the latter, so this is not a good excuse for not translating them. BTW, you (er, I) have not got around to committing my double and float versions of coshl(), sinhl() and tanhl(). But expl() already has your kernel in a non-external place. The extra-precision kernels should work better than the open-coded ones in powl(). I now remember that even the double and float internal ones have a couple of bits of extra precision, while your expl() one has 8-10. I was going to translate the expl() kernel to lower precision, but found that the old internal ones are good enough in both accuracy and efficency for the hyperbolic functions. So my hyperbolic functions in double and float precision are almost identical with the ones in long double precision, except for details hidden in the kernels. I only cleaned up the double and float kernels a little when reorganizing to separate them. However, I couldn't get the kernels to work as fast as open coding for exp*() itself. clang now optimizes the old exp*() _very_ well so it shouldn't have any problems turning the kernels into the same code as the open- coded versions. The old exp() kernels only have a couple of bits of extra precision, but this is enough. i386 now uses the i387 for exp() but not for expf(). This is slow since it has to switch precisions. This is not very accurate even with 11 extra bits of precision, since the method is sloppy giving and loses about 8 of the extras. This is not significantly better than the 2 extra bits given by the kernels. Except possibly for exp(). If you have an error of 0.76 ulps than another 0.25 ulps takes the final error above 1.00. The expl() kernel shouldn't have this problem since its error is smaller than the exp() kernel or any adjustments that pow() could reasonably make to its open-coded version of this. The float coeffs for P* should be easy to fix since we already did that if necessary for their copies in expf(). Bruce From owner-freebsd-numerics@freebsd.org Sat Dec 16 06:15:12 2017 Return-Path: Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 55E58E9EC34 for ; Sat, 16 Dec 2017 06:15:12 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail109.syd.optusnet.com.au (mail109.syd.optusnet.com.au [211.29.132.80]) by mx1.freebsd.org (Postfix) with ESMTP id CB9AB797A8 for ; Sat, 16 Dec 2017 06:15:11 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from [192.168.0.102] (c110-21-101-228.carlnfd1.nsw.optusnet.com.au [110.21.101.228]) by mail109.syd.optusnet.com.au (Postfix) with ESMTPS id 27535D6329A; Sat, 16 Dec 2017 17:15:02 +1100 (AEDT) Date: Sat, 16 Dec 2017 17:15:01 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Bruce Evans cc: Steve Kargl , freebsd-numerics@freebsd.org Subject: Re: progress on powl. In-Reply-To: <20171216135205.D971@besplex.bde.org> Message-ID: <20171216163632.W1413@besplex.bde.org> References: <20171216013325.GA27344@troutmask.apl.washington.edu> <20171216135205.D971@besplex.bde.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.2 cv=bc8baKHB c=1 sm=1 tr=0 a=PalzARQSbocsUSjMRkwAPg==:117 a=PalzARQSbocsUSjMRkwAPg==:17 a=kj9zAlcOel0A:10 a=pWyhuXww-wSbjqJyOqQA:9 a=CjuIK1q_8ugA:10 X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 16 Dec 2017 06:15:12 -0000 On Sat, 16 Dec 2017, Bruce Evans wrote: > ... > It is suprising that P* is accurate enough for pow(). exp() > exponentiates errors. It is apparently enough to calculate y*log2(x) > = n + y' accurately. Since |y'| <= 0.5, the errors don't get > ... > > My clog*(z) has similar problems for the real part. At the end, it must > calculate log*(x) where x has extra precision. Unfortunately, only > the internal kernels of logl() and of my uncommitted version of log() > and logf() support extra precision. Since I don't want to bloat clog() > by open-coding these internals like pow() does for exp()'s internals, > I just call the log*() functions. This blows out the error to up to ~1.8 > ulps. > [... using extra precision of exp*() kernels] Also, the internal extra precision of log*() might be enough for y*log2(x) = n+y'. It is enough for plain log2*(x). logl() has 8-10 extra bits internally, so log2l() has 6-8. Except for my versions, log() and logf() barely have enough bits to lose any for log2() and log2f(). Probably not enough for pow*(). y' also needs to retain extra precision, but that is routine using the method in log2l() (it just discards extra precision at the end). The 4.4BSD pow() is almost trivial for non-exceptional args using its kernels: X /* Return exp(y*log(x)), using simulated extended */ X /* precision for the log and the multiply. */ X X s = log__D(x); X t.a = y; X TRUNC(t.a); X t.b = y - t.a; X t.b = s.b*y + t.b*s.a; X t.a *= s.a; X s.a = t.a + t.b; X s.b = (t.a - s.a) + t.b; X return (exp__D(s.a, s.b)); while the log2l() non-kernel with debugging and i386 macros removed is: X long double X log2l(long double x) X { X struct ld r; X long double hi, lo; X X k_logl(x, &r); X if (!r.lo_set) X return (r.hi); X _2sumF(r.hi, r.lo); X hi = (float)r.hi; X lo = r.lo + (r.hi - hi); X return (invln2_hi * hi + X (invln2_lo + invln2_hi) * lo + invln2_lo * hi); X } You should see _2sumF() open-coded in this pow(). _2sumF() is delicate and the open-coding is probably broken. Compiler bugs break most implementations of 2sum*() if there is extra precision. _2sumF() has complications to work around this. The complications are mostly in the instructions in the comments for using it. For double precision, the the args must be double_t, not double. The 4.4BSD library doesn't know anything about this, so its open coding is just broken on i386. It is unclear if or how fdlibm pow() avoids these bugs. We fixed a couple for exp(). math_private.h has debugging variants of _2sumF() and variants with less restrictions on use. pow() also has to do a multiplication for this. We don't have any macros like _2sumF() to help for this. In general, for real values errors for extra-precision multiplications are unimportant since there are no cancelations. Howver a + b*c might need extra precision since there might be cancelations for the addition. Not a problem for the final step of pow*(), but a problem for the y' = y*log2(x) -n step. Another problem with exp__D() and log__L() is that their API is asymmetric. log__L() returns extra precision but doesn't take it. exp__D() takes extra precision but doesn't return it. Both as needed for pow() and perhaps *gamma*(), but too specialized. General versions were too much slower in C90, but now inline functions can take and return extra precision and hope that the compiler optimizes it away when the caller supplies it as arg.lo = 0 and the callee discards it by not using result.lo. Bruce From owner-freebsd-numerics@freebsd.org Sat Dec 16 16:54:07 2017 Return-Path: Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id D6DA3E87E49 for ; Sat, 16 Dec 2017 16:54:07 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu [128.95.76.21]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "troutmask", Issuer "troutmask" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id BB3EB6A83F for ; Sat, 16 Dec 2017 16:54:07 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (localhost [127.0.0.1]) by troutmask.apl.washington.edu (8.15.2/8.15.2) with ESMTPS id vBGGs6BW019378 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Sat, 16 Dec 2017 08:54:06 -0800 (PST) (envelope-from sgk@troutmask.apl.washington.edu) Received: (from sgk@localhost) by troutmask.apl.washington.edu (8.15.2/8.15.2/Submit) id vBGGs6RH019377; Sat, 16 Dec 2017 08:54:06 -0800 (PST) (envelope-from sgk) Date: Sat, 16 Dec 2017 08:54:06 -0800 From: Steve Kargl To: Bruce Evans Cc: freebsd-numerics@freebsd.org Subject: Re: progress on powl. Message-ID: <20171216165406.GA18979@troutmask.apl.washington.edu> Reply-To: sgk@troutmask.apl.washington.edu References: <20171216013325.GA27344@troutmask.apl.washington.edu> <20171216135205.D971@besplex.bde.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20171216135205.D971@besplex.bde.org> User-Agent: Mutt/1.7.2 (2016-11-26) X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 16 Dec 2017 16:54:07 -0000 On Sat, Dec 16, 2017 at 03:08:52PM +1100, Bruce Evans wrote: > On Fri, 15 Dec 2017, Steve Kargl wrote: > > > As expect no one elase is working on powl, I have > > dusted off my work-in-progress that is now some > > 1+ year in the works. I've now got a sort of > > working version in very limited testing. > > ... > > I'm still slowing working out some thresholds and > > few other kinks. The biggest problem is that > > src/e_pow.c may be the worse commented code that > > I've had the pleasure to read. There are 3 polynomial > > approximations. One is easy to work. The other two > > are gaint mysteries at moment. > I spoke too soon. It seems that some of my bit twiddling has gone off the rails. :( -- Steve