From owner-freebsd-numerics@FreeBSD.ORG Sun Dec 8 18:33:46 2013 Return-Path: Delivered-To: freebsd-numerics@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id EF2624B3 for ; Sun, 8 Dec 2013 18:33:45 +0000 (UTC) Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu [128.95.76.21]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id C10CB15B5 for ; Sun, 8 Dec 2013 18:33:45 +0000 (UTC) Received: from troutmask.apl.washington.edu (localhost.apl.washington.edu [127.0.0.1]) by troutmask.apl.washington.edu (8.14.7/8.14.7) with ESMTP id rB8IXdep083385; Sun, 8 Dec 2013 10:33:39 -0800 (PST) (envelope-from sgk@troutmask.apl.washington.edu) Received: (from sgk@localhost) by troutmask.apl.washington.edu (8.14.7/8.14.7/Submit) id rB8IXdXb083384; Sun, 8 Dec 2013 10:33:39 -0800 (PST) (envelope-from sgk) Date: Sun, 8 Dec 2013 10:33:39 -0800 From: Steve Kargl To: Bruce Evans Subject: Re: dead code in lgamma_r[f]? Message-ID: <20131208183339.GA83010@troutmask.apl.washington.edu> References: <20131205173700.GA64575@troutmask.apl.washington.edu> <20131205182324.GA65135@troutmask.apl.washington.edu> <20131205195225.GA65732@troutmask.apl.washington.edu> <20131206102724.W1033@besplex.bde.org> <20131206114426.I1329@besplex.bde.org> <20131207064602.GA76042@troutmask.apl.washington.edu> <20131208012939.GA80145@troutmask.apl.washington.edu> <20131208141627.F1181@besplex.bde.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20131208141627.F1181@besplex.bde.org> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-numerics@FreeBSD.org, Filipe Maia X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 08 Dec 2013 18:33:46 -0000 On Sun, Dec 08, 2013 at 03:21:04PM +1100, Bruce Evans wrote: > On Sat, 7 Dec 2013, Steve Kargl wrote: > > > I really need to stop looking at fdlibm code. The threshold(s) in > > e_lgammaf.c could be chosen better (and actually documented). In > > the float code > > > > /* 8.0 <= x < 2**58 */ > > } else if (ix < 0x4e800000) { // (ix < 0x5c800000) { > > t = __ieee754_logf(x); > > z = one/x; > > y = z*z; > > w = w0+z*(w1+y*(w2+y*(w3+y*(w4+y*(w5+y*w6))))); > > r = (x-half)*(t-one)+w; > > } else > > /* 2**58 <= x <= inf */ > > r = x*(__ieee754_logf(x)-one); > > > > the 2**58 is much too large. Asymptotically, we have > > That's Cygnus code, not fdlibm :-). > > > lgamma(x)~(x-0.5)log(x)-x = x*(log(x)-1) - 0.5*log(x) > > > > where the 0.5*log(x) is insignificant when compared to 2**58. > > I suspect that 2**58 is a sloppy threshold for double, which > > can be reduced to 2**30 (or smaller) for float. I suppose I > > could look for a tighter bound, but 2**(p+slop) with p, the > > precision, probably is sufficient. > > The Cygnus translation to float didn't look closely enough at > the thresholds. It also did not recompute minimax coefficients as there are too many terms in the approximation. Looks like Cygnus simply rounded the double coefficients to float. > gamma() was the first part of libm that I looked closely at (even > before trig functions in 2004/5). I didn't understand it at first and > tried to write my own. My first idea was to use the asymptotic > expansion as a kernel for all cases, especially for complex args. > To use the asymptotic expansion, the real part of the arg must be > pushed up to about 16 or greater using repeated multiplications. > It is difficult to do these multiplications accurately enough, > but the float case should be very easy using double precision > multiplication. I tried extra precision in software, but this > was too slow. This method also works poorly for args with negative > real part, especially for lgamma() near its zeros. Even using > doubled double precision didn't get near 1 ulp of accuracy in float > precision for lgamma() near its zeros, and the only fix that I > know of is to use a special approximation near each zero down > to about real part -70 (70 cases). fdlibm is also bad for lgamma() > near its zeros. Interesting idea. For x in [2,8), fdlibm code (as you probably know) uses recursion to reduce the evaluation to a log of a product and lgamma(2+s) with s in [0,1). One can then show that lgamma(2+s) = s * (1 - E) + s**2 * P(s) where E is Euler's constant and P(s) is a polynomial in s. But, instead of using a minimax polynomial approximation fdlibm code uses a rational approximation lgamma(2+s) = 0.5*s + p(s) / q(s). This is the origin of the s0, ..., s6, r1, ..., r6 coefficients. > Here is a pari version of this (the C version is harder to understand > due to its technical details and less suitable language; pari syntax > is worse than C syntax for most things (including things that pari > should be able to do well) but slightly better for this use: I suppose one of these days, I'll learn how to use pari. A quick scan of n1256.pdf shows that C99 does not require lgamma of a complex argument, but it does reserve the names clgamma[fl]. > % lg(z) = > % { > % local(bol, rnew); > % > % if (abs(z - 1) < 1.e-1, return (lngamma(z));); > % if (abs(z - 2) < 1.e-1, return (lngamma(z));); > > Near 1 and 2, I planned to use a poly approximation, but that is not > so easy to do in pari so I call the pari builtin lgamma() to fake it. fdlibm's lgammaf_r does not appear to have an issue near 1 or 2. Starting at the lower bound of each interval and using nextafterf to scan up to the upper bound, I'm seeing % make testf && ./testf Interval: Time ULP Value [4.7683716e-07, 2.0000000e+00): 0.04590 0.98491 1.231645e+00 [2.0000000e+00, 8.0000000e+00): 0.08754 0.82825 4.012439e+00 [8.0000000e+00, 2.8823038e+17): 0.05841 1.43789 8.942273e+06 [2.8823038e+17, 2.6584560e+36): 0.05095 0.50000 1.491249e+23 where the reference value is from lgamma_r. The different intervals test specific branches in lgammaf_r. The upper bound of 2.6584560e36 is 0x1p121. Time is the average value for all calls in the interval in usec/call. > % bol = 0; > % r = 1; > % while (abs(z) < M, > % rnew = r * z; > % if (imag(r) > 0 && imag(rnew) < 0, bol += 1); > % if (imag(r) < 0 && imag(rnew) > 0, bol -= 1); > % r = rnew; > % z = z + 1; > % ); > > This pushes up z, almost as above. > > % bol = 0; > % return (lgb(z) - log(r) - 2 * Pi * bol); > > Then use the asymptotic expansion. > > % } > % > % lgb(z) = (z - 0.5) * log(z) - z + 0.5 * log(2 * Pi) + \ > % sum(n = 1, N, b[n] / z^(2 * n - 1)) > > This is the asymptotic expansion. Pari syntax is only perfect for > numeric things like this (and probably for some number theory things > which I don't know well). b[n] is a table of coefficients. > > I had forgotten the details of even the first term in this. You > mentioned using an approximation of just the first term or two above > a huge threshold. It might be better to go much lower using another > term or two (but not as many as medium-sized args). Well, I was only interested in the poorly chosen 2**58 threshold. In the [8,2**58) interval the asymptotic expansion is used, but its written in such a way that the >% + 0.5 * log(2 * Pi) + sum(n = 1, N, b[n] / z^(2 * n - 1)) portion of your above code is approximated by a minimax polynomial. /* 8.0 <= x < 2**58 */ } else if (ix < 0x5c800000) { t = __ieee754_logf(x); z = one/x; y = z*z; w = w0+z*(w1+y*(w2+y*(w3+y*(w4+y*(w5+y*w6))))); r = (x-half)*(t-one)+w; } else fdlibm use only the first 2 terms for the x >= 2**58. > Cephes has a very complicated lgamma() or gamma(), with many special > poly approximations for 0 < x < 8 or similar. fdlibm's approximations in the [0,2) range are mostly black magic to me. -- Steve