From owner-freebsd-numerics@FreeBSD.ORG Sun Dec 8 04:21:15 2013 Return-Path: Delivered-To: freebsd-numerics@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 9AA5E67F for ; Sun, 8 Dec 2013 04:21:15 +0000 (UTC) Received: from mail104.syd.optusnet.com.au (mail104.syd.optusnet.com.au [211.29.132.246]) by mx1.freebsd.org (Postfix) with ESMTP id 484051027 for ; Sun, 8 Dec 2013 04:21:14 +0000 (UTC) Received: from c122-106-156-23.carlnfd1.nsw.optusnet.com.au (c122-106-156-23.carlnfd1.nsw.optusnet.com.au [122.106.156.23]) by mail104.syd.optusnet.com.au (Postfix) with ESMTPS id 8D33442032F; Sun, 8 Dec 2013 15:21:06 +1100 (EST) Date: Sun, 8 Dec 2013 15:21:04 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Steve Kargl Subject: Re: dead code in lgamma_r[f]? In-Reply-To: <20131208012939.GA80145@troutmask.apl.washington.edu> Message-ID: <20131208141627.F1181@besplex.bde.org> References: <20131205173700.GA64575@troutmask.apl.washington.edu> <20131205182324.GA65135@troutmask.apl.washington.edu> <20131205195225.GA65732@troutmask.apl.washington.edu> <20131206102724.W1033@besplex.bde.org> <20131206114426.I1329@besplex.bde.org> <20131207064602.GA76042@troutmask.apl.washington.edu> <20131208012939.GA80145@troutmask.apl.washington.edu> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.1 cv=YYGEuWhf c=1 sm=1 tr=0 a=ebeQFi2P/qHVC0Yw9JDJ4g==:117 a=PO7r1zJSAAAA:8 a=m8NKqXhiB1EA:10 a=kj9zAlcOel0A:10 a=JzwRw_2MAAAA:8 a=ElNGO9NIuDsA:10 a=YmPyuLAzrPIqYOFI3MIA:9 a=7i44fyHmEgJOJrlA:21 a=KHVXADVR9NTKJXXW:21 a=CjuIK1q_8ugA:10 Cc: freebsd-numerics@FreeBSD.org, Filipe Maia X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 08 Dec 2013 04:21:15 -0000 On Sat, 7 Dec 2013, Steve Kargl wrote: > I really need to stop looking at fdlibm code. The threshold(s) in > e_lgammaf.c could be chosen better (and actually documented). In > the float code > > /* 8.0 <= x < 2**58 */ > } else if (ix < 0x4e800000) { // (ix < 0x5c800000) { > t = __ieee754_logf(x); > z = one/x; > y = z*z; > w = w0+z*(w1+y*(w2+y*(w3+y*(w4+y*(w5+y*w6))))); > r = (x-half)*(t-one)+w; > } else > /* 2**58 <= x <= inf */ > r = x*(__ieee754_logf(x)-one); > > the 2**58 is much too large. Asymptotically, we have That's Cygnus code, not fdlibm :-). > lgamma(x)~(x-0.5)log(x)-x = x*(log(x)-1) - 0.5*log(x) > > where the 0.5*log(x) is insignificant when compared to 2**58. > I suspect that 2**58 is a sloppy threshold for double, which > can be reduced to 2**30 (or smaller) for float. I suppose I > could look for a tighter bound, but 2**(p+slop) with p, the > precision, probably is sufficient. The Cygnus translation to float didn't look closely enough at the thresholds. gamma() was the first part of libm that I looked closely at (even before trig functions in 2004/5). I didn't understand it at first and tried to write my own. My first idea was to use the asymptotic expansion as a kernel for all cases, especially for complex args. To use the asymptotic expansion, the real part of the arg must be pushed up to about 16 or greater using repeated multiplications. It is difficult to do these multiplications accurately enough, but the float case should be very easy using double precision multiplication. I tried extra precision in software, but this was too slow. This method also works poorly for args with negative real part, especially for lgamma() near its zeros. Even using doubled double precision didn't get near 1 ulp of accuracy in float precision for lgamma() near its zeros, and the only fix that I know of is to use a special approximation near each zero down to about real part -70 (70 cases). fdlibm is also bad for lgamma() near its zeros. Here is a pari version of this (the C version is harder to understand due to its technical details and less suitable language; pari syntax is worse than C syntax for most things (including things that pari should be able to do well) but slightly better for this use: % g(z) = % { % local(r); % % r = 1; % while (abs(z) < M, % r = r * z; % z = z + 1; % ); % return (exp(lgb(z)) / r); % } This is cgamma(). The whole thing, modulo not handling exceptional args and depending on subroutines. It first pushes up Real(z). It uses abs(z) instead of Real(z) for some reason. Apparently it only works for real z and is buggy for large negative z (small negative z get pushed up towards +Inf). Then it calls the local routine lgb() to do the asymptotic expansion and the pari builtin exp() to convert from clgamma() to cgamma(). % % lg(z) = % { % local(bol, rnew); % % if (abs(z - 1) < 1.e-1, return (lngamma(z));); % if (abs(z - 2) < 1.e-1, return (lngamma(z));); Near 1 and 2, I planned to use a poly approximation, but that is not so easy to do in pari so I call the pari builtin lgamma() to fake it. % bol = 0; % r = 1; % while (abs(z) < M, % rnew = r * z; % if (imag(r) > 0 && imag(rnew) < 0, bol += 1); % if (imag(r) < 0 && imag(rnew) > 0, bol -= 1); % r = rnew; % z = z + 1; % ); This pushes up z, almost as above. % bol = 0; % return (lgb(z) - log(r) - 2 * Pi * bol); Then use the asymptotic expansion. % } % % lgb(z) = (z - 0.5) * log(z) - z + 0.5 * log(2 * Pi) + \ % sum(n = 1, N, b[n] / z^(2 * n - 1)) This is the asymptotic expansion. Pari syntax is only perfect for numeric things like this (and probably for some number theory things which I don't know well). b[n] is a table of coefficients. I had forgotten the details of even the first term in this. You mentioned using an approximation of just the first term or two above a huge threshold. It might be better to go much lower using another term or two (but not as many as medium-sized args). % % setb(N) = % { % b = vector(N); % for (n = 1, N, % b[n] = roundn(bernreal(2 * n) / (2 * n * (2 * n - 1)), P); % ); % b = precision(b, 100); % } This initializes b[] using the builtin bernreal() which gives Bernoulli numbers. roundn() is a local function (not shown) that rounds to 64 bits in IEEE round-to-nearest mode. IEEE arithmetic is too hard to fake everywhere here. it would need a rounding step after every operation. I only adding these steps while developing clog(), mainly to understand the limitations of tripled double precision. % % pg() = plot(x = A, B, abs(g(x + C * I) / gamma(x + C * I) - 1)) % pq() = plot(x = A, B, imag(g(x + C * I)) / imag(gamma(x + C * I)) - 1) % xgamma(z) = % { % if ((truncate(z) - z) < 1e-20, return (100000);); % if (abs(gamma(z) - 1) < 1.e-40, return (1 + 1.e-40);); % return (gamma(z)); % } % pr() = plot(x = A, B, abs(1 / (xgamma(x) - 1))) % % plg() = plot(x = A, B, abs(lg(x + C * I) / lngamma(x + C * I) - 1)) % % A = 1; % B = 100; % C = 0; % M = 17; % N = 8; % P = 64; % setb(100); % gamma(10) % g(10) Testing stuff. Only M, N and P are of interest. M gives the threshold for the asymptotic expansion. N gives the number of terms in it that are used, not counting the furst couple. P gives the precision. Results: 362880.0000000000000000000000 362879.9999999999999999714634 Pari gamma(10) gave the exact result (9 factorial). g(10) got 22 digits corect, which is more than can be hoped for (there would be more rounding errors with finite precision throughout). g() is sloppy for complex args, but after changing 10 to 10+I it works well: -217255.6436196325446121126363 + 267132.0341468011581535241915*I (gamma) -217255.6436196325446121611110 + 267132.0341468011581534939537*I (g) Cephes has a very complicated lgamma() or gamma(), with many special poly approximations for 0 < x < 8 or similar. glibc adopted this. It seems too hard to scale this for complex args. You just can't write a special approximation that even converges in a strip 1 < Re(z) < 8 or similar. But pushing up the real part until the asymptotic expansion works is a very general method. For complex long double args, maybe we should just use this method and not worry about the losses of accuracy from the multiplications. 16 multiplications would lose about 32 ulps. This would give imprecise.c ;-(. A few more than 16 would be needed for more than float precision. Bruce