From owner-freebsd-numerics@FreeBSD.ORG  Sun Dec  8 18:33:46 2013
Return-Path: <owner-freebsd-numerics@FreeBSD.ORG>
Delivered-To: freebsd-numerics@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id EF2624B3
 for <freebsd-numerics@FreeBSD.org>; Sun,  8 Dec 2013 18:33:45 +0000 (UTC)
Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu
 [128.95.76.21])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id C10CB15B5
 for <freebsd-numerics@FreeBSD.org>; Sun,  8 Dec 2013 18:33:45 +0000 (UTC)
Received: from troutmask.apl.washington.edu (localhost.apl.washington.edu
 [127.0.0.1])
 by troutmask.apl.washington.edu (8.14.7/8.14.7) with ESMTP id rB8IXdep083385; 
 Sun, 8 Dec 2013 10:33:39 -0800 (PST)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: (from sgk@localhost)
 by troutmask.apl.washington.edu (8.14.7/8.14.7/Submit) id rB8IXdXb083384;
 Sun, 8 Dec 2013 10:33:39 -0800 (PST) (envelope-from sgk)
Date: Sun, 8 Dec 2013 10:33:39 -0800
From: Steve Kargl <sgk@troutmask.apl.washington.edu>
To: Bruce Evans <brde@optusnet.com.au>
Subject: Re: dead code in lgamma_r[f]?
Message-ID: <20131208183339.GA83010@troutmask.apl.washington.edu>
References: <20131205173700.GA64575@troutmask.apl.washington.edu>
 <20131205182324.GA65135@troutmask.apl.washington.edu>
 <CAN5hRiV6N1arc4RDv=9JbRiXp-J9o3WzAbeZSOpAxks2ZeG+_w@mail.gmail.com>
 <20131205195225.GA65732@troutmask.apl.washington.edu>
 <20131206102724.W1033@besplex.bde.org>
 <20131206114426.I1329@besplex.bde.org>
 <20131207064602.GA76042@troutmask.apl.washington.edu>
 <20131208012939.GA80145@troutmask.apl.washington.edu>
 <20131208141627.F1181@besplex.bde.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20131208141627.F1181@besplex.bde.org>
User-Agent: Mutt/1.5.21 (2010-09-15)
Cc: freebsd-numerics@FreeBSD.org, Filipe Maia <filipe.c.maia@gmail.com>
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
 <freebsd-numerics.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics/>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 08 Dec 2013 18:33:46 -0000

On Sun, Dec 08, 2013 at 03:21:04PM +1100, Bruce Evans wrote:
> On Sat, 7 Dec 2013, Steve Kargl wrote:
> 
> > I really need to stop looking at fdlibm code.  The threshold(s) in
> > e_lgammaf.c could be chosen better (and actually documented).  In
> > the float code
> >
> >    /* 8.0 <= x < 2**58 */
> > 	} else if (ix < 0x4e800000) { // (ix < 0x5c800000) {
> > 	    t = __ieee754_logf(x);
> > 	    z = one/x;
> > 	    y = z*z;
> > 	    w = w0+z*(w1+y*(w2+y*(w3+y*(w4+y*(w5+y*w6)))));
> > 	    r = (x-half)*(t-one)+w;
> > 	} else
> >    /* 2**58 <= x <= inf */
> > 	    r =  x*(__ieee754_logf(x)-one);
> >
> > the 2**58 is much too large.  Asymptotically, we have
> 
> That's Cygnus code, not fdlibm :-).
> 
> > lgamma(x)~(x-0.5)log(x)-x = x*(log(x)-1) - 0.5*log(x)
> >
> > where the 0.5*log(x) is insignificant when compared to 2**58.
> > I suspect that 2**58 is a sloppy threshold for double, which
> > can be reduced to 2**30 (or smaller) for float.  I suppose I
> > could look for a tighter bound, but 2**(p+slop) with p, the
> > precision, probably is sufficient.
> 
> The Cygnus translation to float didn't look closely enough at
> the thresholds.

It also did not recompute minimax coefficients as there are too
many terms in the approximation.  Looks like Cygnus simply rounded
the double coefficients to float.

> gamma() was the first part of libm that I looked closely at (even
> before trig functions in 2004/5).  I didn't understand it at first and
> tried to write my own.  My first idea was to use the asymptotic
> expansion as a kernel for all cases, especially for complex args.
> To use the asymptotic expansion, the real part of the arg must be
> pushed up to about 16 or greater using repeated multiplications.
> It is difficult to do these multiplications accurately enough,
> but the float case should be very easy using double precision
> multiplication.  I tried extra precision in software, but this
> was too slow.  This method also works poorly for args with negative
> real part, especially for lgamma() near its zeros.  Even using
> doubled double precision didn't get near 1 ulp of accuracy in float
> precision for lgamma() near its zeros, and the only fix that I
> know of is to use a special approximation near each zero down
> to about real part -70 (70 cases).  fdlibm is also bad for lgamma()
> near its zeros.

Interesting idea.  For x in [2,8), fdlibm code (as you probably
know) uses recursion to reduce the evaluation to a log of a 
product and lgamma(2+s) with s in [0,1).  One can then show that
lgamma(2+s) = s * (1 - E) + s**2 * P(s) where E is Euler's constant
and P(s) is a polynomial in s.  But, instead of using a minimax
polynomial approximation fdlibm code uses a rational approximation
lgamma(2+s) = 0.5*s + p(s) / q(s).  This is the origin of the 
s0, ..., s6, r1, ..., r6 coefficients.

> Here is a pari version of this (the C version is harder to understand
> due to its technical details and less suitable language; pari syntax
> is worse than C syntax for most things (including things that pari
> should be able to do well) but slightly better for this use:

I suppose one of these days, I'll learn how to use pari.  
A quick scan of n1256.pdf shows that C99 does not require
lgamma of a complex argument, but it does reserve the 
names clgamma[fl].   

> % lg(z) =
> % {
> % 	local(bol, rnew);
> % 
> % 	if (abs(z - 1) < 1.e-1, return (lngamma(z)););
> % 	if (abs(z - 2) < 1.e-1, return (lngamma(z)););
> 
> Near 1 and 2, I planned to use a poly approximation, but that is not
> so easy to do in pari so I call the pari builtin lgamma() to fake it.

fdlibm's lgammaf_r does not appear to have an issue near 1 or 2.
Starting at the lower bound of each interval and using nextafterf
to scan up to the upper bound, I'm seeing

% make testf && ./testf
Interval:                       Time    ULP      Value
[4.7683716e-07, 2.0000000e+00): 0.04590 0.98491  1.231645e+00
[2.0000000e+00, 8.0000000e+00): 0.08754 0.82825  4.012439e+00
[8.0000000e+00, 2.8823038e+17): 0.05841 1.43789  8.942273e+06
[2.8823038e+17, 2.6584560e+36): 0.05095 0.50000  1.491249e+23

where the reference value is from lgamma_r.  The different 
intervals test specific branches in lgammaf_r.  The upper
bound of 2.6584560e36 is 0x1p121.  Time is the average value
for all calls in the interval in usec/call.

> % 	bol = 0;
> % 	r = 1;
> % 	while (abs(z) < M,
> % 		rnew = r * z;
> % 		if (imag(r) > 0 && imag(rnew) < 0, bol += 1);
> % 		if (imag(r) < 0 && imag(rnew) > 0, bol -= 1);
> % 		r = rnew;
> % 		z = z + 1;
> % 	);
> 
> This pushes up z, almost as above.
> 
> % 	bol = 0;
> % 	return (lgb(z) - log(r) - 2 * Pi * bol);
> 
> Then use the asymptotic expansion.
> 
> % }
> % 
> % lgb(z) = (z - 0.5) * log(z) - z + 0.5 * log(2 * Pi) + \
> %     sum(n = 1, N, b[n] / z^(2 * n - 1))
> 
> This is the asymptotic expansion.  Pari syntax is only perfect for
> numeric things like this (and probably for some number theory things
> which I don't know well).   b[n] is a table of coefficients.
> 
> I had forgotten the details of even the first term in this.  You
> mentioned using an approximation of just the first term or two above
> a huge threshold.  It might be better to go much lower using another
> term or two (but not as many as medium-sized args).

Well, I was only interested in the poorly chosen 2**58 threshold. 
In the [8,2**58) interval the asymptotic expansion is used, but
its written in such a way that the

>%  + 0.5 * log(2 * Pi) +  sum(n = 1, N, b[n] / z^(2 * n - 1))

portion of your above code is approximated by a minimax polynomial.

    /* 8.0 <= x < 2**58 */
	} else if (ix < 0x5c800000) {
	    t = __ieee754_logf(x);
	    z = one/x;
	    y = z*z;
	    w = w0+z*(w1+y*(w2+y*(w3+y*(w4+y*(w5+y*w6)))));
	    r = (x-half)*(t-one)+w;
	} else

fdlibm use only the first 2 terms for the x >= 2**58.

> Cephes has a very complicated lgamma() or gamma(), with many special
> poly approximations for 0 < x < 8 or similar.

fdlibm's approximations in the [0,2) range are mostly black magic
to me. 

--  
Steve