From owner-freebsd-numerics@freebsd.org  Sun Feb 19 10:24:10 2017
Return-Path: <owner-freebsd-numerics@freebsd.org>
Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 4603FCDB7C9
 for <freebsd-numerics@mailman.ysv.freebsd.org>;
 Sun, 19 Feb 2017 10:24:10 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from mail106.syd.optusnet.com.au (mail106.syd.optusnet.com.au
 [211.29.132.42]) by mx1.freebsd.org (Postfix) with ESMTP id CDC221538
 for <freebsd-numerics@freebsd.org>; Sun, 19 Feb 2017 10:24:09 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from besplex.bde.org (c122-106-153-191.carlnfd1.nsw.optusnet.com.au
 [122.106.153.191])
 by mail106.syd.optusnet.com.au (Postfix) with ESMTPS id 09C653C86F4;
 Sun, 19 Feb 2017 20:51:13 +1100 (AEDT)
Date: Sun, 19 Feb 2017 20:51:12 +1100 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Bruce Evans <brde@optusnet.com.au>
cc: Steve Kargl <sgk@troutmask.apl.washington.edu>, 
 freebsd-numerics@freebsd.org
Subject: Re: [PATCH] avoid function call overhead in tgammaf
In-Reply-To: <20170219190343.Y1567@besplex.bde.org>
Message-ID: <20170219204114.Q3800@besplex.bde.org>
References: <20170219024101.GA79177@troutmask.apl.washington.edu>
 <20170219190343.Y1567@besplex.bde.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Optus-CM-Score: 0
X-Optus-CM-Analysis: v=2.2 cv=VbSHBBh9 c=1 sm=1 tr=0
 a=Tj3pCpwHnMupdyZSltBt7Q==:117 a=Tj3pCpwHnMupdyZSltBt7Q==:17
 a=kj9zAlcOel0A:10 a=tELu-FXlebb3VOcLUAUA:9 a=CjuIK1q_8ugA:10
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
 <freebsd-numerics.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics/>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 19 Feb 2017 10:24:10 -0000

On Sun, 19 Feb 2017, Bruce Evans wrote:

> On Sat, 18 Feb 2017, Steve Kargl wrote:
>
>> The following patch treats special values (i.e., +-inf, nan)
>> and values outside a limited domain to avoid the function
>> call overhead of tgamma.  Anyone with a commit bit is more
>> than welcomed to commit the patch (after of course a review).
>
> Since it is claimed to be "essentially useless", optimizing it doesn't
> seem useful.
>
>> Index: src/s_tgammaf.c
>> ===================================================================
>> --- src/s_tgammaf.c	(revision 1857)
>> +++ src/s_tgammaf.c	(working copy)
>> ...
>> +static u_int32_t overflow = 0x420c290f;		/*  35.0400981 */
>> +static u_int32_t underflow = 0x421a67d8;	/*  38.6014118 */
>
> This is a confusing name.  tgamma doesn't underflow like exp.  It overflows
> near negative integers, but below a threshold overflow can't happen because
> it is impossible to get near enough to an integer.  It also underflows for
> large negative args not near an integer.  The threshold here must be the
> minimum of the 2 thresholds.  Apparently, the one for underflow is always
> smaller (more negative) for float precision.  It is easy to check all cases
> for float precision to find the minimum.  Double precision uses a fuzzier
> threshold and spells it -190.  It uses the less fuzzy threshold -177.69 in
> a comment, except it misspells this as [plus] 177.69.

Testing shows that the threshold is too high (above the "underflow"
threshold).  Perhaps you forgot about gradual underflow.  The precise
value seems to be:

static u_int32_t underflow = 0x4224000b;	/*  -(-38.something) */

>
>> +static volatile float huge = 1.e30, tiny = 1.e-30;
>> +
>> float
>> tgammaf(float x)
>> {
>> +	u_int32_t hx;
>> +	int32_t ix, sg;
>> +
>> +	GET_FLOAT_WORD(hx, x);
>> +	ix = hx & 0x7fffffff;
>> +	sg = hx & 0x80000000;
>> +
>> +	if (ix > overflow) {
>> +		if (ix >= 0x7f800000)
>> +			return (sg ? x / x : x + x);
>> +		if (!sg)
>> +		 	return (huge * huge);
>> +	}
>> +
>> +	if (ix == 0)
>> +		return (1 / x);
>> +
>> +	if (sg && ix > underflow) {
>> +		/*
>> +		 * tgammaf(x) for integral x returns an NaN, so we implement
>> +		 * a poor man's rintf().
>> +		 */
>> +		volatile float vz;
>> +		float y,z;
>> +
>> +		y = -x;
>> +
>> +		vz = y + 0x1p23F;	/* depend on 0 <= y < 0x1p23 */
>> +		z = vz - 0x1p23F;	/* rintf(y) for the above range */
>> +		if (z == y)
>> +	    		return ((x - x) / (x - x));
>
> This has delicate optimizations which belong more in tgamma().  tgamma()
> just uses (x == ceil(x)) to classify negative integers.  tgamma() also
> has fewer/different style bugs.

The optimizations are quite broken.  x can be any finite non-integer
above ~38, so it doesn't satisfy y < 0x1p23.

I used the quick fix of copying the slow method from tgamma():

 		if (y == ceil(y))
 	    		return ((x - x) / (x - x));
>
>> +
>> +		return (tiny * tiny);
>
> The first error is immediately below the underflow threshold:
>
> X x =                           0xc21a67d9 -38.6014137
> X tgamma(x) = 0xb68ffffca9859b0c 0x80000000 -7.00648117e-46
> X tgammaf(x) =                  0          0 0
> X err = -0x800000000ffffe54 17179869184.50000
>
> This is just a sign error.  tgamma(x) is the small negative value
> ~ 7e-46.  This underflows to -0.0F, but tiny*tiny gives +0.0F.
>
> tgamma() does similar things plus extras to get the sign right.  The
> sign is -1 if ceil(x) is even for x below the underflow threshold.
> It is not so easy to determine the sign from the raw bits.

I used the quick fix of copying the slow method from tgamma():

 		z = ceil(y);
 		z = z / 2;
 		if (z == ceil(z))
 			return (tiny * tiny);
 		return (-tiny * tiny);

>
>> +	}
>> 
>> 	return (tgamma(x));
>> }

I tested this for all cases where hx >= 0xc2240000 in the default rounding
mode.  (Other modes aren't supposed to work, but they might, and ceil()
works right for them more clearly than the optimizations.)

Bruce