From owner-freebsd-numerics@freebsd.org  Sun Feb 19 02:41:01 2017
Return-Path: <owner-freebsd-numerics@freebsd.org>
Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id BD661CDBD0A
 for <freebsd-numerics@mailman.ysv.freebsd.org>;
 Sun, 19 Feb 2017 02:41:01 +0000 (UTC)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu
 [128.95.76.21])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client CN "troutmask", Issuer "troutmask" (not verified))
 by mx1.freebsd.org (Postfix) with ESMTPS id A411A3C3
 for <freebsd-numerics@freebsd.org>; Sun, 19 Feb 2017 02:41:01 +0000 (UTC)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: from troutmask.apl.washington.edu (localhost [127.0.0.1])
 by troutmask.apl.washington.edu (8.15.2/8.15.2) with ESMTPS id v1J2f1UR079192
 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO)
 for <freebsd-numerics@freebsd.org>; Sat, 18 Feb 2017 18:41:01 -0800 (PST)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: (from sgk@localhost)
 by troutmask.apl.washington.edu (8.15.2/8.15.2/Submit) id v1J2f1oF079191
 for freebsd-numerics@freebsd.org; Sat, 18 Feb 2017 18:41:01 -0800 (PST)
 (envelope-from sgk)
Date: Sat, 18 Feb 2017 18:41:01 -0800
From: Steve Kargl <sgk@troutmask.apl.washington.edu>
To: freebsd-numerics@freebsd.org
Subject: [PATCH] avoid function call overhead in tgammaf
Message-ID: <20170219024101.GA79177@troutmask.apl.washington.edu>
Reply-To: sgk@troutmask.apl.washington.edu
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.7.2 (2016-11-26)
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
 <freebsd-numerics.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics/>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 19 Feb 2017 02:41:01 -0000

The following patch treats special values (i.e., +-inf, nan)
and values outside a limited domain to avoid the function
call overhead of tgamma.  Anyone with a commit bit is more
than welcomed to commit the patch (after of course a review).

Index: src/s_tgammaf.c
===================================================================
--- src/s_tgammaf.c	(revision 1857)
+++ src/s_tgammaf.c	(working copy)
@@ -27,17 +27,59 @@
 #include <sys/cdefs.h>
 __FBSDID("$FreeBSD: head/lib/msun/src/s_tgammaf.c 176388 2008-02-18 17:27:11Z das $");
 
-#include <math.h>
+#include "math.h"
+#include "math_private.h"
 
 /*
- * We simply call tgamma() rather than bloating the math library with
- * a float-optimized version of it. The reason is that tgammaf() is
- * essentially useless, since the function is superexponential and
- * floats have very limited range.
+ * The gamma function is superexponential, which means that floats have
+ * a very limited domain.  Rather than bloating the math library with a
+ * float-optimized version of tgammaf, we call tgamma() within the limited
+ * domain of [-underflow,overflow].  However, to avoid function call overhead,
+ * tgammaf() directly treats special values and values outside the limited
+ * domain.
  */
+
+static u_int32_t overflow = 0x420c290f;		/*  35.0400981 */
+static u_int32_t underflow = 0x421a67d8;	/*  38.6014118 */
+static volatile float huge = 1.e30, tiny = 1.e-30;
+
 float
 tgammaf(float x)
 {
+	u_int32_t hx;
+	int32_t ix, sg;
+
+	GET_FLOAT_WORD(hx, x);
+	ix = hx & 0x7fffffff;
+	sg = hx & 0x80000000;
+
+	if (ix > overflow) {
+		if (ix >= 0x7f800000)
+			return (sg ? x / x : x + x);
+		if (!sg)
+		 	return (huge * huge);
+	}
+
+	if (ix == 0)
+		return (1 / x);
+
+	if (sg && ix > underflow) {
+		/*
+		 * tgammaf(x) for integral x returns an NaN, so we implement
+		 * a poor man's rintf().
+		 */
+		volatile float vz;
+		float y,z;
+
+		y = -x;
+
+		vz = y + 0x1p23F;	/* depend on 0 <= y < 0x1p23 */
+		z = vz - 0x1p23F;	/* rintf(y) for the above range */
+		if (z == y)
+	    		return ((x - x) / (x - x));
+
+		return (tiny * tiny);
+	}
 
 	return (tgamma(x));
 }


-- 
Steve
20161221 https://www.youtube.com/watch?v=IbCHE-hONow

From owner-freebsd-numerics@freebsd.org  Sun Feb 19 08:59:59 2017
Return-Path: <owner-freebsd-numerics@freebsd.org>
Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 95455CE521E
 for <freebsd-numerics@mailman.ysv.freebsd.org>;
 Sun, 19 Feb 2017 08:59:59 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from mail104.syd.optusnet.com.au (mail104.syd.optusnet.com.au
 [211.29.132.246])
 by mx1.freebsd.org (Postfix) with ESMTP id 60981106E
 for <freebsd-numerics@freebsd.org>; Sun, 19 Feb 2017 08:59:58 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from besplex.bde.org (c122-106-153-191.carlnfd1.nsw.optusnet.com.au
 [122.106.153.191])
 by mail104.syd.optusnet.com.au (Postfix) with ESMTPS id 14D2142A65C;
 Sun, 19 Feb 2017 19:59:48 +1100 (AEDT)
Date: Sun, 19 Feb 2017 19:59:48 +1100 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Steve Kargl <sgk@troutmask.apl.washington.edu>
cc: freebsd-numerics@freebsd.org
Subject: Re: [PATCH] avoid function call overhead in tgammaf
In-Reply-To: <20170219024101.GA79177@troutmask.apl.washington.edu>
Message-ID: <20170219190343.Y1567@besplex.bde.org>
References: <20170219024101.GA79177@troutmask.apl.washington.edu>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Optus-CM-Score: 0
X-Optus-CM-Analysis: v=2.2 cv=Ca543Pjl c=1 sm=1 tr=0
 a=Tj3pCpwHnMupdyZSltBt7Q==:117 a=Tj3pCpwHnMupdyZSltBt7Q==:17
 a=kj9zAlcOel0A:10 a=RRXJm6L_Oh1pcGlzDb0A:9 a=CjuIK1q_8ugA:10
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
 <freebsd-numerics.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics/>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 19 Feb 2017 08:59:59 -0000

On Sat, 18 Feb 2017, Steve Kargl wrote:

> The following patch treats special values (i.e., +-inf, nan)
> and values outside a limited domain to avoid the function
> call overhead of tgamma.  Anyone with a commit bit is more
> than welcomed to commit the patch (after of course a review).

Since it is claimed to be "essentially useless", optimizing it doesn't
seem useful.

> Index: src/s_tgammaf.c
> ===================================================================
> --- src/s_tgammaf.c	(revision 1857)
> +++ src/s_tgammaf.c	(working copy)
> ...
> +static u_int32_t overflow = 0x420c290f;		/*  35.0400981 */
> +static u_int32_t underflow = 0x421a67d8;	/*  38.6014118 */

This is a confusing name.  tgamma doesn't underflow like exp.  It overflows
near negative integers, but below a threshold overflow can't happen because
it is impossible to get near enough to an integer.  It also underflows for
large negative args not near an integer.  The threshold here must be the
minimum of the 2 thresholds.  Apparently, the one for underflow is always
smaller (more negative) for float precision.  It is easy to check all cases
for float precision to find the minimum.  Double precision uses a fuzzier
threshold and spells it -190.  It uses the less fuzzy threshold -177.69 in
a comment, except it misspells this as [plus] 177.69.

> +static volatile float huge = 1.e30, tiny = 1.e-30;
> +
> float
> tgammaf(float x)
> {
> +	u_int32_t hx;
> +	int32_t ix, sg;
> +
> +	GET_FLOAT_WORD(hx, x);
> +	ix = hx & 0x7fffffff;
> +	sg = hx & 0x80000000;
> +
> +	if (ix > overflow) {
> +		if (ix >= 0x7f800000)
> +			return (sg ? x / x : x + x);
> +		if (!sg)
> +		 	return (huge * huge);
> +	}
> +
> +	if (ix == 0)
> +		return (1 / x);
> +
> +	if (sg && ix > underflow) {
> +		/*
> +		 * tgammaf(x) for integral x returns an NaN, so we implement
> +		 * a poor man's rintf().
> +		 */
> +		volatile float vz;
> +		float y,z;
> +
> +		y = -x;
> +
> +		vz = y + 0x1p23F;	/* depend on 0 <= y < 0x1p23 */
> +		z = vz - 0x1p23F;	/* rintf(y) for the above range */
> +		if (z == y)
> +	    		return ((x - x) / (x - x));

This has delicate optimizations which belong more in tgamma().  tgamma()
just uses (x == ceil(x)) to classify negative integers.  tgamma() also
has fewer/different style bugs.

> +
> +		return (tiny * tiny);

The first error is immediately below the underflow threshold:

X x =                           0xc21a67d9 -38.6014137
X tgamma(x) = 0xb68ffffca9859b0c 0x80000000 -7.00648117e-46
X tgammaf(x) =                  0          0 0
X err = -0x800000000ffffe54 17179869184.50000

This is just a sign error.  tgamma(x) is the small negative value
~ 7e-46.  This underflows to -0.0F, but tiny*tiny gives +0.0F.

tgamma() does similar things plus extras to get the sign right.  The
sign is -1 if ceil(x) is even for x below the underflow threshold.
It is not so easy to determine the sign from the raw bits.

> +	}
>
> 	return (tgamma(x));
> }

Bruce

From owner-freebsd-numerics@freebsd.org  Sun Feb 19 10:24:10 2017
Return-Path: <owner-freebsd-numerics@freebsd.org>
Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 4603FCDB7C9
 for <freebsd-numerics@mailman.ysv.freebsd.org>;
 Sun, 19 Feb 2017 10:24:10 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from mail106.syd.optusnet.com.au (mail106.syd.optusnet.com.au
 [211.29.132.42]) by mx1.freebsd.org (Postfix) with ESMTP id CDC221538
 for <freebsd-numerics@freebsd.org>; Sun, 19 Feb 2017 10:24:09 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from besplex.bde.org (c122-106-153-191.carlnfd1.nsw.optusnet.com.au
 [122.106.153.191])
 by mail106.syd.optusnet.com.au (Postfix) with ESMTPS id 09C653C86F4;
 Sun, 19 Feb 2017 20:51:13 +1100 (AEDT)
Date: Sun, 19 Feb 2017 20:51:12 +1100 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Bruce Evans <brde@optusnet.com.au>
cc: Steve Kargl <sgk@troutmask.apl.washington.edu>, 
 freebsd-numerics@freebsd.org
Subject: Re: [PATCH] avoid function call overhead in tgammaf
In-Reply-To: <20170219190343.Y1567@besplex.bde.org>
Message-ID: <20170219204114.Q3800@besplex.bde.org>
References: <20170219024101.GA79177@troutmask.apl.washington.edu>
 <20170219190343.Y1567@besplex.bde.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Optus-CM-Score: 0
X-Optus-CM-Analysis: v=2.2 cv=VbSHBBh9 c=1 sm=1 tr=0
 a=Tj3pCpwHnMupdyZSltBt7Q==:117 a=Tj3pCpwHnMupdyZSltBt7Q==:17
 a=kj9zAlcOel0A:10 a=tELu-FXlebb3VOcLUAUA:9 a=CjuIK1q_8ugA:10
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
 <freebsd-numerics.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics/>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 19 Feb 2017 10:24:10 -0000

On Sun, 19 Feb 2017, Bruce Evans wrote:

> On Sat, 18 Feb 2017, Steve Kargl wrote:
>
>> The following patch treats special values (i.e., +-inf, nan)
>> and values outside a limited domain to avoid the function
>> call overhead of tgamma.  Anyone with a commit bit is more
>> than welcomed to commit the patch (after of course a review).
>
> Since it is claimed to be "essentially useless", optimizing it doesn't
> seem useful.
>
>> Index: src/s_tgammaf.c
>> ===================================================================
>> --- src/s_tgammaf.c	(revision 1857)
>> +++ src/s_tgammaf.c	(working copy)
>> ...
>> +static u_int32_t overflow = 0x420c290f;		/*  35.0400981 */
>> +static u_int32_t underflow = 0x421a67d8;	/*  38.6014118 */
>
> This is a confusing name.  tgamma doesn't underflow like exp.  It overflows
> near negative integers, but below a threshold overflow can't happen because
> it is impossible to get near enough to an integer.  It also underflows for
> large negative args not near an integer.  The threshold here must be the
> minimum of the 2 thresholds.  Apparently, the one for underflow is always
> smaller (more negative) for float precision.  It is easy to check all cases
> for float precision to find the minimum.  Double precision uses a fuzzier
> threshold and spells it -190.  It uses the less fuzzy threshold -177.69 in
> a comment, except it misspells this as [plus] 177.69.

Testing shows that the threshold is too high (above the "underflow"
threshold).  Perhaps you forgot about gradual underflow.  The precise
value seems to be:

static u_int32_t underflow = 0x4224000b;	/*  -(-38.something) */

>
>> +static volatile float huge = 1.e30, tiny = 1.e-30;
>> +
>> float
>> tgammaf(float x)
>> {
>> +	u_int32_t hx;
>> +	int32_t ix, sg;
>> +
>> +	GET_FLOAT_WORD(hx, x);
>> +	ix = hx & 0x7fffffff;
>> +	sg = hx & 0x80000000;
>> +
>> +	if (ix > overflow) {
>> +		if (ix >= 0x7f800000)
>> +			return (sg ? x / x : x + x);
>> +		if (!sg)
>> +		 	return (huge * huge);
>> +	}
>> +
>> +	if (ix == 0)
>> +		return (1 / x);
>> +
>> +	if (sg && ix > underflow) {
>> +		/*
>> +		 * tgammaf(x) for integral x returns an NaN, so we implement
>> +		 * a poor man's rintf().
>> +		 */
>> +		volatile float vz;
>> +		float y,z;
>> +
>> +		y = -x;
>> +
>> +		vz = y + 0x1p23F;	/* depend on 0 <= y < 0x1p23 */
>> +		z = vz - 0x1p23F;	/* rintf(y) for the above range */
>> +		if (z == y)
>> +	    		return ((x - x) / (x - x));
>
> This has delicate optimizations which belong more in tgamma().  tgamma()
> just uses (x == ceil(x)) to classify negative integers.  tgamma() also
> has fewer/different style bugs.

The optimizations are quite broken.  x can be any finite non-integer
above ~38, so it doesn't satisfy y < 0x1p23.

I used the quick fix of copying the slow method from tgamma():

 		if (y == ceil(y))
 	    		return ((x - x) / (x - x));
>
>> +
>> +		return (tiny * tiny);
>
> The first error is immediately below the underflow threshold:
>
> X x =                           0xc21a67d9 -38.6014137
> X tgamma(x) = 0xb68ffffca9859b0c 0x80000000 -7.00648117e-46
> X tgammaf(x) =                  0          0 0
> X err = -0x800000000ffffe54 17179869184.50000
>
> This is just a sign error.  tgamma(x) is the small negative value
> ~ 7e-46.  This underflows to -0.0F, but tiny*tiny gives +0.0F.
>
> tgamma() does similar things plus extras to get the sign right.  The
> sign is -1 if ceil(x) is even for x below the underflow threshold.
> It is not so easy to determine the sign from the raw bits.

I used the quick fix of copying the slow method from tgamma():

 		z = ceil(y);
 		z = z / 2;
 		if (z == ceil(z))
 			return (tiny * tiny);
 		return (-tiny * tiny);

>
>> +	}
>> 
>> 	return (tgamma(x));
>> }

I tested this for all cases where hx >= 0xc2240000 in the default rounding
mode.  (Other modes aren't supposed to work, but they might, and ceil()
works right for them more clearly than the optimizations.)

Bruce