Date: Sun, 19 Feb 2017 20:51:12 +1100 (EST) From: Bruce Evans <brde@optusnet.com.au> To: Bruce Evans <brde@optusnet.com.au> Cc: Steve Kargl <sgk@troutmask.apl.washington.edu>, freebsd-numerics@freebsd.org Subject: Re: [PATCH] avoid function call overhead in tgammaf Message-ID: <20170219204114.Q3800@besplex.bde.org> In-Reply-To: <20170219190343.Y1567@besplex.bde.org> References: <20170219024101.GA79177@troutmask.apl.washington.edu> <20170219190343.Y1567@besplex.bde.org>
index | next in thread | previous in thread | raw e-mail
On Sun, 19 Feb 2017, Bruce Evans wrote:
> On Sat, 18 Feb 2017, Steve Kargl wrote:
>
>> The following patch treats special values (i.e., +-inf, nan)
>> and values outside a limited domain to avoid the function
>> call overhead of tgamma. Anyone with a commit bit is more
>> than welcomed to commit the patch (after of course a review).
>
> Since it is claimed to be "essentially useless", optimizing it doesn't
> seem useful.
>
>> Index: src/s_tgammaf.c
>> ===================================================================
>> --- src/s_tgammaf.c (revision 1857)
>> +++ src/s_tgammaf.c (working copy)
>> ...
>> +static u_int32_t overflow = 0x420c290f; /* 35.0400981 */
>> +static u_int32_t underflow = 0x421a67d8; /* 38.6014118 */
>
> This is a confusing name. tgamma doesn't underflow like exp. It overflows
> near negative integers, but below a threshold overflow can't happen because
> it is impossible to get near enough to an integer. It also underflows for
> large negative args not near an integer. The threshold here must be the
> minimum of the 2 thresholds. Apparently, the one for underflow is always
> smaller (more negative) for float precision. It is easy to check all cases
> for float precision to find the minimum. Double precision uses a fuzzier
> threshold and spells it -190. It uses the less fuzzy threshold -177.69 in
> a comment, except it misspells this as [plus] 177.69.
Testing shows that the threshold is too high (above the "underflow"
threshold). Perhaps you forgot about gradual underflow. The precise
value seems to be:
static u_int32_t underflow = 0x4224000b; /* -(-38.something) */
>
>> +static volatile float huge = 1.e30, tiny = 1.e-30;
>> +
>> float
>> tgammaf(float x)
>> {
>> + u_int32_t hx;
>> + int32_t ix, sg;
>> +
>> + GET_FLOAT_WORD(hx, x);
>> + ix = hx & 0x7fffffff;
>> + sg = hx & 0x80000000;
>> +
>> + if (ix > overflow) {
>> + if (ix >= 0x7f800000)
>> + return (sg ? x / x : x + x);
>> + if (!sg)
>> + return (huge * huge);
>> + }
>> +
>> + if (ix == 0)
>> + return (1 / x);
>> +
>> + if (sg && ix > underflow) {
>> + /*
>> + * tgammaf(x) for integral x returns an NaN, so we implement
>> + * a poor man's rintf().
>> + */
>> + volatile float vz;
>> + float y,z;
>> +
>> + y = -x;
>> +
>> + vz = y + 0x1p23F; /* depend on 0 <= y < 0x1p23 */
>> + z = vz - 0x1p23F; /* rintf(y) for the above range */
>> + if (z == y)
>> + return ((x - x) / (x - x));
>
> This has delicate optimizations which belong more in tgamma(). tgamma()
> just uses (x == ceil(x)) to classify negative integers. tgamma() also
> has fewer/different style bugs.
The optimizations are quite broken. x can be any finite non-integer
above ~38, so it doesn't satisfy y < 0x1p23.
I used the quick fix of copying the slow method from tgamma():
if (y == ceil(y))
return ((x - x) / (x - x));
>
>> +
>> + return (tiny * tiny);
>
> The first error is immediately below the underflow threshold:
>
> X x = 0xc21a67d9 -38.6014137
> X tgamma(x) = 0xb68ffffca9859b0c 0x80000000 -7.00648117e-46
> X tgammaf(x) = 0 0 0
> X err = -0x800000000ffffe54 17179869184.50000
>
> This is just a sign error. tgamma(x) is the small negative value
> ~ 7e-46. This underflows to -0.0F, but tiny*tiny gives +0.0F.
>
> tgamma() does similar things plus extras to get the sign right. The
> sign is -1 if ceil(x) is even for x below the underflow threshold.
> It is not so easy to determine the sign from the raw bits.
I used the quick fix of copying the slow method from tgamma():
z = ceil(y);
z = z / 2;
if (z == ceil(z))
return (tiny * tiny);
return (-tiny * tiny);
>
>> + }
>>
>> return (tgamma(x));
>> }
I tested this for all cases where hx >= 0xc2240000 in the default rounding
mode. (Other modes aren't supposed to work, but they might, and ceil()
works right for them more clearly than the optimizations.)
Bruce
home |
help
Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20170219204114.Q3800>
