From owner-freebsd-numerics@freebsd.org Sun Feb 19 10:24:10 2017 Return-Path: Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 4603FCDB7C9 for ; Sun, 19 Feb 2017 10:24:10 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail106.syd.optusnet.com.au (mail106.syd.optusnet.com.au [211.29.132.42]) by mx1.freebsd.org (Postfix) with ESMTP id CDC221538 for ; Sun, 19 Feb 2017 10:24:09 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from besplex.bde.org (c122-106-153-191.carlnfd1.nsw.optusnet.com.au [122.106.153.191]) by mail106.syd.optusnet.com.au (Postfix) with ESMTPS id 09C653C86F4; Sun, 19 Feb 2017 20:51:13 +1100 (AEDT) Date: Sun, 19 Feb 2017 20:51:12 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Bruce Evans cc: Steve Kargl , freebsd-numerics@freebsd.org Subject: Re: [PATCH] avoid function call overhead in tgammaf In-Reply-To: <20170219190343.Y1567@besplex.bde.org> Message-ID: <20170219204114.Q3800@besplex.bde.org> References: <20170219024101.GA79177@troutmask.apl.washington.edu> <20170219190343.Y1567@besplex.bde.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.2 cv=VbSHBBh9 c=1 sm=1 tr=0 a=Tj3pCpwHnMupdyZSltBt7Q==:117 a=Tj3pCpwHnMupdyZSltBt7Q==:17 a=kj9zAlcOel0A:10 a=tELu-FXlebb3VOcLUAUA:9 a=CjuIK1q_8ugA:10 X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 19 Feb 2017 10:24:10 -0000 On Sun, 19 Feb 2017, Bruce Evans wrote: > On Sat, 18 Feb 2017, Steve Kargl wrote: > >> The following patch treats special values (i.e., +-inf, nan) >> and values outside a limited domain to avoid the function >> call overhead of tgamma. Anyone with a commit bit is more >> than welcomed to commit the patch (after of course a review). > > Since it is claimed to be "essentially useless", optimizing it doesn't > seem useful. > >> Index: src/s_tgammaf.c >> =================================================================== >> --- src/s_tgammaf.c (revision 1857) >> +++ src/s_tgammaf.c (working copy) >> ... >> +static u_int32_t overflow = 0x420c290f; /* 35.0400981 */ >> +static u_int32_t underflow = 0x421a67d8; /* 38.6014118 */ > > This is a confusing name. tgamma doesn't underflow like exp. It overflows > near negative integers, but below a threshold overflow can't happen because > it is impossible to get near enough to an integer. It also underflows for > large negative args not near an integer. The threshold here must be the > minimum of the 2 thresholds. Apparently, the one for underflow is always > smaller (more negative) for float precision. It is easy to check all cases > for float precision to find the minimum. Double precision uses a fuzzier > threshold and spells it -190. It uses the less fuzzy threshold -177.69 in > a comment, except it misspells this as [plus] 177.69. Testing shows that the threshold is too high (above the "underflow" threshold). Perhaps you forgot about gradual underflow. The precise value seems to be: static u_int32_t underflow = 0x4224000b; /* -(-38.something) */ > >> +static volatile float huge = 1.e30, tiny = 1.e-30; >> + >> float >> tgammaf(float x) >> { >> + u_int32_t hx; >> + int32_t ix, sg; >> + >> + GET_FLOAT_WORD(hx, x); >> + ix = hx & 0x7fffffff; >> + sg = hx & 0x80000000; >> + >> + if (ix > overflow) { >> + if (ix >= 0x7f800000) >> + return (sg ? x / x : x + x); >> + if (!sg) >> + return (huge * huge); >> + } >> + >> + if (ix == 0) >> + return (1 / x); >> + >> + if (sg && ix > underflow) { >> + /* >> + * tgammaf(x) for integral x returns an NaN, so we implement >> + * a poor man's rintf(). >> + */ >> + volatile float vz; >> + float y,z; >> + >> + y = -x; >> + >> + vz = y + 0x1p23F; /* depend on 0 <= y < 0x1p23 */ >> + z = vz - 0x1p23F; /* rintf(y) for the above range */ >> + if (z == y) >> + return ((x - x) / (x - x)); > > This has delicate optimizations which belong more in tgamma(). tgamma() > just uses (x == ceil(x)) to classify negative integers. tgamma() also > has fewer/different style bugs. The optimizations are quite broken. x can be any finite non-integer above ~38, so it doesn't satisfy y < 0x1p23. I used the quick fix of copying the slow method from tgamma(): if (y == ceil(y)) return ((x - x) / (x - x)); > >> + >> + return (tiny * tiny); > > The first error is immediately below the underflow threshold: > > X x = 0xc21a67d9 -38.6014137 > X tgamma(x) = 0xb68ffffca9859b0c 0x80000000 -7.00648117e-46 > X tgammaf(x) = 0 0 0 > X err = -0x800000000ffffe54 17179869184.50000 > > This is just a sign error. tgamma(x) is the small negative value > ~ 7e-46. This underflows to -0.0F, but tiny*tiny gives +0.0F. > > tgamma() does similar things plus extras to get the sign right. The > sign is -1 if ceil(x) is even for x below the underflow threshold. > It is not so easy to determine the sign from the raw bits. I used the quick fix of copying the slow method from tgamma(): z = ceil(y); z = z / 2; if (z == ceil(z)) return (tiny * tiny); return (-tiny * tiny); > >> + } >> >> return (tgamma(x)); >> } I tested this for all cases where hx >= 0xc2240000 in the default rounding mode. (Other modes aren't supposed to work, but they might, and ceil() works right for them more clearly than the optimizations.) Bruce