From owner-freebsd-numerics@FreeBSD.ORG Sun May 10 00:38:35 2015 Return-Path: Delivered-To: freebsd-numerics@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 9347EFFD for ; Sun, 10 May 2015 00:38:35 +0000 (UTC) Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu [128.95.76.21]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "troutmask", Issuer "troutmask" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 7554B16D0 for ; Sun, 10 May 2015 00:38:32 +0000 (UTC) Received: from troutmask.apl.washington.edu (localhost [127.0.0.1]) by troutmask.apl.washington.edu (8.14.9/8.14.9) with ESMTP id t4A0TAdA082271 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO) for ; Sat, 9 May 2015 17:29:10 -0700 (PDT) (envelope-from sgk@troutmask.apl.washington.edu) Received: (from sgk@localhost) by troutmask.apl.washington.edu (8.14.9/8.14.9/Submit) id t4A0TAAm082270 for freebsd-numerics@freebsd.org; Sat, 9 May 2015 17:29:10 -0700 (PDT) (envelope-from sgk) Date: Sat, 9 May 2015 17:29:10 -0700 From: Steve Kargl To: freebsd-numerics@freebsd.org Subject: small cleanup patch for e_pow.c Message-ID: <20150510002910.GA82261@troutmask.apl.washington.edu> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.23 (2014-03-12) X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 10 May 2015 00:38:35 -0000 In reading, e_pow.c I found a small piece of code that can be remove. Anyone object? Index: src/e_pow.c =================================================================== --- src/e_pow.c (revision 1603) +++ src/e_pow.c (working copy) @@ -187,10 +187,6 @@ __ieee754_pow(double x, double y) /* |y| is huge */ if(iy>0x41e00000) { /* if |y| > 2**31 */ - if(iy>0x43f00000){ /* if |y| > 2**64, must o/uflow */ - if(ix<=0x3fefffff) return (hy<0)? huge*huge:tiny*tiny; - if(ix>=0x3ff00000) return (hy>0)? huge*huge:tiny*tiny; - } /* over/underflow if x is not close to one */ if(ix<0x3fefffff) return (hy<0)? s*huge*huge:s*tiny*tiny; if(ix>0x3ff00000) return (hy>0)? s*huge*huge:s*tiny*tiny; -- Steve From owner-freebsd-numerics@FreeBSD.ORG Sun May 10 02:24:06 2015 Return-Path: Delivered-To: freebsd-numerics@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 841CF5E4 for ; Sun, 10 May 2015 02:24:06 +0000 (UTC) Received: from mail110.syd.optusnet.com.au (mail110.syd.optusnet.com.au [211.29.132.97]) by mx1.freebsd.org (Postfix) with ESMTP id 4A7221FA2 for ; Sun, 10 May 2015 02:24:05 +0000 (UTC) Received: from c211-30-166-197.carlnfd1.nsw.optusnet.com.au (c211-30-166-197.carlnfd1.nsw.optusnet.com.au [211.30.166.197]) by mail110.syd.optusnet.com.au (Postfix) with ESMTPS id EC8AF784016; Sun, 10 May 2015 12:00:00 +1000 (AEST) Date: Sun, 10 May 2015 11:59:55 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Steve Kargl cc: freebsd-numerics@freebsd.org Subject: Re: small cleanup patch for e_pow.c In-Reply-To: <20150510002910.GA82261@troutmask.apl.washington.edu> Message-ID: <20150510113454.O841@besplex.bde.org> References: <20150510002910.GA82261@troutmask.apl.washington.edu> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.1 cv=A5NVYcmG c=1 sm=1 tr=0 a=KA6XNC2GZCFrdESI5ZmdjQ==:117 a=PO7r1zJSAAAA:8 a=kj9zAlcOel0A:10 a=JzwRw_2MAAAA:8 a=LUJW0xp84oLmpHL8iQMA:9 a=CjuIK1q_8ugA:10 X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 10 May 2015 02:24:06 -0000 On Sat, 9 May 2015, Steve Kargl wrote: > In reading, e_pow.c I found a small piece of code that > can be remove. Anyone object? > > Index: src/e_pow.c > =================================================================== > --- src/e_pow.c (revision 1603) > +++ src/e_pow.c (working copy) > @@ -187,10 +187,6 @@ __ieee754_pow(double x, double y) > > /* |y| is huge */ > if(iy>0x41e00000) { /* if |y| > 2**31 */ > - if(iy>0x43f00000){ /* if |y| > 2**64, must o/uflow */ > - if(ix<=0x3fefffff) return (hy<0)? huge*huge:tiny*tiny; > - if(ix>=0x3ff00000) return (hy>0)? huge*huge:tiny*tiny; > - } > /* over/underflow if x is not close to one */ > if(ix<0x3fefffff) return (hy<0)? s*huge*huge:s*tiny*tiny; > if(ix>0x3ff00000) return (hy>0)? s*huge*huge:s*tiny*tiny; It seems to be just an optimization. It is a large optimization for the huge args, but those are not common, and is at most a tiny pessimization for non-huge args (just an extra branch which can be predicted perfectly if non-huge args are never used). My tests cover huge args uniformly in float space, so they benefit from optimizations like this more than normal programs. However, on some CPUs the exceptions for calculating huge*huge and tiny*tiny when they overflow/underflow are the main source of slowness, so it doesn't help much to avoid large code that has to be executed for normal args, unless the latter would generate multiple exceptions. Here if you are correct that the above code can be removed, then the exceptions generated by the main path can't be too complicated or it would be hard to see that there are no spurious ones. We can see this easily for functions like exp(), and pow() is not very different, because the algorithm is basically to calculate a small result and then multiply it by 2**k. The overflows and underflows only occur in the scaling step. Bruce From owner-freebsd-numerics@FreeBSD.ORG Sun May 10 06:15:00 2015 Return-Path: Delivered-To: freebsd-numerics@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 7E616CDE for ; Sun, 10 May 2015 06:15:00 +0000 (UTC) Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu [128.95.76.21]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "troutmask", Issuer "troutmask" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 460E9140F for ; Sun, 10 May 2015 06:15:00 +0000 (UTC) Received: from troutmask.apl.washington.edu (localhost [127.0.0.1]) by troutmask.apl.washington.edu (8.14.9/8.14.9) with ESMTP id t4A6Ex4d083030 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Sat, 9 May 2015 23:14:59 -0700 (PDT) (envelope-from sgk@troutmask.apl.washington.edu) Received: (from sgk@localhost) by troutmask.apl.washington.edu (8.14.9/8.14.9/Submit) id t4A6Ewj2083029; Sat, 9 May 2015 23:14:58 -0700 (PDT) (envelope-from sgk) Date: Sat, 9 May 2015 23:14:58 -0700 From: Steve Kargl To: Bruce Evans Cc: freebsd-numerics@freebsd.org Subject: Re: small cleanup patch for e_pow.c Message-ID: <20150510061458.GA82518@troutmask.apl.washington.edu> References: <20150510002910.GA82261@troutmask.apl.washington.edu> <20150510113454.O841@besplex.bde.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150510113454.O841@besplex.bde.org> User-Agent: Mutt/1.5.23 (2014-03-12) X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 10 May 2015 06:15:00 -0000 On Sun, May 10, 2015 at 11:59:55AM +1000, Bruce Evans wrote: > On Sat, 9 May 2015, Steve Kargl wrote: > > > In reading, e_pow.c I found a small piece of code that > > can be remove. Anyone object? > > > > Index: src/e_pow.c > > =================================================================== > > --- src/e_pow.c (revision 1603) > > +++ src/e_pow.c (working copy) > > @@ -187,10 +187,6 @@ __ieee754_pow(double x, double y) > > > > /* |y| is huge */ > > if(iy>0x41e00000) { /* if |y| > 2**31 */ > > - if(iy>0x43f00000){ /* if |y| > 2**64, must o/uflow */ > > - if(ix<=0x3fefffff) return (hy<0)? huge*huge:tiny*tiny; > > - if(ix>=0x3ff00000) return (hy>0)? huge*huge:tiny*tiny; > > - } > > /* over/underflow if x is not close to one */ > > if(ix<0x3fefffff) return (hy<0)? s*huge*huge:s*tiny*tiny; > > if(ix>0x3ff00000) return (hy>0)? s*huge*huge:s*tiny*tiny; > > It seems to be just an optimization. It is a large optimization for > the huge args, but those are not common, and is at most a tiny > pessimization for non-huge args (just an extra branch which can be > predicted perfectly if non-huge args are never used). > I don't see how this can be an optimization. The code has the form if (|y| > 2**31) { if (|y| > 2**64) { if (a) return if (b) return } if (a') return if (b') return ... } The difference between a and a' is <= instead of <, and similar for b and b' with >= and >. If either a or b would have return, so will a' and b'. If neither a nor b return, then neither a' nor b' will return. The a' and b' lines were added by das in r141296, where the commit message says the change is for fdlibm 5.3 compatibility. I'll also note that block like 'if (|y|>2**64) { }' is not present in e_powf.c -- Steve From owner-freebsd-numerics@FreeBSD.ORG Sun May 10 10:44:12 2015 Return-Path: Delivered-To: freebsd-numerics@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 0BF6533D for ; Sun, 10 May 2015 10:44:12 +0000 (UTC) Received: from mail106.syd.optusnet.com.au (mail106.syd.optusnet.com.au [211.29.132.42]) by mx1.freebsd.org (Postfix) with ESMTP id C48F81E11 for ; Sun, 10 May 2015 10:44:11 +0000 (UTC) Received: from c211-30-166-197.carlnfd1.nsw.optusnet.com.au (c211-30-166-197.carlnfd1.nsw.optusnet.com.au [211.30.166.197]) by mail106.syd.optusnet.com.au (Postfix) with ESMTPS id 7D29D3C50AE; Sun, 10 May 2015 20:16:15 +1000 (AEST) Date: Sun, 10 May 2015 20:16:14 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Steve Kargl cc: freebsd-numerics@freebsd.org Subject: Re: small cleanup patch for e_pow.c In-Reply-To: <20150510061458.GA82518@troutmask.apl.washington.edu> Message-ID: <20150510172810.E1812@besplex.bde.org> References: <20150510002910.GA82261@troutmask.apl.washington.edu> <20150510113454.O841@besplex.bde.org> <20150510061458.GA82518@troutmask.apl.washington.edu> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.1 cv=A5NVYcmG c=1 sm=1 tr=0 a=KA6XNC2GZCFrdESI5ZmdjQ==:117 a=PO7r1zJSAAAA:8 a=kj9zAlcOel0A:10 a=JzwRw_2MAAAA:8 a=1_pA0lLU-LACsFVV59MA:9 a=CjuIK1q_8ugA:10 X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 10 May 2015 10:44:12 -0000 On Sat, 9 May 2015, Steve Kargl wrote: > On Sun, May 10, 2015 at 11:59:55AM +1000, Bruce Evans wrote: >> On Sat, 9 May 2015, Steve Kargl wrote: >> >>> In reading, e_pow.c I found a small piece of code that >>> can be remove. Anyone object? >>> >>> Index: src/e_pow.c >>> =================================================================== >>> --- src/e_pow.c (revision 1603) >>> +++ src/e_pow.c (working copy) >>> @@ -187,10 +187,6 @@ __ieee754_pow(double x, double y) >>> >>> /* |y| is huge */ >>> if(iy>0x41e00000) { /* if |y| > 2**31 */ >>> - if(iy>0x43f00000){ /* if |y| > 2**64, must o/uflow */ >>> - if(ix<=0x3fefffff) return (hy<0)? huge*huge:tiny*tiny; >>> - if(ix>=0x3ff00000) return (hy>0)? huge*huge:tiny*tiny; >>> - } >>> /* over/underflow if x is not close to one */ >>> if(ix<0x3fefffff) return (hy<0)? s*huge*huge:s*tiny*tiny; >>> if(ix>0x3ff00000) return (hy>0)? s*huge*huge:s*tiny*tiny; >> >> It seems to be just an optimization. It is a large optimization for >> the huge args, but those are not common, and is at most a tiny >> pessimization for non-huge args (just an extra branch which can be >> predicted perfectly if non-huge args are never used). > > I don't see how this can be an optimization. The code has the form > > if (|y| > 2**31) { > if (|y| > 2**64) { > if (a) return > if (b) return > } > if (a') return > if (b') return > ... > } > > The difference between a and a' is <= instead of <, and similar for > b and b' with >= and >. If either a or b would have return, so will > a' and b'. If neither a nor b return, then neither a' nor b' will > return. The a' and b' lines were added by das in r141296, where the > commit message says the change is for fdlibm 5.3 compatibility. Hmm, I read the comment and not the code. According to the comment, there are no tests a and b. But these are apparently needed. Obviously, we must return 1 when x is exactly 1, and it is apparently possible for overflow/underflow to occur for values near 1 even when y is huge. > I'll also note that block like 'if (|y|>2**64) { }' is not present > in e_powf.c Apparently just another translation error. The tests for the double precision case are probably fuzzy just for efficiency -- we want to look at only the high word. In float precision, we can look at all the bits. Since even 1 bit away from 1 is relatively large in float precision, overflow/underflow is more likely to happen automatically. Indeed, (1+2**-23)**(2**30) is a large multiple of FLT_MAX although (1+2**-23)**(2**29) is a small fraction of FLT_MAX; N >= 2**30 and above also gives underflow for (1-2**-24)**N; N >= 2**62 gives overflow in double precision for (1+2**-52)**N, ... So 1 is the only numbers near 1 that doesn't give overflow. Here is the code again, with explanations: X if(iy>0x41e00000) { /* if |y| > 2**31 */ X if(iy>0x43f00000){ /* if |y| > 2**64, must o/uflow */ The correct fuzzy (power of 2) threshold seems to be >= 2**62. X if(ix<=0x3fefffff) return (hy<0)? huge*huge:tiny*tiny; X if(ix>=0x3ff00000) return (hy>0)? huge*huge:tiny*tiny; Then we must determine on which side of 1 |x| lies. ix <= 0x3fefffff clearly imples that |x| < 1. Otherwise, ix >= 0x3ff00000 and |x| >= 1. The last inequality is strict since |x| = 1 has already been handled. The check that ix >= 0x3ff00000 is redundant and should be omitted. The first check should be written more naturally as ix < 0x3ff00000. In float precision, the first check can be written even more easily as ix < 0x3f800000. X } X /* over/underflow if x is not close to one */ X if(ix<0x3fefffff) return (hy<0)? s*huge*huge:s*tiny*tiny; X if(ix>0x3ff00000) return (hy>0)? s*huge*huge:s*tiny*tiny; Removing the equal signs from the inequalities makes a signficant difference. There are 2**32 values with ix == 0x3fefffff and another 2**32 values with ix == 0x3ff000000. All of these are close to 1. Since |iy| is not so large now, |x| has to be signficantly different from 1 for overflow/underflow to occur. It probably occurs for many of the 2 * 2**32 values not found by these tests; the tests are just quick fuzzy ones using ix. In float precision, we can find the exact thresholds for overflow and underflow. Perhaps the Cygnus thresholds are exact. das didn't actually add a' and b', but just changed the spelling of sn in them: X @@ -192,9 +191,9 @@ X } X /* over/underflow if x is not close to one */ X - if(ix<0x3fefffff) return (hy<0)? sn*huge*huge:sn*tiny*tiny; X - if(ix>0x3ff00000) return (hy>0)? sn*huge*huge:sn*tiny*tiny; X - /* now |1-x| is tiny <= 2**-20, suffice to compute X + if(ix<0x3fefffff) return (hy<0)? s*huge*huge:s*tiny*tiny; X + if(ix>0x3ff00000) return (hy>0)? s*huge*huge:s*tiny*tiny; X + /* now |1-x| is tiny <= 2**-20, suffice to compute X log(x) by x-x^2/2+x^3/3-x^4/4 */ X - t = ax-1; /* t has 20 trailing zeros */ X + t = ax-one; /* t has 20 trailing zeros */ X w = (t*t)*(0.5-t*(0.3333333333333333333333-t*0.25)); X u = ivln2_h*t; /* ivln2_h has 21 sig. bits */ Bruce From owner-freebsd-numerics@FreeBSD.ORG Sun May 10 19:01:17 2015 Return-Path: Delivered-To: freebsd-numerics@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 2344E1D2 for ; Sun, 10 May 2015 19:01:17 +0000 (UTC) Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu [128.95.76.21]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "troutmask", Issuer "troutmask" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id DD8331E70 for ; Sun, 10 May 2015 19:01:16 +0000 (UTC) Received: from troutmask.apl.washington.edu (localhost [127.0.0.1]) by troutmask.apl.washington.edu (8.14.9/8.14.9) with ESMTP id t4AJ1FI4085423 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Sun, 10 May 2015 12:01:15 -0700 (PDT) (envelope-from sgk@troutmask.apl.washington.edu) Received: (from sgk@localhost) by troutmask.apl.washington.edu (8.14.9/8.14.9/Submit) id t4AJ1FQt085422; Sun, 10 May 2015 12:01:15 -0700 (PDT) (envelope-from sgk) Date: Sun, 10 May 2015 12:01:14 -0700 From: Steve Kargl To: Bruce Evans Cc: freebsd-numerics@freebsd.org Subject: Re: small cleanup patch for e_pow.c Message-ID: <20150510190114.GA85376@troutmask.apl.washington.edu> References: <20150510002910.GA82261@troutmask.apl.washington.edu> <20150510113454.O841@besplex.bde.org> <20150510061458.GA82518@troutmask.apl.washington.edu> <20150510172810.E1812@besplex.bde.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150510172810.E1812@besplex.bde.org> User-Agent: Mutt/1.5.23 (2014-03-12) X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 10 May 2015 19:01:17 -0000 On Sun, May 10, 2015 at 08:16:14PM +1000, Bruce Evans wrote: > On Sat, 9 May 2015, Steve Kargl wrote: > > > > I don't see how this can be an optimization. The code has the form > > > > if (|y| > 2**31) { > > if (|y| > 2**64) { > > if (a) return > > if (b) return > > } > > if (a') return > > if (b') return > > ... > > } > > > > The difference between a and a' is <= instead of <, and similar for > > b and b' with >= and >. If either a or b would have return, so will > > a' and b'. If neither a nor b return, then neither a' nor b' will > > return. The a' and b' lines were added by das in r141296, where the > > commit message says the change is for fdlibm 5.3 compatibility. > > Hmm, I read the comment and not the code. According to the comment, > there are no tests a and b. But these are apparently needed. > Obviously, we must return 1 when x is exactly 1, and it is apparently > possible for overflow/underflow to occur for values near 1 even when > y is huge. > > > I'll also note that block like 'if (|y|>2**64) { }' is not present > > in e_powf.c > > Apparently just another translation error. > > The tests for the double precision case are probably fuzzy just for > efficiency -- we want to look at only the high word. In float > precision, we can look at all the bits. Since even 1 bit away from 1 > is relatively large in float precision, overflow/underflow is more > likely to happen automatically. Indeed, (1+2**-23)**(2**30) is a large > multiple of FLT_MAX although (1+2**-23)**(2**29) is a small fraction > of FLT_MAX; N >= 2**30 and above also gives underflow for (1-2**-24)**N; > N >= 2**62 gives overflow in double precision for (1+2**-52)**N, ... > > So 1 is the only numbers near 1 that doesn't give overflow. > Thanks for the explanation! That help dislodge a mental block. To find the magic numbers, it seems I need to consider (1-2**(-p))**(2**N) = 2**(emin-p) for underflow (1+2**(-p))**(2**N) = (1-2**(-p))*2**emax for overflow With p = [24, 53, 64, 113], emin = [-125, -1021, -16381, -16381], emax = [128, 1024, 16384, 16384], and the use of log(1+z) = z for |z| << 1, I find underflow: N = [30.7, 62.5, 77.5, 126.5] overflow: N = [30.5, 62.5, 77.5, 126.5] PS: Yes, I'm slowly poking away at powl(). I have the 19 special cases working for ld80. Now, comes the hard part. -- Steve From owner-freebsd-numerics@FreeBSD.ORG Sun May 10 20:37:00 2015 Return-Path: Delivered-To: freebsd-numerics@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 268B026D for ; Sun, 10 May 2015 20:37:00 +0000 (UTC) Received: from mail110.syd.optusnet.com.au (mail110.syd.optusnet.com.au [211.29.132.97]) by mx1.freebsd.org (Postfix) with ESMTP id DFB1517F8 for ; Sun, 10 May 2015 20:36:59 +0000 (UTC) Received: from c211-30-166-197.carlnfd1.nsw.optusnet.com.au (c211-30-166-197.carlnfd1.nsw.optusnet.com.au [211.30.166.197]) by mail110.syd.optusnet.com.au (Postfix) with ESMTPS id 5C76478419D; Mon, 11 May 2015 06:36:55 +1000 (AEST) Date: Mon, 11 May 2015 06:36:51 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Steve Kargl cc: freebsd-numerics@freebsd.org Subject: Re: small cleanup patch for e_pow.c In-Reply-To: <20150510190114.GA85376@troutmask.apl.washington.edu> Message-ID: <20150511054753.L1656@besplex.bde.org> References: <20150510002910.GA82261@troutmask.apl.washington.edu> <20150510113454.O841@besplex.bde.org> <20150510061458.GA82518@troutmask.apl.washington.edu> <20150510172810.E1812@besplex.bde.org> <20150510190114.GA85376@troutmask.apl.washington.edu> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.1 cv=L/MkHYj8 c=1 sm=1 tr=0 a=KA6XNC2GZCFrdESI5ZmdjQ==:117 a=PO7r1zJSAAAA:8 a=kj9zAlcOel0A:10 a=JzwRw_2MAAAA:8 a=6QxUbEKU0hjXAd6ngnwA:9 a=CjuIK1q_8ugA:10 X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 10 May 2015 20:37:00 -0000 On Sun, 10 May 2015, Steve Kargl wrote: > On Sun, May 10, 2015 at 08:16:14PM +1000, Bruce Evans wrote: >> ... >> So 1 is the only numbers near 1 that doesn't give overflow. > > Thanks for the explanation! That help dislodge a mental block. > > To find the magic numbers, it seems I need to consider > > (1-2**(-p))**(2**N) = 2**(emin-p) for underflow > (1+2**(-p))**(2**N) = (1-2**(-p))*2**emax for overflow > > With p = [24, 53, 64, 113], emin = [-125, -1021, -16381, -16381], > emax = [128, 1024, 16384, 16384], and the use of log(1+z) = z for > |z| << 1, I find > > underflow: N = [30.7, 62.5, 77.5, 126.5] > overflow: N = [30.5, 62.5, 77.5, 126.5] I plugged some numbers into pari to get sloppy estimates. E.g., (1 + 2^-23.0)^(2^N) / 2^128; then bump up N until the result is > 1. I get this at N = 29.5, so 30.7 seems too high. > PS: Yes, I'm slowly poking away at powl(). I have the 19 special cases > working for ld80. Now, comes the hard part. Excellent. The only thing I would change immediately in the fdlibm implementation is its manually inlined exp(). Use the exp*() kernel instead. Hopefully the kernel used for hyperbolic functions is also suitable for pow(). This kernel has only been committed for long double precision. Look at the old BSD libm for use of other exp() kernels and their use by pow(). It has 2 kernels: exp__D(), which is used by pow(), erf() and [t]gamma(), and exp__E(), which is used by cosh() and expm1(). We still have exp__D() (renamed with more underscores) in bsdsrc for use in tgamma(). A general kernel would take an extra-precision arg and return an extra- precision result, but that would be inefficient and these kernels only support an extra-precision arg. I don't see how that is useful for expm1(), and my kernel for exp*() only gives extra precision for the result, as needed for hyperbolic functions. I didn't notice before that old BSD libm used an exp() kernel for cosh(). Ng (or other Sun hackers?) apparently dumbed down his implementation of cosh() a bit for the version in fdlibm, so FreeBSD's cosh() is probably still less accurate than the old (1985) version :-(. This is fixed in my version, but only the long double version of the fixes has been committed. The accuracy of cosh() was presumably better than 1 ulp in 1985; it is 2 ulps (maybe 1.5) in FreeBSD, and about 0.9 ulps in my version (the kernel for expl() is more accurate, so coshl( has an accuracy of closer to 0.5 ulps). OTOH, pow() may have been improved over the 1985 version in fdlibm. It manually inlines kernels for both exp() and log() (named exp__D() and log__L(), and these kernels seem to be accurate enough -- I haven't found any errors larger than 1 ulp in pow(). The manual inlining makes the code harder to read (about 50% larger, with many more special constants for the kernels). Bruce From owner-freebsd-numerics@FreeBSD.ORG Sun May 10 20:59:48 2015 Return-Path: Delivered-To: freebsd-numerics@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 7BD238D6 for ; Sun, 10 May 2015 20:59:48 +0000 (UTC) Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu [128.95.76.21]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "troutmask", Issuer "troutmask" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 5AF901A4B for ; Sun, 10 May 2015 20:59:48 +0000 (UTC) Received: from troutmask.apl.washington.edu (localhost [127.0.0.1]) by troutmask.apl.washington.edu (8.14.9/8.14.9) with ESMTP id t4AKxkpl085971 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Sun, 10 May 2015 13:59:46 -0700 (PDT) (envelope-from sgk@troutmask.apl.washington.edu) Received: (from sgk@localhost) by troutmask.apl.washington.edu (8.14.9/8.14.9/Submit) id t4AKxkQS085970; Sun, 10 May 2015 13:59:46 -0700 (PDT) (envelope-from sgk) Date: Sun, 10 May 2015 13:59:46 -0700 From: Steve Kargl To: Bruce Evans Cc: freebsd-numerics@freebsd.org Subject: Re: small cleanup patch for e_pow.c Message-ID: <20150510205946.GA85935@troutmask.apl.washington.edu> References: <20150510002910.GA82261@troutmask.apl.washington.edu> <20150510113454.O841@besplex.bde.org> <20150510061458.GA82518@troutmask.apl.washington.edu> <20150510172810.E1812@besplex.bde.org> <20150510190114.GA85376@troutmask.apl.washington.edu> <20150511054753.L1656@besplex.bde.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150511054753.L1656@besplex.bde.org> User-Agent: Mutt/1.5.23 (2014-03-12) X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 10 May 2015 20:59:48 -0000 On Mon, May 11, 2015 at 06:36:51AM +1000, Bruce Evans wrote: > On Sun, 10 May 2015, Steve Kargl wrote: > > > On Sun, May 10, 2015 at 08:16:14PM +1000, Bruce Evans wrote: > >> ... > >> So 1 is the only numbers near 1 that doesn't give overflow. > > > > Thanks for the explanation! That help dislodge a mental block. > > > > To find the magic numbers, it seems I need to consider > > > > (1-2**(-p))**(2**N) = 2**(emin-p) for underflow > > (1+2**(-p))**(2**N) = (1-2**(-p))*2**emax for overflow > > > > With p = [24, 53, 64, 113], emin = [-125, -1021, -16381, -16381], > > emax = [128, 1024, 16384, 16384], and the use of log(1+z) = z for > > |z| << 1, I find > > > > underflow: N = [30.7, 62.5, 77.5, 126.5] > > overflow: N = [30.5, 62.5, 77.5, 126.5] > > I plugged some numbers into pari to get sloppy estimates. E.g., > (1 + 2^-23.0)^(2^N) / 2^128; then bump up N until the result is > 1. > I get this at N = 29.5, so 30.7 seems too high. > I used p = 24 in my (1+-2^(-p)). It seems that you're using p-1=23. (1-2**(-p))**(2**N) = 2**(emin-p) Using log() and log(1+z) = z for |z| << 1, the above equation gives N = p + log((p-emin) * log(2)) / log(2) where the equality is of course approximation. p = 24 --> N = 30.690 p = 23 --> N = 29.681 -- Steve From owner-freebsd-numerics@FreeBSD.ORG Sun May 10 23:09:55 2015 Return-Path: Delivered-To: freebsd-numerics@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 18B28829 for ; Sun, 10 May 2015 23:09:55 +0000 (UTC) Received: from mail106.syd.optusnet.com.au (mail106.syd.optusnet.com.au [211.29.132.42]) by mx1.freebsd.org (Postfix) with ESMTP id CFA0D177D for ; Sun, 10 May 2015 23:09:53 +0000 (UTC) Received: from c211-30-166-197.carlnfd1.nsw.optusnet.com.au (c211-30-166-197.carlnfd1.nsw.optusnet.com.au [211.30.166.197]) by mail106.syd.optusnet.com.au (Postfix) with ESMTPS id DA60C3C72E0; Mon, 11 May 2015 09:09:50 +1000 (AEST) Date: Mon, 11 May 2015 09:09:46 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Steve Kargl cc: freebsd-numerics@freebsd.org Subject: Re: small cleanup patch for e_pow.c In-Reply-To: <20150510205946.GA85935@troutmask.apl.washington.edu> Message-ID: <20150511084012.R2314@besplex.bde.org> References: <20150510002910.GA82261@troutmask.apl.washington.edu> <20150510113454.O841@besplex.bde.org> <20150510061458.GA82518@troutmask.apl.washington.edu> <20150510172810.E1812@besplex.bde.org> <20150510190114.GA85376@troutmask.apl.washington.edu> <20150511054753.L1656@besplex.bde.org> <20150510205946.GA85935@troutmask.apl.washington.edu> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.1 cv=A5NVYcmG c=1 sm=1 tr=0 a=KA6XNC2GZCFrdESI5ZmdjQ==:117 a=PO7r1zJSAAAA:8 a=kj9zAlcOel0A:10 a=JzwRw_2MAAAA:8 a=MDOUD4c6SDqqUN_XlSwA:9 a=CjuIK1q_8ugA:10 X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 10 May 2015 23:09:55 -0000 On Sun, 10 May 2015, Steve Kargl wrote: > On Mon, May 11, 2015 at 06:36:51AM +1000, Bruce Evans wrote: >> On Sun, 10 May 2015, Steve Kargl wrote: >>> >>> To find the magic numbers, it seems I need to consider >>> >>> (1-2**(-p))**(2**N) = 2**(emin-p) for underflow >>> (1+2**(-p))**(2**N) = (1-2**(-p))*2**emax for overflow >>> >>> With p = [24, 53, 64, 113], emin = [-125, -1021, -16381, -16381], >>> emax = [128, 1024, 16384, 16384], and the use of log(1+z) = z for >>> |z| << 1, I find >>> >>> underflow: N = [30.7, 62.5, 77.5, 126.5] >>> overflow: N = [30.5, 62.5, 77.5, 126.5] >> >> I plugged some numbers into pari to get sloppy estimates. E.g., >> (1 + 2^-23.0)^(2^N) / 2^128; then bump up N until the result is > 1. >> I get this at N = 29.5, so 30.7 seems too high. > > I used p = 24 in my (1+-2^(-p)). It seems that you're using p-1=23. It should be 1+2**-(p-1) and 1-2**-p, but the 1-epsilon side dominates; also 2**128 is sloppy with gradual underflow; in the scale factor for the 1-epsilon side, 128 must be replaced by something like 150. So 30.7 is correct if we don't bother to treat the sides differently. There is a related problem with extra precision. Expressions like huge*huge and tiny*tiny are always wrong in the presence of extra precision/exponent range, since they don't overflow or underflow but just give a wrong result if extra exponent range actually works. But extra precision/exponent is often broken: - C11 broke it and also broke efficiency by requiring extra precision/ exponent range to be destroyed on function return. clang doesn't implement this breakage, but gcc-4.8 does (with -std=c99 but not with -std=gnu99) - at least old versions of clang miscompile expressions like huge*huge unless huge is declared with the volatile hack. The C11 breakage makes returning the value INFINITY correct and then the only error in evaluating this at compile time is not raising the overflow exception. - at least old versons of both clang and gcc miscompile expressions like tiny*tiny, except... Functions like exp() are especially affected by this bug. cexp() is most noticeably broken by it, since it uses expressions like exp(x) * cos(y), written otherwise carefully by scaling exp(x) so as to avoid spurious overflow, so some non-overflowing cases return garbage related to huge*huge in extra precision. We mostly don't worry about this bug, except I fixed it for cexp*(). My test don't see it, since they assign function results to a variable without the extra precision, so as to destroy it like the C11 breakage requires. The bug should be fixed someday by replacing code like 'return huge*huge;' with 'return Inf_with_OE();'. Cases where the expression is actually x*x where x is known to be >= huge are harder to fix. Bruce