From owner-freebsd-numerics@FreeBSD.ORG Sun Aug 25 02:30:30 2013 Return-Path: Delivered-To: freebsd-numerics@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 3FF16E2A for ; Sun, 25 Aug 2013 02:30:30 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu [128.95.76.21]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 13C2A21A6 for ; Sun, 25 Aug 2013 02:30:30 +0000 (UTC) Received: from troutmask.apl.washington.edu (localhost.apl.washington.edu [127.0.0.1]) by troutmask.apl.washington.edu (8.14.7/8.14.7) with ESMTP id r7P2UTGP056799; Sat, 24 Aug 2013 19:30:29 -0700 (PDT) (envelope-from sgk@troutmask.apl.washington.edu) Received: (from sgk@localhost) by troutmask.apl.washington.edu (8.14.7/8.14.7/Submit) id r7P2UTTg056798; Sat, 24 Aug 2013 19:30:29 -0700 (PDT) (envelope-from sgk) Date: Sat, 24 Aug 2013 19:30:29 -0700 From: Steve Kargl To: Bruce Evans Subject: Re: (2nd time) tweaks to erff() threshold values Message-ID: <20130825023029.GA56772@troutmask.apl.washington.edu> References: <20130822213315.GA6708@troutmask.apl.washington.edu> <20130823202257.Q1593@besplex.bde.org> <20130823165744.GA47369@troutmask.apl.washington.edu> <20130824202102.GA54981@troutmask.apl.washington.edu> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20130824202102.GA54981@troutmask.apl.washington.edu> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-numerics@FreeBSD.org X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 25 Aug 2013 02:30:30 -0000 On Sat, Aug 24, 2013 at 01:21:02PM -0700, Steve Kargl wrote: > On Fri, Aug 23, 2013 at 09:57:44AM -0700, Steve Kargl wrote: > > On Fri, Aug 23, 2013 at 09:12:33PM +1000, Bruce Evans wrote: > > > >> The whole erf implementation is probably not the best way for > >> float precision, but is good for understanding higher precisions. > > > > I'm fairly certain that the implementation of erff() is not very > > efficient. The polynomials, used in the rational approximations, > > are the same order as those used in the double precision approximation. > > I'll check the polys when update my P(x)/Q(x) remes algorithm for > > erfl and erfcl. > > I seem to be right (although I haven't iterated on the P's and Q's). > Now, with pretty testing and (perhaps) better coefficients: | usec/call | Max ULP | X Max ULP ---------+-----------+---------+--------------------------------- new erfc | 0.02757 | 0.65947 | 8.19824636e-01, 0x1.a3c00ep-1 old erfc | 0.03348 | 0.68218 | 8.43257010e-01, 0x1.afbf62p-1 ---------+-----------+---------+--------------------------------- new erf | 0.02302 | 0.62437 | 1.04175471e-38, 0x1.c5bf88p-127 old erf | 0.03028 | 0.62437 | 1.04175471e-38, 0x1.c5bf88p-127 OK to commit? -- Steve Index: src/s_erff.c =================================================================== --- src/s_erff.c (revision 1361) +++ src/s_erff.c (working copy) @@ -31,16 +31,16 @@ erx = 8.4506291151e-01, /* 0x3f58560b * */ efx = 1.2837916613e-01, /* 0x3e0375d4 */ efx8= 1.0270333290e+00, /* 0x3f8375d4 */ -pp0 = 1.2837916613e-01, /* 0x3e0375d4 */ -pp1 = -3.2504209876e-01, /* 0xbea66beb */ -pp2 = -2.8481749818e-02, /* 0xbce9528f */ -pp3 = -5.7702702470e-03, /* 0xbbbd1489 */ -pp4 = -2.3763017452e-05, /* 0xb7c756b1 */ -qq1 = 3.9791721106e-01, /* 0x3ecbbbce */ -qq2 = 6.5022252500e-02, /* 0x3d852a63 */ -qq3 = 5.0813062117e-03, /* 0x3ba68116 */ -qq4 = 1.3249473704e-04, /* 0x390aee49 */ -qq5 = -3.9602282413e-06, /* 0xb684e21a */ +/* + * Domain [0, 0.84375], range ~[-5.4446e-10,5.5197e-10]: + * |(erf(x) - x)/x - p(x)/q(x)| < 2**-31. + */ +pp0 = 1.28379166e-01F, /* 0x1.06eba8p-3 */ +pp1 = -3.36030394e-01F, /* -0x1.58185ap-2 */ +pp2 = -1.86260219e-03F, /* -0x1.e8451ep-10 */ +qq1 = 3.12324286e-01F, /* 0x1.3fd1f0p-2 */ +qq2 = 2.16070302e-02F, /* 0x1.620274p-6 */ +qq3 = -1.98859419e-03F, /* -0x1.04a626p-9 */ /* * Coefficients for approximation to erf in [0.84375,1.25] */ @@ -114,8 +114,8 @@ erff(float x) return x + efx*x; } z = x*x; - r = pp0+z*(pp1+z*(pp2+z*(pp3+z*pp4))); - s = one+z*(qq1+z*(qq2+z*(qq3+z*(qq4+z*qq5)))); + r = pp0+z*(pp1+z*pp2); + s = one+z*(qq1+z*(qq2+z*qq3)); y = r/s; return x + x*y; } @@ -163,8 +163,8 @@ erfcf(float x) if(ix < 0x23800000) /* |x|<2**-56 */ return one-x; z = x*x; - r = pp0+z*(pp1+z*(pp2+z*(pp3+z*pp4))); - s = one+z*(qq1+z*(qq2+z*(qq3+z*(qq4+z*qq5)))); + r = pp0+z*(pp1+z*pp2); + s = one+z*(qq1+z*(qq2+z*qq3)); y = r/s; if(hx < 0x3e800000) { /* x<1/4 */ return one-(x+x*y); From owner-freebsd-numerics@FreeBSD.ORG Sun Aug 25 02:51:13 2013 Return-Path: Delivered-To: freebsd-numerics@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id BBCB61000 for ; Sun, 25 Aug 2013 02:51:13 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu [128.95.76.21]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 7DE50225D for ; Sun, 25 Aug 2013 02:51:13 +0000 (UTC) Received: from troutmask.apl.washington.edu (localhost.apl.washington.edu [127.0.0.1]) by troutmask.apl.washington.edu (8.14.7/8.14.7) with ESMTP id r7P2pCBN056878; Sat, 24 Aug 2013 19:51:12 -0700 (PDT) (envelope-from sgk@troutmask.apl.washington.edu) Received: (from sgk@localhost) by troutmask.apl.washington.edu (8.14.7/8.14.7/Submit) id r7P2pCIQ056877; Sat, 24 Aug 2013 19:51:12 -0700 (PDT) (envelope-from sgk) Date: Sat, 24 Aug 2013 19:51:12 -0700 From: Steve Kargl To: Bruce Evans Subject: Re: (2nd time) tweaks to erff() threshold values Message-ID: <20130825025112.GA56868@troutmask.apl.washington.edu> References: <20130822213315.GA6708@troutmask.apl.washington.edu> <20130823202257.Q1593@besplex.bde.org> <20130823165744.GA47369@troutmask.apl.washington.edu> <20130824202102.GA54981@troutmask.apl.washington.edu> <20130825023029.GA56772@troutmask.apl.washington.edu> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20130825023029.GA56772@troutmask.apl.washington.edu> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-numerics@FreeBSD.org X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 25 Aug 2013 02:51:13 -0000 On Sat, Aug 24, 2013 at 07:30:29PM -0700, Steve Kargl wrote: > On Sat, Aug 24, 2013 at 01:21:02PM -0700, Steve Kargl wrote: > > On Fri, Aug 23, 2013 at 09:57:44AM -0700, Steve Kargl wrote: > > > On Fri, Aug 23, 2013 at 09:12:33PM +1000, Bruce Evans wrote: > > > > > >> The whole erf implementation is probably not the best way for > > >> float precision, but is good for understanding higher precisions. > > > > > > I'm fairly certain that the implementation of erff() is not very > > > efficient. The polynomials, used in the rational approximations, > > > are the same order as those used in the double precision approximation. > > > I'll check the polys when update my P(x)/Q(x) remes algorithm for > > > erfl and erfcl. > > > > I seem to be right (although I haven't iterated on the P's and Q's). > > > > Now, with pretty testing and (perhaps) better coefficients: > LMAO. I have not idea where "pretty" came from. s/pretty/better -- Steve From owner-freebsd-numerics@FreeBSD.ORG Sun Aug 25 17:19:11 2013 Return-Path: Delivered-To: freebsd-numerics@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 73373BC3 for ; Sun, 25 Aug 2013 17:19:11 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu [128.95.76.21]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 49AC82825 for ; Sun, 25 Aug 2013 17:19:11 +0000 (UTC) Received: from troutmask.apl.washington.edu (localhost.apl.washington.edu [127.0.0.1]) by troutmask.apl.washington.edu (8.14.7/8.14.7) with ESMTP id r7PHJAqr060446; Sun, 25 Aug 2013 10:19:10 -0700 (PDT) (envelope-from sgk@troutmask.apl.washington.edu) Received: (from sgk@localhost) by troutmask.apl.washington.edu (8.14.7/8.14.7/Submit) id r7PHJAh4060445; Sun, 25 Aug 2013 10:19:10 -0700 (PDT) (envelope-from sgk) Date: Sun, 25 Aug 2013 10:19:10 -0700 From: Steve Kargl To: Bruce Evans Subject: Re: (2nd time) tweaks to erff() threshold values Message-ID: <20130825171910.GA60396@troutmask.apl.washington.edu> References: <20130822213315.GA6708@troutmask.apl.washington.edu> <20130823202257.Q1593@besplex.bde.org> <20130823165744.GA47369@troutmask.apl.washington.edu> <20130824202102.GA54981@troutmask.apl.washington.edu> <20130825023029.GA56772@troutmask.apl.washington.edu> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20130825023029.GA56772@troutmask.apl.washington.edu> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-numerics@FreeBSD.org X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 25 Aug 2013 17:19:11 -0000 On Sat, Aug 24, 2013 at 07:30:29PM -0700, Steve Kargl wrote: > On Sat, Aug 24, 2013 at 01:21:02PM -0700, Steve Kargl wrote: > > On Fri, Aug 23, 2013 at 09:57:44AM -0700, Steve Kargl wrote: > > > On Fri, Aug 23, 2013 at 09:12:33PM +1000, Bruce Evans wrote: > > > > > >> The whole erf implementation is probably not the best way for > > >> float precision, but is good for understanding higher precisions. > > > > > > I'm fairly certain that the implementation of erff() is not very > > > efficient. The polynomials, used in the rational approximations, > > > are the same order as those used in the double precision approximation. > > > I'll check the polys when update my P(x)/Q(x) remes algorithm for > > > erfl and erfcl. > > > > I seem to be right (although I haven't iterated on the P's and Q's). > > > > Now, with pretty testing and (perhaps) better coefficients: These values where for the interval [0,0.84375]. > > | usec/call | Max ULP | X Max ULP > ---------+-----------+---------+--------------------------------- > new erfc | 0.02757 | 0.65947 | 8.19824636e-01, 0x1.a3c00ep-1 > old erfc | 0.03348 | 0.68218 | 8.43257010e-01, 0x1.afbf62p-1 > ---------+-----------+---------+--------------------------------- > new erf | 0.02302 | 0.62437 | 1.04175471e-38, 0x1.c5bf88p-127 > old erf | 0.03028 | 0.62437 | 1.04175471e-38, 0x1.c5bf88p-127 New interval [0.84375,1.25]. | usecs/call| Max ULP | X Max ULP ---------+-----------+---------+------------------------------ Old erfc | 0.03469 | 0.95158 | 1.24999416e+00, 0x1.3fff9ep+0 New erfc | 0.02781 | 0.76497 | 1.24971473e+00, 0x1.3fed4ep+0 ---------+-----------+---------+------------------------------ Old erf | 0.03404 | 0.55617 | 1.24996567e+00, 0x1.3ffdcp+0 New erf | 0.02774 | 0.54213 | 8.52452755e-01, 0x1.b474bp-1 Note that s_erf.c claims that in this interval, a taylor series about erf(1) is used and suggests that the constant erx = 8.4506291151e-01 is erf(1). That's bogus as erf(1) = 8.42700779e-01F (in float). Note**2, I did not record a domain and range as my routine that generates plots seems to give a really messed up result for the range [-0.08, 0.06]. However, the error estimate from solving the Remes matrix equation is err = 0x1.39c84809b7ed2p-38. New diff: Index: src/s_erff.c =================================================================== --- src/s_erff.c (revision 1361) +++ src/s_erff.c (working copy) @@ -25,38 +25,34 @@ half= 5.0000000000e-01, /* 0x3F000000 * one = 1.0000000000e+00, /* 0x3F800000 */ two = 2.0000000000e+00, /* 0x40000000 */ /* c = (subfloat)0.84506291151 */ -erx = 8.4506291151e-01, /* 0x3f58560b */ +erx = 8.42700779e-01F, /* 0x1.af767ap-1 */ + /* * Coefficients for approximation to erf on [0,0.84375] */ efx = 1.2837916613e-01, /* 0x3e0375d4 */ efx8= 1.0270333290e+00, /* 0x3f8375d4 */ -pp0 = 1.2837916613e-01, /* 0x3e0375d4 */ -pp1 = -3.2504209876e-01, /* 0xbea66beb */ -pp2 = -2.8481749818e-02, /* 0xbce9528f */ -pp3 = -5.7702702470e-03, /* 0xbbbd1489 */ -pp4 = -2.3763017452e-05, /* 0xb7c756b1 */ -qq1 = 3.9791721106e-01, /* 0x3ecbbbce */ -qq2 = 6.5022252500e-02, /* 0x3d852a63 */ -qq3 = 5.0813062117e-03, /* 0x3ba68116 */ -qq4 = 1.3249473704e-04, /* 0x390aee49 */ -qq5 = -3.9602282413e-06, /* 0xb684e21a */ +/* + * Domain [0, 0.84375], range ~[-5.4446e-10,5.5197e-10]: + * |(erf(x) - x)/x - p(x)/q(x)| < 2**-31. + */ +pp0 = 1.28379166e-01F, /* 0x1.06eba8p-3 */ +pp1 = -3.36030394e-01F, /* -0x1.58185ap-2 */ +pp2 = -1.86260219e-03F, /* -0x1.e8451ep-10 */ +qq1 = 3.12324286e-01F, /* 0x1.3fd1f0p-2 */ +qq2 = 2.16070302e-02F, /* 0x1.620274p-6 */ +qq3 = -1.98859419e-03F, /* -0x1.04a626p-9 */ /* * Coefficients for approximation to erf in [0.84375,1.25] */ -pa0 = -2.3621185683e-03, /* 0xbb1acdc6 */ -pa1 = 4.1485610604e-01, /* 0x3ed46805 */ -pa2 = -3.7220788002e-01, /* 0xbebe9208 */ -pa3 = 3.1834661961e-01, /* 0x3ea2fe54 */ -pa4 = -1.1089469492e-01, /* 0xbde31cc2 */ -pa5 = 3.5478305072e-02, /* 0x3d1151b3 */ -pa6 = -2.1663755178e-03, /* 0xbb0df9c0 */ -qa1 = 1.0642088205e-01, /* 0x3dd9f331 */ -qa2 = 5.4039794207e-01, /* 0x3f0a5785 */ -qa3 = 7.1828655899e-02, /* 0x3d931ae7 */ -qa4 = 1.2617121637e-01, /* 0x3e013307 */ -qa5 = 1.3637083583e-02, /* 0x3c5f6e13 */ -qa6 = 1.1984500103e-02, /* 0x3c445aa3 */ +pa0 = 1.35131621e-08F, /* 0x1.d04f08p-27 */ +pa1 = 4.15107518e-01F, /* 0x1.a911f2p-2 */ +pa2 = -1.63339108e-01F, /* -0x1.4e84bcp-3 */ +pa3 = 1.12098485e-01F, /* 0x1.cb27c8p-4 */ +qa1 = 6.06513679e-01F, /* 0x1.3688f6p-1 */ +qa2 = 5.43227255e-01F, /* 0x1.1621e2p-1 */ +qa3 = 1.74396917e-01F, /* 0x1.652a36p-3 */ +qa4 = 5.88681065e-02F, /* 0x1.e23f5ep-5 */ /* * Coefficients for approximation to erfc in [1.25,1/0.35] */ @@ -114,15 +110,15 @@ erff(float x) return x + efx*x; } z = x*x; - r = pp0+z*(pp1+z*(pp2+z*(pp3+z*pp4))); - s = one+z*(qq1+z*(qq2+z*(qq3+z*(qq4+z*qq5)))); + r = pp0+z*(pp1+z*pp2); + s = one+z*(qq1+z*(qq2+z*qq3)); y = r/s; return x + x*y; } if(ix < 0x3fa00000) { /* 0.84375 <= |x| < 1.25 */ s = fabsf(x)-one; - P = pa0+s*(pa1+s*(pa2+s*(pa3+s*(pa4+s*(pa5+s*pa6))))); - Q = one+s*(qa1+s*(qa2+s*(qa3+s*(qa4+s*(qa5+s*qa6))))); + P = pa0+s*(pa1+s*(pa2+s*pa3)); + Q = one+s*(qa1+s*(qa2+s*(qa3+s*qa4))); if(hx>=0) return erx + P/Q; else return -erx - P/Q; } if (ix >= 0x40c00000) { /* inf>|x|>=6 */ @@ -163,8 +159,8 @@ erfcf(float x) if(ix < 0x23800000) /* |x|<2**-56 */ return one-x; z = x*x; - r = pp0+z*(pp1+z*(pp2+z*(pp3+z*pp4))); - s = one+z*(qq1+z*(qq2+z*(qq3+z*(qq4+z*qq5)))); + r = pp0+z*(pp1+z*pp2); + s = one+z*(qq1+z*(qq2+z*qq3)); y = r/s; if(hx < 0x3e800000) { /* x<1/4 */ return one-(x+x*y); @@ -176,8 +172,8 @@ erfcf(float x) } if(ix < 0x3fa00000) { /* 0.84375 <= |x| < 1.25 */ s = fabsf(x)-one; - P = pa0+s*(pa1+s*(pa2+s*(pa3+s*(pa4+s*(pa5+s*pa6))))); - Q = one+s*(qa1+s*(qa2+s*(qa3+s*(qa4+s*(qa5+s*qa6))))); + P = pa0+s*(pa1+s*(pa2+s*pa3)); + Q = one+s*(qa1+s*(qa2+s*(qa3+s*qa4))); if(hx>=0) { z = one-erx; return z - P/Q; } else { -- Steve From owner-freebsd-numerics@FreeBSD.ORG Sun Aug 25 17:58:18 2013 Return-Path: Delivered-To: freebsd-numerics@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 85E1E2A7 for ; Sun, 25 Aug 2013 17:58:18 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu [128.95.76.21]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 4ABF329B1 for ; Sun, 25 Aug 2013 17:58:18 +0000 (UTC) Received: from troutmask.apl.washington.edu (localhost.apl.washington.edu [127.0.0.1]) by troutmask.apl.washington.edu (8.14.7/8.14.7) with ESMTP id r7PHwHvO060609; Sun, 25 Aug 2013 10:58:17 -0700 (PDT) (envelope-from sgk@troutmask.apl.washington.edu) Received: (from sgk@localhost) by troutmask.apl.washington.edu (8.14.7/8.14.7/Submit) id r7PHwH61060608; Sun, 25 Aug 2013 10:58:17 -0700 (PDT) (envelope-from sgk) Date: Sun, 25 Aug 2013 10:58:17 -0700 From: Steve Kargl To: Bruce Evans Subject: Re: (2nd time) tweaks to erff() threshold values Message-ID: <20130825175817.GA60599@troutmask.apl.washington.edu> References: <20130822213315.GA6708@troutmask.apl.washington.edu> <20130823202257.Q1593@besplex.bde.org> <20130823165744.GA47369@troutmask.apl.washington.edu> <20130824202102.GA54981@troutmask.apl.washington.edu> <20130825023029.GA56772@troutmask.apl.washington.edu> <20130825171910.GA60396@troutmask.apl.washington.edu> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20130825171910.GA60396@troutmask.apl.washington.edu> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-numerics@FreeBSD.org X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 25 Aug 2013 17:58:18 -0000 On Sun, Aug 25, 2013 at 10:19:10AM -0700, Steve Kargl wrote: > > Note**2, I did not record a domain and range as my routine that > generates plots seems to give a really messed up result for > the range [-0.08, 0.06]. However, the error estimate from > solving the Remes matrix equation is err = 0x1.39c84809b7ed2p-38. Found the bug in the graphing routine. /* * Domain [0.84375, 1.25], range ~[-1.954e-10,1.940e-11]: * |(erf(x) - erx) - p(x)/q(x)| < 2**-36. */ > +pa0 = 1.35131621e-08F, /* 0x1.d04f08p-27 */ > +pa1 = 4.15107518e-01F, /* 0x1.a911f2p-2 */ > +pa2 = -1.63339108e-01F, /* -0x1.4e84bcp-3 */ > +pa3 = 1.12098485e-01F, /* 0x1.cb27c8p-4 */ > +qa1 = 6.06513679e-01F, /* 0x1.3688f6p-1 */ > +qa2 = 5.43227255e-01F, /* 0x1.1621e2p-1 */ > +qa3 = 1.74396917e-01F, /* 0x1.652a36p-3 */ > +qa4 = 5.88681065e-02F, /* 0x1.e23f5ep-5 */ Note, the range and 2**-36 estimate are from the results of the Remes algorithm run with 1024-bits of precision. The above coefficients were simply rounded to single precision, as I haven't investigated a better method for running, yet. -- Steve From owner-freebsd-numerics@FreeBSD.ORG Mon Aug 26 11:06:49 2013 Return-Path: Delivered-To: freebsd-numerics@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 07088197 for ; Mon, 26 Aug 2013 11:06:49 +0000 (UTC) (envelope-from owner-bugmaster@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:1900:2254:206c::16:87]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id E8CE82868 for ; Mon, 26 Aug 2013 11:06:48 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.7/8.14.7) with ESMTP id r7QB6mOV066021 for ; Mon, 26 Aug 2013 11:06:48 GMT (envelope-from owner-bugmaster@FreeBSD.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.7/8.14.7/Submit) id r7QB6m0e066019 for freebsd-numerics@FreeBSD.org; Mon, 26 Aug 2013 11:06:48 GMT (envelope-from owner-bugmaster@FreeBSD.org) Date: Mon, 26 Aug 2013 11:06:48 GMT Message-Id: <201308261106.r7QB6m0e066019@freefall.freebsd.org> X-Authentication-Warning: freefall.freebsd.org: gnats set sender to owner-bugmaster@FreeBSD.org using -f From: FreeBSD bugmaster To: freebsd-numerics@FreeBSD.org Subject: Current problem reports assigned to freebsd-numerics@FreeBSD.org X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 26 Aug 2013 11:06:49 -0000 Note: to view an individual PR, use: http://www.freebsd.org/cgi/query-pr.cgi?pr=(number). The following is a listing of current problems submitted by FreeBSD users. These represent problem reports covering all versions including experimental development code and obsolete releases. S Tracker Resp. Description -------------------------------------------------------------------------------- o stand/175811 numerics libstdc++ needs complex support in order use C99 o bin/170206 numerics [msun] [patch] complex arcsinh, log, etc. o stand/82654 numerics C99 long double math functions are missing 3 problems total. From owner-freebsd-numerics@FreeBSD.ORG Mon Aug 26 14:27:43 2013 Return-Path: Delivered-To: freebsd-numerics@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 7E191B23 for ; Mon, 26 Aug 2013 14:27:43 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail109.syd.optusnet.com.au (mail109.syd.optusnet.com.au [211.29.132.80]) by mx1.freebsd.org (Postfix) with ESMTP id 4311A27D3 for ; Mon, 26 Aug 2013 14:27:43 +0000 (UTC) Received: from c122-106-156-23.carlnfd1.nsw.optusnet.com.au (c122-106-156-23.carlnfd1.nsw.optusnet.com.au [122.106.156.23]) by mail109.syd.optusnet.com.au (Postfix) with ESMTPS id B4BAAD64236; Tue, 27 Aug 2013 00:27:31 +1000 (EST) Date: Tue, 27 Aug 2013 00:27:30 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Steve Kargl Subject: Re: (2nd time) tweaks to erff() threshold values In-Reply-To: <20130823165744.GA47369@troutmask.apl.washington.edu> Message-ID: <20130826235215.X2165@besplex.bde.org> References: <20130822213315.GA6708@troutmask.apl.washington.edu> <20130823202257.Q1593@besplex.bde.org> <20130823165744.GA47369@troutmask.apl.washington.edu> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.1 cv=YYGEuWhf c=1 sm=1 tr=0 a=ebeQFi2P/qHVC0Yw9JDJ4g==:117 a=PO7r1zJSAAAA:8 a=F6uwjYSw4mAA:10 a=kj9zAlcOel0A:10 a=JzwRw_2MAAAA:8 a=bRjWgPqadKYA:10 a=EmHTJ-zS--LAX7py56IA:9 a=CjuIK1q_8ugA:10 Cc: freebsd-numerics@FreeBSD.org X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 26 Aug 2013 14:27:43 -0000 Catching up with old mail... On Fri, 23 Aug 2013, Steve Kargl wrote: > On Fri, Aug 23, 2013 at 09:12:33PM +1000, Bruce Evans wrote: > ... >> Some modified lines: >> >> @ if(ix < 0x3f580000) { /* |x|<0.84375 */ >> @ if(ix < 0x38800000) { /* |x|<2**-14 */ >> @ if (ix < 0x04000000) /* |x|<0x1p-119 */ >> @ return (8*x+efx8*x)/8; /*avoid spurious underflow */ >> @ } >> @ return x + efx*x; >> @ } > > OK. I'll update my patch to use the above. Did you see my > suggested changes to erfcf()? Yes. I didn't try them because I neglected to test erfc*() before so I didn't have anything to compare with. I thought that erfc*() was so similar to erf*() that it didn't need testing. But testing shows that it did need testing since it is quite broken. I get errors like the following, and the changes in your later mail don't affect this much: (oops, the change of the threshold from 28 to 9.194705 in this mail did affect this and gave very large errors of up to 2048 starting at the new threshold. The extra errors seem to be the result of underflow to 0 (even on i386 with some extra precision). With that backed out): Very large errors start about here: % x = 0x407b9000 3.93066406 % erfc(x) = 0x3e5d2abbd6aee678 0x32e955df 2.71638191e-08 % erfcf(x) = 0x3e5d2abdc0000000 0x32e955ee 2.71638463e-08 % err = +0x1e9511988 15.29115 This was the largest one found: % x = 0x40ffd000 7.99414062 % erfc(x) = 0x39ef467e0a9d6031 0xf7a33f0 1.23359547e-29 % erfcf(x) = 0x39ef4685e0000000 0xf7a342f 1.23360018e-29 % err = +0x7d5629fcf 62.66829 This is due to a large cancelation for subtracting 1. The fdlibm algorithm is no good. erfc() is much better in old BSD libm. It is almost the same as fdlibm except it avoids this problem: fdlibm erfc(): % z = x; % SET_LOW_WORD(z,0); % r = __ieee754_exp(-z*z-0.5625)* % __ieee754_exp((z-x)*(z+x)+R/S); % if(hx>0) return r/x; else return two-r/x; Old BSD erfc() (from FreeBSD-1.1.5): % /* return exp(-x^2 - lsqrtPI_hi + R + y)/x; */ % s = ((R + y) - lsqrtPI_hi) + z; % y = (((z-s) - lsqrtPI_hi) + R) + y; % r = exp__D(s, y)/x; % if (x>0) % return r; % else % return two-r; The extra-precision function exp__D() gives some chance of a not very large error. I think it only gives about 10 more bits of accuracy. I don't quite see how it can work for very large x when the cancelation is almost complete. Replacing the expf()'s by exp()'s in the erfc() reduced the error similarly, but it is still above 1 ulp. The old BSD erf*() also has more comments about what the error actually is, but these don't clearly distinguish erfc() from erf(). The old BSD erf*()'s file name is not spelled with an s_. > ... Bruce From owner-freebsd-numerics@FreeBSD.ORG Mon Aug 26 14:30:25 2013 Return-Path: Delivered-To: freebsd-numerics@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 6A9AAC1B for ; Mon, 26 Aug 2013 14:30:25 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail110.syd.optusnet.com.au (mail110.syd.optusnet.com.au [211.29.132.97]) by mx1.freebsd.org (Postfix) with ESMTP id 2F95A282E for ; Mon, 26 Aug 2013 14:30:24 +0000 (UTC) Received: from c122-106-156-23.carlnfd1.nsw.optusnet.com.au (c122-106-156-23.carlnfd1.nsw.optusnet.com.au [122.106.156.23]) by mail110.syd.optusnet.com.au (Postfix) with ESMTPS id E78DE783074; Tue, 27 Aug 2013 00:30:13 +1000 (EST) Date: Tue, 27 Aug 2013 00:30:12 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Steve Kargl Subject: Re: (2nd time) tweaks to erff() threshold values In-Reply-To: <20130823171526.GA47736@troutmask.apl.washington.edu> Message-ID: <20130827002821.K2328@besplex.bde.org> References: <20130822213315.GA6708@troutmask.apl.washington.edu> <20130823202257.Q1593@besplex.bde.org> <20130823171526.GA47736@troutmask.apl.washington.edu> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.1 cv=bpB1Wiqi c=1 sm=1 tr=0 a=ebeQFi2P/qHVC0Yw9JDJ4g==:117 a=PO7r1zJSAAAA:8 a=F6uwjYSw4mAA:10 a=kj9zAlcOel0A:10 a=JzwRw_2MAAAA:8 a=bRjWgPqadKYA:10 a=0N4hs-jxxM66dZRz4AkA:9 a=CjuIK1q_8ugA:10 Cc: freebsd-numerics@freebsd.org, Bruce Evans X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 26 Aug 2013 14:30:25 -0000 On Fri, 23 Aug 2013, Steve Kargl wrote: > On Fri, Aug 23, 2013 at 09:12:33PM +1000, Bruce Evans wrote: >> >> @ if(ix < 0x3f580000) { /* |x|<0.84375 */ >> @ if(ix < 0x38800000) { /* |x|<2**-14 */ >> @ if (ix < 0x04000000) /* |x|<0x1p-119 */ >> @ return (8*x+efx8*x)/8; /*avoid spurious underflow */ > > To keep the diff between s_erf.c and s_erff.c small, do you > want my to update s_erf.c with the last line above? Yes, including the style change from *0.125 to /8 and any similar style changes you can find (or don't change this anywhere), and the comment on the threshold for underflow. Bruce From owner-freebsd-numerics@FreeBSD.ORG Mon Aug 26 15:23:33 2013 Return-Path: Delivered-To: freebsd-numerics@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 02EA0772 for ; Mon, 26 Aug 2013 15:23:33 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail107.syd.optusnet.com.au (mail107.syd.optusnet.com.au [211.29.132.53]) by mx1.freebsd.org (Postfix) with ESMTP id A18FE2B57 for ; Mon, 26 Aug 2013 15:23:32 +0000 (UTC) Received: from c122-106-156-23.carlnfd1.nsw.optusnet.com.au (c122-106-156-23.carlnfd1.nsw.optusnet.com.au [122.106.156.23]) by mail107.syd.optusnet.com.au (Postfix) with ESMTPS id BF567D440CD; Tue, 27 Aug 2013 01:23:20 +1000 (EST) Date: Tue, 27 Aug 2013 01:23:19 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Steve Kargl Subject: Re: (2nd time) tweaks to erff() threshold values In-Reply-To: <20130825171910.GA60396@troutmask.apl.washington.edu> Message-ID: <20130827003147.R2328@besplex.bde.org> References: <20130822213315.GA6708@troutmask.apl.washington.edu> <20130823202257.Q1593@besplex.bde.org> <20130823165744.GA47369@troutmask.apl.washington.edu> <20130824202102.GA54981@troutmask.apl.washington.edu> <20130825023029.GA56772@troutmask.apl.washington.edu> <20130825171910.GA60396@troutmask.apl.washington.edu> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.1 cv=DstvpgP+ c=1 sm=1 tr=0 a=ebeQFi2P/qHVC0Yw9JDJ4g==:117 a=PO7r1zJSAAAA:8 a=F6uwjYSw4mAA:10 a=kj9zAlcOel0A:10 a=JzwRw_2MAAAA:8 a=bRjWgPqadKYA:10 a=NYRZoR52SQm7tVCai4AA:9 a=CjuIK1q_8ugA:10 a=udEmAkY8_B0A:10 Cc: freebsd-numerics@freebsd.org X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 26 Aug 2013 15:23:33 -0000 On Sun, 25 Aug 2013, Steve Kargl wrote: > On Sat, Aug 24, 2013 at 07:30:29PM -0700, Steve Kargl wrote: >> On Sat, Aug 24, 2013 at 01:21:02PM -0700, Steve Kargl wrote: >>> On Fri, Aug 23, 2013 at 09:57:44AM -0700, Steve Kargl wrote: >>>> On Fri, Aug 23, 2013 at 09:12:33PM +1000, Bruce Evans wrote: >>>> >>>>> The whole erf implementation is probably not the best way for >>>>> float precision, but is good for understanding higher precisions. >>>> >>>> I'm fairly certain that the implementation of erff() is not very >>>> efficient. The polynomials, used in the rational approximations, >>>> are the same order as those used in the double precision approximation. >>>> I'll check the polys when update my P(x)/Q(x) remes algorithm for >>>> erfl and erfcl. >>> >>> I seem to be right (although I haven't iterated on the P's and Q's). >> >> Now, with pretty testing and (perhaps) better coefficients: You mean pretty ugly formatting of the declarations of th coefficients to match the existing style :-). > These values where for the interval [0,0.84375]. >> >> | usec/call | Max ULP | X Max ULP >> ---------+-----------+---------+--------------------------------- >> new erfc | 0.02757 | 0.65947 | 8.19824636e-01, 0x1.a3c00ep-1 >> old erfc | 0.03348 | 0.68218 | 8.43257010e-01, 0x1.afbf62p-1 >> ---------+-----------+---------+--------------------------------- >> new erf | 0.02302 | 0.62437 | 1.04175471e-38, 0x1.c5bf88p-127 >> old erf | 0.03028 | 0.62437 | 1.04175471e-38, 0x1.c5bf88p-127 Be careful testing on i386. Its extra precision gives smaller errors. These look like the i386 errors. Old erff has a maximum error of more than 0.9 ulps on all args. > New interval [0.84375,1.25]. > > | usecs/call| Max ULP | X Max ULP > ---------+-----------+---------+------------------------------ > Old erfc | 0.03469 | 0.95158 | 1.24999416e+00, 0x1.3fff9ep+0 > New erfc | 0.02781 | 0.76497 | 1.24971473e+00, 0x1.3fed4ep+0 > ---------+-----------+---------+------------------------------ > Old erf | 0.03404 | 0.55617 | 1.24996567e+00, 0x1.3ffdcp+0 > New erf | 0.02774 | 0.54213 | 8.52452755e-01, 0x1.b474bp-1 I see an improvement like this on amd64 but not on i386. New erf is down from 0.9+ to 0.8+ on amd64 on all args, but still 0.6+ on i386. erfc has much larger errors on other intervals and I didn't test this interval by itself. > Note that s_erf.c claims that in this interval, a taylor series about > erf(1) is used and suggests that the constant erx = 8.4506291151e-01 > is erf(1). That's bogus as erf(1) = 8.42700779e-01F (in float). Well, not quite. It says: % * Remark: here we use the taylor series expansion at x=1. % * erf(1+s) = erf(1) + s*Poly(s) % * = 0.845.. + P1(s)/Q1(s) % * That is, we use rational approximation to approximate % * erf(1+s) - (c = (single)0.84506291151) % * Note that |P1/Q1|< 0.078 for x in [0.84375,1.25] % * where % * P1(s) = degree 6 poly in s % * Q1(s) = degree 6 poly in s It puts some of erf(1) in 0.845 and some in P1(s)/Q1(s). > New diff: It is relative to a version that already has the threshold changes. > Index: src/s_erff.c > ... > -pa0 = -2.3621185683e-03, /* 0xbb1acdc6 */ > ... > -qa1 = 1.0642088205e-01, /* 0x3dd9f331 */ > ... > +pa0 = 1.35131621e-08F, /* 0x1.d04f08p-27 */ > ... > +qa1 = 6.06513679e-01F, /* 0x1.3688f6p-1 */ (qa0 is spelled 'one' in both versions.) You still have some of erf(1) in pa0/qa0, but not as much as before. The decomposition should be something like hi+lo -- write erf(1) = hi+lo and put hi in erx and lo in pa0/qa0 = pa0. Is that exactly what you do? I wonder if there is a technical reason for putting more in pa0/qa0. Or maybe the whole expression should be written as hi+P/Q+lo, where P(0) = 0. Sometimes rational approximations should be done by splitting out even more terms from the P/Q part and adding the other terms more carefully. The parts that are split out are chosen to make them easier to add up carefully, while keeping the parts that are not split out small so that they don't need to be evaluated carefully. See k_tan.c for an example. Another bug in the comment is that it hasn't caught up the code. The variable 'erx' was spelled 'c' in old BSD libm. It is still spelled 'c' in the fdlibm comment. The comment does say that 'c' is rounded to single. That is good for a hi+lo decomposition, and it may be necessary for technical reasons to keep some of 'c's lower bits zero. Rounding it to single doesn't give this when the precision is already single. Old erff didn't round it further either. Bruce From owner-freebsd-numerics@FreeBSD.ORG Mon Aug 26 17:19:20 2013 Return-Path: Delivered-To: freebsd-numerics@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 1D042198 for ; Mon, 26 Aug 2013 17:19:20 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail108.syd.optusnet.com.au (mail108.syd.optusnet.com.au [211.29.132.59]) by mx1.freebsd.org (Postfix) with ESMTP id BC0B7223E for ; Mon, 26 Aug 2013 17:19:19 +0000 (UTC) Received: from c122-106-156-23.carlnfd1.nsw.optusnet.com.au (c122-106-156-23.carlnfd1.nsw.optusnet.com.au [122.106.156.23]) by mail108.syd.optusnet.com.au (Postfix) with ESMTPS id F06991A054A; Tue, 27 Aug 2013 03:19:13 +1000 (EST) Date: Tue, 27 Aug 2013 03:19:13 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Steve Kargl Subject: Re: (2nd time) tweaks to erff() threshold values In-Reply-To: <20130825175817.GA60599@troutmask.apl.washington.edu> Message-ID: <20130827012348.G2328@besplex.bde.org> References: <20130822213315.GA6708@troutmask.apl.washington.edu> <20130823202257.Q1593@besplex.bde.org> <20130823165744.GA47369@troutmask.apl.washington.edu> <20130824202102.GA54981@troutmask.apl.washington.edu> <20130825023029.GA56772@troutmask.apl.washington.edu> <20130825171910.GA60396@troutmask.apl.washington.edu> <20130825175817.GA60599@troutmask.apl.washington.edu> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.1 cv=YYGEuWhf c=1 sm=1 tr=0 a=ebeQFi2P/qHVC0Yw9JDJ4g==:117 a=PO7r1zJSAAAA:8 a=F6uwjYSw4mAA:10 a=kj9zAlcOel0A:10 a=JzwRw_2MAAAA:8 a=bRjWgPqadKYA:10 a=xRB_yx2URKs_nkmv_ssA:9 a=CjuIK1q_8ugA:10 Cc: freebsd-numerics@freebsd.org X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 26 Aug 2013 17:19:20 -0000 On Sun, 25 Aug 2013, Steve Kargl wrote: > On Sun, Aug 25, 2013 at 10:19:10AM -0700, Steve Kargl wrote: >> >> Note**2, I did not record a domain and range as my routine that >> generates plots seems to give a really messed up result for >> the range [-0.08, 0.06]. However, the error estimate from >> solving the Remes matrix equation is err = 0x1.39c84809b7ed2p-38. > > Found the bug in the graphing routine. > > /* > * Domain [0.84375, 1.25], range ~[-1.954e-10,1.940e-11]: > * |(erf(x) - erx) - p(x)/q(x)| < 2**-36. > */ > > Note, the range and 2**-36 estimate are from the results of > the Remes algorithm run with 1024-bits of precision. The > above coefficients were simply rounded to single precision, > as I haven't investigated a better method for running, yet. I now have a good method for rounding the coefficients. It is like my old iterative method of rounding the the n'th coeff and then calculating all coeffs after the n'th again with all coeffs up to and including the n'th fixed, except I now use an almost pure Remez algorithm for all steps, and later another set of iterations to recaclulate all early coeffs after fixing some later ones. Here is the almost pure Remez algorithm in pari: % cpolinterpolate(p, d, a) = - p is a polynomial containing fixed leading coeffs (0 for basic Remez) - d is the order of the lowest non-fixed coeff (0 for basic Remez) - 'a' is an array of n + 2 "interpolation" points (basic Remez) - the function to be approximated is a global f(x) - the function giving the relative errors a global e(x) For example, f(x) might be cos(x) ~= 1 - 0.5*x**2 + ... After fixing the first 2 coeffs, p = 1 - 0.5*x**2, d = 4, and we have a modified Remez approximation problem of finding a poly starting with the x**4 term. Nonzero d requires one obvious change to basic Remez (set up the equations to be solved starting at the x**d term instead of the x**0 term), and one not so obvious change (if d is odd, then the signs in the +-1 column (the coefficients for the error E) of the matrix must be flipped for negative interpolation points). You can understand the need for something like the latter by understanding basic Remez staring at the matrix a little. Note that the signes in the +- column for basic Remez give a nonsingular matrix. This just fails for odd d, but is fixed by the adjustment just described, except when the interpolation points are symmetric with the origin. The symmetric case is easy to avoid in practice. % { % local(M, c, i, j, n, v); % % n = matsize(a)[2] - 2; % v = vector(n); % M = matrix(n, n); % for (i = 1, n, % v[i] = f(a[1 + i]) - subst(p, x, a[1 + i]); Function value at an interpolation point after adjusting for the fixed part p = p(x). % if (evenfunc, % for (j = 1, n - 1, M[i, j] = a[1 + i]^(d + 2*(j - 1));); % , % for (j = 1, n - 1, M[i, j] = a[1 + i]^(d + j - 1);); % ); % M[i, n] = (-1)^i; This sets up basic Remez, with the adjustment for x**d instead of x**0. The special case for even functions is not quite just an optimization. % if (d % 2 == 1 && a[1 + i] < 0, % M[i, n] = -M[i, n]; % ); This flips the signs. % M[i, n] *= e(a[1 + i]); This changes the +-1 column to a weighted version of the same. It is best for e(x) to be as smooth as f(x), but this algorithm does OK with even widly discontinuious e(x) (e.g., 2**ilogb(x)). % ); % v = precision(v, default(realprecision)); % M = precision(M, default(realprecision)); % c = matsolve(M, v~); % c = precision(c, default(realprecision)); Solve the equation, with technical complications to avoid problems with insufficient ot too much precision. % gE = c[n]; This is the global error (E). % if (evenfunc, % for (j = 1, n - 1, p += c[j] * x^(d + 2 * (j - 1));); % , % for (j = 1, n - 1, p += c[j] * x^(d + j - 1);); % ); Add the new terms to the fixed terms in p. % return (p); % } The iteration step in basic Remez needs an adjustment for signs too. In basic Remez, the signs of the errors at the interpolation points alternate (+-E), and a new set of interpolation points is chosen to preserve this alternation while choosing points where the error is larger than gE. The sign flipping must be applied similary. E.g., if for basic Remez the error pattern is 'E -E | E -E E' where '|' is the origin, the for odd d the error pattern becomes '-E E | E -E E'. Note that the signs no longer alternate across the origin. Complications like the following are handled very well by this algorithm: suppose that in approximating cos(x) we fix the x**2 coeff at -0.4999 instead of -0.5 (similar inaccuracies are unavoidable for higher terms). Then the naive method of trying to approximate (cos(x) - (1 - 0.4999*x**2) / x^4 using basic Remez doesn't work because the function to be approximated is singular at the origin. (We divided by x**4 to try to reduce to basic Remez.) The singularity tends to cause singular matrixes. I used to use essentially this method, with hack to avoid the singularity. It basically works to a not try to approximate near the singularity, since multiplying by x**4 will cancel the singularity and leave a good approximation near it. But it is hard to keep the interpolation points away from the singularity without having many special cases. The above algorithm hits a sort of removable singularity at 0 instead, It seems to be numerically stable near the singularity if f(x) is smooth. To handle some coeffs being rounded more severely than others, the following algorithm works suprisingly well: - do the above iterations to round all coeffs from the first to the last - fix all the (trailing) coeffs that are rounded more severely than the others - iterate from the first to he last non-trailing coeff to calculate all the leading coeffs again, now for the new function f(x) - p1(x) where p1 is the poly with the trailing coeffs, and of course the poly for the leading coeffs has degree reduced by the number of trailing coeffs compared with the poly found in the first pass. This makes the leading coeffs match the trailing coeffs better than is possible on the first pass. On the first pass, leading coeffs are chosen under the assumption that trailing coeffs are infinte-precision. When they are severely rounded, this assumption breaks badly. This algorithm works so well that trying to improve it using more passes gets nowhere or backwards. E.g., it usually makes no significant difference to caclulate all the trailing coeffs again with the recalulated set of leading coeffs. Someone at INRIA wrote a paper about poly approximations with some coeffs forced to 0. Normally, all the generators {x**0, x**1, x**2, ...} are allowed. Forcing some coeffs to 0 restricts the generators to a specifed subset of the standard one. The author noted that this can easily give a singular matrix for the corresponding extended Remez algorithm. I think too easily to be very useful. My extension works fairly generally for the special subset {x**d, x**(d+1), ...}. I have the same amount of proof that it works as the paper (none). I didn't notice anything about the complications for the sign flipping in the paper. Perhaps it required the subset to include {x**0} for general functions or {x**1} for odd functions. Without the function being 0 at 0, it is rarely possible to approximate it well enough if the x**0 coeff is forced to 0. Similarly for higher coeffs that want to be nonzero. The point of forcing coeffs to 0 is that small ones tend to give terms that are harder to add up accurately. But the method seems to have very limited use since only really tiny terms can be omitted without having to do too much compsensation in other terms. Bruce From owner-freebsd-numerics@FreeBSD.ORG Mon Aug 26 17:43:14 2013 Return-Path: Delivered-To: freebsd-numerics@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 37956E5C for ; Mon, 26 Aug 2013 17:43:14 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu [128.95.76.21]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id C4E5B23CD for ; Mon, 26 Aug 2013 17:43:13 +0000 (UTC) Received: from troutmask.apl.washington.edu (localhost.apl.washington.edu [127.0.0.1]) by troutmask.apl.washington.edu (8.14.7/8.14.7) with ESMTP id r7QHh7lk067685; Mon, 26 Aug 2013 10:43:07 -0700 (PDT) (envelope-from sgk@troutmask.apl.washington.edu) Received: (from sgk@localhost) by troutmask.apl.washington.edu (8.14.7/8.14.7/Submit) id r7QHh7tR067684; Mon, 26 Aug 2013 10:43:07 -0700 (PDT) (envelope-from sgk) Date: Mon, 26 Aug 2013 10:43:07 -0700 From: Steve Kargl To: Bruce Evans Subject: Re: (2nd time) tweaks to erff() threshold values Message-ID: <20130826174307.GA67511@troutmask.apl.washington.edu> References: <20130822213315.GA6708@troutmask.apl.washington.edu> <20130823202257.Q1593@besplex.bde.org> <20130823165744.GA47369@troutmask.apl.washington.edu> <20130824202102.GA54981@troutmask.apl.washington.edu> <20130825023029.GA56772@troutmask.apl.washington.edu> <20130825171910.GA60396@troutmask.apl.washington.edu> <20130827003147.R2328@besplex.bde.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20130827003147.R2328@besplex.bde.org> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-numerics@freebsd.org X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 26 Aug 2013 17:43:14 -0000 On Tue, Aug 27, 2013 at 01:23:19AM +1000, Bruce Evans wrote: > On Sun, 25 Aug 2013, Steve Kargl wrote: > >> These values where for the interval [0,0.84375]. >>> >>> | usec/call | Max ULP | X Max ULP >>> ---------+-----------+---------+--------------------------------- >>> new erfc | 0.02757 | 0.65947 | 8.19824636e-01, 0x1.a3c00ep-1 >>> old erfc | 0.03348 | 0.68218 | 8.43257010e-01, 0x1.afbf62p-1 >>> ---------+-----------+---------+--------------------------------- >>> new erf | 0.02302 | 0.62437 | 1.04175471e-38, 0x1.c5bf88p-127 >>> old erf | 0.03028 | 0.62437 | 1.04175471e-38, 0x1.c5bf88p-127 > > Be careful testing on i386. Its extra precision gives smaller errors. > These look like the i386 errors. Old erff has a maximum error of more > than 0.9 ulps on all args. Have I ever mentioned how much I dislike the i386? :-) New test results for both i386 and amd64 follow my .sig. New diff that merges all previous diff follows the test results. > > Note that s_erf.c claims that in this interval, a taylor series about > > erf(1) is used and suggests that the constant erx = 8.4506291151e-01 > > is erf(1). That's bogus as erf(1) = 8.42700779e-01F (in float). > > Well, not quite. It says: > > % * Remark: here we use the taylor series expansion at x=1. > % * erf(1+s) = erf(1) + s*Poly(s) > % * = 0.845.. + P1(s)/Q1(s) > % * That is, we use rational approximation to approximate > % * erf(1+s) - (c = (single)0.84506291151) > % * Note that |P1/Q1|< 0.078 for x in [0.84375,1.25] > % * where > % * P1(s) = degree 6 poly in s > % * Q1(s) = degree 6 poly in s > > It puts some of erf(1) in 0.845 and some in P1(s)/Q1(s). I see. I (obviously) not did glean this from the comment. The (c = (single)0.84506291151) isn't the leading 24-bits of erf(1). It's the leading 24-bits of a constant near erf(1). I can regenerate polynomials with a hi+lo split. > > Index: src/s_erff.c > > ... > > -pa0 = -2.3621185683e-03, /* 0xbb1acdc6 */ > > ... > > -qa1 = 1.0642088205e-01, /* 0x3dd9f331 */ > > ... > > +pa0 = 1.35131621e-08F, /* 0x1.d04f08p-27 */ > > ... > > +qa1 = 6.06513679e-01F, /* 0x1.3688f6p-1 */ > > (qa0 is spelled 'one' in both versions.) Yes 'qa0' is spelled as 'one'. I may remove this, and spell 'qa0' as '1'. > You still have some of erf(1) in pa0/qa0, but not as much as before. The > decomposition should be something like hi+lo -- write erf(1) = hi+lo and > put hi in erx and lo in pa0/qa0 = pa0. Is that exactly what you do? I > wonder if there is a technical reason for putting more in pa0/qa0. I did not do any splitting. I set erx = erf(1) in float precision. pa0 ~ 1e-8f is on the order of epsilon, and the results of solving a matrix equation. Perhaps, hi+lo was used to ensure the stability of the matrix solver. > maybe the whole expression should be written as hi+P/Q+lo, where P(0) = 0. > Sometimes rational approximations should be done by splitting out even > more terms from the P/Q part and adding the other terms more carefully. > The parts that are split out are chosen to make them easier to add up > carefully, while keeping the parts that are not split out small so that > they don't need to be evaluated carefully. See k_tan.c for an example. I left my copy of Hildebrand at work over the weekend, so I haven't reviewed the pitfalls of rational approximations, yet. > The comment does say that 'c' is rounded to single. That is good for > a hi+lo decomposition, and it may be necessary for technical reasons > to keep some of 'c's lower bits zero. Rounding it to single doesn't > give this when the precision is already single. Old erff didn't round > it further either. I wasn't aware of old BSD libm code. After working through the code in s_erf[f].c, it seemed to me that a better algorithm may exist, but I haven't tried to look for one. As you may guess, I'm gearing up to translate s_erf.c into a s_erfl.c. -- Steve erfc (Look at the old and new ULP for 3rd and 4th interval. This is due to the change in mask from 0xfffff000 to 0xffffe000.) i386 -----+ Code | Interval | us/call | Max ULP | X Max ULP -----+----------------+---------+---------+------------------------------- Old | [0, 0.84375] | 0.03435 | 0.68218 | 8.43257010e-01, 0x1.afbf62p-1 Old | [0.84375,1.25] | 0.03486 | 0.95158 | 1.24999416e+00, 0x1.3fff9ep+0 Old | [1.25,2.857143]| 0.14772 | 5.81600 | 1.88727951e+00, 0x1.e324c0p+0 Old | [2.857143,12] | 0.27741 | 63.6824 | 7.99420977e+00, 0x1.ffa122p+2 -----+----------------+---------+---------+------------------------------- New | [0, 0.84375] | 0.02809 | 0.65947 | 8.19824636e-01, 0x1.a3c00ep-1 New | [0.84375,1.25] | 0.02780 | 0.76497 | 1.24971473e+00, 0x1.3fed4ep+0 New | [1.25,2.857143]| 0.13546 | 2.10176 | 1.88294506e+00, 0x1.e208b0p+0 New | [2.857143,12] | 0.18424 | 1.94002 | 4.00527334e+00, 0x1.005666p+2 -----+----------------+---------+---------+------------------------------- Amd64 -----+ Code | Interval | us/call | Max ULP | X Max ULP -----+----------------+---------+---------+------------------------------- Old | [0, 0.84375] | 0.02847 | 2.77642 | 8.43427539e-01, 0x1.afd5bcp-1 Old | [0.84375,1.25] | 0.02755 | 2.35308 | 1.24922514e+00, 0x1.3fcd38p+0 Old | [1.25,2.857143]| 0.10872 | 6.27739 | 1.88167512e+00, 0x1.e1b576p+0 Old | [2.857143,12] | 0.14263 | 63.8095 | 7.99418974e+00, 0x1.ffa0cep+2 -----+----------------+---------+---------+------------------------------- New | [0, 0.84375] | 0.02440 | 2.77756 | 8.33204687e-01, 0x1.aa99cep-1 New | [0.84375,1.25] | 0.02337 | 2.31164 | 1.24426317e+00, 0x1.3e8808p+0 New | [1.25,2.857143]| 0.10051 | 3.16774 | 1.34512150e+00, 0x1.5859e2p+0 New | [2.857143,12] | 0.10205 | 2.90042 | 4.08927298e+00, 0x1.05b6a6p+2 -----+----------------+---------+---------+------------------------------- erf (Not much change in ULP, but increase in speed). i386 -----+ Code | Interval | usec/call | Max ULP | X Max ULP -----+----------------+-----------+---------+-------------------------------- Old | [0, 0.84375] | 0.03074 | 0.62437 | 1.04175471e-38, 0x1.c5bf88p-127 Old | [0.84375,1.25] | 0.03405 | 0.55617 | 1.24996567e+00, 0x1.3ffdc0p+0 Old | [1.25,2.857143]| 0.14923 | 0.61903 | 1.25788522e+00, 0x1.4204c4p+0 Old | [2.857143,12] | 0.05959 | 0.50002 | 2.86163688e+00, 0x1.6e4a1ep+1 -----+----------------+-----------+---------+-------------------------------- New | [0, 0.84375] | 0.02347 | 0.62437 | 1.04175471e-38, 0x1.c5bf88p-127 New | [0.84375,1.25] | 0.02774 | 0.54213 | 8.52452755e-01, 0x1.b474b0p-1 New | [1.25,2.857143]| 0.14229 | 0.59585 | 1.26142204e+00, 0x1.42ec8ep+0 New | [2.857143,12] | 0.02932 | 0.50002 | 2.86393452e+00, 0x1.6e9568p+1 -----+----------------+-----------+---------+-------------------------------- Amd64 -----+ Code | Interval | us/call | Max ULP | X Max ULP -----+----------------+---------+---------+-------------------------------- Old | [0, 0.84375] | 0.02699 | 0.96792 | 8.36685717e-01, 0x1.ac6212p-1 Old | [0.84375,1.25] | 0.02912 | 0.76611 | 1.23045731e+00, 0x1.3aff4p+0 Old | [1.25,2.857143]| 0.10780 | 0.74929 | 1.25055075e+00, 0x1.402418p+0 Old | [2.857143,12] | 0.04671 | 0.50008 | 2.89601469e+00, 0x1.72b09cp+1 -----+----------------+---------+---------+-------------------------------- New | [0, 0.84375] | 0.02241 | 0.94208 | 8.36147904e-01, 0x1.ac1b94p-1 New | [0.84375,1.25] | 0.02486 | 0.76446 | 1.18880725e+00, 0x1.3055acp+0 New | [1.25,2.857143]| 0.10124 | 0.72032 | 1.28225052e+00, 0x1.484192p+0 New | [2.857143,12] | 0.02954 | 0.50003 | 2.89976478e+00, 0x1.732b7ep+1 -----+----------------+---------+---------+-------------------------------- Index: src/s_erff.c =================================================================== --- src/s_erff.c (revision 1361) +++ src/s_erff.c (working copy) @@ -24,75 +24,61 @@ tiny = 1e-30, half= 5.0000000000e-01, /* 0x3F000000 */ one = 1.0000000000e+00, /* 0x3F800000 */ two = 2.0000000000e+00, /* 0x40000000 */ - /* c = (subfloat)0.84506291151 */ -erx = 8.4506291151e-01, /* 0x3f58560b */ + +erx = 8.42700779e-01F, /* 0x1.af767ap-1 */ + /* * Coefficients for approximation to erf on [0,0.84375] */ efx = 1.2837916613e-01, /* 0x3e0375d4 */ efx8= 1.0270333290e+00, /* 0x3f8375d4 */ -pp0 = 1.2837916613e-01, /* 0x3e0375d4 */ -pp1 = -3.2504209876e-01, /* 0xbea66beb */ -pp2 = -2.8481749818e-02, /* 0xbce9528f */ -pp3 = -5.7702702470e-03, /* 0xbbbd1489 */ -pp4 = -2.3763017452e-05, /* 0xb7c756b1 */ -qq1 = 3.9791721106e-01, /* 0x3ecbbbce */ -qq2 = 6.5022252500e-02, /* 0x3d852a63 */ -qq3 = 5.0813062117e-03, /* 0x3ba68116 */ -qq4 = 1.3249473704e-04, /* 0x390aee49 */ -qq5 = -3.9602282413e-06, /* 0xb684e21a */ /* - * Coefficients for approximation to erf in [0.84375,1.25] + * Domain [0, 0.84375], range ~[-5.4446e-10,5.5197e-10]: + * |(erf(x) - x)/x - p(x)/q(x)| < 2**-31. + */ +pp0 = 1.28379166e-01F, /* 0x1.06eba8p-3 */ +pp1 = -3.36030394e-01F, /* -0x1.58185ap-2 */ +pp2 = -1.86260219e-03F, /* -0x1.e8451ep-10 */ +qq1 = 3.12324286e-01F, /* 0x1.3fd1f0p-2 */ +qq2 = 2.16070302e-02F, /* 0x1.620274p-6 */ +qq3 = -1.98859419e-03F, /* -0x1.04a626p-9 */ +/* + * Domain [0.84375, 1.25], range ~[-1.954e-10,1.940e-11]: + * |(erf(x) - erx) - p(x)/q(x)| < 2**-36. */ -pa0 = -2.3621185683e-03, /* 0xbb1acdc6 */ -pa1 = 4.1485610604e-01, /* 0x3ed46805 */ -pa2 = -3.7220788002e-01, /* 0xbebe9208 */ -pa3 = 3.1834661961e-01, /* 0x3ea2fe54 */ -pa4 = -1.1089469492e-01, /* 0xbde31cc2 */ -pa5 = 3.5478305072e-02, /* 0x3d1151b3 */ -pa6 = -2.1663755178e-03, /* 0xbb0df9c0 */ -qa1 = 1.0642088205e-01, /* 0x3dd9f331 */ -qa2 = 5.4039794207e-01, /* 0x3f0a5785 */ -qa3 = 7.1828655899e-02, /* 0x3d931ae7 */ -qa4 = 1.2617121637e-01, /* 0x3e013307 */ -qa5 = 1.3637083583e-02, /* 0x3c5f6e13 */ -qa6 = 1.1984500103e-02, /* 0x3c445aa3 */ +pa0 = 1.35131621e-08F, /* 0x1.d04f08p-27 */ +pa1 = 4.15107518e-01F, /* 0x1.a911f2p-2 */ +pa2 = -1.63339108e-01F, /* -0x1.4e84bcp-3 */ +pa3 = 1.12098485e-01F, /* 0x1.cb27c8p-4 */ +qa1 = 6.06513679e-01F, /* 0x1.3688f6p-1 */ +qa2 = 5.43227255e-01F, /* 0x1.1621e2p-1 */ +qa3 = 1.74396917e-01F, /* 0x1.652a36p-3 */ +qa4 = 5.88681065e-02F, /* 0x1.e23f5ep-5 */ /* - * Coefficients for approximation to erfc in [1.25,1/0.35] + * Domain [1.25,1/0.35], range [-7.043e-10,7.457e-10]: + * |log(x*erfc(x)) + x**2 + 0.5625 - r(x)/s(x)| < 2**-30 */ -ra0 = -9.8649440333e-03, /* 0xbc21a093 */ -ra1 = -6.9385856390e-01, /* 0xbf31a0b7 */ -ra2 = -1.0558626175e+01, /* 0xc128f022 */ -ra3 = -6.2375331879e+01, /* 0xc2798057 */ -ra4 = -1.6239666748e+02, /* 0xc322658c */ -ra5 = -1.8460508728e+02, /* 0xc3389ae7 */ -ra6 = -8.1287437439e+01, /* 0xc2a2932b */ -ra7 = -9.8143291473e+00, /* 0xc11d077e */ -sa1 = 1.9651271820e+01, /* 0x419d35ce */ -sa2 = 1.3765776062e+02, /* 0x4309a863 */ -sa3 = 4.3456588745e+02, /* 0x43d9486f */ -sa4 = 6.4538726807e+02, /* 0x442158c9 */ -sa5 = 4.2900814819e+02, /* 0x43d6810b */ -sa6 = 1.0863500214e+02, /* 0x42d9451f */ -sa7 = 6.5702495575e+00, /* 0x40d23f7c */ -sa8 = -6.0424413532e-02, /* 0xbd777f97 */ +ra0 = -9.87132732e-03F, /* -0x1.4376b2p-7 */ +ra1 = -5.53605914e-01F, /* -0x1.1b723cp-1 */ +ra2 = -2.17589188e+00F, /* -0x1.1683ap+1 */ +ra3 = -1.43268085e+00F, /* -0x1.6ec42cp+0 */ +sa1 = 5.45995426e+00F, /* 0x1.5d6fe4p+2 */ +sa2 = 6.69798088e+00F, /* 0x1.acabb8p+2 */ +sa3 = 1.43113089e+00F, /* 0x1.6e5e98p+0 */ +sa4 = -5.77397496e-02F, /* -0x1.d90108p-5 */ /* - * Coefficients for approximation to erfc in [1/.35,28] + * Domain [1.25,1/0.35], range [-3.753e-12,3.875e-12]: + * |log(x*erfc(x)) + x**2 + 0.5625 - r(x)/s(x)| < 2**-43 */ -rb0 = -9.8649431020e-03, /* 0xbc21a092 */ -rb1 = -7.9928326607e-01, /* 0xbf4c9dd4 */ -rb2 = -1.7757955551e+01, /* 0xc18e104b */ -rb3 = -1.6063638306e+02, /* 0xc320a2ea */ -rb4 = -6.3756646729e+02, /* 0xc41f6441 */ -rb5 = -1.0250950928e+03, /* 0xc480230b */ -rb6 = -4.8351919556e+02, /* 0xc3f1c275 */ -sb1 = 3.0338060379e+01, /* 0x41f2b459 */ -sb2 = 3.2579251099e+02, /* 0x43a2e571 */ -sb3 = 1.5367296143e+03, /* 0x44c01759 */ -sb4 = 3.1998581543e+03, /* 0x4547fdbb */ -sb5 = 2.5530502930e+03, /* 0x451f90ce */ -sb6 = 4.7452853394e+02, /* 0x43ed43a7 */ -sb7 = -2.2440952301e+01; /* 0xc1b38712 */ +rb0 = -9.86494310e-03F, /* -0x1.434124p-7 */ +rb1 = -6.24557793e-01F, /* -0x1.3fc60ap-1 */ +rb2 = -6.12944698e+00F, /* -0x1.8848dcp+2 */ +rb3 = -1.64760838e+01F, /* -0x1.079e0ap+4 */ +rb4 = -9.36094189e+00F, /* -0x1.2b8cd6p+3 */ +sb1 = 1.26263084e+01F, /* 0x1.940ab8p+3 */ +sb2 = 4.47332840e+01F, /* 0x1.65ddc4p+5 */ +sb3 = 4.65590134e+01F, /* 0x1.7478dcp+5 */ +sb4 = 8.74471664e+00F; /* 0x1.17d4b8p+3 */ float erff(float x) @@ -107,43 +93,38 @@ erff(float x) } if(ix < 0x3f580000) { /* |x|<0.84375 */ - if(ix < 0x31800000) { /* |x|<2**-28 */ - if (ix < 0x04000000) - /*avoid underflow */ - return (float)0.125*((float)8.0*x+efx8*x); + if(ix < 0x38800000) { /* |x|<2**-14 */ + if (ix < 0x04000000) /* |x|<0x1p-119 */ + return (8*x+efx8*x)/8; /* avoid spurious underflow */ return x + efx*x; } z = x*x; - r = pp0+z*(pp1+z*(pp2+z*(pp3+z*pp4))); - s = one+z*(qq1+z*(qq2+z*(qq3+z*(qq4+z*qq5)))); + r = pp0+z*(pp1+z*pp2); + s = one+z*(qq1+z*(qq2+z*qq3)); y = r/s; return x + x*y; } if(ix < 0x3fa00000) { /* 0.84375 <= |x| < 1.25 */ s = fabsf(x)-one; - P = pa0+s*(pa1+s*(pa2+s*(pa3+s*(pa4+s*(pa5+s*pa6))))); - Q = one+s*(qa1+s*(qa2+s*(qa3+s*(qa4+s*(qa5+s*qa6))))); + P = pa0+s*(pa1+s*(pa2+s*pa3)); + Q = one+s*(qa1+s*(qa2+s*(qa3+s*qa4))); if(hx>=0) return erx + P/Q; else return -erx - P/Q; } - if (ix >= 0x40c00000) { /* inf>|x|>=6 */ + if (ix >= 0x40800000) { /* inf>|x|>=4 */ if(hx>=0) return one-tiny; else return tiny-one; } x = fabsf(x); s = one/(x*x); if(ix< 0x4036DB6E) { /* |x| < 1/0.35 */ - R=ra0+s*(ra1+s*(ra2+s*(ra3+s*(ra4+s*( - ra5+s*(ra6+s*ra7)))))); - S=one+s*(sa1+s*(sa2+s*(sa3+s*(sa4+s*( - sa5+s*(sa6+s*(sa7+s*sa8))))))); + R=ra0+s*(ra1+s*(ra2+s*ra3)); + S=one+s*(sa1+s*(sa2+s*(sa3+s*sa4))); } else { /* |x| >= 1/0.35 */ - R=rb0+s*(rb1+s*(rb2+s*(rb3+s*(rb4+s*( - rb5+s*rb6))))); - S=one+s*(sb1+s*(sb2+s*(sb3+s*(sb4+s*( - sb5+s*(sb6+s*sb7)))))); + R=rb0+s*(rb1+s*(rb2+s*(rb3+s*rb4))); + S=one+s*(sb1+s*(sb2+s*(sb3+s*sb4))); } GET_FLOAT_WORD(ix,x); - SET_FLOAT_WORD(z,ix&0xfffff000); - r = __ieee754_expf(-z*z-(float)0.5625)*__ieee754_expf((z-x)*(z+x)+R/S); + SET_FLOAT_WORD(z,ix&0xffffe000); + r = expf(-z*z-0.5625F)*expf((z-x)*(z+x)+R/S); if(hx>=0) return one-r/x; else return r/x-one; } @@ -160,11 +141,11 @@ erfcf(float x) } if(ix < 0x3f580000) { /* |x|<0.84375 */ - if(ix < 0x23800000) /* |x|<2**-56 */ + if(ix < 0x33800000) /* |x|<2**-24 */ return one-x; z = x*x; - r = pp0+z*(pp1+z*(pp2+z*(pp3+z*pp4))); - s = one+z*(qq1+z*(qq2+z*(qq3+z*(qq4+z*qq5)))); + r = pp0+z*(pp1+z*pp2); + s = one+z*(qq1+z*(qq2+z*qq3)); y = r/s; if(hx < 0x3e800000) { /* x<1/4 */ return one-(x+x*y); @@ -176,33 +157,28 @@ erfcf(float x) } if(ix < 0x3fa00000) { /* 0.84375 <= |x| < 1.25 */ s = fabsf(x)-one; - P = pa0+s*(pa1+s*(pa2+s*(pa3+s*(pa4+s*(pa5+s*pa6))))); - Q = one+s*(qa1+s*(qa2+s*(qa3+s*(qa4+s*(qa5+s*qa6))))); + P = pa0+s*(pa1+s*(pa2+s*pa3)); + Q = one+s*(qa1+s*(qa2+s*(qa3+s*qa4))); if(hx>=0) { z = one-erx; return z - P/Q; } else { z = erx+P/Q; return one+z; } } - if (ix < 0x41e00000) { /* |x|<28 */ + if (ix < 0x41200000) { /* |x|<10 */ x = fabsf(x); s = one/(x*x); if(ix< 0x4036DB6D) { /* |x| < 1/.35 ~ 2.857143*/ - R=ra0+s*(ra1+s*(ra2+s*(ra3+s*(ra4+s*( - ra5+s*(ra6+s*ra7)))))); - S=one+s*(sa1+s*(sa2+s*(sa3+s*(sa4+s*( - sa5+s*(sa6+s*(sa7+s*sa8))))))); + R=ra0+s*(ra1+s*(ra2+s*ra3)); + S=one+s*(sa1+s*(sa2+s*(sa3+s*sa4))); } else { /* |x| >= 1/.35 ~ 2.857143 */ - if(hx<0&&ix>=0x40c00000) return two-tiny;/* x < -6 */ - R=rb0+s*(rb1+s*(rb2+s*(rb3+s*(rb4+s*( - rb5+s*rb6))))); - S=one+s*(sb1+s*(sb2+s*(sb3+s*(sb4+s*( - sb5+s*(sb6+s*sb7)))))); + if(hx<0&&ix>=0x40a00000) return two-tiny;/* x < -5 */ + R=rb0+s*(rb1+s*(rb2+s*(rb3+s*rb4))); + S=one+s*(sb1+s*(sb2+s*(sb3+s*sb4))); } GET_FLOAT_WORD(ix,x); - SET_FLOAT_WORD(z,ix&0xfffff000); - r = __ieee754_expf(-z*z-(float)0.5625)* - __ieee754_expf((z-x)*(z+x)+R/S); + SET_FLOAT_WORD(z,ix&0xffffe000); + r = expf(-z*z-0.5625F)*expf((z-x)*(z+x)+R/S); if(hx>0) return r/x; else return two-r/x; } else { if(hx>0) return tiny*tiny; else return two-tiny; From owner-freebsd-numerics@FreeBSD.ORG Tue Aug 27 11:24:48 2013 Return-Path: Delivered-To: freebsd-numerics@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 1BF729FD for ; Tue, 27 Aug 2013 11:24:48 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail107.syd.optusnet.com.au (mail107.syd.optusnet.com.au [211.29.132.53]) by mx1.freebsd.org (Postfix) with ESMTP id 5480D2005 for ; Tue, 27 Aug 2013 11:24:46 +0000 (UTC) Received: from c122-106-156-23.carlnfd1.nsw.optusnet.com.au (c122-106-156-23.carlnfd1.nsw.optusnet.com.au [122.106.156.23]) by mail107.syd.optusnet.com.au (Postfix) with ESMTPS id 369D5D4507F; Tue, 27 Aug 2013 21:24:40 +1000 (EST) Date: Tue, 27 Aug 2013 21:24:39 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Steve Kargl Subject: Re: (2nd time) tweaks to erff() threshold values In-Reply-To: <20130826174307.GA67511@troutmask.apl.washington.edu> Message-ID: <20130827202751.X1787@besplex.bde.org> References: <20130822213315.GA6708@troutmask.apl.washington.edu> <20130823202257.Q1593@besplex.bde.org> <20130823165744.GA47369@troutmask.apl.washington.edu> <20130824202102.GA54981@troutmask.apl.washington.edu> <20130825023029.GA56772@troutmask.apl.washington.edu> <20130825171910.GA60396@troutmask.apl.washington.edu> <20130827003147.R2328@besplex.bde.org> <20130826174307.GA67511@troutmask.apl.washington.edu> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.1 cv=DstvpgP+ c=1 sm=1 tr=0 a=ebeQFi2P/qHVC0Yw9JDJ4g==:117 a=PO7r1zJSAAAA:8 a=F6uwjYSw4mAA:10 a=kj9zAlcOel0A:10 a=JzwRw_2MAAAA:8 a=bRjWgPqadKYA:10 a=zMXrinGMM86ty1FFVUAA:9 a=CjuIK1q_8ugA:10 Cc: freebsd-numerics@FreeBSD.org X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 27 Aug 2013 11:24:48 -0000 On Mon, 26 Aug 2013, Steve Kargl wrote: > On Tue, Aug 27, 2013 at 01:23:19AM +1000, Bruce Evans wrote: >> On Sun, 25 Aug 2013, Steve Kargl wrote: >> >>> These values where for the interval [0,0.84375]. >>>> >>>> | usec/call | Max ULP | X Max ULP >>>> ---------+-----------+---------+--------------------------------- >>>> new erfc | 0.02757 | 0.65947 | 8.19824636e-01, 0x1.a3c00ep-1 >>>> old erfc | 0.03348 | 0.68218 | 8.43257010e-01, 0x1.afbf62p-1 >>>> ---------+-----------+---------+--------------------------------- >>>> new erf | 0.02302 | 0.62437 | 1.04175471e-38, 0x1.c5bf88p-127 >>>> old erf | 0.03028 | 0.62437 | 1.04175471e-38, 0x1.c5bf88p-127 >> >> Be careful testing on i386. Its extra precision gives smaller errors. >> These look like the i386 errors. Old erff has a maximum error of more >> than 0.9 ulps on all args. > > Have I ever mentioned how much I dislike the i386? :-) You don't like it because it works better. :-) > New test results for both i386 and amd64 follow my .sig. > New diff that merges all previous diff follows the test > results. > ... > I wasn't aware of old BSD libm code. After working through the > code in s_erf[f].c, it seemed to me that a better algorithm may > exist, but I haven't tried to look for one. As you may guess, > I'm gearing up to translate s_erf.c into a s_erfl.c. You fixed the large error, so using the old extra-precision methods can wait. > erfc (Look at the old and new ULP for 3rd and 4th interval. This is due > to the change in mask from 0xfffff000 to 0xffffe000.) A very interesting fix. > i386 > -----+ > Code | Interval | us/call | Max ULP | X Max ULP > -----+----------------+---------+---------+------------------------------- > Old | [0, 0.84375] | 0.03435 | 0.68218 | 8.43257010e-01, 0x1.afbf62p-1 > Old | [0.84375,1.25] | 0.03486 | 0.95158 | 1.24999416e+00, 0x1.3fff9ep+0 > Old | [1.25,2.857143]| 0.14772 | 5.81600 | 1.88727951e+00, 0x1.e324c0p+0 > Old | [2.857143,12] | 0.27741 | 63.6824 | 7.99420977e+00, 0x1.ffa122p+2 > -----+----------------+---------+---------+------------------------------- > New | [0, 0.84375] | 0.02809 | 0.65947 | 8.19824636e-01, 0x1.a3c00ep-1 > New | [0.84375,1.25] | 0.02780 | 0.76497 | 1.24971473e+00, 0x1.3fed4ep+0 > New | [1.25,2.857143]| 0.13546 | 2.10176 | 1.88294506e+00, 0x1.e208b0p+0 > New | [2.857143,12] | 0.18424 | 1.94002 | 4.00527334e+00, 0x1.005666p+2 > -----+----------------+---------+---------+------------------------------- I only tested i386. I tested all values. I'm getting smaller errors, with 1 additional fix, probably due to my version of expf() being more accurate. I had to make it and exp() more accurate for use in the hyperbolic functions. These don't need as much accuracy accuracy as your expl() provides, but they benefit from a little more accuracy than the old expf() provides. On i386, exp() was fairly accurate since it used the i387 and extra precision. On amd64, it was as inaccurate as expf(). I made the fdlibm version more accurate than the i387 version, without using extra precision. I akso made it faster. It was about as slow as the i387 version since the latter does a mode switch. First x with an error above 1 ulp (the error is always below 1 ulp for negative x): % x = 0x3f503c5a 0.813420892 % erfc(x) = 0x3fcffffae66cfabc 0x3e7fffd7 0.249999392 % erfcf(x) = 0x3fcffffac0000000 0x3e7fffd6 0.249999374 % err = -0x266cfabc 1.20080 x with the largest error: % x = 0x3ff11a66 1.88361812 % erfc(x) = 0x3f7fa4bdd41171f5 0x3bfd25ef 0.00772546913 % erfcf(x) = 0x3f7fa4bda0000000 0x3bfd25ed 0.00772546837 % err = -0x341171f5 1.62713 Last x with an error above 1.5 ulps: % x = 0x3ff11d09 1.88369858 % erfc(x) = 0x3f7fa20070976afc 0x3bfd1004 0.00772285625 % erfcf(x) = 0x3f7fa20040000000 0x3bfd1002 0.00772285555 % err = -0x30976afc 1.51848 Last x with an error above 1 ulp (this depends on fixing the threshold from 10 to 11. 10 gave underflow errors and many more errors of > 1.5 ulps): % x = 0x4112bf56 9.17171288 % erfc(x) = 0x381865b08003770c 0xc32d84 1.79242497e-38 % erfcf(x) = 0x381865b060000000 0xc32d83 1.79242483e-38 % err = -0x2003770c 1.00042 > Amd64 > -----+ > Code | Interval | us/call | Max ULP | X Max ULP > -----+----------------+---------+---------+------------------------------- > Old | [0, 0.84375] | 0.02847 | 2.77642 | 8.43427539e-01, 0x1.afd5bcp-1 > Old | [0.84375,1.25] | 0.02755 | 2.35308 | 1.24922514e+00, 0x1.3fcd38p+0 > Old | [1.25,2.857143]| 0.10872 | 6.27739 | 1.88167512e+00, 0x1.e1b576p+0 > Old | [2.857143,12] | 0.14263 | 63.8095 | 7.99418974e+00, 0x1.ffa0cep+2 > -----+----------------+---------+---------+------------------------------- > New | [0, 0.84375] | 0.02440 | 2.77756 | 8.33204687e-01, 0x1.aa99cep-1 > New | [0.84375,1.25] | 0.02337 | 2.31164 | 1.24426317e+00, 0x1.3e8808p+0 > New | [1.25,2.857143]| 0.10051 | 3.16774 | 1.34512150e+00, 0x1.5859e2p+0 > New | [2.857143,12] | 0.10205 | 2.90042 | 4.08927298e+00, 0x1.05b6a6p+2 > -----+----------------+---------+---------+------------------------------- See how much better i386 is :-). In a quick test on i386 handicapped by using float precision, I get similar errors. Subtracting 1 without increasing the error is remarkably difficult. > erf (Not much change in ULP, but increase in speed). > > i386 > -----+ > Code | Interval | usec/call | Max ULP | X Max ULP > -----+----------------+-----------+---------+-------------------------------- > Old | [0, 0.84375] | 0.03074 | 0.62437 | 1.04175471e-38, 0x1.c5bf88p-127 > Old | [0.84375,1.25] | 0.03405 | 0.55617 | 1.24996567e+00, 0x1.3ffdc0p+0 > Old | [1.25,2.857143]| 0.14923 | 0.61903 | 1.25788522e+00, 0x1.4204c4p+0 > Old | [2.857143,12] | 0.05959 | 0.50002 | 2.86163688e+00, 0x1.6e4a1ep+1 > -----+----------------+-----------+---------+-------------------------------- > New | [0, 0.84375] | 0.02347 | 0.62437 | 1.04175471e-38, 0x1.c5bf88p-127 > New | [0.84375,1.25] | 0.02774 | 0.54213 | 8.52452755e-01, 0x1.b474b0p-1 > New | [1.25,2.857143]| 0.14229 | 0.59585 | 1.26142204e+00, 0x1.42ec8ep+0 > New | [2.857143,12] | 0.02932 | 0.50002 | 2.86393452e+00, 0x1.6e9568p+1 > -----+----------------+-----------+---------+-------------------------------- The largest error is now: % x = 0x3f56086a 0.836065888 % erf(x) = 0x3fe86a082a46ed45 0x3f435041 0.762943347 % erff(x) = 0x3fe86a0840000000 0x3f435042 0.762943387 % err = +0x15b912bb 0.67884 > Amd64 > -----+ > Code | Interval | us/call | Max ULP | X Max ULP > -----+----------------+---------+---------+-------------------------------- > Old | [0, 0.84375] | 0.02699 | 0.96792 | 8.36685717e-01, 0x1.ac6212p-1 > Old | [0.84375,1.25] | 0.02912 | 0.76611 | 1.23045731e+00, 0x1.3aff4p+0 > Old | [1.25,2.857143]| 0.10780 | 0.74929 | 1.25055075e+00, 0x1.402418p+0 > Old | [2.857143,12] | 0.04671 | 0.50008 | 2.89601469e+00, 0x1.72b09cp+1 > -----+----------------+---------+---------+-------------------------------- > New | [0, 0.84375] | 0.02241 | 0.94208 | 8.36147904e-01, 0x1.ac1b94p-1 > New | [0.84375,1.25] | 0.02486 | 0.76446 | 1.18880725e+00, 0x1.3055acp+0 > New | [1.25,2.857143]| 0.10124 | 0.72032 | 1.28225052e+00, 0x1.484192p+0 > New | [2.857143,12] | 0.02954 | 0.50003 | 2.89976478e+00, 0x1.732b7ep+1 > -----+----------------+---------+---------+-------------------------------- The largest error on i386 handicapped with float precision is the same: % x = 0x3f560dca 0.836147904 % erf(x) = 0x3fe86a68a1da81dc 0x3f435345 0.762989346 % erff(x) = 0x3fe86a68c0000000 0x3f435346 0.762989402 % err = +0x1e257e24 0.94208 Apparently the splitting point 0.84375 is not very well chosen, since we get largest errors just before it and much smaller errors above it by using another algorithm. > Index: src/s_erff.c > =================================================================== > --- src/s_erff.c (revision 1361) > +++ src/s_erff.c (working copy) > ... > /* > - * Coefficients for approximation to erf in [0.84375,1.25] > + * Domain [0, 0.84375], range ~[-5.4446e-10,5.5197e-10]: > + * |(erf(x) - x)/x - p(x)/q(x)| < 2**-31. > + */ Extra spaces. > ... > /* > - * Coefficients for approximation to erfc in [1.25,1/0.35] > + * Domain [1.25,1/0.35], range [-7.043e-10,7.457e-10]: > + * |log(x*erfc(x)) + x**2 + 0.5625 - r(x)/s(x)| < 2**-30 > */ It is even less clear than for the others that the scaling of the error is correct. The comment could make this clearer by giving the ratio of the correct error scale with the one used in the approximation, at the endpoints (max and min for this, usually at the endpoints). > +sa4 = -5.77397496e-02F, /* -0x1.d90108p-5 */ > /* > - * Coefficients for approximation to erfc in [1/.35,28] > + * Domain [1.25,1/0.35], range [-3.753e-12,3.875e-12]: > + * |log(x*erfc(x)) + x**2 + 0.5625 - r(x)/s(x)| < 2**-43 > */ Domain needs more editing. > ... > @@ -176,33 +157,28 @@ erfcf(float x) > } > if(ix < 0x3fa00000) { /* 0.84375 <= |x| < 1.25 */ > s = fabsf(x)-one; > - P = pa0+s*(pa1+s*(pa2+s*(pa3+s*(pa4+s*(pa5+s*pa6))))); > - Q = one+s*(qa1+s*(qa2+s*(qa3+s*(qa4+s*(qa5+s*qa6))))); > + P = pa0+s*(pa1+s*(pa2+s*pa3)); > + Q = one+s*(qa1+s*(qa2+s*(qa3+s*qa4))); > if(hx>=0) { > z = one-erx; return z - P/Q; > } else { > z = erx+P/Q; return one+z; > } > } > - if (ix < 0x41e00000) { /* |x|<28 */ > + if (ix < 0x41200000) { /* |x|<10 */ 10 is too small. It gives spurious underflow to 0 and larger than necessary errors, at least on i386. 11 works. > x = fabsf(x); > s = one/(x*x); > if(ix< 0x4036DB6D) { /* |x| < 1/.35 ~ 2.857143*/ > - R=ra0+s*(ra1+s*(ra2+s*(ra3+s*(ra4+s*( > - ra5+s*(ra6+s*ra7)))))); > - S=one+s*(sa1+s*(sa2+s*(sa3+s*(sa4+s*( > - sa5+s*(sa6+s*(sa7+s*sa8))))))); > + R=ra0+s*(ra1+s*(ra2+s*ra3)); > + S=one+s*(sa1+s*(sa2+s*(sa3+s*sa4))); > } else { /* |x| >= 1/.35 ~ 2.857143 */ > - if(hx<0&&ix>=0x40c00000) return two-tiny;/* x < -6 */ > - R=rb0+s*(rb1+s*(rb2+s*(rb3+s*(rb4+s*( > - rb5+s*rb6))))); > - S=one+s*(sb1+s*(sb2+s*(sb3+s*(sb4+s*( > - sb5+s*(sb6+s*sb7)))))); > + if(hx<0&&ix>=0x40a00000) return two-tiny;/* x < -5 */ > + R=rb0+s*(rb1+s*(rb2+s*(rb3+s*rb4))); > + S=one+s*(sb1+s*(sb2+s*(sb3+s*sb4))); > } > GET_FLOAT_WORD(ix,x); This and the GET_FLOAT_WORD() for erff() are bogus. The word has already been read into hx. > - SET_FLOAT_WORD(z,ix&0xfffff000); > - r = __ieee754_expf(-z*z-(float)0.5625)* > - __ieee754_expf((z-x)*(z+x)+R/S); > + SET_FLOAT_WORD(z,ix&0xffffe000); Change further to use hx. > + r = expf(-z*z-0.5625F)*expf((z-x)*(z+x)+R/S); Good fix. Is the point that -z*z-0.5625F must be exact, but the old splitting only made z*z exact? Is -z*z-0.5625F now always exact? Anyway, the error for this case is now smaller than near 1.88, and changing the expf()'s to exp()s doesn't reduce it much more, so the old extra-precision methods can't be much better. Keep the old style for the lines not related to the fix. erfc() still uses the old style. > if(hx>0) return r/x; else return two-r/x; > } else { > if(hx>0) return tiny*tiny; else return two-tiny; Bruce From owner-freebsd-numerics@FreeBSD.ORG Tue Aug 27 18:47:08 2013 Return-Path: Delivered-To: freebsd-numerics@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 83960B72 for ; Tue, 27 Aug 2013 18:47:08 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu [128.95.76.21]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 594D92B3E for ; Tue, 27 Aug 2013 18:47:08 +0000 (UTC) Received: from troutmask.apl.washington.edu (localhost.apl.washington.edu [127.0.0.1]) by troutmask.apl.washington.edu (8.14.7/8.14.7) with ESMTP id r7RIl2QI076154; Tue, 27 Aug 2013 11:47:02 -0700 (PDT) (envelope-from sgk@troutmask.apl.washington.edu) Received: (from sgk@localhost) by troutmask.apl.washington.edu (8.14.7/8.14.7/Submit) id r7RIl10g076153; Tue, 27 Aug 2013 11:47:01 -0700 (PDT) (envelope-from sgk) Date: Tue, 27 Aug 2013 11:47:01 -0700 From: Steve Kargl To: Bruce Evans Subject: Re: (2nd time) tweaks to erff() threshold values Message-ID: <20130827184701.GA76013@troutmask.apl.washington.edu> References: <20130822213315.GA6708@troutmask.apl.washington.edu> <20130823202257.Q1593@besplex.bde.org> <20130823165744.GA47369@troutmask.apl.washington.edu> <20130824202102.GA54981@troutmask.apl.washington.edu> <20130825023029.GA56772@troutmask.apl.washington.edu> <20130825171910.GA60396@troutmask.apl.washington.edu> <20130827003147.R2328@besplex.bde.org> <20130826174307.GA67511@troutmask.apl.washington.edu> <20130827202751.X1787@besplex.bde.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20130827202751.X1787@besplex.bde.org> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-numerics@FreeBSD.org X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 27 Aug 2013 18:47:08 -0000 On Tue, Aug 27, 2013 at 09:24:39PM +1000, Bruce Evans wrote: > On Mon, 26 Aug 2013, Steve Kargl wrote: > > > erfc (Look at the old and new ULP for 3rd and 4th interval. This is due > > to the change in mask from 0xfffff000 to 0xffffe000.) > > A very interesting fix. Well, I stumped across this fix. See below. > Apparently the splitting point 0.84375 is not very well chosen, since > we get largest errors just before it and much smaller errors above it > by using another algorithm. The only hint about the choice of 0.84375 is from the comment int s_erf.c. * is close to one. The interval is chosen because the fix * point of erf(x) is near 0.6174 (i.e., erf(x)=x when x is * near 0.6174), and by some experiment, 0.84375 is chosen to * guarantee the error is less than one ulp for erf. I haven't investigated whether 0.84375 is best choice. However, exhaustive testing in [0.6, 0.62] gives a max ulp of 0.666. > > /* > > - * Coefficients for approximation to erf in [0.84375,1.25] > > + * Domain [0, 0.84375], range ~[-5.4446e-10,5.5197e-10]: > > + * |(erf(x) - x)/x - p(x)/q(x)| < 2**-31. > > + */ > > Extra spaces. Fixed. > > ... > > /* > > - * Coefficients for approximation to erfc in [1.25,1/0.35] > > + * Domain [1.25,1/0.35], range [-7.043e-10,7.457e-10]: > > + * |log(x*erfc(x)) + x**2 + 0.5625 - r(x)/s(x)| < 2**-30 > > */ > > It is even less clear than for the others that the scaling of the error > is correct. The comment could make this clearer by giving the ratio of > the correct error scale with the one used in the approximation, at the > endpoints (max and min for this, usually at the endpoints). I'm not sure what you mean here. In looking at error curves, it turns out that one of the 2 limits in the range is taken from an endpoint. > > +sa4 = -5.77397496e-02F, /* -0x1.d90108p-5 */ > > > /* > > - * Coefficients for approximation to erfc in [1/.35,28] > > + * Domain [1.25,1/0.35], range [-3.753e-12,3.875e-12]: > > + * |log(x*erfc(x)) + x**2 + 0.5625 - r(x)/s(x)| < 2**-43 > > */ > > Domain needs more editing. Fixed. > > - if (ix < 0x41e00000) { /* |x|<28 */ > > + if (ix < 0x41200000) { /* |x|<10 */ > > 10 is too small. It gives spurious underflow to 0 and larger than necessary > errors, at least on i386. 11 works. I changed to 11. This required a new set of rb and sb coefficients as the old set was for [1/0.35, 10]. With the new set the max ulp for erfc goes up, but it is still much better that 63.something . ./testf -m 2.857143 -M 11 -E double Interval tested: [2.8571,11.0000] Max ULP: 2.92095 x Max ULP: 4.08856726e+00, 0x1.05ab16p+2 > > + R=rb0+s*(rb1+s*(rb2+s*(rb3+s*rb4))); > > + S=one+s*(sb1+s*(sb2+s*(sb3+s*sb4))); > > } > > GET_FLOAT_WORD(ix,x); > > This and the GET_FLOAT_WORD() for erff() are bogus. The word has already > been read into hx. Well spotted. Fixed. > > - SET_FLOAT_WORD(z,ix&0xfffff000); > > - r = __ieee754_expf(-z*z-(float)0.5625)* > > - __ieee754_expf((z-x)*(z+x)+R/S); > > + SET_FLOAT_WORD(z,ix&0xffffe000); > > Change further to use hx. Fixed. > > + r = expf(-z*z-0.5625F)*expf((z-x)*(z+x)+R/S); > > Good fix. Is the point that -z*z-0.5625F must be exact, but the old > splitting only made z*z exact? Is -z*z-0.5625F now always exact? > Anyway, the error for this case is now smaller than near 1.88, and > changing the expf()'s to exp()s doesn't reduce it much more, so the > old extra-precision methods can't be much better. Back to the choice of mask. I was trying to understand the origins of the constant 0.5625F. The asymptotic expansion of erfc(x) is (exp(-x**2) / (x*sqrt(pi))) * P(x) with P(x) being some polynomial. It turns out exp(-0.5625) = 0.56978 and 1/sqrt(pi) = 0.56419. So, the algorithm drops the 1/sqrt(pi) and approximates it by exp(-0.5625). Well, in reading and re-reading the comment in s_erf.c, this part caught my attention: * Note1: * To compute exp(-x*x-0.5625+R/S), let s be a single * precision number and s := x; then * -x*x = -s*s + (s-x)*(s+x) * exp(-x*x-0.5626+R/S) = * exp(-s*s-0.5625)*exp((s-x)*(s+x)+R/S); This applies to the double precision version, ie erf(), so I simply ask the question whether 0xfffff000 was adequate masking out the low order bits. Fortunately, I only needed to test 0xffff8000, 0xffffc0000, and 0xffffe000. -- Steve From owner-freebsd-numerics@FreeBSD.ORG Wed Aug 28 11:31:30 2013 Return-Path: Delivered-To: freebsd-numerics@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 27EFBA88 for ; Wed, 28 Aug 2013 11:31:30 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail109.syd.optusnet.com.au (mail109.syd.optusnet.com.au [211.29.132.80]) by mx1.freebsd.org (Postfix) with ESMTP id A5A932FB1 for ; Wed, 28 Aug 2013 11:31:29 +0000 (UTC) Received: from c122-106-156-23.carlnfd1.nsw.optusnet.com.au (c122-106-156-23.carlnfd1.nsw.optusnet.com.au [122.106.156.23]) by mail109.syd.optusnet.com.au (Postfix) with ESMTPS id 00F10D631B8; Wed, 28 Aug 2013 21:31:22 +1000 (EST) Date: Wed, 28 Aug 2013 21:31:21 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Steve Kargl Subject: Re: (2nd time) tweaks to erff() threshold values In-Reply-To: <20130827184701.GA76013@troutmask.apl.washington.edu> Message-ID: <20130828205705.S1447@besplex.bde.org> References: <20130822213315.GA6708@troutmask.apl.washington.edu> <20130823202257.Q1593@besplex.bde.org> <20130823165744.GA47369@troutmask.apl.washington.edu> <20130824202102.GA54981@troutmask.apl.washington.edu> <20130825023029.GA56772@troutmask.apl.washington.edu> <20130825171910.GA60396@troutmask.apl.washington.edu> <20130827003147.R2328@besplex.bde.org> <20130826174307.GA67511@troutmask.apl.washington.edu> <20130827202751.X1787@besplex.bde.org> <20130827184701.GA76013@troutmask.apl.washington.edu> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.1 cv=DstvpgP+ c=1 sm=1 tr=0 a=ebeQFi2P/qHVC0Yw9JDJ4g==:117 a=PO7r1zJSAAAA:8 a=F6uwjYSw4mAA:10 a=kj9zAlcOel0A:10 a=JzwRw_2MAAAA:8 a=bRjWgPqadKYA:10 a=fiHkbJjbcZh7ihRK9ZwA:9 a=CjuIK1q_8ugA:10 Cc: freebsd-numerics@FreeBSD.org X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 28 Aug 2013 11:31:30 -0000 On Tue, 27 Aug 2013, Steve Kargl wrote: > On Tue, Aug 27, 2013 at 09:24:39PM +1000, Bruce Evans wrote: >> On Mon, 26 Aug 2013, Steve Kargl wrote: >>> ... >>> /* >>> - * Coefficients for approximation to erfc in [1.25,1/0.35] >>> + * Domain [1.25,1/0.35], range [-7.043e-10,7.457e-10]: >>> + * |log(x*erfc(x)) + x**2 + 0.5625 - r(x)/s(x)| < 2**-30 >>> */ >> >> It is even less clear than for the others that the scaling of the error >> is correct. The comment could make this clearer by giving the ratio of >> the correct error scale with the one used in the approximation, at the >> endpoints (max and min for this, usually at the endpoints). > > I'm not sure what you mean here. In looking at error curves, > it turns out that one of the 2 limits in the range is taken > from an endpoint. I mean that these are absolute errors for a function related to the result that is especially obscure in this case -- the function is log(x*result). The error that needs to be minimized is the relative error of the result, where "relative" is not quite the ratio of the error and the result -- it is the ratio of the error and [result rounded to a nearby power of 2; I'm not sure if the rounding is up or down]. For the absolute error to be close enough to the relative error, the relative error must not vary much across the interval, AND the scaling for the abslute error must be compatible. This is far from clear here. I think taking logs changes the scale for the absolute error to logarithmic, so this error is not really absolute. Hopefully the scale for the result is similarly logarithmic, so that the scales are compatible. There is also a factor of x before taking logs, so the scaling is not logarithmic either. >>> - if (ix < 0x41e00000) { /* |x|<28 */ >>> + if (ix < 0x41200000) { /* |x|<10 */ >> >> 10 is too small. It gives spurious underflow to 0 and larger than necessary >> errors, at least on i386. 11 works. > > I changed to 11. This required a new set of rb and sb coefficients > as the old set was for [1/0.35, 10]. With the new set the max ulp > for erfc goes up, but it is still much better that 63.something . Something wrong here. I got smaller errors by changing 10 to 11 without changing the coeffs. Now with updated coeffs, I get larger errors, at least on i386 (previously, there were no errors of more than 1.5 ulps above 1.88... Now there are some above 2.04). >>> + r = expf(-z*z-0.5625F)*expf((z-x)*(z+x)+R/S); >> >> Good fix. Is the point that -z*z-0.5625F must be exact, but the old >> splitting only made z*z exact? Is -z*z-0.5625F now always exact? >> Anyway, the error for this case is now smaller than near 1.88, and >> changing the expf()'s to exp()s doesn't reduce it much more, so the >> old extra-precision methods can't be much better. > > Back to the choice of mask. I was trying to understand > the origins of the constant 0.5625F. The asymptotic > expansion of erfc(x) is (exp(-x**2) / (x*sqrt(pi))) * P(x) > with P(x) being some polynomial. It turns out > exp(-0.5625) = 0.56978 and 1/sqrt(pi) = 0.56419. So, the > algorithm drops the 1/sqrt(pi) and approximates it by > exp(-0.5625). It is a hi+lo decomposition done multiplicatively and exponentially: 1/sqrt(pi) = exp(-0.5625) * exp(lo) where -0.5625 and lo are folded into other terms, and the hi part must be done exactly before exp() on it for enough accuracy. Taking exp() loses between 0.5 and 1 ulps of accuracy, and more is lost in the produce of the exp()'s. Hopefully the arg of the second exp() has a small error, so not much more accuracy is lost for it. > Well, in reading and re-reading the comment in s_erf.c, this > part caught my attention: > > * Note1: > * To compute exp(-x*x-0.5625+R/S), let s be a single > * precision number and s := x; then > * -x*x = -s*s + (s-x)*(s+x) > * exp(-x*x-0.5626+R/S) = The comment but not the code has this off-by-.0001 0.5626. > * exp(-s*s-0.5625)*exp((s-x)*(s+x)+R/S); > > This applies to the double precision version, ie erf(), so I > simply ask the question whether 0xfffff000 was adequate masking > out the low order bits. Fortunately, I only needed to test > 0xffff8000, 0xffffc0000, and 0xffffe000. I tried the others too, and they worked less well. But I haven't checked that -s*s-0.5625[F] is exact. In double precision, many more bits are stripped from x to create x, so exactness is less surprising. Bruce