From owner-freebsd-numerics@FreeBSD.ORG Sun Aug 25 17:19:11 2013 Return-Path: Delivered-To: freebsd-numerics@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 73373BC3 for ; Sun, 25 Aug 2013 17:19:11 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu [128.95.76.21]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 49AC82825 for ; Sun, 25 Aug 2013 17:19:11 +0000 (UTC) Received: from troutmask.apl.washington.edu (localhost.apl.washington.edu [127.0.0.1]) by troutmask.apl.washington.edu (8.14.7/8.14.7) with ESMTP id r7PHJAqr060446; Sun, 25 Aug 2013 10:19:10 -0700 (PDT) (envelope-from sgk@troutmask.apl.washington.edu) Received: (from sgk@localhost) by troutmask.apl.washington.edu (8.14.7/8.14.7/Submit) id r7PHJAh4060445; Sun, 25 Aug 2013 10:19:10 -0700 (PDT) (envelope-from sgk) Date: Sun, 25 Aug 2013 10:19:10 -0700 From: Steve Kargl To: Bruce Evans Subject: Re: (2nd time) tweaks to erff() threshold values Message-ID: <20130825171910.GA60396@troutmask.apl.washington.edu> References: <20130822213315.GA6708@troutmask.apl.washington.edu> <20130823202257.Q1593@besplex.bde.org> <20130823165744.GA47369@troutmask.apl.washington.edu> <20130824202102.GA54981@troutmask.apl.washington.edu> <20130825023029.GA56772@troutmask.apl.washington.edu> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20130825023029.GA56772@troutmask.apl.washington.edu> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-numerics@FreeBSD.org X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 25 Aug 2013 17:19:11 -0000 On Sat, Aug 24, 2013 at 07:30:29PM -0700, Steve Kargl wrote: > On Sat, Aug 24, 2013 at 01:21:02PM -0700, Steve Kargl wrote: > > On Fri, Aug 23, 2013 at 09:57:44AM -0700, Steve Kargl wrote: > > > On Fri, Aug 23, 2013 at 09:12:33PM +1000, Bruce Evans wrote: > > > > > >> The whole erf implementation is probably not the best way for > > >> float precision, but is good for understanding higher precisions. > > > > > > I'm fairly certain that the implementation of erff() is not very > > > efficient. The polynomials, used in the rational approximations, > > > are the same order as those used in the double precision approximation. > > > I'll check the polys when update my P(x)/Q(x) remes algorithm for > > > erfl and erfcl. > > > > I seem to be right (although I haven't iterated on the P's and Q's). > > > > Now, with pretty testing and (perhaps) better coefficients: These values where for the interval [0,0.84375]. > > | usec/call | Max ULP | X Max ULP > ---------+-----------+---------+--------------------------------- > new erfc | 0.02757 | 0.65947 | 8.19824636e-01, 0x1.a3c00ep-1 > old erfc | 0.03348 | 0.68218 | 8.43257010e-01, 0x1.afbf62p-1 > ---------+-----------+---------+--------------------------------- > new erf | 0.02302 | 0.62437 | 1.04175471e-38, 0x1.c5bf88p-127 > old erf | 0.03028 | 0.62437 | 1.04175471e-38, 0x1.c5bf88p-127 New interval [0.84375,1.25]. | usecs/call| Max ULP | X Max ULP ---------+-----------+---------+------------------------------ Old erfc | 0.03469 | 0.95158 | 1.24999416e+00, 0x1.3fff9ep+0 New erfc | 0.02781 | 0.76497 | 1.24971473e+00, 0x1.3fed4ep+0 ---------+-----------+---------+------------------------------ Old erf | 0.03404 | 0.55617 | 1.24996567e+00, 0x1.3ffdcp+0 New erf | 0.02774 | 0.54213 | 8.52452755e-01, 0x1.b474bp-1 Note that s_erf.c claims that in this interval, a taylor series about erf(1) is used and suggests that the constant erx = 8.4506291151e-01 is erf(1). That's bogus as erf(1) = 8.42700779e-01F (in float). Note**2, I did not record a domain and range as my routine that generates plots seems to give a really messed up result for the range [-0.08, 0.06]. However, the error estimate from solving the Remes matrix equation is err = 0x1.39c84809b7ed2p-38. New diff: Index: src/s_erff.c =================================================================== --- src/s_erff.c (revision 1361) +++ src/s_erff.c (working copy) @@ -25,38 +25,34 @@ half= 5.0000000000e-01, /* 0x3F000000 * one = 1.0000000000e+00, /* 0x3F800000 */ two = 2.0000000000e+00, /* 0x40000000 */ /* c = (subfloat)0.84506291151 */ -erx = 8.4506291151e-01, /* 0x3f58560b */ +erx = 8.42700779e-01F, /* 0x1.af767ap-1 */ + /* * Coefficients for approximation to erf on [0,0.84375] */ efx = 1.2837916613e-01, /* 0x3e0375d4 */ efx8= 1.0270333290e+00, /* 0x3f8375d4 */ -pp0 = 1.2837916613e-01, /* 0x3e0375d4 */ -pp1 = -3.2504209876e-01, /* 0xbea66beb */ -pp2 = -2.8481749818e-02, /* 0xbce9528f */ -pp3 = -5.7702702470e-03, /* 0xbbbd1489 */ -pp4 = -2.3763017452e-05, /* 0xb7c756b1 */ -qq1 = 3.9791721106e-01, /* 0x3ecbbbce */ -qq2 = 6.5022252500e-02, /* 0x3d852a63 */ -qq3 = 5.0813062117e-03, /* 0x3ba68116 */ -qq4 = 1.3249473704e-04, /* 0x390aee49 */ -qq5 = -3.9602282413e-06, /* 0xb684e21a */ +/* + * Domain [0, 0.84375], range ~[-5.4446e-10,5.5197e-10]: + * |(erf(x) - x)/x - p(x)/q(x)| < 2**-31. + */ +pp0 = 1.28379166e-01F, /* 0x1.06eba8p-3 */ +pp1 = -3.36030394e-01F, /* -0x1.58185ap-2 */ +pp2 = -1.86260219e-03F, /* -0x1.e8451ep-10 */ +qq1 = 3.12324286e-01F, /* 0x1.3fd1f0p-2 */ +qq2 = 2.16070302e-02F, /* 0x1.620274p-6 */ +qq3 = -1.98859419e-03F, /* -0x1.04a626p-9 */ /* * Coefficients for approximation to erf in [0.84375,1.25] */ -pa0 = -2.3621185683e-03, /* 0xbb1acdc6 */ -pa1 = 4.1485610604e-01, /* 0x3ed46805 */ -pa2 = -3.7220788002e-01, /* 0xbebe9208 */ -pa3 = 3.1834661961e-01, /* 0x3ea2fe54 */ -pa4 = -1.1089469492e-01, /* 0xbde31cc2 */ -pa5 = 3.5478305072e-02, /* 0x3d1151b3 */ -pa6 = -2.1663755178e-03, /* 0xbb0df9c0 */ -qa1 = 1.0642088205e-01, /* 0x3dd9f331 */ -qa2 = 5.4039794207e-01, /* 0x3f0a5785 */ -qa3 = 7.1828655899e-02, /* 0x3d931ae7 */ -qa4 = 1.2617121637e-01, /* 0x3e013307 */ -qa5 = 1.3637083583e-02, /* 0x3c5f6e13 */ -qa6 = 1.1984500103e-02, /* 0x3c445aa3 */ +pa0 = 1.35131621e-08F, /* 0x1.d04f08p-27 */ +pa1 = 4.15107518e-01F, /* 0x1.a911f2p-2 */ +pa2 = -1.63339108e-01F, /* -0x1.4e84bcp-3 */ +pa3 = 1.12098485e-01F, /* 0x1.cb27c8p-4 */ +qa1 = 6.06513679e-01F, /* 0x1.3688f6p-1 */ +qa2 = 5.43227255e-01F, /* 0x1.1621e2p-1 */ +qa3 = 1.74396917e-01F, /* 0x1.652a36p-3 */ +qa4 = 5.88681065e-02F, /* 0x1.e23f5ep-5 */ /* * Coefficients for approximation to erfc in [1.25,1/0.35] */ @@ -114,15 +110,15 @@ erff(float x) return x + efx*x; } z = x*x; - r = pp0+z*(pp1+z*(pp2+z*(pp3+z*pp4))); - s = one+z*(qq1+z*(qq2+z*(qq3+z*(qq4+z*qq5)))); + r = pp0+z*(pp1+z*pp2); + s = one+z*(qq1+z*(qq2+z*qq3)); y = r/s; return x + x*y; } if(ix < 0x3fa00000) { /* 0.84375 <= |x| < 1.25 */ s = fabsf(x)-one; - P = pa0+s*(pa1+s*(pa2+s*(pa3+s*(pa4+s*(pa5+s*pa6))))); - Q = one+s*(qa1+s*(qa2+s*(qa3+s*(qa4+s*(qa5+s*qa6))))); + P = pa0+s*(pa1+s*(pa2+s*pa3)); + Q = one+s*(qa1+s*(qa2+s*(qa3+s*qa4))); if(hx>=0) return erx + P/Q; else return -erx - P/Q; } if (ix >= 0x40c00000) { /* inf>|x|>=6 */ @@ -163,8 +159,8 @@ erfcf(float x) if(ix < 0x23800000) /* |x|<2**-56 */ return one-x; z = x*x; - r = pp0+z*(pp1+z*(pp2+z*(pp3+z*pp4))); - s = one+z*(qq1+z*(qq2+z*(qq3+z*(qq4+z*qq5)))); + r = pp0+z*(pp1+z*pp2); + s = one+z*(qq1+z*(qq2+z*qq3)); y = r/s; if(hx < 0x3e800000) { /* x<1/4 */ return one-(x+x*y); @@ -176,8 +172,8 @@ erfcf(float x) } if(ix < 0x3fa00000) { /* 0.84375 <= |x| < 1.25 */ s = fabsf(x)-one; - P = pa0+s*(pa1+s*(pa2+s*(pa3+s*(pa4+s*(pa5+s*pa6))))); - Q = one+s*(qa1+s*(qa2+s*(qa3+s*(qa4+s*(qa5+s*qa6))))); + P = pa0+s*(pa1+s*(pa2+s*pa3)); + Q = one+s*(qa1+s*(qa2+s*(qa3+s*qa4))); if(hx>=0) { z = one-erx; return z - P/Q; } else { -- Steve