From owner-freebsd-numerics@FreeBSD.ORG  Sun Aug 25 02:30:30 2013
Return-Path: <owner-freebsd-numerics@FreeBSD.ORG>
Delivered-To: freebsd-numerics@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTP id 3FF16E2A
 for <freebsd-numerics@FreeBSD.org>; Sun, 25 Aug 2013 02:30:30 +0000 (UTC)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu
 [128.95.76.21])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id 13C2A21A6
 for <freebsd-numerics@FreeBSD.org>; Sun, 25 Aug 2013 02:30:30 +0000 (UTC)
Received: from troutmask.apl.washington.edu (localhost.apl.washington.edu
 [127.0.0.1])
 by troutmask.apl.washington.edu (8.14.7/8.14.7) with ESMTP id r7P2UTGP056799; 
 Sat, 24 Aug 2013 19:30:29 -0700 (PDT)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: (from sgk@localhost)
 by troutmask.apl.washington.edu (8.14.7/8.14.7/Submit) id r7P2UTTg056798;
 Sat, 24 Aug 2013 19:30:29 -0700 (PDT) (envelope-from sgk)
Date: Sat, 24 Aug 2013 19:30:29 -0700
From: Steve Kargl <sgk@troutmask.apl.washington.edu>
To: Bruce Evans <brde@optusnet.com.au>
Subject: Re: (2nd time) tweaks to erff() threshold values
Message-ID: <20130825023029.GA56772@troutmask.apl.washington.edu>
References: <20130822213315.GA6708@troutmask.apl.washington.edu>
 <20130823202257.Q1593@besplex.bde.org>
 <20130823165744.GA47369@troutmask.apl.washington.edu>
 <20130824202102.GA54981@troutmask.apl.washington.edu>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20130824202102.GA54981@troutmask.apl.washington.edu>
User-Agent: Mutt/1.5.21 (2010-09-15)
Cc: freebsd-numerics@FreeBSD.org
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
 <freebsd-numerics.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 25 Aug 2013 02:30:30 -0000

On Sat, Aug 24, 2013 at 01:21:02PM -0700, Steve Kargl wrote:
> On Fri, Aug 23, 2013 at 09:57:44AM -0700, Steve Kargl wrote:
> > On Fri, Aug 23, 2013 at 09:12:33PM +1000, Bruce Evans wrote:
> > 
> >> The whole erf implementation is probably not the best way for
> >> float precision, but is good for understanding higher precisions.
> > 
> > I'm fairly certain that the implementation of erff() is not very
> > efficient.  The polynomials, used in the rational approximations,
> > are the same order as those used in the double precision approximation.
> > I'll check the polys when update my P(x)/Q(x) remes algorithm for
> > erfl and erfcl.
> 
> I seem to be right (although I haven't iterated on the P's and Q's).
> 

Now, with pretty testing and (perhaps) better coefficients:

         | usec/call | Max ULP | X Max ULP
---------+-----------+---------+---------------------------------
new erfc |  0.02757  | 0.65947 | 8.19824636e-01, 0x1.a3c00ep-1
old erfc |  0.03348  | 0.68218 | 8.43257010e-01, 0x1.afbf62p-1
---------+-----------+---------+---------------------------------
new erf  |  0.02302  | 0.62437 | 1.04175471e-38, 0x1.c5bf88p-127
old erf  |  0.03028  | 0.62437 | 1.04175471e-38, 0x1.c5bf88p-127

OK to commit?

-- 
Steve

Index: src/s_erff.c
===================================================================
--- src/s_erff.c	(revision 1361)
+++ src/s_erff.c	(working copy)
@@ -31,16 +31,16 @@ erx =  8.4506291151e-01, /* 0x3f58560b *
  */
 efx =  1.2837916613e-01, /* 0x3e0375d4 */
 efx8=  1.0270333290e+00, /* 0x3f8375d4 */
-pp0  =  1.2837916613e-01, /* 0x3e0375d4 */
-pp1  = -3.2504209876e-01, /* 0xbea66beb */
-pp2  = -2.8481749818e-02, /* 0xbce9528f */
-pp3  = -5.7702702470e-03, /* 0xbbbd1489 */
-pp4  = -2.3763017452e-05, /* 0xb7c756b1 */
-qq1  =  3.9791721106e-01, /* 0x3ecbbbce */
-qq2  =  6.5022252500e-02, /* 0x3d852a63 */
-qq3  =  5.0813062117e-03, /* 0x3ba68116 */
-qq4  =  1.3249473704e-04, /* 0x390aee49 */
-qq5  = -3.9602282413e-06, /* 0xb684e21a */
+/*
+ *  Domain [0, 0.84375], range ~[-5.4446e-10,5.5197e-10]:
+ *  |(erf(x) - x)/x - p(x)/q(x)| < 2**-31.
+ */
+pp0  =  1.28379166e-01F,	/*  0x1.06eba8p-3 */
+pp1  = -3.36030394e-01F,	/* -0x1.58185ap-2 */
+pp2  = -1.86260219e-03F,	/* -0x1.e8451ep-10 */
+qq1  =  3.12324286e-01F,	/*  0x1.3fd1f0p-2 */
+qq2  =  2.16070302e-02F,	/*  0x1.620274p-6 */
+qq3  = -1.98859419e-03F, 	/* -0x1.04a626p-9 */
 /*
  * Coefficients for approximation to  erf  in [0.84375,1.25]
  */
@@ -114,8 +114,8 @@ erff(float x)
 		return x + efx*x;
 	    }
 	    z = x*x;
-	    r = pp0+z*(pp1+z*(pp2+z*(pp3+z*pp4)));
-	    s = one+z*(qq1+z*(qq2+z*(qq3+z*(qq4+z*qq5))));
+	    r = pp0+z*(pp1+z*pp2);
+	    s = one+z*(qq1+z*(qq2+z*qq3));
 	    y = r/s;
 	    return x + x*y;
 	}
@@ -163,8 +163,8 @@ erfcf(float x)
 	    if(ix < 0x23800000)  	/* |x|<2**-56 */
 		return one-x;
 	    z = x*x;
-	    r = pp0+z*(pp1+z*(pp2+z*(pp3+z*pp4)));
-	    s = one+z*(qq1+z*(qq2+z*(qq3+z*(qq4+z*qq5))));
+	    r = pp0+z*(pp1+z*pp2);
+	    s = one+z*(qq1+z*(qq2+z*qq3));
 	    y = r/s;
 	    if(hx < 0x3e800000) {  	/* x<1/4 */
 		return one-(x+x*y);

From owner-freebsd-numerics@FreeBSD.ORG  Sun Aug 25 02:51:13 2013
Return-Path: <owner-freebsd-numerics@FreeBSD.ORG>
Delivered-To: freebsd-numerics@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTP id BBCB61000
 for <freebsd-numerics@FreeBSD.org>; Sun, 25 Aug 2013 02:51:13 +0000 (UTC)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu
 [128.95.76.21])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id 7DE50225D
 for <freebsd-numerics@FreeBSD.org>; Sun, 25 Aug 2013 02:51:13 +0000 (UTC)
Received: from troutmask.apl.washington.edu (localhost.apl.washington.edu
 [127.0.0.1])
 by troutmask.apl.washington.edu (8.14.7/8.14.7) with ESMTP id r7P2pCBN056878; 
 Sat, 24 Aug 2013 19:51:12 -0700 (PDT)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: (from sgk@localhost)
 by troutmask.apl.washington.edu (8.14.7/8.14.7/Submit) id r7P2pCIQ056877;
 Sat, 24 Aug 2013 19:51:12 -0700 (PDT) (envelope-from sgk)
Date: Sat, 24 Aug 2013 19:51:12 -0700
From: Steve Kargl <sgk@troutmask.apl.washington.edu>
To: Bruce Evans <brde@optusnet.com.au>
Subject: Re: (2nd time) tweaks to erff() threshold values
Message-ID: <20130825025112.GA56868@troutmask.apl.washington.edu>
References: <20130822213315.GA6708@troutmask.apl.washington.edu>
 <20130823202257.Q1593@besplex.bde.org>
 <20130823165744.GA47369@troutmask.apl.washington.edu>
 <20130824202102.GA54981@troutmask.apl.washington.edu>
 <20130825023029.GA56772@troutmask.apl.washington.edu>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20130825023029.GA56772@troutmask.apl.washington.edu>
User-Agent: Mutt/1.5.21 (2010-09-15)
Cc: freebsd-numerics@FreeBSD.org
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
 <freebsd-numerics.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 25 Aug 2013 02:51:13 -0000

On Sat, Aug 24, 2013 at 07:30:29PM -0700, Steve Kargl wrote:
> On Sat, Aug 24, 2013 at 01:21:02PM -0700, Steve Kargl wrote:
> > On Fri, Aug 23, 2013 at 09:57:44AM -0700, Steve Kargl wrote:
> > > On Fri, Aug 23, 2013 at 09:12:33PM +1000, Bruce Evans wrote:
> > > 
> > >> The whole erf implementation is probably not the best way for
> > >> float precision, but is good for understanding higher precisions.
> > > 
> > > I'm fairly certain that the implementation of erff() is not very
> > > efficient.  The polynomials, used in the rational approximations,
> > > are the same order as those used in the double precision approximation.
> > > I'll check the polys when update my P(x)/Q(x) remes algorithm for
> > > erfl and erfcl.
> > 
> > I seem to be right (although I haven't iterated on the P's and Q's).
> > 
> 
> Now, with pretty testing and (perhaps) better coefficients:
> 

LMAO.  I have not idea where "pretty" came from.  s/pretty/better
-- 
Steve

From owner-freebsd-numerics@FreeBSD.ORG  Sun Aug 25 17:19:11 2013
Return-Path: <owner-freebsd-numerics@FreeBSD.ORG>
Delivered-To: freebsd-numerics@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTP id 73373BC3
 for <freebsd-numerics@FreeBSD.org>; Sun, 25 Aug 2013 17:19:11 +0000 (UTC)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu
 [128.95.76.21])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id 49AC82825
 for <freebsd-numerics@FreeBSD.org>; Sun, 25 Aug 2013 17:19:11 +0000 (UTC)
Received: from troutmask.apl.washington.edu (localhost.apl.washington.edu
 [127.0.0.1])
 by troutmask.apl.washington.edu (8.14.7/8.14.7) with ESMTP id r7PHJAqr060446; 
 Sun, 25 Aug 2013 10:19:10 -0700 (PDT)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: (from sgk@localhost)
 by troutmask.apl.washington.edu (8.14.7/8.14.7/Submit) id r7PHJAh4060445;
 Sun, 25 Aug 2013 10:19:10 -0700 (PDT) (envelope-from sgk)
Date: Sun, 25 Aug 2013 10:19:10 -0700
From: Steve Kargl <sgk@troutmask.apl.washington.edu>
To: Bruce Evans <brde@optusnet.com.au>
Subject: Re: (2nd time) tweaks to erff() threshold values
Message-ID: <20130825171910.GA60396@troutmask.apl.washington.edu>
References: <20130822213315.GA6708@troutmask.apl.washington.edu>
 <20130823202257.Q1593@besplex.bde.org>
 <20130823165744.GA47369@troutmask.apl.washington.edu>
 <20130824202102.GA54981@troutmask.apl.washington.edu>
 <20130825023029.GA56772@troutmask.apl.washington.edu>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20130825023029.GA56772@troutmask.apl.washington.edu>
User-Agent: Mutt/1.5.21 (2010-09-15)
Cc: freebsd-numerics@FreeBSD.org
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
 <freebsd-numerics.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 25 Aug 2013 17:19:11 -0000

On Sat, Aug 24, 2013 at 07:30:29PM -0700, Steve Kargl wrote:
> On Sat, Aug 24, 2013 at 01:21:02PM -0700, Steve Kargl wrote:
> > On Fri, Aug 23, 2013 at 09:57:44AM -0700, Steve Kargl wrote:
> > > On Fri, Aug 23, 2013 at 09:12:33PM +1000, Bruce Evans wrote:
> > > 
> > >> The whole erf implementation is probably not the best way for
> > >> float precision, but is good for understanding higher precisions.
> > > 
> > > I'm fairly certain that the implementation of erff() is not very
> > > efficient.  The polynomials, used in the rational approximations,
> > > are the same order as those used in the double precision approximation.
> > > I'll check the polys when update my P(x)/Q(x) remes algorithm for
> > > erfl and erfcl.
> > 
> > I seem to be right (although I haven't iterated on the P's and Q's).
> > 
> 
> Now, with pretty testing and (perhaps) better coefficients:

These values where for the interval [0,0.84375].

> 
>          | usec/call | Max ULP | X Max ULP
> ---------+-----------+---------+---------------------------------
> new erfc |  0.02757  | 0.65947 | 8.19824636e-01, 0x1.a3c00ep-1
> old erfc |  0.03348  | 0.68218 | 8.43257010e-01, 0x1.afbf62p-1
> ---------+-----------+---------+---------------------------------
> new erf  |  0.02302  | 0.62437 | 1.04175471e-38, 0x1.c5bf88p-127
> old erf  |  0.03028  | 0.62437 | 1.04175471e-38, 0x1.c5bf88p-127


New interval [0.84375,1.25].

         | usecs/call| Max ULP | X Max ULP
---------+-----------+---------+------------------------------
Old erfc |  0.03469  | 0.95158 | 1.24999416e+00, 0x1.3fff9ep+0
New erfc |  0.02781  | 0.76497 | 1.24971473e+00, 0x1.3fed4ep+0
---------+-----------+---------+------------------------------
Old erf  |  0.03404  | 0.55617 | 1.24996567e+00, 0x1.3ffdcp+0
New erf  |  0.02774  | 0.54213 | 8.52452755e-01, 0x1.b474bp-1

Note that s_erf.c claims that in this interval, a taylor series about
erf(1) is used and suggests that the constant erx =  8.4506291151e-01
is erf(1).  That's bogus as erf(1) = 8.42700779e-01F (in float).

Note**2, I did not record a domain and range as my routine that
generates plots seems to give a really messed up result for
the range [-0.08, 0.06].  However, the error estimate from
solving the Remes matrix equation is err = 0x1.39c84809b7ed2p-38.


New diff:

Index: src/s_erff.c
===================================================================
--- src/s_erff.c	(revision 1361)
+++ src/s_erff.c	(working copy)
@@ -25,38 +25,34 @@ half=  5.0000000000e-01, /* 0x3F000000 *
 one =  1.0000000000e+00, /* 0x3F800000 */
 two =  2.0000000000e+00, /* 0x40000000 */
 	/* c = (subfloat)0.84506291151 */
-erx =  8.4506291151e-01, /* 0x3f58560b */
+erx =  8.42700779e-01F, /* 0x1.af767ap-1 */
+
 /*
  * Coefficients for approximation to  erf on [0,0.84375]
  */
 efx =  1.2837916613e-01, /* 0x3e0375d4 */
 efx8=  1.0270333290e+00, /* 0x3f8375d4 */
-pp0  =  1.2837916613e-01, /* 0x3e0375d4 */
-pp1  = -3.2504209876e-01, /* 0xbea66beb */
-pp2  = -2.8481749818e-02, /* 0xbce9528f */
-pp3  = -5.7702702470e-03, /* 0xbbbd1489 */
-pp4  = -2.3763017452e-05, /* 0xb7c756b1 */
-qq1  =  3.9791721106e-01, /* 0x3ecbbbce */
-qq2  =  6.5022252500e-02, /* 0x3d852a63 */
-qq3  =  5.0813062117e-03, /* 0x3ba68116 */
-qq4  =  1.3249473704e-04, /* 0x390aee49 */
-qq5  = -3.9602282413e-06, /* 0xb684e21a */
+/*
+ *  Domain [0, 0.84375], range ~[-5.4446e-10,5.5197e-10]:
+ *  |(erf(x) - x)/x - p(x)/q(x)| < 2**-31.
+ */
+pp0  =  1.28379166e-01F,	/*  0x1.06eba8p-3 */
+pp1  = -3.36030394e-01F,	/* -0x1.58185ap-2 */
+pp2  = -1.86260219e-03F,	/* -0x1.e8451ep-10 */
+qq1  =  3.12324286e-01F,	/*  0x1.3fd1f0p-2 */
+qq2  =  2.16070302e-02F,	/*  0x1.620274p-6 */
+qq3  = -1.98859419e-03F, 	/* -0x1.04a626p-9 */
 /*
  * Coefficients for approximation to  erf  in [0.84375,1.25]
  */
-pa0  = -2.3621185683e-03, /* 0xbb1acdc6 */
-pa1  =  4.1485610604e-01, /* 0x3ed46805 */
-pa2  = -3.7220788002e-01, /* 0xbebe9208 */
-pa3  =  3.1834661961e-01, /* 0x3ea2fe54 */
-pa4  = -1.1089469492e-01, /* 0xbde31cc2 */
-pa5  =  3.5478305072e-02, /* 0x3d1151b3 */
-pa6  = -2.1663755178e-03, /* 0xbb0df9c0 */
-qa1  =  1.0642088205e-01, /* 0x3dd9f331 */
-qa2  =  5.4039794207e-01, /* 0x3f0a5785 */
-qa3  =  7.1828655899e-02, /* 0x3d931ae7 */
-qa4  =  1.2617121637e-01, /* 0x3e013307 */
-qa5  =  1.3637083583e-02, /* 0x3c5f6e13 */
-qa6  =  1.1984500103e-02, /* 0x3c445aa3 */
+pa0  =  1.35131621e-08F, /*  0x1.d04f08p-27 */
+pa1  =  4.15107518e-01F, /*  0x1.a911f2p-2 */
+pa2  = -1.63339108e-01F, /* -0x1.4e84bcp-3 */
+pa3  =  1.12098485e-01F, /*  0x1.cb27c8p-4 */
+qa1  =  6.06513679e-01F, /*  0x1.3688f6p-1 */
+qa2  =  5.43227255e-01F, /*  0x1.1621e2p-1 */
+qa3  =  1.74396917e-01F, /*  0x1.652a36p-3 */
+qa4  =  5.88681065e-02F, /*  0x1.e23f5ep-5 */
 /*
  * Coefficients for approximation to  erfc in [1.25,1/0.35]
  */
@@ -114,15 +110,15 @@ erff(float x)
 		return x + efx*x;
 	    }
 	    z = x*x;
-	    r = pp0+z*(pp1+z*(pp2+z*(pp3+z*pp4)));
-	    s = one+z*(qq1+z*(qq2+z*(qq3+z*(qq4+z*qq5))));
+	    r = pp0+z*(pp1+z*pp2);
+	    s = one+z*(qq1+z*(qq2+z*qq3));
 	    y = r/s;
 	    return x + x*y;
 	}
 	if(ix < 0x3fa00000) {		/* 0.84375 <= |x| < 1.25 */
 	    s = fabsf(x)-one;
-	    P = pa0+s*(pa1+s*(pa2+s*(pa3+s*(pa4+s*(pa5+s*pa6)))));
-	    Q = one+s*(qa1+s*(qa2+s*(qa3+s*(qa4+s*(qa5+s*qa6)))));
+	    P = pa0+s*(pa1+s*(pa2+s*pa3));
+	    Q = one+s*(qa1+s*(qa2+s*(qa3+s*qa4)));
 	    if(hx>=0) return erx + P/Q; else return -erx - P/Q;
 	}
 	if (ix >= 0x40c00000) {		/* inf>|x|>=6 */
@@ -163,8 +159,8 @@ erfcf(float x)
 	    if(ix < 0x23800000)  	/* |x|<2**-56 */
 		return one-x;
 	    z = x*x;
-	    r = pp0+z*(pp1+z*(pp2+z*(pp3+z*pp4)));
-	    s = one+z*(qq1+z*(qq2+z*(qq3+z*(qq4+z*qq5))));
+	    r = pp0+z*(pp1+z*pp2);
+	    s = one+z*(qq1+z*(qq2+z*qq3));
 	    y = r/s;
 	    if(hx < 0x3e800000) {  	/* x<1/4 */
 		return one-(x+x*y);
@@ -176,8 +172,8 @@ erfcf(float x)
 	}
 	if(ix < 0x3fa00000) {		/* 0.84375 <= |x| < 1.25 */
 	    s = fabsf(x)-one;
-	    P = pa0+s*(pa1+s*(pa2+s*(pa3+s*(pa4+s*(pa5+s*pa6)))));
-	    Q = one+s*(qa1+s*(qa2+s*(qa3+s*(qa4+s*(qa5+s*qa6)))));
+	    P = pa0+s*(pa1+s*(pa2+s*pa3));
+	    Q = one+s*(qa1+s*(qa2+s*(qa3+s*qa4)));
 	    if(hx>=0) {
 	        z  = one-erx; return z - P/Q;
 	    } else {


-- 
Steve

From owner-freebsd-numerics@FreeBSD.ORG  Sun Aug 25 17:58:18 2013
Return-Path: <owner-freebsd-numerics@FreeBSD.ORG>
Delivered-To: freebsd-numerics@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTP id 85E1E2A7
 for <freebsd-numerics@FreeBSD.org>; Sun, 25 Aug 2013 17:58:18 +0000 (UTC)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu
 [128.95.76.21])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id 4ABF329B1
 for <freebsd-numerics@FreeBSD.org>; Sun, 25 Aug 2013 17:58:18 +0000 (UTC)
Received: from troutmask.apl.washington.edu (localhost.apl.washington.edu
 [127.0.0.1])
 by troutmask.apl.washington.edu (8.14.7/8.14.7) with ESMTP id r7PHwHvO060609; 
 Sun, 25 Aug 2013 10:58:17 -0700 (PDT)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: (from sgk@localhost)
 by troutmask.apl.washington.edu (8.14.7/8.14.7/Submit) id r7PHwH61060608;
 Sun, 25 Aug 2013 10:58:17 -0700 (PDT) (envelope-from sgk)
Date: Sun, 25 Aug 2013 10:58:17 -0700
From: Steve Kargl <sgk@troutmask.apl.washington.edu>
To: Bruce Evans <brde@optusnet.com.au>
Subject: Re: (2nd time) tweaks to erff() threshold values
Message-ID: <20130825175817.GA60599@troutmask.apl.washington.edu>
References: <20130822213315.GA6708@troutmask.apl.washington.edu>
 <20130823202257.Q1593@besplex.bde.org>
 <20130823165744.GA47369@troutmask.apl.washington.edu>
 <20130824202102.GA54981@troutmask.apl.washington.edu>
 <20130825023029.GA56772@troutmask.apl.washington.edu>
 <20130825171910.GA60396@troutmask.apl.washington.edu>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20130825171910.GA60396@troutmask.apl.washington.edu>
User-Agent: Mutt/1.5.21 (2010-09-15)
Cc: freebsd-numerics@FreeBSD.org
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
 <freebsd-numerics.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 25 Aug 2013 17:58:18 -0000

On Sun, Aug 25, 2013 at 10:19:10AM -0700, Steve Kargl wrote:
> 
> Note**2, I did not record a domain and range as my routine that
> generates plots seems to give a really messed up result for
> the range [-0.08, 0.06].  However, the error estimate from
> solving the Remes matrix equation is err = 0x1.39c84809b7ed2p-38.

Found the bug in the graphing routine.

/*
 *  Domain [0.84375, 1.25], range ~[-1.954e-10,1.940e-11]:
 *  |(erf(x) - erx) - p(x)/q(x)| < 2**-36.
 */
> +pa0  =  1.35131621e-08F, /*  0x1.d04f08p-27 */
> +pa1  =  4.15107518e-01F, /*  0x1.a911f2p-2 */
> +pa2  = -1.63339108e-01F, /* -0x1.4e84bcp-3 */
> +pa3  =  1.12098485e-01F, /*  0x1.cb27c8p-4 */
> +qa1  =  6.06513679e-01F, /*  0x1.3688f6p-1 */
> +qa2  =  5.43227255e-01F, /*  0x1.1621e2p-1 */
> +qa3  =  1.74396917e-01F, /*  0x1.652a36p-3 */
> +qa4  =  5.88681065e-02F, /*  0x1.e23f5ep-5 */

Note, the range and 2**-36 estimate are from the results of
the Remes algorithm run with 1024-bits of precision.  The
above coefficients were simply rounded to single precision,
as I haven't investigated a better method for running, yet.

-- 
Steve

From owner-freebsd-numerics@FreeBSD.ORG  Mon Aug 26 11:06:49 2013
Return-Path: <owner-freebsd-numerics@FreeBSD.ORG>
Delivered-To: freebsd-numerics@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTP id 07088197
 for <freebsd-numerics@FreeBSD.org>; Mon, 26 Aug 2013 11:06:49 +0000 (UTC)
 (envelope-from owner-bugmaster@FreeBSD.org)
Received: from freefall.freebsd.org (freefall.freebsd.org
 [IPv6:2001:1900:2254:206c::16:87])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id E8CE82868
 for <freebsd-numerics@FreeBSD.org>; Mon, 26 Aug 2013 11:06:48 +0000 (UTC)
Received: from freefall.freebsd.org (localhost [127.0.0.1])
 by freefall.freebsd.org (8.14.7/8.14.7) with ESMTP id r7QB6mOV066021
 for <freebsd-numerics@FreeBSD.org>; Mon, 26 Aug 2013 11:06:48 GMT
 (envelope-from owner-bugmaster@FreeBSD.org)
Received: (from gnats@localhost)
 by freefall.freebsd.org (8.14.7/8.14.7/Submit) id r7QB6m0e066019
 for freebsd-numerics@FreeBSD.org; Mon, 26 Aug 2013 11:06:48 GMT
 (envelope-from owner-bugmaster@FreeBSD.org)
Date: Mon, 26 Aug 2013 11:06:48 GMT
Message-Id: <201308261106.r7QB6m0e066019@freefall.freebsd.org>
X-Authentication-Warning: freefall.freebsd.org: gnats set sender to
 owner-bugmaster@FreeBSD.org using -f
From: FreeBSD bugmaster <bugmaster@freebsd.org>
To: freebsd-numerics@FreeBSD.org
Subject: Current problem reports assigned to freebsd-numerics@FreeBSD.org
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
 <freebsd-numerics.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 26 Aug 2013 11:06:49 -0000

Note: to view an individual PR, use:
  http://www.freebsd.org/cgi/query-pr.cgi?pr=(number).

The following is a listing of current problems submitted by FreeBSD users.
These represent problem reports covering all versions including
experimental development code and obsolete releases.


S Tracker      Resp.      Description
--------------------------------------------------------------------------------
o stand/175811 numerics   libstdc++ needs complex support in order use C99
o bin/170206   numerics   [msun] [patch] complex arcsinh, log, etc.
o stand/82654  numerics   C99 long double math functions are missing

3 problems total.


From owner-freebsd-numerics@FreeBSD.ORG  Mon Aug 26 14:27:43 2013
Return-Path: <owner-freebsd-numerics@FreeBSD.ORG>
Delivered-To: freebsd-numerics@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTP id 7E191B23
 for <freebsd-numerics@FreeBSD.org>; Mon, 26 Aug 2013 14:27:43 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from mail109.syd.optusnet.com.au (mail109.syd.optusnet.com.au
 [211.29.132.80]) by mx1.freebsd.org (Postfix) with ESMTP id 4311A27D3
 for <freebsd-numerics@FreeBSD.org>; Mon, 26 Aug 2013 14:27:43 +0000 (UTC)
Received: from c122-106-156-23.carlnfd1.nsw.optusnet.com.au
 (c122-106-156-23.carlnfd1.nsw.optusnet.com.au [122.106.156.23])
 by mail109.syd.optusnet.com.au (Postfix) with ESMTPS id B4BAAD64236;
 Tue, 27 Aug 2013 00:27:31 +1000 (EST)
Date: Tue, 27 Aug 2013 00:27:30 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Steve Kargl <sgk@troutmask.apl.washington.edu>
Subject: Re: (2nd time) tweaks to erff() threshold values
In-Reply-To: <20130823165744.GA47369@troutmask.apl.washington.edu>
Message-ID: <20130826235215.X2165@besplex.bde.org>
References: <20130822213315.GA6708@troutmask.apl.washington.edu>
 <20130823202257.Q1593@besplex.bde.org>
 <20130823165744.GA47369@troutmask.apl.washington.edu>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Optus-CM-Score: 0
X-Optus-CM-Analysis: v=2.1 cv=YYGEuWhf c=1 sm=1 tr=0
 a=ebeQFi2P/qHVC0Yw9JDJ4g==:117 a=PO7r1zJSAAAA:8 a=F6uwjYSw4mAA:10
 a=kj9zAlcOel0A:10 a=JzwRw_2MAAAA:8 a=bRjWgPqadKYA:10
 a=EmHTJ-zS--LAX7py56IA:9 a=CjuIK1q_8ugA:10
Cc: freebsd-numerics@FreeBSD.org
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
 <freebsd-numerics.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 26 Aug 2013 14:27:43 -0000

Catching up with old mail...

On Fri, 23 Aug 2013, Steve Kargl wrote:

> On Fri, Aug 23, 2013 at 09:12:33PM +1000, Bruce Evans wrote:
> ...
>> Some modified lines:
>>
>> @ 	if(ix < 0x3f580000) {		/* |x|<0.84375 */
>> @ 	    if(ix < 0x38800000) { 	/* |x|<2**-14 */
>> @ 	        if (ix < 0x04000000)	/* |x|<0x1p-119 */
>> @ 		    return (8*x+efx8*x)/8;  /*avoid spurious underflow */
>> @ 		}
>> @ 		return x + efx*x;
>> @ 	    }
>
> OK.  I'll update my patch to use the above.  Did you see my
> suggested changes to erfcf()?

Yes.  I didn't try them because I neglected to test erfc*() before so I
didn't have anything to compare with.  I thought that erfc*() was so
similar to erf*() that it didn't need testing.  But testing shows that
it did need testing since it is quite broken.  I get errors like the
following, and the changes in your later mail don't affect this much:
(oops, the change of the threshold from 28 to 9.194705 in this mail
did affect this and gave very large errors of up to 2048 starting at
the new threshold.  The extra errors seem to be the result of underflow
to 0 (even on i386 with some extra precision).  With that backed out):

Very large errors start about here:
% x =                           0x407b9000 3.93066406
% erfc(x) =  0x3e5d2abbd6aee678 0x32e955df 2.71638191e-08
% erfcf(x) = 0x3e5d2abdc0000000 0x32e955ee 2.71638463e-08
% err = +0x1e9511988 15.29115

This was the largest one found:
% x =                           0x40ffd000 7.99414062
% erfc(x) =  0x39ef467e0a9d6031  0xf7a33f0 1.23359547e-29
% erfcf(x) = 0x39ef4685e0000000  0xf7a342f 1.23360018e-29
% err = +0x7d5629fcf 62.66829

This is due to a large cancelation for subtracting 1.  The fdlibm algorithm
is no good.  erfc() is much better in old BSD libm.  It is almost the
same as fdlibm except it avoids this problem:

fdlibm erfc():

% 	    z  = x;
% 	    SET_LOW_WORD(z,0);
% 	    r  =  __ieee754_exp(-z*z-0.5625)*
% 			__ieee754_exp((z-x)*(z+x)+R/S);
% 	    if(hx>0) return r/x; else return two-r/x;

Old BSD erfc() (from FreeBSD-1.1.5):

% 	/* return exp(-x^2 - lsqrtPI_hi + R + y)/x;	*/
% 	s = ((R + y) - lsqrtPI_hi) + z;
% 	y = (((z-s) - lsqrtPI_hi) + R) + y;
% 	r = exp__D(s, y)/x;
% 	if (x>0)
% 		return r;
% 	else
% 		return two-r;

The extra-precision function exp__D() gives some chance of a not very
large error.  I think it only gives about 10 more bits of accuracy.
I don't quite see how it can work for very large x when the cancelation
is almost complete.  Replacing the expf()'s by exp()'s in the erfc()
reduced the error similarly, but it is still above 1 ulp.

The old BSD erf*() also has more comments about what the error actually
is, but these don't clearly distinguish erfc() from erf().

The old BSD erf*()'s file name is not spelled with an s_.

> ...

Bruce

From owner-freebsd-numerics@FreeBSD.ORG  Mon Aug 26 14:30:25 2013
Return-Path: <owner-freebsd-numerics@FreeBSD.ORG>
Delivered-To: freebsd-numerics@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTP id 6A9AAC1B
 for <freebsd-numerics@freebsd.org>; Mon, 26 Aug 2013 14:30:25 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from mail110.syd.optusnet.com.au (mail110.syd.optusnet.com.au
 [211.29.132.97]) by mx1.freebsd.org (Postfix) with ESMTP id 2F95A282E
 for <freebsd-numerics@freebsd.org>; Mon, 26 Aug 2013 14:30:24 +0000 (UTC)
Received: from c122-106-156-23.carlnfd1.nsw.optusnet.com.au
 (c122-106-156-23.carlnfd1.nsw.optusnet.com.au [122.106.156.23])
 by mail110.syd.optusnet.com.au (Postfix) with ESMTPS id E78DE783074;
 Tue, 27 Aug 2013 00:30:13 +1000 (EST)
Date: Tue, 27 Aug 2013 00:30:12 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Steve Kargl <sgk@troutmask.apl.washington.edu>
Subject: Re: (2nd time) tweaks to erff() threshold values
In-Reply-To: <20130823171526.GA47736@troutmask.apl.washington.edu>
Message-ID: <20130827002821.K2328@besplex.bde.org>
References: <20130822213315.GA6708@troutmask.apl.washington.edu>
 <20130823202257.Q1593@besplex.bde.org>
 <20130823171526.GA47736@troutmask.apl.washington.edu>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Optus-CM-Score: 0
X-Optus-CM-Analysis: v=2.1 cv=bpB1Wiqi c=1 sm=1 tr=0
 a=ebeQFi2P/qHVC0Yw9JDJ4g==:117 a=PO7r1zJSAAAA:8 a=F6uwjYSw4mAA:10
 a=kj9zAlcOel0A:10 a=JzwRw_2MAAAA:8 a=bRjWgPqadKYA:10
 a=0N4hs-jxxM66dZRz4AkA:9 a=CjuIK1q_8ugA:10
Cc: freebsd-numerics@freebsd.org, Bruce Evans <brde@optusnet.com.au>
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
 <freebsd-numerics.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 26 Aug 2013 14:30:25 -0000

On Fri, 23 Aug 2013, Steve Kargl wrote:

> On Fri, Aug 23, 2013 at 09:12:33PM +1000, Bruce Evans wrote:
>>
>> @ 	if(ix < 0x3f580000) {		/* |x|<0.84375 */
>> @ 	    if(ix < 0x38800000) { 	/* |x|<2**-14 */
>> @ 	        if (ix < 0x04000000)	/* |x|<0x1p-119 */
>> @ 		    return (8*x+efx8*x)/8;  /*avoid spurious underflow */
>
> To keep the diff between s_erf.c and s_erff.c small, do you
> want my to update s_erf.c with the last line above?

Yes, including the style change from *0.125 to /8 and any similar style
changes you can find (or don't change this anywhere), and the comment
on the threshold for underflow.

Bruce

From owner-freebsd-numerics@FreeBSD.ORG  Mon Aug 26 15:23:33 2013
Return-Path: <owner-freebsd-numerics@FreeBSD.ORG>
Delivered-To: freebsd-numerics@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTP id 02EA0772
 for <freebsd-numerics@freebsd.org>; Mon, 26 Aug 2013 15:23:33 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from mail107.syd.optusnet.com.au (mail107.syd.optusnet.com.au
 [211.29.132.53]) by mx1.freebsd.org (Postfix) with ESMTP id A18FE2B57
 for <freebsd-numerics@freebsd.org>; Mon, 26 Aug 2013 15:23:32 +0000 (UTC)
Received: from c122-106-156-23.carlnfd1.nsw.optusnet.com.au
 (c122-106-156-23.carlnfd1.nsw.optusnet.com.au [122.106.156.23])
 by mail107.syd.optusnet.com.au (Postfix) with ESMTPS id BF567D440CD;
 Tue, 27 Aug 2013 01:23:20 +1000 (EST)
Date: Tue, 27 Aug 2013 01:23:19 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Steve Kargl <sgk@troutmask.apl.washington.edu>
Subject: Re: (2nd time) tweaks to erff() threshold values
In-Reply-To: <20130825171910.GA60396@troutmask.apl.washington.edu>
Message-ID: <20130827003147.R2328@besplex.bde.org>
References: <20130822213315.GA6708@troutmask.apl.washington.edu>
 <20130823202257.Q1593@besplex.bde.org>
 <20130823165744.GA47369@troutmask.apl.washington.edu>
 <20130824202102.GA54981@troutmask.apl.washington.edu>
 <20130825023029.GA56772@troutmask.apl.washington.edu>
 <20130825171910.GA60396@troutmask.apl.washington.edu>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Optus-CM-Score: 0
X-Optus-CM-Analysis: v=2.1 cv=DstvpgP+ c=1 sm=1 tr=0
 a=ebeQFi2P/qHVC0Yw9JDJ4g==:117 a=PO7r1zJSAAAA:8 a=F6uwjYSw4mAA:10
 a=kj9zAlcOel0A:10 a=JzwRw_2MAAAA:8 a=bRjWgPqadKYA:10
 a=NYRZoR52SQm7tVCai4AA:9 a=CjuIK1q_8ugA:10 a=udEmAkY8_B0A:10
Cc: freebsd-numerics@freebsd.org
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
 <freebsd-numerics.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 26 Aug 2013 15:23:33 -0000

On Sun, 25 Aug 2013, Steve Kargl wrote:

> On Sat, Aug 24, 2013 at 07:30:29PM -0700, Steve Kargl wrote:
>> On Sat, Aug 24, 2013 at 01:21:02PM -0700, Steve Kargl wrote:
>>> On Fri, Aug 23, 2013 at 09:57:44AM -0700, Steve Kargl wrote:
>>>> On Fri, Aug 23, 2013 at 09:12:33PM +1000, Bruce Evans wrote:
>>>>
>>>>> The whole erf implementation is probably not the best way for
>>>>> float precision, but is good for understanding higher precisions.
>>>>
>>>> I'm fairly certain that the implementation of erff() is not very
>>>> efficient.  The polynomials, used in the rational approximations,
>>>> are the same order as those used in the double precision approximation.
>>>> I'll check the polys when update my P(x)/Q(x) remes algorithm for
>>>> erfl and erfcl.
>>>
>>> I seem to be right (although I haven't iterated on the P's and Q's).
>>
>> Now, with pretty testing and (perhaps) better coefficients:

You mean pretty ugly formatting of the declarations of th coefficients to
match the existing style :-).

> These values where for the interval [0,0.84375].
>>
>>          | usec/call | Max ULP | X Max ULP
>> ---------+-----------+---------+---------------------------------
>> new erfc |  0.02757  | 0.65947 | 8.19824636e-01, 0x1.a3c00ep-1
>> old erfc |  0.03348  | 0.68218 | 8.43257010e-01, 0x1.afbf62p-1
>> ---------+-----------+---------+---------------------------------
>> new erf  |  0.02302  | 0.62437 | 1.04175471e-38, 0x1.c5bf88p-127
>> old erf  |  0.03028  | 0.62437 | 1.04175471e-38, 0x1.c5bf88p-127

Be careful testing on i386.  Its extra precision gives smaller errors.
These look like the i386 errors.  Old erff has a maximum error of more
than 0.9 ulps on all args.

> New interval [0.84375,1.25].
>
>         | usecs/call| Max ULP | X Max ULP
> ---------+-----------+---------+------------------------------
> Old erfc |  0.03469  | 0.95158 | 1.24999416e+00, 0x1.3fff9ep+0
> New erfc |  0.02781  | 0.76497 | 1.24971473e+00, 0x1.3fed4ep+0
> ---------+-----------+---------+------------------------------
> Old erf  |  0.03404  | 0.55617 | 1.24996567e+00, 0x1.3ffdcp+0
> New erf  |  0.02774  | 0.54213 | 8.52452755e-01, 0x1.b474bp-1

I see an improvement like this on amd64 but not on i386.  New erf is
down from 0.9+ to 0.8+ on amd64 on all args, but still 0.6+ on i386.
erfc has much larger errors on other intervals and I didn't test this
interval by itself.

> Note that s_erf.c claims that in this interval, a taylor series about
> erf(1) is used and suggests that the constant erx =  8.4506291151e-01
> is erf(1).  That's bogus as erf(1) = 8.42700779e-01F (in float).

Well, not quite.  It says:

%  *	   Remark: here we use the taylor series expansion at x=1.
%  *		erf(1+s) = erf(1) + s*Poly(s)
%  *			 = 0.845.. + P1(s)/Q1(s)
%  *	   That is, we use rational approximation to approximate
%  *			erf(1+s) - (c = (single)0.84506291151)
%  *	   Note that |P1/Q1|< 0.078 for x in [0.84375,1.25]
%  *	   where 
%  *		P1(s) = degree 6 poly in s
%  *		Q1(s) = degree 6 poly in s

It puts some of erf(1) in 0.845 and some in P1(s)/Q1(s).

> New diff:

It is relative to a version that already has the threshold changes.

> Index: src/s_erff.c
> ...
> -pa0  = -2.3621185683e-03, /* 0xbb1acdc6 */
> ...
> -qa1  =  1.0642088205e-01, /* 0x3dd9f331 */
> ...
> +pa0  =  1.35131621e-08F, /*  0x1.d04f08p-27 */
> ...
> +qa1  =  6.06513679e-01F, /*  0x1.3688f6p-1 */

(qa0 is spelled 'one' in both versions.)

You still have some of erf(1) in pa0/qa0, but not as much as before.  The
decomposition should be something like hi+lo -- write erf(1) = hi+lo and
put hi in erx and lo in pa0/qa0 = pa0.  Is that exactly what you do?  I
wonder if there is a technical reason for putting more in pa0/qa0.  Or
maybe the whole expression should be written as hi+P/Q+lo, where P(0) = 0.
Sometimes rational approximations should be done by splitting out even
more terms from the P/Q part and adding the other terms more carefully.
The parts that are split out are chosen to make them easier to add up
carefully, while keeping the parts that are not split out small so that
they don't need to be evaluated carefully.  See k_tan.c for an example.

Another bug in the comment is that it hasn't caught up the code.  The
variable 'erx' was spelled 'c' in old BSD libm.  It is still spelled
'c' in the fdlibm comment.

The comment does say that 'c' is rounded to single.  That is good for
a hi+lo decomposition, and it may be necessary for technical reasons
to keep some of 'c's lower bits zero.  Rounding it to single doesn't
give this when the precision is already single.  Old erff didn't round
it further either.

Bruce

From owner-freebsd-numerics@FreeBSD.ORG  Mon Aug 26 17:19:20 2013
Return-Path: <owner-freebsd-numerics@FreeBSD.ORG>
Delivered-To: freebsd-numerics@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTP id 1D042198
 for <freebsd-numerics@freebsd.org>; Mon, 26 Aug 2013 17:19:20 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from mail108.syd.optusnet.com.au (mail108.syd.optusnet.com.au
 [211.29.132.59]) by mx1.freebsd.org (Postfix) with ESMTP id BC0B7223E
 for <freebsd-numerics@freebsd.org>; Mon, 26 Aug 2013 17:19:19 +0000 (UTC)
Received: from c122-106-156-23.carlnfd1.nsw.optusnet.com.au
 (c122-106-156-23.carlnfd1.nsw.optusnet.com.au [122.106.156.23])
 by mail108.syd.optusnet.com.au (Postfix) with ESMTPS id F06991A054A;
 Tue, 27 Aug 2013 03:19:13 +1000 (EST)
Date: Tue, 27 Aug 2013 03:19:13 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Steve Kargl <sgk@troutmask.apl.washington.edu>
Subject: Re: (2nd time) tweaks to erff() threshold values
In-Reply-To: <20130825175817.GA60599@troutmask.apl.washington.edu>
Message-ID: <20130827012348.G2328@besplex.bde.org>
References: <20130822213315.GA6708@troutmask.apl.washington.edu>
 <20130823202257.Q1593@besplex.bde.org>
 <20130823165744.GA47369@troutmask.apl.washington.edu>
 <20130824202102.GA54981@troutmask.apl.washington.edu>
 <20130825023029.GA56772@troutmask.apl.washington.edu>
 <20130825171910.GA60396@troutmask.apl.washington.edu>
 <20130825175817.GA60599@troutmask.apl.washington.edu>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Optus-CM-Score: 0
X-Optus-CM-Analysis: v=2.1 cv=YYGEuWhf c=1 sm=1 tr=0
 a=ebeQFi2P/qHVC0Yw9JDJ4g==:117 a=PO7r1zJSAAAA:8 a=F6uwjYSw4mAA:10
 a=kj9zAlcOel0A:10 a=JzwRw_2MAAAA:8 a=bRjWgPqadKYA:10
 a=xRB_yx2URKs_nkmv_ssA:9 a=CjuIK1q_8ugA:10
Cc: freebsd-numerics@freebsd.org
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
 <freebsd-numerics.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 26 Aug 2013 17:19:20 -0000

On Sun, 25 Aug 2013, Steve Kargl wrote:

> On Sun, Aug 25, 2013 at 10:19:10AM -0700, Steve Kargl wrote:
>>
>> Note**2, I did not record a domain and range as my routine that
>> generates plots seems to give a really messed up result for
>> the range [-0.08, 0.06].  However, the error estimate from
>> solving the Remes matrix equation is err = 0x1.39c84809b7ed2p-38.
>
> Found the bug in the graphing routine.
>
> /*
> *  Domain [0.84375, 1.25], range ~[-1.954e-10,1.940e-11]:
> *  |(erf(x) - erx) - p(x)/q(x)| < 2**-36.
> */
>
> Note, the range and 2**-36 estimate are from the results of
> the Remes algorithm run with 1024-bits of precision.  The
> above coefficients were simply rounded to single precision,
> as I haven't investigated a better method for running, yet.

I now have a good method for rounding the coefficients.  It is like
my old iterative method of rounding the the n'th coeff and then
calculating all coeffs after the n'th again with all coeffs up to and
including the n'th fixed, except I now use an almost pure Remez
algorithm for all steps, and later another set of iterations to
recaclulate all early coeffs after fixing some later ones.

Here is the almost pure Remez algorithm in pari:

% cpolinterpolate(p, d, a) =

- p is a polynomial containing fixed leading coeffs (0 for basic Remez)
- d is the order of the lowest non-fixed coeff (0 for basic Remez)
- 'a' is an array of n + 2 "interpolation" points (basic Remez)
- the function to be approximated is a global f(x)
- the function giving the relative errors a global e(x)

For example, f(x) might be cos(x) ~= 1 - 0.5*x**2 + ...  After fixing
the first 2 coeffs, p = 1 - 0.5*x**2, d = 4, and we have a modified
Remez approximation problem of finding a poly starting with the x**4
term.  Nonzero d requires one obvious change to basic Remez (set up the
equations to be solved starting at the x**d term instead of the x**0
term), and one not so obvious change (if d is odd, then the signs in
the +-1 column (the coefficients for the error E) of the matrix must
be flipped for negative interpolation points).  You can understand the
need for something like the latter by understanding basic Remez staring
at the matrix a little.  Note that the signes in the +- column for
basic Remez give a nonsingular matrix.  This just fails for odd d, but
is fixed by the adjustment just described, except when the interpolation
points are symmetric with the origin.  The symmetric case is easy to avoid
in practice.

% {
% 	local(M, c, i, j, n, v);
% 
% 	n = matsize(a)[2] - 2;
% 	v = vector(n);
% 	M = matrix(n, n);
% 	for (i = 1, n,
% 		v[i] = f(a[1 + i]) - subst(p, x, a[1 + i]);

Function value at an interpolation point after adjusting for the fixed
part p = p(x).

% 		if (evenfunc,
% 			for (j = 1, n - 1, M[i, j] = a[1 + i]^(d + 2*(j - 1)););
% 		,
% 			for (j = 1, n - 1, M[i, j] = a[1 + i]^(d + j - 1););
% 		);
% 		M[i, n] = (-1)^i;

This sets up basic Remez, with the adjustment for x**d instead of x**0.
The special case for even functions is not quite just an optimization.

% 		if (d % 2 == 1 && a[1 + i] < 0,
% 			M[i, n] = -M[i, n];
% 		);

This flips the signs.

% 		M[i, n] *= e(a[1 + i]);

This changes the +-1 column to a weighted version of the same.  It is best
for e(x) to be as smooth as f(x), but this algorithm does OK with even
widly discontinuious e(x) (e.g., 2**ilogb(x)).

% 	);
% 	v = precision(v, default(realprecision));
% 	M = precision(M, default(realprecision));
% 	c = matsolve(M, v~);
% 	c = precision(c, default(realprecision));

Solve the equation, with technical complications to avoid problems with
insufficient ot too much precision.

% 	gE = c[n];

This is the global error (E).

% 	if (evenfunc,
% 		for (j = 1, n - 1, p += c[j] * x^(d + 2 * (j - 1)););
% 	,
% 		for (j = 1, n - 1, p += c[j] * x^(d + j - 1););
% 	);

Add the new terms to the fixed terms in p.

% 	return (p);
% }

The iteration step in basic Remez needs an adjustment for signs too.  In
basic Remez, the signs of the errors at the interpolation points alternate
(+-E), and a new set of interpolation points is chosen to preserve this
alternation while choosing points where the error is larger than gE.  The
sign flipping must be applied similary.  E.g., if for basic Remez the
error pattern is 'E -E | E -E E' where '|' is the origin, the for odd d
the error pattern becomes '-E E | E -E E'.  Note that the signs no longer
alternate across the origin.

Complications like the following are handled very well by this algorithm:
suppose that in approximating cos(x) we fix the x**2 coeff at -0.4999
instead of -0.5 (similar inaccuracies are unavoidable for higher terms).
Then the naive method of trying to approximate
(cos(x) - (1 - 0.4999*x**2) / x^4 using basic Remez doesn't work because
the function to be approximated is singular at the origin.  (We divided
by x**4 to try to reduce to basic Remez.)  The singularity tends to cause
singular matrixes.  I used to use essentially this method, with hack to
avoid the singularity.  It basically works to a not try to approximate
near the singularity, since multiplying by x**4 will cancel the singularity
and leave a good approximation near it.  But it is hard to keep the
interpolation points away from the singularity without having many special
cases.  The above algorithm hits a sort of removable singularity at 0
instead,  It seems to be numerically stable near the singularity if f(x)
is smooth.

To handle some coeffs being rounded more severely than others, the following
algorithm works suprisingly well:
- do the above iterations to round all coeffs from the first to the last
- fix all the (trailing) coeffs that are rounded more severely than the
   others
- iterate from the first to he last non-trailing coeff to calculate all
   the leading coeffs again, now for the new function f(x) - p1(x) where
   p1 is the poly with the trailing coeffs, and of course the poly for the
   leading coeffs has degree reduced by the number of trailing coeffs
   compared with the poly found in the first pass.  This makes the leading
   coeffs match the trailing coeffs better than is possible on the first
   pass.  On the first pass, leading coeffs are chosen under the assumption
   that trailing coeffs are infinte-precision.  When they are severely
   rounded, this assumption breaks badly.
This algorithm works so well that trying to improve it using more passes
gets nowhere or backwards.  E.g., it usually makes no significant difference
to caclulate all the trailing coeffs again with the recalulated set of
leading coeffs.

Someone at INRIA wrote a paper about poly approximations with some
coeffs forced to 0.  Normally, all the generators {x**0, x**1, x**2, ...}
are allowed.  Forcing some coeffs to 0 restricts the generators to a
specifed subset of the standard one.  The author noted that this can
easily give a singular matrix for the corresponding extended Remez
algorithm.  I think too easily to be very useful.  My extension works
fairly generally for the special subset {x**d, x**(d+1), ...}.  I have
the same amount of proof that it works as the paper (none).  I didn't
notice anything about the complications for the sign flipping in the
paper.  Perhaps it required the subset to include {x**0} for general
functions or {x**1} for odd functions.  Without the function being 0
at 0, it is rarely possible to approximate it well enough if the x**0
coeff is forced to 0.  Similarly for higher coeffs that want to be
nonzero.  The point of forcing coeffs to 0 is that small ones tend to
give terms that are harder to add up accurately.  But the method seems
to have very limited use since only really tiny terms can be omitted
without having to do too much compsensation in other terms.

Bruce

From owner-freebsd-numerics@FreeBSD.ORG  Mon Aug 26 17:43:14 2013
Return-Path: <owner-freebsd-numerics@FreeBSD.ORG>
Delivered-To: freebsd-numerics@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTP id 37956E5C
 for <freebsd-numerics@freebsd.org>; Mon, 26 Aug 2013 17:43:14 +0000 (UTC)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu
 [128.95.76.21])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id C4E5B23CD
 for <freebsd-numerics@freebsd.org>; Mon, 26 Aug 2013 17:43:13 +0000 (UTC)
Received: from troutmask.apl.washington.edu (localhost.apl.washington.edu
 [127.0.0.1])
 by troutmask.apl.washington.edu (8.14.7/8.14.7) with ESMTP id r7QHh7lk067685; 
 Mon, 26 Aug 2013 10:43:07 -0700 (PDT)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: (from sgk@localhost)
 by troutmask.apl.washington.edu (8.14.7/8.14.7/Submit) id r7QHh7tR067684;
 Mon, 26 Aug 2013 10:43:07 -0700 (PDT) (envelope-from sgk)
Date: Mon, 26 Aug 2013 10:43:07 -0700
From: Steve Kargl <sgk@troutmask.apl.washington.edu>
To: Bruce Evans <brde@optusnet.com.au>
Subject: Re: (2nd time) tweaks to erff() threshold values
Message-ID: <20130826174307.GA67511@troutmask.apl.washington.edu>
References: <20130822213315.GA6708@troutmask.apl.washington.edu>
 <20130823202257.Q1593@besplex.bde.org>
 <20130823165744.GA47369@troutmask.apl.washington.edu>
 <20130824202102.GA54981@troutmask.apl.washington.edu>
 <20130825023029.GA56772@troutmask.apl.washington.edu>
 <20130825171910.GA60396@troutmask.apl.washington.edu>
 <20130827003147.R2328@besplex.bde.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20130827003147.R2328@besplex.bde.org>
User-Agent: Mutt/1.5.21 (2010-09-15)
Cc: freebsd-numerics@freebsd.org
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
 <freebsd-numerics.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 26 Aug 2013 17:43:14 -0000

On Tue, Aug 27, 2013 at 01:23:19AM +1000, Bruce Evans wrote:
> On Sun, 25 Aug 2013, Steve Kargl wrote:
> 
>> These values where for the interval [0,0.84375].
>>>
>>>          | usec/call | Max ULP | X Max ULP
>>> ---------+-----------+---------+---------------------------------
>>> new erfc |  0.02757  | 0.65947 | 8.19824636e-01, 0x1.a3c00ep-1
>>> old erfc |  0.03348  | 0.68218 | 8.43257010e-01, 0x1.afbf62p-1
>>> ---------+-----------+---------+---------------------------------
>>> new erf  |  0.02302  | 0.62437 | 1.04175471e-38, 0x1.c5bf88p-127
>>> old erf  |  0.03028  | 0.62437 | 1.04175471e-38, 0x1.c5bf88p-127
> 
> Be careful testing on i386.  Its extra precision gives smaller errors.
> These look like the i386 errors.  Old erff has a maximum error of more
> than 0.9 ulps on all args.

Have I ever mentioned how much I dislike the i386? :-)
New test results for both i386 and amd64 follow my .sig.
New diff that merges all previous diff follows the test
results.

> > Note that s_erf.c claims that in this interval, a taylor series about
> > erf(1) is used and suggests that the constant erx =  8.4506291151e-01
> > is erf(1).  That's bogus as erf(1) = 8.42700779e-01F (in float).
> 
> Well, not quite.  It says:
> 
> %  *	   Remark: here we use the taylor series expansion at x=1.
> %  *		erf(1+s) = erf(1) + s*Poly(s)
> %  *			 = 0.845.. + P1(s)/Q1(s)
> %  *	   That is, we use rational approximation to approximate
> %  *			erf(1+s) - (c = (single)0.84506291151)
> %  *	   Note that |P1/Q1|< 0.078 for x in [0.84375,1.25]
> %  *	   where 
> %  *		P1(s) = degree 6 poly in s
> %  *		Q1(s) = degree 6 poly in s
> 
> It puts some of erf(1) in 0.845 and some in P1(s)/Q1(s).

I see.  I (obviously) not did glean this from the comment.
The (c = (single)0.84506291151) isn't the leading 24-bits 
of erf(1).  It's the leading 24-bits of a constant near
erf(1).  I can regenerate polynomials with a hi+lo split.


> > Index: src/s_erff.c
> > ...
> > -pa0  = -2.3621185683e-03, /* 0xbb1acdc6 */
> > ...
> > -qa1  =  1.0642088205e-01, /* 0x3dd9f331 */
> > ...
> > +pa0  =  1.35131621e-08F, /*  0x1.d04f08p-27 */
> > ...
> > +qa1  =  6.06513679e-01F, /*  0x1.3688f6p-1 */
> 
> (qa0 is spelled 'one' in both versions.)

Yes 'qa0' is spelled as 'one'.  I may remove this,
and spell 'qa0' as '1'.

> You still have some of erf(1) in pa0/qa0, but not as much as before.  The
> decomposition should be something like hi+lo -- write erf(1) = hi+lo and
> put hi in erx and lo in pa0/qa0 = pa0.  Is that exactly what you do?  I
> wonder if there is a technical reason for putting more in pa0/qa0.

I did not do any splitting.  I set erx = erf(1) in float precision.
pa0 ~ 1e-8f is on the order of epsilon, and the results of solving
a matrix equation.  Perhaps, hi+lo was used to ensure the stability
of the matrix solver.

> maybe the whole expression should be written as hi+P/Q+lo, where P(0) = 0.
> Sometimes rational approximations should be done by splitting out even
> more terms from the P/Q part and adding the other terms more carefully.
> The parts that are split out are chosen to make them easier to add up
> carefully, while keeping the parts that are not split out small so that
> they don't need to be evaluated carefully.  See k_tan.c for an example.

I left my copy of Hildebrand at work over the weekend, so I haven't
reviewed the pitfalls of rational approximations, yet.

> The comment does say that 'c' is rounded to single.  That is good for
> a hi+lo decomposition, and it may be necessary for technical reasons
> to keep some of 'c's lower bits zero.  Rounding it to single doesn't
> give this when the precision is already single.  Old erff didn't round
> it further either.

I wasn't aware of old BSD libm code.  After working through the
code in s_erf[f].c, it seemed to me that a better algorithm may
exist, but I haven't tried to look for one.  As you may guess,
I'm gearing up to translate s_erf.c into a s_erfl.c.

-- 
Steve

erfc  (Look at the old and new ULP for 3rd and 4th interval.  This is due
       to the change in mask from 0xfffff000 to 0xffffe000.)

i386
-----+
Code |    Interval    | us/call | Max ULP | X Max ULP
-----+----------------+---------+---------+-------------------------------
Old  | [0, 0.84375]   | 0.03435 | 0.68218 | 8.43257010e-01, 0x1.afbf62p-1
Old  | [0.84375,1.25] | 0.03486 | 0.95158 | 1.24999416e+00, 0x1.3fff9ep+0
Old  | [1.25,2.857143]| 0.14772 | 5.81600 | 1.88727951e+00, 0x1.e324c0p+0
Old  | [2.857143,12]  | 0.27741 | 63.6824 | 7.99420977e+00, 0x1.ffa122p+2
-----+----------------+---------+---------+-------------------------------
New  | [0, 0.84375]   | 0.02809 | 0.65947 | 8.19824636e-01, 0x1.a3c00ep-1
New  | [0.84375,1.25] | 0.02780 | 0.76497 | 1.24971473e+00, 0x1.3fed4ep+0
New  | [1.25,2.857143]| 0.13546 | 2.10176 | 1.88294506e+00, 0x1.e208b0p+0
New  | [2.857143,12]  | 0.18424 | 1.94002 | 4.00527334e+00, 0x1.005666p+2
-----+----------------+---------+---------+-------------------------------

Amd64
-----+
Code |    Interval    | us/call | Max ULP | X Max ULP
-----+----------------+---------+---------+-------------------------------
Old  | [0, 0.84375]   | 0.02847 | 2.77642 | 8.43427539e-01, 0x1.afd5bcp-1
Old  | [0.84375,1.25] | 0.02755 | 2.35308 | 1.24922514e+00, 0x1.3fcd38p+0
Old  | [1.25,2.857143]| 0.10872 | 6.27739 | 1.88167512e+00, 0x1.e1b576p+0
Old  | [2.857143,12]  | 0.14263 | 63.8095 | 7.99418974e+00, 0x1.ffa0cep+2
-----+----------------+---------+---------+-------------------------------
New  | [0, 0.84375]   | 0.02440 | 2.77756 | 8.33204687e-01, 0x1.aa99cep-1
New  | [0.84375,1.25] | 0.02337 | 2.31164 | 1.24426317e+00, 0x1.3e8808p+0
New  | [1.25,2.857143]| 0.10051 | 3.16774 | 1.34512150e+00, 0x1.5859e2p+0
New  | [2.857143,12]  | 0.10205 | 2.90042 | 4.08927298e+00, 0x1.05b6a6p+2
-----+----------------+---------+---------+-------------------------------

erf (Not much change in ULP, but increase in speed).

i386
-----+
Code |    Interval    | usec/call | Max ULP | X Max ULP
-----+----------------+-----------+---------+--------------------------------
Old  | [0, 0.84375]   | 0.03074   | 0.62437 | 1.04175471e-38, 0x1.c5bf88p-127
Old  | [0.84375,1.25] | 0.03405   | 0.55617 | 1.24996567e+00, 0x1.3ffdc0p+0
Old  | [1.25,2.857143]| 0.14923   | 0.61903 | 1.25788522e+00, 0x1.4204c4p+0
Old  | [2.857143,12]  | 0.05959   | 0.50002 | 2.86163688e+00, 0x1.6e4a1ep+1
-----+----------------+-----------+---------+--------------------------------
New  | [0, 0.84375]   | 0.02347   | 0.62437 | 1.04175471e-38, 0x1.c5bf88p-127
New  | [0.84375,1.25] | 0.02774   | 0.54213 | 8.52452755e-01, 0x1.b474b0p-1
New  | [1.25,2.857143]| 0.14229   | 0.59585 | 1.26142204e+00, 0x1.42ec8ep+0
New  | [2.857143,12]  | 0.02932   | 0.50002 | 2.86393452e+00, 0x1.6e9568p+1
-----+----------------+-----------+---------+--------------------------------

Amd64
-----+
Code |    Interval    | us/call | Max ULP | X Max ULP
-----+----------------+---------+---------+--------------------------------
Old  | [0, 0.84375]   | 0.02699 | 0.96792 | 8.36685717e-01, 0x1.ac6212p-1
Old  | [0.84375,1.25] | 0.02912 | 0.76611 | 1.23045731e+00, 0x1.3aff4p+0
Old  | [1.25,2.857143]| 0.10780 | 0.74929 | 1.25055075e+00, 0x1.402418p+0
Old  | [2.857143,12]  | 0.04671 | 0.50008 | 2.89601469e+00, 0x1.72b09cp+1
-----+----------------+---------+---------+--------------------------------
New  | [0, 0.84375]   | 0.02241 | 0.94208 | 8.36147904e-01, 0x1.ac1b94p-1
New  | [0.84375,1.25] | 0.02486 | 0.76446 | 1.18880725e+00, 0x1.3055acp+0
New  | [1.25,2.857143]| 0.10124 | 0.72032 | 1.28225052e+00, 0x1.484192p+0
New  | [2.857143,12]  | 0.02954 | 0.50003 | 2.89976478e+00, 0x1.732b7ep+1
-----+----------------+---------+---------+--------------------------------

Index: src/s_erff.c
===================================================================
--- src/s_erff.c	(revision 1361)
+++ src/s_erff.c	(working copy)
@@ -24,75 +24,61 @@ tiny	    = 1e-30,
 half=  5.0000000000e-01, /* 0x3F000000 */
 one =  1.0000000000e+00, /* 0x3F800000 */
 two =  2.0000000000e+00, /* 0x40000000 */
-	/* c = (subfloat)0.84506291151 */
-erx =  8.4506291151e-01, /* 0x3f58560b */
+
+erx =  8.42700779e-01F, /* 0x1.af767ap-1 */
+
 /*
  * Coefficients for approximation to  erf on [0,0.84375]
  */
 efx =  1.2837916613e-01, /* 0x3e0375d4 */
 efx8=  1.0270333290e+00, /* 0x3f8375d4 */
-pp0  =  1.2837916613e-01, /* 0x3e0375d4 */
-pp1  = -3.2504209876e-01, /* 0xbea66beb */
-pp2  = -2.8481749818e-02, /* 0xbce9528f */
-pp3  = -5.7702702470e-03, /* 0xbbbd1489 */
-pp4  = -2.3763017452e-05, /* 0xb7c756b1 */
-qq1  =  3.9791721106e-01, /* 0x3ecbbbce */
-qq2  =  6.5022252500e-02, /* 0x3d852a63 */
-qq3  =  5.0813062117e-03, /* 0x3ba68116 */
-qq4  =  1.3249473704e-04, /* 0x390aee49 */
-qq5  = -3.9602282413e-06, /* 0xb684e21a */
 /*
- * Coefficients for approximation to  erf  in [0.84375,1.25]
+ *  Domain [0, 0.84375], range ~[-5.4446e-10,5.5197e-10]:
+ *  |(erf(x) - x)/x - p(x)/q(x)| < 2**-31.
+ */
+pp0  =  1.28379166e-01F,	/*  0x1.06eba8p-3 */
+pp1  = -3.36030394e-01F,	/* -0x1.58185ap-2 */
+pp2  = -1.86260219e-03F,	/* -0x1.e8451ep-10 */
+qq1  =  3.12324286e-01F,	/*  0x1.3fd1f0p-2 */
+qq2  =  2.16070302e-02F,	/*  0x1.620274p-6 */
+qq3  = -1.98859419e-03F, 	/* -0x1.04a626p-9 */
+/*
+ *  Domain [0.84375, 1.25], range ~[-1.954e-10,1.940e-11]:
+ *  |(erf(x) - erx) - p(x)/q(x)| < 2**-36.
  */
-pa0  = -2.3621185683e-03, /* 0xbb1acdc6 */
-pa1  =  4.1485610604e-01, /* 0x3ed46805 */
-pa2  = -3.7220788002e-01, /* 0xbebe9208 */
-pa3  =  3.1834661961e-01, /* 0x3ea2fe54 */
-pa4  = -1.1089469492e-01, /* 0xbde31cc2 */
-pa5  =  3.5478305072e-02, /* 0x3d1151b3 */
-pa6  = -2.1663755178e-03, /* 0xbb0df9c0 */
-qa1  =  1.0642088205e-01, /* 0x3dd9f331 */
-qa2  =  5.4039794207e-01, /* 0x3f0a5785 */
-qa3  =  7.1828655899e-02, /* 0x3d931ae7 */
-qa4  =  1.2617121637e-01, /* 0x3e013307 */
-qa5  =  1.3637083583e-02, /* 0x3c5f6e13 */
-qa6  =  1.1984500103e-02, /* 0x3c445aa3 */
+pa0  =  1.35131621e-08F, /*  0x1.d04f08p-27 */
+pa1  =  4.15107518e-01F, /*  0x1.a911f2p-2 */
+pa2  = -1.63339108e-01F, /* -0x1.4e84bcp-3 */
+pa3  =  1.12098485e-01F, /*  0x1.cb27c8p-4 */
+qa1  =  6.06513679e-01F, /*  0x1.3688f6p-1 */
+qa2  =  5.43227255e-01F, /*  0x1.1621e2p-1 */
+qa3  =  1.74396917e-01F, /*  0x1.652a36p-3 */
+qa4  =  5.88681065e-02F, /*  0x1.e23f5ep-5 */
 /*
- * Coefficients for approximation to  erfc in [1.25,1/0.35]
+ *  Domain [1.25,1/0.35], range [-7.043e-10,7.457e-10]:
+ *  |log(x*erfc(x)) + x**2 + 0.5625 - r(x)/s(x)| < 2**-30
  */
-ra0  = -9.8649440333e-03, /* 0xbc21a093 */
-ra1  = -6.9385856390e-01, /* 0xbf31a0b7 */
-ra2  = -1.0558626175e+01, /* 0xc128f022 */
-ra3  = -6.2375331879e+01, /* 0xc2798057 */
-ra4  = -1.6239666748e+02, /* 0xc322658c */
-ra5  = -1.8460508728e+02, /* 0xc3389ae7 */
-ra6  = -8.1287437439e+01, /* 0xc2a2932b */
-ra7  = -9.8143291473e+00, /* 0xc11d077e */
-sa1  =  1.9651271820e+01, /* 0x419d35ce */
-sa2  =  1.3765776062e+02, /* 0x4309a863 */
-sa3  =  4.3456588745e+02, /* 0x43d9486f */
-sa4  =  6.4538726807e+02, /* 0x442158c9 */
-sa5  =  4.2900814819e+02, /* 0x43d6810b */
-sa6  =  1.0863500214e+02, /* 0x42d9451f */
-sa7  =  6.5702495575e+00, /* 0x40d23f7c */
-sa8  = -6.0424413532e-02, /* 0xbd777f97 */
+ra0  = -9.87132732e-03F, /* -0x1.4376b2p-7 */
+ra1  = -5.53605914e-01F, /* -0x1.1b723cp-1 */
+ra2  = -2.17589188e+00F, /* -0x1.1683ap+1 */
+ra3  = -1.43268085e+00F, /* -0x1.6ec42cp+0 */
+sa1  =  5.45995426e+00F, /*  0x1.5d6fe4p+2 */
+sa2  =  6.69798088e+00F, /*  0x1.acabb8p+2 */
+sa3  =  1.43113089e+00F, /*  0x1.6e5e98p+0 */
+sa4  = -5.77397496e-02F, /* -0x1.d90108p-5 */
 /*
- * Coefficients for approximation to  erfc in [1/.35,28]
+ *  Domain [1.25,1/0.35], range [-3.753e-12,3.875e-12]:
+ *  |log(x*erfc(x)) + x**2 + 0.5625 - r(x)/s(x)| < 2**-43
  */
-rb0  = -9.8649431020e-03, /* 0xbc21a092 */
-rb1  = -7.9928326607e-01, /* 0xbf4c9dd4 */
-rb2  = -1.7757955551e+01, /* 0xc18e104b */
-rb3  = -1.6063638306e+02, /* 0xc320a2ea */
-rb4  = -6.3756646729e+02, /* 0xc41f6441 */
-rb5  = -1.0250950928e+03, /* 0xc480230b */
-rb6  = -4.8351919556e+02, /* 0xc3f1c275 */
-sb1  =  3.0338060379e+01, /* 0x41f2b459 */
-sb2  =  3.2579251099e+02, /* 0x43a2e571 */
-sb3  =  1.5367296143e+03, /* 0x44c01759 */
-sb4  =  3.1998581543e+03, /* 0x4547fdbb */
-sb5  =  2.5530502930e+03, /* 0x451f90ce */
-sb6  =  4.7452853394e+02, /* 0x43ed43a7 */
-sb7  = -2.2440952301e+01; /* 0xc1b38712 */
+rb0  = -9.86494310e-03F, /* -0x1.434124p-7 */
+rb1  = -6.24557793e-01F, /* -0x1.3fc60ap-1 */
+rb2  = -6.12944698e+00F, /* -0x1.8848dcp+2 */
+rb3  = -1.64760838e+01F, /* -0x1.079e0ap+4 */
+rb4  = -9.36094189e+00F, /* -0x1.2b8cd6p+3 */
+sb1  =  1.26263084e+01F, /*  0x1.940ab8p+3 */
+sb2  =  4.47332840e+01F, /*  0x1.65ddc4p+5 */
+sb3  =  4.65590134e+01F, /*  0x1.7478dcp+5 */
+sb4  =  8.74471664e+00F; /*  0x1.17d4b8p+3 */
 
 float
 erff(float x)
@@ -107,43 +93,38 @@ erff(float x)
 	}
 
 	if(ix < 0x3f580000) {		/* |x|<0.84375 */
-	    if(ix < 0x31800000) { 	/* |x|<2**-28 */
-	        if (ix < 0x04000000)
-		    /*avoid underflow */
-		    return (float)0.125*((float)8.0*x+efx8*x);
+	    if(ix < 0x38800000) { 	/* |x|<2**-14 */
+	        if (ix < 0x04000000)	/* |x|<0x1p-119 */
+		    return (8*x+efx8*x)/8;	/* avoid spurious underflow */
 		return x + efx*x;
 	    }
 	    z = x*x;
-	    r = pp0+z*(pp1+z*(pp2+z*(pp3+z*pp4)));
-	    s = one+z*(qq1+z*(qq2+z*(qq3+z*(qq4+z*qq5))));
+	    r = pp0+z*(pp1+z*pp2);
+	    s = one+z*(qq1+z*(qq2+z*qq3));
 	    y = r/s;
 	    return x + x*y;
 	}
 	if(ix < 0x3fa00000) {		/* 0.84375 <= |x| < 1.25 */
 	    s = fabsf(x)-one;
-	    P = pa0+s*(pa1+s*(pa2+s*(pa3+s*(pa4+s*(pa5+s*pa6)))));
-	    Q = one+s*(qa1+s*(qa2+s*(qa3+s*(qa4+s*(qa5+s*qa6)))));
+	    P = pa0+s*(pa1+s*(pa2+s*pa3));
+	    Q = one+s*(qa1+s*(qa2+s*(qa3+s*qa4)));
 	    if(hx>=0) return erx + P/Q; else return -erx - P/Q;
 	}
-	if (ix >= 0x40c00000) {		/* inf>|x|>=6 */
+	if (ix >= 0x40800000) {		/* inf>|x|>=4 */
 	    if(hx>=0) return one-tiny; else return tiny-one;
 	}
 	x = fabsf(x);
  	s = one/(x*x);
 	if(ix< 0x4036DB6E) {	/* |x| < 1/0.35 */
-	    R=ra0+s*(ra1+s*(ra2+s*(ra3+s*(ra4+s*(
-				ra5+s*(ra6+s*ra7))))));
-	    S=one+s*(sa1+s*(sa2+s*(sa3+s*(sa4+s*(
-				sa5+s*(sa6+s*(sa7+s*sa8)))))));
+	    R=ra0+s*(ra1+s*(ra2+s*ra3));
+	    S=one+s*(sa1+s*(sa2+s*(sa3+s*sa4)));
 	} else {	/* |x| >= 1/0.35 */
-	    R=rb0+s*(rb1+s*(rb2+s*(rb3+s*(rb4+s*(
-				rb5+s*rb6)))));
-	    S=one+s*(sb1+s*(sb2+s*(sb3+s*(sb4+s*(
-				sb5+s*(sb6+s*sb7))))));
+	    R=rb0+s*(rb1+s*(rb2+s*(rb3+s*rb4)));
+	    S=one+s*(sb1+s*(sb2+s*(sb3+s*sb4)));
 	}
 	GET_FLOAT_WORD(ix,x);
-	SET_FLOAT_WORD(z,ix&0xfffff000);
-	r  =  __ieee754_expf(-z*z-(float)0.5625)*__ieee754_expf((z-x)*(z+x)+R/S);
+	SET_FLOAT_WORD(z,ix&0xffffe000);
+	r  = expf(-z*z-0.5625F)*expf((z-x)*(z+x)+R/S);
 	if(hx>=0) return one-r/x; else return  r/x-one;
 }
 
@@ -160,11 +141,11 @@ erfcf(float x)
 	}
 
 	if(ix < 0x3f580000) {		/* |x|<0.84375 */
-	    if(ix < 0x23800000)  	/* |x|<2**-56 */
+	    if(ix < 0x33800000)  	/* |x|<2**-24 */
 		return one-x;
 	    z = x*x;
-	    r = pp0+z*(pp1+z*(pp2+z*(pp3+z*pp4)));
-	    s = one+z*(qq1+z*(qq2+z*(qq3+z*(qq4+z*qq5))));
+	    r = pp0+z*(pp1+z*pp2);
+	    s = one+z*(qq1+z*(qq2+z*qq3));
 	    y = r/s;
 	    if(hx < 0x3e800000) {  	/* x<1/4 */
 		return one-(x+x*y);
@@ -176,33 +157,28 @@ erfcf(float x)
 	}
 	if(ix < 0x3fa00000) {		/* 0.84375 <= |x| < 1.25 */
 	    s = fabsf(x)-one;
-	    P = pa0+s*(pa1+s*(pa2+s*(pa3+s*(pa4+s*(pa5+s*pa6)))));
-	    Q = one+s*(qa1+s*(qa2+s*(qa3+s*(qa4+s*(qa5+s*qa6)))));
+	    P = pa0+s*(pa1+s*(pa2+s*pa3));
+	    Q = one+s*(qa1+s*(qa2+s*(qa3+s*qa4)));
 	    if(hx>=0) {
 	        z  = one-erx; return z - P/Q;
 	    } else {
 		z = erx+P/Q; return one+z;
 	    }
 	}
-	if (ix < 0x41e00000) {		/* |x|<28 */
+	if (ix < 0x41200000) {		/* |x|<10 */
 	    x = fabsf(x);
  	    s = one/(x*x);
 	    if(ix< 0x4036DB6D) {	/* |x| < 1/.35 ~ 2.857143*/
-	        R=ra0+s*(ra1+s*(ra2+s*(ra3+s*(ra4+s*(
-				ra5+s*(ra6+s*ra7))))));
-	        S=one+s*(sa1+s*(sa2+s*(sa3+s*(sa4+s*(
-				sa5+s*(sa6+s*(sa7+s*sa8)))))));
+	        R=ra0+s*(ra1+s*(ra2+s*ra3));
+	        S=one+s*(sa1+s*(sa2+s*(sa3+s*sa4)));
 	    } else {			/* |x| >= 1/.35 ~ 2.857143 */
-		if(hx<0&&ix>=0x40c00000) return two-tiny;/* x < -6 */
-	        R=rb0+s*(rb1+s*(rb2+s*(rb3+s*(rb4+s*(
-				rb5+s*rb6)))));
-	        S=one+s*(sb1+s*(sb2+s*(sb3+s*(sb4+s*(
-				sb5+s*(sb6+s*sb7))))));
+		if(hx<0&&ix>=0x40a00000) return two-tiny;/* x < -5 */
+	        R=rb0+s*(rb1+s*(rb2+s*(rb3+s*rb4)));
+		S=one+s*(sb1+s*(sb2+s*(sb3+s*sb4)));
 	    }
 	    GET_FLOAT_WORD(ix,x);
-	    SET_FLOAT_WORD(z,ix&0xfffff000);
-	    r  =  __ieee754_expf(-z*z-(float)0.5625)*
-			__ieee754_expf((z-x)*(z+x)+R/S);
+	    SET_FLOAT_WORD(z,ix&0xffffe000);
+	    r  = expf(-z*z-0.5625F)*expf((z-x)*(z+x)+R/S);
 	    if(hx>0) return r/x; else return two-r/x;
 	} else {
 	    if(hx>0) return tiny*tiny; else return two-tiny;

From owner-freebsd-numerics@FreeBSD.ORG  Tue Aug 27 11:24:48 2013
Return-Path: <owner-freebsd-numerics@FreeBSD.ORG>
Delivered-To: freebsd-numerics@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTP id 1BF729FD
 for <freebsd-numerics@FreeBSD.org>; Tue, 27 Aug 2013 11:24:48 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from mail107.syd.optusnet.com.au (mail107.syd.optusnet.com.au
 [211.29.132.53]) by mx1.freebsd.org (Postfix) with ESMTP id 5480D2005
 for <freebsd-numerics@FreeBSD.org>; Tue, 27 Aug 2013 11:24:46 +0000 (UTC)
Received: from c122-106-156-23.carlnfd1.nsw.optusnet.com.au
 (c122-106-156-23.carlnfd1.nsw.optusnet.com.au [122.106.156.23])
 by mail107.syd.optusnet.com.au (Postfix) with ESMTPS id 369D5D4507F;
 Tue, 27 Aug 2013 21:24:40 +1000 (EST)
Date: Tue, 27 Aug 2013 21:24:39 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Steve Kargl <sgk@troutmask.apl.washington.edu>
Subject: Re: (2nd time) tweaks to erff() threshold values
In-Reply-To: <20130826174307.GA67511@troutmask.apl.washington.edu>
Message-ID: <20130827202751.X1787@besplex.bde.org>
References: <20130822213315.GA6708@troutmask.apl.washington.edu>
 <20130823202257.Q1593@besplex.bde.org>
 <20130823165744.GA47369@troutmask.apl.washington.edu>
 <20130824202102.GA54981@troutmask.apl.washington.edu>
 <20130825023029.GA56772@troutmask.apl.washington.edu>
 <20130825171910.GA60396@troutmask.apl.washington.edu>
 <20130827003147.R2328@besplex.bde.org>
 <20130826174307.GA67511@troutmask.apl.washington.edu>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Optus-CM-Score: 0
X-Optus-CM-Analysis: v=2.1 cv=DstvpgP+ c=1 sm=1 tr=0
 a=ebeQFi2P/qHVC0Yw9JDJ4g==:117 a=PO7r1zJSAAAA:8 a=F6uwjYSw4mAA:10
 a=kj9zAlcOel0A:10 a=JzwRw_2MAAAA:8 a=bRjWgPqadKYA:10
 a=zMXrinGMM86ty1FFVUAA:9 a=CjuIK1q_8ugA:10
Cc: freebsd-numerics@FreeBSD.org
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
 <freebsd-numerics.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 27 Aug 2013 11:24:48 -0000

On Mon, 26 Aug 2013, Steve Kargl wrote:

> On Tue, Aug 27, 2013 at 01:23:19AM +1000, Bruce Evans wrote:
>> On Sun, 25 Aug 2013, Steve Kargl wrote:
>>
>>> These values where for the interval [0,0.84375].
>>>>
>>>>          | usec/call | Max ULP | X Max ULP
>>>> ---------+-----------+---------+---------------------------------
>>>> new erfc |  0.02757  | 0.65947 | 8.19824636e-01, 0x1.a3c00ep-1
>>>> old erfc |  0.03348  | 0.68218 | 8.43257010e-01, 0x1.afbf62p-1
>>>> ---------+-----------+---------+---------------------------------
>>>> new erf  |  0.02302  | 0.62437 | 1.04175471e-38, 0x1.c5bf88p-127
>>>> old erf  |  0.03028  | 0.62437 | 1.04175471e-38, 0x1.c5bf88p-127
>>
>> Be careful testing on i386.  Its extra precision gives smaller errors.
>> These look like the i386 errors.  Old erff has a maximum error of more
>> than 0.9 ulps on all args.
>
> Have I ever mentioned how much I dislike the i386? :-)

You don't like it because it works better. :-)

> New test results for both i386 and amd64 follow my .sig.
> New diff that merges all previous diff follows the test
> results.
> ...
> I wasn't aware of old BSD libm code.  After working through the
> code in s_erf[f].c, it seemed to me that a better algorithm may
> exist, but I haven't tried to look for one.  As you may guess,
> I'm gearing up to translate s_erf.c into a s_erfl.c.

You fixed the large error, so using the old extra-precision methods
can wait.

> erfc  (Look at the old and new ULP for 3rd and 4th interval.  This is due
>       to the change in mask from 0xfffff000 to 0xffffe000.)

A very interesting fix.

> i386
> -----+
> Code |    Interval    | us/call | Max ULP | X Max ULP
> -----+----------------+---------+---------+-------------------------------
> Old  | [0, 0.84375]   | 0.03435 | 0.68218 | 8.43257010e-01, 0x1.afbf62p-1
> Old  | [0.84375,1.25] | 0.03486 | 0.95158 | 1.24999416e+00, 0x1.3fff9ep+0
> Old  | [1.25,2.857143]| 0.14772 | 5.81600 | 1.88727951e+00, 0x1.e324c0p+0
> Old  | [2.857143,12]  | 0.27741 | 63.6824 | 7.99420977e+00, 0x1.ffa122p+2
> -----+----------------+---------+---------+-------------------------------
> New  | [0, 0.84375]   | 0.02809 | 0.65947 | 8.19824636e-01, 0x1.a3c00ep-1
> New  | [0.84375,1.25] | 0.02780 | 0.76497 | 1.24971473e+00, 0x1.3fed4ep+0
> New  | [1.25,2.857143]| 0.13546 | 2.10176 | 1.88294506e+00, 0x1.e208b0p+0
> New  | [2.857143,12]  | 0.18424 | 1.94002 | 4.00527334e+00, 0x1.005666p+2
> -----+----------------+---------+---------+-------------------------------

I only tested i386.  I tested all values.  I'm getting smaller errors, with
1 additional fix, probably due to my version of expf() being more accurate.
I had to make it and exp() more accurate for use in the hyperbolic functions.
These don't need as much accuracy accuracy as your expl() provides, but
they benefit from a little more accuracy than the old expf() provides.
On i386, exp() was fairly accurate since it used the i387 and extra
precision.  On amd64, it was as inaccurate as expf().  I made the fdlibm
version more accurate than the i387 version, without using extra precision.
I akso made it faster.  It was about as slow as the i387 version since the
latter does a mode switch.

First x with an error above 1 ulp (the error is always below 1 ulp for
negative x):
% x =                           0x3f503c5a 0.813420892
% erfc(x) =  0x3fcffffae66cfabc 0x3e7fffd7 0.249999392
% erfcf(x) = 0x3fcffffac0000000 0x3e7fffd6 0.249999374
% err = -0x266cfabc 1.20080

x with the largest error:
% x =                           0x3ff11a66 1.88361812
% erfc(x) =  0x3f7fa4bdd41171f5 0x3bfd25ef 0.00772546913
% erfcf(x) = 0x3f7fa4bda0000000 0x3bfd25ed 0.00772546837
% err = -0x341171f5 1.62713

Last x with an error above 1.5 ulps:
% x =                           0x3ff11d09 1.88369858
% erfc(x) =  0x3f7fa20070976afc 0x3bfd1004 0.00772285625
% erfcf(x) = 0x3f7fa20040000000 0x3bfd1002 0.00772285555
% err = -0x30976afc 1.51848

Last x with an error above 1 ulp (this depends on fixing the threshold
from 10 to 11.  10 gave underflow errors and many more errors of > 1.5
ulps):
% x =                           0x4112bf56 9.17171288
% erfc(x) =  0x381865b08003770c   0xc32d84 1.79242497e-38
% erfcf(x) = 0x381865b060000000   0xc32d83 1.79242483e-38
% err = -0x2003770c 1.00042

> Amd64
> -----+
> Code |    Interval    | us/call | Max ULP | X Max ULP
> -----+----------------+---------+---------+-------------------------------
> Old  | [0, 0.84375]   | 0.02847 | 2.77642 | 8.43427539e-01, 0x1.afd5bcp-1
> Old  | [0.84375,1.25] | 0.02755 | 2.35308 | 1.24922514e+00, 0x1.3fcd38p+0
> Old  | [1.25,2.857143]| 0.10872 | 6.27739 | 1.88167512e+00, 0x1.e1b576p+0
> Old  | [2.857143,12]  | 0.14263 | 63.8095 | 7.99418974e+00, 0x1.ffa0cep+2
> -----+----------------+---------+---------+-------------------------------
> New  | [0, 0.84375]   | 0.02440 | 2.77756 | 8.33204687e-01, 0x1.aa99cep-1
> New  | [0.84375,1.25] | 0.02337 | 2.31164 | 1.24426317e+00, 0x1.3e8808p+0
> New  | [1.25,2.857143]| 0.10051 | 3.16774 | 1.34512150e+00, 0x1.5859e2p+0
> New  | [2.857143,12]  | 0.10205 | 2.90042 | 4.08927298e+00, 0x1.05b6a6p+2
> -----+----------------+---------+---------+-------------------------------

See how much better i386 is :-).

In a quick test on i386 handicapped by using float precision, I get similar
errors.  Subtracting 1 without increasing the error is remarkably difficult.

> erf (Not much change in ULP, but increase in speed).
>
> i386
> -----+
> Code |    Interval    | usec/call | Max ULP | X Max ULP
> -----+----------------+-----------+---------+--------------------------------
> Old  | [0, 0.84375]   | 0.03074   | 0.62437 | 1.04175471e-38, 0x1.c5bf88p-127
> Old  | [0.84375,1.25] | 0.03405   | 0.55617 | 1.24996567e+00, 0x1.3ffdc0p+0
> Old  | [1.25,2.857143]| 0.14923   | 0.61903 | 1.25788522e+00, 0x1.4204c4p+0
> Old  | [2.857143,12]  | 0.05959   | 0.50002 | 2.86163688e+00, 0x1.6e4a1ep+1
> -----+----------------+-----------+---------+--------------------------------
> New  | [0, 0.84375]   | 0.02347   | 0.62437 | 1.04175471e-38, 0x1.c5bf88p-127
> New  | [0.84375,1.25] | 0.02774   | 0.54213 | 8.52452755e-01, 0x1.b474b0p-1
> New  | [1.25,2.857143]| 0.14229   | 0.59585 | 1.26142204e+00, 0x1.42ec8ep+0
> New  | [2.857143,12]  | 0.02932   | 0.50002 | 2.86393452e+00, 0x1.6e9568p+1
> -----+----------------+-----------+---------+--------------------------------


The largest error is now:
% x =                           0x3f56086a 0.836065888
% erf(x) =   0x3fe86a082a46ed45 0x3f435041 0.762943347
% erff(x) =  0x3fe86a0840000000 0x3f435042 0.762943387
% err = +0x15b912bb 0.67884

> Amd64
> -----+
> Code |    Interval    | us/call | Max ULP | X Max ULP
> -----+----------------+---------+---------+--------------------------------
> Old  | [0, 0.84375]   | 0.02699 | 0.96792 | 8.36685717e-01, 0x1.ac6212p-1
> Old  | [0.84375,1.25] | 0.02912 | 0.76611 | 1.23045731e+00, 0x1.3aff4p+0
> Old  | [1.25,2.857143]| 0.10780 | 0.74929 | 1.25055075e+00, 0x1.402418p+0
> Old  | [2.857143,12]  | 0.04671 | 0.50008 | 2.89601469e+00, 0x1.72b09cp+1
> -----+----------------+---------+---------+--------------------------------
> New  | [0, 0.84375]   | 0.02241 | 0.94208 | 8.36147904e-01, 0x1.ac1b94p-1
> New  | [0.84375,1.25] | 0.02486 | 0.76446 | 1.18880725e+00, 0x1.3055acp+0
> New  | [1.25,2.857143]| 0.10124 | 0.72032 | 1.28225052e+00, 0x1.484192p+0
> New  | [2.857143,12]  | 0.02954 | 0.50003 | 2.89976478e+00, 0x1.732b7ep+1
> -----+----------------+---------+---------+--------------------------------

The largest error on i386 handicapped with float precision is the same:

% x =                           0x3f560dca 0.836147904
% erf(x) =   0x3fe86a68a1da81dc 0x3f435345 0.762989346
% erff(x) =  0x3fe86a68c0000000 0x3f435346 0.762989402
% err = +0x1e257e24 0.94208

Apparently the splitting point 0.84375 is not very well chosen, since
we get largest errors just before it and much smaller errors above it
by using another algorithm.

> Index: src/s_erff.c
> ===================================================================
> --- src/s_erff.c	(revision 1361)
> +++ src/s_erff.c	(working copy)
> ...
> /*
> - * Coefficients for approximation to  erf  in [0.84375,1.25]
> + *  Domain [0, 0.84375], range ~[-5.4446e-10,5.5197e-10]:
> + *  |(erf(x) - x)/x - p(x)/q(x)| < 2**-31.
> + */

Extra spaces.

> ...
> /*
> - * Coefficients for approximation to  erfc in [1.25,1/0.35]
> + *  Domain [1.25,1/0.35], range [-7.043e-10,7.457e-10]:
> + *  |log(x*erfc(x)) + x**2 + 0.5625 - r(x)/s(x)| < 2**-30
>  */

It is even less clear than for the others that the scaling of the error
is correct.  The comment could make this clearer by giving the ratio of
the correct error scale with the one used in the approximation, at the
endpoints (max and min for this, usually at the endpoints).
> +sa4  = -5.77397496e-02F, /* -0x1.d90108p-5 */

> /*
> - * Coefficients for approximation to  erfc in [1/.35,28]
> + *  Domain [1.25,1/0.35], range [-3.753e-12,3.875e-12]:
> + *  |log(x*erfc(x)) + x**2 + 0.5625 - r(x)/s(x)| < 2**-43
>  */

Domain needs more editing.
> ...
> @@ -176,33 +157,28 @@ erfcf(float x)
> 	}
> 	if(ix < 0x3fa00000) {		/* 0.84375 <= |x| < 1.25 */
> 	    s = fabsf(x)-one;
> -	    P = pa0+s*(pa1+s*(pa2+s*(pa3+s*(pa4+s*(pa5+s*pa6)))));
> -	    Q = one+s*(qa1+s*(qa2+s*(qa3+s*(qa4+s*(qa5+s*qa6)))));
> +	    P = pa0+s*(pa1+s*(pa2+s*pa3));
> +	    Q = one+s*(qa1+s*(qa2+s*(qa3+s*qa4)));
> 	    if(hx>=0) {
> 	        z  = one-erx; return z - P/Q;
> 	    } else {
> 		z = erx+P/Q; return one+z;
> 	    }
> 	}
> -	if (ix < 0x41e00000) {		/* |x|<28 */
> +	if (ix < 0x41200000) {		/* |x|<10 */

10 is too small.  It gives spurious underflow to 0 and larger than necessary
errors, at least on i386.  11 works.

> 	    x = fabsf(x);
>  	    s = one/(x*x);
> 	    if(ix< 0x4036DB6D) {	/* |x| < 1/.35 ~ 2.857143*/
> -	        R=ra0+s*(ra1+s*(ra2+s*(ra3+s*(ra4+s*(
> -				ra5+s*(ra6+s*ra7))))));
> -	        S=one+s*(sa1+s*(sa2+s*(sa3+s*(sa4+s*(
> -				sa5+s*(sa6+s*(sa7+s*sa8)))))));
> +	        R=ra0+s*(ra1+s*(ra2+s*ra3));
> +	        S=one+s*(sa1+s*(sa2+s*(sa3+s*sa4)));
> 	    } else {			/* |x| >= 1/.35 ~ 2.857143 */
> -		if(hx<0&&ix>=0x40c00000) return two-tiny;/* x < -6 */
> -	        R=rb0+s*(rb1+s*(rb2+s*(rb3+s*(rb4+s*(
> -				rb5+s*rb6)))));
> -	        S=one+s*(sb1+s*(sb2+s*(sb3+s*(sb4+s*(
> -				sb5+s*(sb6+s*sb7))))));
> +		if(hx<0&&ix>=0x40a00000) return two-tiny;/* x < -5 */
> +	        R=rb0+s*(rb1+s*(rb2+s*(rb3+s*rb4)));
> +		S=one+s*(sb1+s*(sb2+s*(sb3+s*sb4)));
> 	    }
> 	    GET_FLOAT_WORD(ix,x);

This and the GET_FLOAT_WORD() for erff() are bogus.  The word has already
been read into hx.

> -	    SET_FLOAT_WORD(z,ix&0xfffff000);
> -	    r  =  __ieee754_expf(-z*z-(float)0.5625)*
> -			__ieee754_expf((z-x)*(z+x)+R/S);
> +	    SET_FLOAT_WORD(z,ix&0xffffe000);

Change further to use hx.

> +	    r  = expf(-z*z-0.5625F)*expf((z-x)*(z+x)+R/S);

Good fix.  Is the point that -z*z-0.5625F must be exact, but the old
splitting only made z*z exact?  Is -z*z-0.5625F now always exact?
Anyway, the error for this case is now smaller than near 1.88, and
changing the expf()'s to exp()s doesn't reduce it much more, so the
old extra-precision methods can't be much better.

Keep the old style for the lines not related to the fix.  erfc() still
uses the old style.

> 	    if(hx>0) return r/x; else return two-r/x;
> 	} else {
> 	    if(hx>0) return tiny*tiny; else return two-tiny;

Bruce

From owner-freebsd-numerics@FreeBSD.ORG  Tue Aug 27 18:47:08 2013
Return-Path: <owner-freebsd-numerics@FreeBSD.ORG>
Delivered-To: freebsd-numerics@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTP id 83960B72
 for <freebsd-numerics@FreeBSD.org>; Tue, 27 Aug 2013 18:47:08 +0000 (UTC)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu
 [128.95.76.21])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id 594D92B3E
 for <freebsd-numerics@FreeBSD.org>; Tue, 27 Aug 2013 18:47:08 +0000 (UTC)
Received: from troutmask.apl.washington.edu (localhost.apl.washington.edu
 [127.0.0.1])
 by troutmask.apl.washington.edu (8.14.7/8.14.7) with ESMTP id r7RIl2QI076154; 
 Tue, 27 Aug 2013 11:47:02 -0700 (PDT)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: (from sgk@localhost)
 by troutmask.apl.washington.edu (8.14.7/8.14.7/Submit) id r7RIl10g076153;
 Tue, 27 Aug 2013 11:47:01 -0700 (PDT) (envelope-from sgk)
Date: Tue, 27 Aug 2013 11:47:01 -0700
From: Steve Kargl <sgk@troutmask.apl.washington.edu>
To: Bruce Evans <brde@optusnet.com.au>
Subject: Re: (2nd time) tweaks to erff() threshold values
Message-ID: <20130827184701.GA76013@troutmask.apl.washington.edu>
References: <20130822213315.GA6708@troutmask.apl.washington.edu>
 <20130823202257.Q1593@besplex.bde.org>
 <20130823165744.GA47369@troutmask.apl.washington.edu>
 <20130824202102.GA54981@troutmask.apl.washington.edu>
 <20130825023029.GA56772@troutmask.apl.washington.edu>
 <20130825171910.GA60396@troutmask.apl.washington.edu>
 <20130827003147.R2328@besplex.bde.org>
 <20130826174307.GA67511@troutmask.apl.washington.edu>
 <20130827202751.X1787@besplex.bde.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20130827202751.X1787@besplex.bde.org>
User-Agent: Mutt/1.5.21 (2010-09-15)
Cc: freebsd-numerics@FreeBSD.org
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
 <freebsd-numerics.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 27 Aug 2013 18:47:08 -0000

On Tue, Aug 27, 2013 at 09:24:39PM +1000, Bruce Evans wrote:
> On Mon, 26 Aug 2013, Steve Kargl wrote:
> 
> > erfc  (Look at the old and new ULP for 3rd and 4th interval.  This is due
> >       to the change in mask from 0xfffff000 to 0xffffe000.)
> 
> A very interesting fix.

Well, I stumped across this fix.  See below.

> Apparently the splitting point 0.84375 is not very well chosen, since
> we get largest errors just before it and much smaller errors above it
> by using another algorithm.

The only hint about the choice of 0.84375 is from the comment
int s_erf.c.

 *	   is close to one. The interval is chosen because the fix
 *	   point of erf(x) is near 0.6174 (i.e., erf(x)=x when x is
 *	   near 0.6174), and by some experiment, 0.84375 is chosen to
 * 	   guarantee the error is less than one ulp for erf.

I haven't investigated whether 0.84375 is best choice.  However,
exhaustive testing in [0.6, 0.62] gives a max ulp of 0.666.

> > /*
> > - * Coefficients for approximation to  erf  in [0.84375,1.25]
> > + *  Domain [0, 0.84375], range ~[-5.4446e-10,5.5197e-10]:
> > + *  |(erf(x) - x)/x - p(x)/q(x)| < 2**-31.
> > + */
> 
> Extra spaces.

Fixed.

> > ...
> > /*
> > - * Coefficients for approximation to  erfc in [1.25,1/0.35]
> > + *  Domain [1.25,1/0.35], range [-7.043e-10,7.457e-10]:
> > + *  |log(x*erfc(x)) + x**2 + 0.5625 - r(x)/s(x)| < 2**-30
> >  */
> 
> It is even less clear than for the others that the scaling of the error
> is correct.  The comment could make this clearer by giving the ratio of
> the correct error scale with the one used in the approximation, at the
> endpoints (max and min for this, usually at the endpoints).

I'm not sure what you mean here.  In looking at error curves,
it turns out that one of the 2 limits in the range is taken
from an endpoint.

> > +sa4  = -5.77397496e-02F, /* -0x1.d90108p-5 */
> 
> > /*
> > - * Coefficients for approximation to  erfc in [1/.35,28]
> > + *  Domain [1.25,1/0.35], range [-3.753e-12,3.875e-12]:
> > + *  |log(x*erfc(x)) + x**2 + 0.5625 - r(x)/s(x)| < 2**-43
> >  */
> 
> Domain needs more editing.

Fixed.

> > -	if (ix < 0x41e00000) {		/* |x|<28 */
> > +	if (ix < 0x41200000) {		/* |x|<10 */
> 
> 10 is too small.  It gives spurious underflow to 0 and larger than necessary
> errors, at least on i386.  11 works.

I changed to 11.  This required a new set of rb and sb coefficients
as the old set was for [1/0.35, 10].  With the new set the max ulp
for erfc goes up, but it is still much better that 63.something .

./testf -m 2.857143 -M 11 -E double
Interval tested: [2.8571,11.0000]
    Max ULP: 2.92095
  x Max ULP: 4.08856726e+00, 0x1.05ab16p+2

> > +	        R=rb0+s*(rb1+s*(rb2+s*(rb3+s*rb4)));
> > +		S=one+s*(sb1+s*(sb2+s*(sb3+s*sb4)));
> > 	    }
> > 	    GET_FLOAT_WORD(ix,x);
> 
> This and the GET_FLOAT_WORD() for erff() are bogus.  The word has already
> been read into hx.

Well spotted.  Fixed.

> > -	    SET_FLOAT_WORD(z,ix&0xfffff000);
> > -	    r  =  __ieee754_expf(-z*z-(float)0.5625)*
> > -			__ieee754_expf((z-x)*(z+x)+R/S);
> > +	    SET_FLOAT_WORD(z,ix&0xffffe000);
> 
> Change further to use hx.

Fixed.

> > +	    r  = expf(-z*z-0.5625F)*expf((z-x)*(z+x)+R/S);
> 
> Good fix.  Is the point that -z*z-0.5625F must be exact, but the old
> splitting only made z*z exact?  Is -z*z-0.5625F now always exact?
> Anyway, the error for this case is now smaller than near 1.88, and
> changing the expf()'s to exp()s doesn't reduce it much more, so the
> old extra-precision methods can't be much better.

Back to the choice of mask.  I was trying to understand
the origins of the constant 0.5625F.  The asymptotic 
expansion of erfc(x) is (exp(-x**2) / (x*sqrt(pi))) * P(x)
with P(x) being some polynomial.  It turns out 
exp(-0.5625) = 0.56978 and 1/sqrt(pi) = 0.56419.  So, the
algorithm drops the 1/sqrt(pi) and approximates it by
exp(-0.5625).

Well, in reading and re-reading the comment in s_erf.c, this 
part caught my attention:

 *     Note1:
 *	   To compute exp(-x*x-0.5625+R/S), let s be a single
 *	   precision number and s := x; then
 *		-x*x = -s*s + (s-x)*(s+x)
 *	        exp(-x*x-0.5626+R/S) =
 *			exp(-s*s-0.5625)*exp((s-x)*(s+x)+R/S);

This applies to the double precision version, ie erf(), so I 
simply ask the question whether 0xfffff000 was adequate masking
out the low order bits. Fortunately, I only needed to test
0xffff8000, 0xffffc0000, and 0xffffe000.

-- 
Steve

From owner-freebsd-numerics@FreeBSD.ORG  Wed Aug 28 11:31:30 2013
Return-Path: <owner-freebsd-numerics@FreeBSD.ORG>
Delivered-To: freebsd-numerics@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTP id 27EFBA88
 for <freebsd-numerics@FreeBSD.org>; Wed, 28 Aug 2013 11:31:30 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from mail109.syd.optusnet.com.au (mail109.syd.optusnet.com.au
 [211.29.132.80]) by mx1.freebsd.org (Postfix) with ESMTP id A5A932FB1
 for <freebsd-numerics@FreeBSD.org>; Wed, 28 Aug 2013 11:31:29 +0000 (UTC)
Received: from c122-106-156-23.carlnfd1.nsw.optusnet.com.au
 (c122-106-156-23.carlnfd1.nsw.optusnet.com.au [122.106.156.23])
 by mail109.syd.optusnet.com.au (Postfix) with ESMTPS id 00F10D631B8;
 Wed, 28 Aug 2013 21:31:22 +1000 (EST)
Date: Wed, 28 Aug 2013 21:31:21 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Steve Kargl <sgk@troutmask.apl.washington.edu>
Subject: Re: (2nd time) tweaks to erff() threshold values
In-Reply-To: <20130827184701.GA76013@troutmask.apl.washington.edu>
Message-ID: <20130828205705.S1447@besplex.bde.org>
References: <20130822213315.GA6708@troutmask.apl.washington.edu>
 <20130823202257.Q1593@besplex.bde.org>
 <20130823165744.GA47369@troutmask.apl.washington.edu>
 <20130824202102.GA54981@troutmask.apl.washington.edu>
 <20130825023029.GA56772@troutmask.apl.washington.edu>
 <20130825171910.GA60396@troutmask.apl.washington.edu>
 <20130827003147.R2328@besplex.bde.org>
 <20130826174307.GA67511@troutmask.apl.washington.edu>
 <20130827202751.X1787@besplex.bde.org>
 <20130827184701.GA76013@troutmask.apl.washington.edu>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Optus-CM-Score: 0
X-Optus-CM-Analysis: v=2.1 cv=DstvpgP+ c=1 sm=1 tr=0
 a=ebeQFi2P/qHVC0Yw9JDJ4g==:117 a=PO7r1zJSAAAA:8 a=F6uwjYSw4mAA:10
 a=kj9zAlcOel0A:10 a=JzwRw_2MAAAA:8 a=bRjWgPqadKYA:10
 a=fiHkbJjbcZh7ihRK9ZwA:9 a=CjuIK1q_8ugA:10
Cc: freebsd-numerics@FreeBSD.org
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
 <freebsd-numerics.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 28 Aug 2013 11:31:30 -0000

On Tue, 27 Aug 2013, Steve Kargl wrote:

> On Tue, Aug 27, 2013 at 09:24:39PM +1000, Bruce Evans wrote:
>> On Mon, 26 Aug 2013, Steve Kargl wrote:

>>> ...
>>> /*
>>> - * Coefficients for approximation to  erfc in [1.25,1/0.35]
>>> + *  Domain [1.25,1/0.35], range [-7.043e-10,7.457e-10]:
>>> + *  |log(x*erfc(x)) + x**2 + 0.5625 - r(x)/s(x)| < 2**-30
>>>  */
>>
>> It is even less clear than for the others that the scaling of the error
>> is correct.  The comment could make this clearer by giving the ratio of
>> the correct error scale with the one used in the approximation, at the
>> endpoints (max and min for this, usually at the endpoints).
>
> I'm not sure what you mean here.  In looking at error curves,
> it turns out that one of the 2 limits in the range is taken
> from an endpoint.

I mean that these are absolute errors for a function related to the
result that is especially obscure in this case -- the function is
log(x*result).  The error that needs to be minimized is the relative
error of the result, where "relative" is not quite the ratio of the
error and the result -- it is the ratio of the error and [result
rounded to a nearby power of 2; I'm not sure if the rounding is
up or down].  For the absolute error to be close enough to the
relative error, the relative error must not vary much across the
interval, AND the scaling for the abslute error must be compatible.
This is far from clear here.  I think taking logs changes the scale
for the absolute error to logarithmic, so this error is not really
absolute.  Hopefully the scale for the result is similarly logarithmic,
so that the scales are compatible.  There is also a factor of x before
taking logs, so the scaling is not logarithmic either.

>>> -	if (ix < 0x41e00000) {		/* |x|<28 */
>>> +	if (ix < 0x41200000) {		/* |x|<10 */
>>
>> 10 is too small.  It gives spurious underflow to 0 and larger than necessary
>> errors, at least on i386.  11 works.
>
> I changed to 11.  This required a new set of rb and sb coefficients
> as the old set was for [1/0.35, 10].  With the new set the max ulp
> for erfc goes up, but it is still much better that 63.something .

Something wrong here.  I got smaller errors by changing 10 to 11 without
changing the coeffs.  Now with updated coeffs, I get larger errors,
at least on i386 (previously, there were no errors of more than 1.5 ulps
above 1.88...  Now there are some above 2.04).

>>> +	    r  = expf(-z*z-0.5625F)*expf((z-x)*(z+x)+R/S);
>>
>> Good fix.  Is the point that -z*z-0.5625F must be exact, but the old
>> splitting only made z*z exact?  Is -z*z-0.5625F now always exact?
>> Anyway, the error for this case is now smaller than near 1.88, and
>> changing the expf()'s to exp()s doesn't reduce it much more, so the
>> old extra-precision methods can't be much better.
>
> Back to the choice of mask.  I was trying to understand
> the origins of the constant 0.5625F.  The asymptotic
> expansion of erfc(x) is (exp(-x**2) / (x*sqrt(pi))) * P(x)
> with P(x) being some polynomial.  It turns out
> exp(-0.5625) = 0.56978 and 1/sqrt(pi) = 0.56419.  So, the
> algorithm drops the 1/sqrt(pi) and approximates it by
> exp(-0.5625).

It is a hi+lo decomposition done multiplicatively and exponentially:

 	1/sqrt(pi) = exp(-0.5625) * exp(lo)

where -0.5625 and lo are folded into other terms, and the hi part must
be done exactly before exp() on it for enough accuracy.  Taking exp()
loses between 0.5 and 1 ulps of accuracy, and more is lost in the
produce of the exp()'s.  Hopefully the arg of the second exp() has
a small error, so not much more accuracy is lost for it.

> Well, in reading and re-reading the comment in s_erf.c, this
> part caught my attention:
>
> *     Note1:
> *	   To compute exp(-x*x-0.5625+R/S), let s be a single
> *	   precision number and s := x; then
> *		-x*x = -s*s + (s-x)*(s+x)
> *	        exp(-x*x-0.5626+R/S) =

The comment but not the code has this off-by-.0001 0.5626.

> *			exp(-s*s-0.5625)*exp((s-x)*(s+x)+R/S);
>
> This applies to the double precision version, ie erf(), so I
> simply ask the question whether 0xfffff000 was adequate masking
> out the low order bits. Fortunately, I only needed to test
> 0xffff8000, 0xffffc0000, and 0xffffe000.

I tried the others too, and they worked less well.  But I haven't
checked that -s*s-0.5625[F] is exact.  In double precision, many
more bits are stripped from x to create x, so exactness is less
surprising.

Bruce