From owner-freebsd-numerics@FreeBSD.ORG  Tue Aug 27 11:24:48 2013
Return-Path: <owner-freebsd-numerics@FreeBSD.ORG>
Delivered-To: freebsd-numerics@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTP id 1BF729FD
 for <freebsd-numerics@FreeBSD.org>; Tue, 27 Aug 2013 11:24:48 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from mail107.syd.optusnet.com.au (mail107.syd.optusnet.com.au
 [211.29.132.53]) by mx1.freebsd.org (Postfix) with ESMTP id 5480D2005
 for <freebsd-numerics@FreeBSD.org>; Tue, 27 Aug 2013 11:24:46 +0000 (UTC)
Received: from c122-106-156-23.carlnfd1.nsw.optusnet.com.au
 (c122-106-156-23.carlnfd1.nsw.optusnet.com.au [122.106.156.23])
 by mail107.syd.optusnet.com.au (Postfix) with ESMTPS id 369D5D4507F;
 Tue, 27 Aug 2013 21:24:40 +1000 (EST)
Date: Tue, 27 Aug 2013 21:24:39 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Steve Kargl <sgk@troutmask.apl.washington.edu>
Subject: Re: (2nd time) tweaks to erff() threshold values
In-Reply-To: <20130826174307.GA67511@troutmask.apl.washington.edu>
Message-ID: <20130827202751.X1787@besplex.bde.org>
References: <20130822213315.GA6708@troutmask.apl.washington.edu>
 <20130823202257.Q1593@besplex.bde.org>
 <20130823165744.GA47369@troutmask.apl.washington.edu>
 <20130824202102.GA54981@troutmask.apl.washington.edu>
 <20130825023029.GA56772@troutmask.apl.washington.edu>
 <20130825171910.GA60396@troutmask.apl.washington.edu>
 <20130827003147.R2328@besplex.bde.org>
 <20130826174307.GA67511@troutmask.apl.washington.edu>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Optus-CM-Score: 0
X-Optus-CM-Analysis: v=2.1 cv=DstvpgP+ c=1 sm=1 tr=0
 a=ebeQFi2P/qHVC0Yw9JDJ4g==:117 a=PO7r1zJSAAAA:8 a=F6uwjYSw4mAA:10
 a=kj9zAlcOel0A:10 a=JzwRw_2MAAAA:8 a=bRjWgPqadKYA:10
 a=zMXrinGMM86ty1FFVUAA:9 a=CjuIK1q_8ugA:10
Cc: freebsd-numerics@FreeBSD.org
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
 <freebsd-numerics.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 27 Aug 2013 11:24:48 -0000

On Mon, 26 Aug 2013, Steve Kargl wrote:

> On Tue, Aug 27, 2013 at 01:23:19AM +1000, Bruce Evans wrote:
>> On Sun, 25 Aug 2013, Steve Kargl wrote:
>>
>>> These values where for the interval [0,0.84375].
>>>>
>>>>          | usec/call | Max ULP | X Max ULP
>>>> ---------+-----------+---------+---------------------------------
>>>> new erfc |  0.02757  | 0.65947 | 8.19824636e-01, 0x1.a3c00ep-1
>>>> old erfc |  0.03348  | 0.68218 | 8.43257010e-01, 0x1.afbf62p-1
>>>> ---------+-----------+---------+---------------------------------
>>>> new erf  |  0.02302  | 0.62437 | 1.04175471e-38, 0x1.c5bf88p-127
>>>> old erf  |  0.03028  | 0.62437 | 1.04175471e-38, 0x1.c5bf88p-127
>>
>> Be careful testing on i386.  Its extra precision gives smaller errors.
>> These look like the i386 errors.  Old erff has a maximum error of more
>> than 0.9 ulps on all args.
>
> Have I ever mentioned how much I dislike the i386? :-)

You don't like it because it works better. :-)

> New test results for both i386 and amd64 follow my .sig.
> New diff that merges all previous diff follows the test
> results.
> ...
> I wasn't aware of old BSD libm code.  After working through the
> code in s_erf[f].c, it seemed to me that a better algorithm may
> exist, but I haven't tried to look for one.  As you may guess,
> I'm gearing up to translate s_erf.c into a s_erfl.c.

You fixed the large error, so using the old extra-precision methods
can wait.

> erfc  (Look at the old and new ULP for 3rd and 4th interval.  This is due
>       to the change in mask from 0xfffff000 to 0xffffe000.)

A very interesting fix.

> i386
> -----+
> Code |    Interval    | us/call | Max ULP | X Max ULP
> -----+----------------+---------+---------+-------------------------------
> Old  | [0, 0.84375]   | 0.03435 | 0.68218 | 8.43257010e-01, 0x1.afbf62p-1
> Old  | [0.84375,1.25] | 0.03486 | 0.95158 | 1.24999416e+00, 0x1.3fff9ep+0
> Old  | [1.25,2.857143]| 0.14772 | 5.81600 | 1.88727951e+00, 0x1.e324c0p+0
> Old  | [2.857143,12]  | 0.27741 | 63.6824 | 7.99420977e+00, 0x1.ffa122p+2
> -----+----------------+---------+---------+-------------------------------
> New  | [0, 0.84375]   | 0.02809 | 0.65947 | 8.19824636e-01, 0x1.a3c00ep-1
> New  | [0.84375,1.25] | 0.02780 | 0.76497 | 1.24971473e+00, 0x1.3fed4ep+0
> New  | [1.25,2.857143]| 0.13546 | 2.10176 | 1.88294506e+00, 0x1.e208b0p+0
> New  | [2.857143,12]  | 0.18424 | 1.94002 | 4.00527334e+00, 0x1.005666p+2
> -----+----------------+---------+---------+-------------------------------

I only tested i386.  I tested all values.  I'm getting smaller errors, with
1 additional fix, probably due to my version of expf() being more accurate.
I had to make it and exp() more accurate for use in the hyperbolic functions.
These don't need as much accuracy accuracy as your expl() provides, but
they benefit from a little more accuracy than the old expf() provides.
On i386, exp() was fairly accurate since it used the i387 and extra
precision.  On amd64, it was as inaccurate as expf().  I made the fdlibm
version more accurate than the i387 version, without using extra precision.
I akso made it faster.  It was about as slow as the i387 version since the
latter does a mode switch.

First x with an error above 1 ulp (the error is always below 1 ulp for
negative x):
% x =                           0x3f503c5a 0.813420892
% erfc(x) =  0x3fcffffae66cfabc 0x3e7fffd7 0.249999392
% erfcf(x) = 0x3fcffffac0000000 0x3e7fffd6 0.249999374
% err = -0x266cfabc 1.20080

x with the largest error:
% x =                           0x3ff11a66 1.88361812
% erfc(x) =  0x3f7fa4bdd41171f5 0x3bfd25ef 0.00772546913
% erfcf(x) = 0x3f7fa4bda0000000 0x3bfd25ed 0.00772546837
% err = -0x341171f5 1.62713

Last x with an error above 1.5 ulps:
% x =                           0x3ff11d09 1.88369858
% erfc(x) =  0x3f7fa20070976afc 0x3bfd1004 0.00772285625
% erfcf(x) = 0x3f7fa20040000000 0x3bfd1002 0.00772285555
% err = -0x30976afc 1.51848

Last x with an error above 1 ulp (this depends on fixing the threshold
from 10 to 11.  10 gave underflow errors and many more errors of > 1.5
ulps):
% x =                           0x4112bf56 9.17171288
% erfc(x) =  0x381865b08003770c   0xc32d84 1.79242497e-38
% erfcf(x) = 0x381865b060000000   0xc32d83 1.79242483e-38
% err = -0x2003770c 1.00042

> Amd64
> -----+
> Code |    Interval    | us/call | Max ULP | X Max ULP
> -----+----------------+---------+---------+-------------------------------
> Old  | [0, 0.84375]   | 0.02847 | 2.77642 | 8.43427539e-01, 0x1.afd5bcp-1
> Old  | [0.84375,1.25] | 0.02755 | 2.35308 | 1.24922514e+00, 0x1.3fcd38p+0
> Old  | [1.25,2.857143]| 0.10872 | 6.27739 | 1.88167512e+00, 0x1.e1b576p+0
> Old  | [2.857143,12]  | 0.14263 | 63.8095 | 7.99418974e+00, 0x1.ffa0cep+2
> -----+----------------+---------+---------+-------------------------------
> New  | [0, 0.84375]   | 0.02440 | 2.77756 | 8.33204687e-01, 0x1.aa99cep-1
> New  | [0.84375,1.25] | 0.02337 | 2.31164 | 1.24426317e+00, 0x1.3e8808p+0
> New  | [1.25,2.857143]| 0.10051 | 3.16774 | 1.34512150e+00, 0x1.5859e2p+0
> New  | [2.857143,12]  | 0.10205 | 2.90042 | 4.08927298e+00, 0x1.05b6a6p+2
> -----+----------------+---------+---------+-------------------------------

See how much better i386 is :-).

In a quick test on i386 handicapped by using float precision, I get similar
errors.  Subtracting 1 without increasing the error is remarkably difficult.

> erf (Not much change in ULP, but increase in speed).
>
> i386
> -----+
> Code |    Interval    | usec/call | Max ULP | X Max ULP
> -----+----------------+-----------+---------+--------------------------------
> Old  | [0, 0.84375]   | 0.03074   | 0.62437 | 1.04175471e-38, 0x1.c5bf88p-127
> Old  | [0.84375,1.25] | 0.03405   | 0.55617 | 1.24996567e+00, 0x1.3ffdc0p+0
> Old  | [1.25,2.857143]| 0.14923   | 0.61903 | 1.25788522e+00, 0x1.4204c4p+0
> Old  | [2.857143,12]  | 0.05959   | 0.50002 | 2.86163688e+00, 0x1.6e4a1ep+1
> -----+----------------+-----------+---------+--------------------------------
> New  | [0, 0.84375]   | 0.02347   | 0.62437 | 1.04175471e-38, 0x1.c5bf88p-127
> New  | [0.84375,1.25] | 0.02774   | 0.54213 | 8.52452755e-01, 0x1.b474b0p-1
> New  | [1.25,2.857143]| 0.14229   | 0.59585 | 1.26142204e+00, 0x1.42ec8ep+0
> New  | [2.857143,12]  | 0.02932   | 0.50002 | 2.86393452e+00, 0x1.6e9568p+1
> -----+----------------+-----------+---------+--------------------------------


The largest error is now:
% x =                           0x3f56086a 0.836065888
% erf(x) =   0x3fe86a082a46ed45 0x3f435041 0.762943347
% erff(x) =  0x3fe86a0840000000 0x3f435042 0.762943387
% err = +0x15b912bb 0.67884

> Amd64
> -----+
> Code |    Interval    | us/call | Max ULP | X Max ULP
> -----+----------------+---------+---------+--------------------------------
> Old  | [0, 0.84375]   | 0.02699 | 0.96792 | 8.36685717e-01, 0x1.ac6212p-1
> Old  | [0.84375,1.25] | 0.02912 | 0.76611 | 1.23045731e+00, 0x1.3aff4p+0
> Old  | [1.25,2.857143]| 0.10780 | 0.74929 | 1.25055075e+00, 0x1.402418p+0
> Old  | [2.857143,12]  | 0.04671 | 0.50008 | 2.89601469e+00, 0x1.72b09cp+1
> -----+----------------+---------+---------+--------------------------------
> New  | [0, 0.84375]   | 0.02241 | 0.94208 | 8.36147904e-01, 0x1.ac1b94p-1
> New  | [0.84375,1.25] | 0.02486 | 0.76446 | 1.18880725e+00, 0x1.3055acp+0
> New  | [1.25,2.857143]| 0.10124 | 0.72032 | 1.28225052e+00, 0x1.484192p+0
> New  | [2.857143,12]  | 0.02954 | 0.50003 | 2.89976478e+00, 0x1.732b7ep+1
> -----+----------------+---------+---------+--------------------------------

The largest error on i386 handicapped with float precision is the same:

% x =                           0x3f560dca 0.836147904
% erf(x) =   0x3fe86a68a1da81dc 0x3f435345 0.762989346
% erff(x) =  0x3fe86a68c0000000 0x3f435346 0.762989402
% err = +0x1e257e24 0.94208

Apparently the splitting point 0.84375 is not very well chosen, since
we get largest errors just before it and much smaller errors above it
by using another algorithm.

> Index: src/s_erff.c
> ===================================================================
> --- src/s_erff.c	(revision 1361)
> +++ src/s_erff.c	(working copy)
> ...
> /*
> - * Coefficients for approximation to  erf  in [0.84375,1.25]
> + *  Domain [0, 0.84375], range ~[-5.4446e-10,5.5197e-10]:
> + *  |(erf(x) - x)/x - p(x)/q(x)| < 2**-31.
> + */

Extra spaces.

> ...
> /*
> - * Coefficients for approximation to  erfc in [1.25,1/0.35]
> + *  Domain [1.25,1/0.35], range [-7.043e-10,7.457e-10]:
> + *  |log(x*erfc(x)) + x**2 + 0.5625 - r(x)/s(x)| < 2**-30
>  */

It is even less clear than for the others that the scaling of the error
is correct.  The comment could make this clearer by giving the ratio of
the correct error scale with the one used in the approximation, at the
endpoints (max and min for this, usually at the endpoints).
> +sa4  = -5.77397496e-02F, /* -0x1.d90108p-5 */

> /*
> - * Coefficients for approximation to  erfc in [1/.35,28]
> + *  Domain [1.25,1/0.35], range [-3.753e-12,3.875e-12]:
> + *  |log(x*erfc(x)) + x**2 + 0.5625 - r(x)/s(x)| < 2**-43
>  */

Domain needs more editing.
> ...
> @@ -176,33 +157,28 @@ erfcf(float x)
> 	}
> 	if(ix < 0x3fa00000) {		/* 0.84375 <= |x| < 1.25 */
> 	    s = fabsf(x)-one;
> -	    P = pa0+s*(pa1+s*(pa2+s*(pa3+s*(pa4+s*(pa5+s*pa6)))));
> -	    Q = one+s*(qa1+s*(qa2+s*(qa3+s*(qa4+s*(qa5+s*qa6)))));
> +	    P = pa0+s*(pa1+s*(pa2+s*pa3));
> +	    Q = one+s*(qa1+s*(qa2+s*(qa3+s*qa4)));
> 	    if(hx>=0) {
> 	        z  = one-erx; return z - P/Q;
> 	    } else {
> 		z = erx+P/Q; return one+z;
> 	    }
> 	}
> -	if (ix < 0x41e00000) {		/* |x|<28 */
> +	if (ix < 0x41200000) {		/* |x|<10 */

10 is too small.  It gives spurious underflow to 0 and larger than necessary
errors, at least on i386.  11 works.

> 	    x = fabsf(x);
>  	    s = one/(x*x);
> 	    if(ix< 0x4036DB6D) {	/* |x| < 1/.35 ~ 2.857143*/
> -	        R=ra0+s*(ra1+s*(ra2+s*(ra3+s*(ra4+s*(
> -				ra5+s*(ra6+s*ra7))))));
> -	        S=one+s*(sa1+s*(sa2+s*(sa3+s*(sa4+s*(
> -				sa5+s*(sa6+s*(sa7+s*sa8)))))));
> +	        R=ra0+s*(ra1+s*(ra2+s*ra3));
> +	        S=one+s*(sa1+s*(sa2+s*(sa3+s*sa4)));
> 	    } else {			/* |x| >= 1/.35 ~ 2.857143 */
> -		if(hx<0&&ix>=0x40c00000) return two-tiny;/* x < -6 */
> -	        R=rb0+s*(rb1+s*(rb2+s*(rb3+s*(rb4+s*(
> -				rb5+s*rb6)))));
> -	        S=one+s*(sb1+s*(sb2+s*(sb3+s*(sb4+s*(
> -				sb5+s*(sb6+s*sb7))))));
> +		if(hx<0&&ix>=0x40a00000) return two-tiny;/* x < -5 */
> +	        R=rb0+s*(rb1+s*(rb2+s*(rb3+s*rb4)));
> +		S=one+s*(sb1+s*(sb2+s*(sb3+s*sb4)));
> 	    }
> 	    GET_FLOAT_WORD(ix,x);

This and the GET_FLOAT_WORD() for erff() are bogus.  The word has already
been read into hx.

> -	    SET_FLOAT_WORD(z,ix&0xfffff000);
> -	    r  =  __ieee754_expf(-z*z-(float)0.5625)*
> -			__ieee754_expf((z-x)*(z+x)+R/S);
> +	    SET_FLOAT_WORD(z,ix&0xffffe000);

Change further to use hx.

> +	    r  = expf(-z*z-0.5625F)*expf((z-x)*(z+x)+R/S);

Good fix.  Is the point that -z*z-0.5625F must be exact, but the old
splitting only made z*z exact?  Is -z*z-0.5625F now always exact?
Anyway, the error for this case is now smaller than near 1.88, and
changing the expf()'s to exp()s doesn't reduce it much more, so the
old extra-precision methods can't be much better.

Keep the old style for the lines not related to the fix.  erfc() still
uses the old style.

> 	    if(hx>0) return r/x; else return two-r/x;
> 	} else {
> 	    if(hx>0) return tiny*tiny; else return two-tiny;

Bruce