From owner-freebsd-numerics@FreeBSD.ORG  Mon Aug 26 17:19:20 2013
Return-Path: <owner-freebsd-numerics@FreeBSD.ORG>
Delivered-To: freebsd-numerics@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTP id 1D042198
 for <freebsd-numerics@freebsd.org>; Mon, 26 Aug 2013 17:19:20 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from mail108.syd.optusnet.com.au (mail108.syd.optusnet.com.au
 [211.29.132.59]) by mx1.freebsd.org (Postfix) with ESMTP id BC0B7223E
 for <freebsd-numerics@freebsd.org>; Mon, 26 Aug 2013 17:19:19 +0000 (UTC)
Received: from c122-106-156-23.carlnfd1.nsw.optusnet.com.au
 (c122-106-156-23.carlnfd1.nsw.optusnet.com.au [122.106.156.23])
 by mail108.syd.optusnet.com.au (Postfix) with ESMTPS id F06991A054A;
 Tue, 27 Aug 2013 03:19:13 +1000 (EST)
Date: Tue, 27 Aug 2013 03:19:13 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Steve Kargl <sgk@troutmask.apl.washington.edu>
Subject: Re: (2nd time) tweaks to erff() threshold values
In-Reply-To: <20130825175817.GA60599@troutmask.apl.washington.edu>
Message-ID: <20130827012348.G2328@besplex.bde.org>
References: <20130822213315.GA6708@troutmask.apl.washington.edu>
 <20130823202257.Q1593@besplex.bde.org>
 <20130823165744.GA47369@troutmask.apl.washington.edu>
 <20130824202102.GA54981@troutmask.apl.washington.edu>
 <20130825023029.GA56772@troutmask.apl.washington.edu>
 <20130825171910.GA60396@troutmask.apl.washington.edu>
 <20130825175817.GA60599@troutmask.apl.washington.edu>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Optus-CM-Score: 0
X-Optus-CM-Analysis: v=2.1 cv=YYGEuWhf c=1 sm=1 tr=0
 a=ebeQFi2P/qHVC0Yw9JDJ4g==:117 a=PO7r1zJSAAAA:8 a=F6uwjYSw4mAA:10
 a=kj9zAlcOel0A:10 a=JzwRw_2MAAAA:8 a=bRjWgPqadKYA:10
 a=xRB_yx2URKs_nkmv_ssA:9 a=CjuIK1q_8ugA:10
Cc: freebsd-numerics@freebsd.org
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
 <freebsd-numerics.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 26 Aug 2013 17:19:20 -0000

On Sun, 25 Aug 2013, Steve Kargl wrote:

> On Sun, Aug 25, 2013 at 10:19:10AM -0700, Steve Kargl wrote:
>>
>> Note**2, I did not record a domain and range as my routine that
>> generates plots seems to give a really messed up result for
>> the range [-0.08, 0.06].  However, the error estimate from
>> solving the Remes matrix equation is err = 0x1.39c84809b7ed2p-38.
>
> Found the bug in the graphing routine.
>
> /*
> *  Domain [0.84375, 1.25], range ~[-1.954e-10,1.940e-11]:
> *  |(erf(x) - erx) - p(x)/q(x)| < 2**-36.
> */
>
> Note, the range and 2**-36 estimate are from the results of
> the Remes algorithm run with 1024-bits of precision.  The
> above coefficients were simply rounded to single precision,
> as I haven't investigated a better method for running, yet.

I now have a good method for rounding the coefficients.  It is like
my old iterative method of rounding the the n'th coeff and then
calculating all coeffs after the n'th again with all coeffs up to and
including the n'th fixed, except I now use an almost pure Remez
algorithm for all steps, and later another set of iterations to
recaclulate all early coeffs after fixing some later ones.

Here is the almost pure Remez algorithm in pari:

% cpolinterpolate(p, d, a) =

- p is a polynomial containing fixed leading coeffs (0 for basic Remez)
- d is the order of the lowest non-fixed coeff (0 for basic Remez)
- 'a' is an array of n + 2 "interpolation" points (basic Remez)
- the function to be approximated is a global f(x)
- the function giving the relative errors a global e(x)

For example, f(x) might be cos(x) ~= 1 - 0.5*x**2 + ...  After fixing
the first 2 coeffs, p = 1 - 0.5*x**2, d = 4, and we have a modified
Remez approximation problem of finding a poly starting with the x**4
term.  Nonzero d requires one obvious change to basic Remez (set up the
equations to be solved starting at the x**d term instead of the x**0
term), and one not so obvious change (if d is odd, then the signs in
the +-1 column (the coefficients for the error E) of the matrix must
be flipped for negative interpolation points).  You can understand the
need for something like the latter by understanding basic Remez staring
at the matrix a little.  Note that the signes in the +- column for
basic Remez give a nonsingular matrix.  This just fails for odd d, but
is fixed by the adjustment just described, except when the interpolation
points are symmetric with the origin.  The symmetric case is easy to avoid
in practice.

% {
% 	local(M, c, i, j, n, v);
% 
% 	n = matsize(a)[2] - 2;
% 	v = vector(n);
% 	M = matrix(n, n);
% 	for (i = 1, n,
% 		v[i] = f(a[1 + i]) - subst(p, x, a[1 + i]);

Function value at an interpolation point after adjusting for the fixed
part p = p(x).

% 		if (evenfunc,
% 			for (j = 1, n - 1, M[i, j] = a[1 + i]^(d + 2*(j - 1)););
% 		,
% 			for (j = 1, n - 1, M[i, j] = a[1 + i]^(d + j - 1););
% 		);
% 		M[i, n] = (-1)^i;

This sets up basic Remez, with the adjustment for x**d instead of x**0.
The special case for even functions is not quite just an optimization.

% 		if (d % 2 == 1 && a[1 + i] < 0,
% 			M[i, n] = -M[i, n];
% 		);

This flips the signs.

% 		M[i, n] *= e(a[1 + i]);

This changes the +-1 column to a weighted version of the same.  It is best
for e(x) to be as smooth as f(x), but this algorithm does OK with even
widly discontinuious e(x) (e.g., 2**ilogb(x)).

% 	);
% 	v = precision(v, default(realprecision));
% 	M = precision(M, default(realprecision));
% 	c = matsolve(M, v~);
% 	c = precision(c, default(realprecision));

Solve the equation, with technical complications to avoid problems with
insufficient ot too much precision.

% 	gE = c[n];

This is the global error (E).

% 	if (evenfunc,
% 		for (j = 1, n - 1, p += c[j] * x^(d + 2 * (j - 1)););
% 	,
% 		for (j = 1, n - 1, p += c[j] * x^(d + j - 1););
% 	);

Add the new terms to the fixed terms in p.

% 	return (p);
% }

The iteration step in basic Remez needs an adjustment for signs too.  In
basic Remez, the signs of the errors at the interpolation points alternate
(+-E), and a new set of interpolation points is chosen to preserve this
alternation while choosing points where the error is larger than gE.  The
sign flipping must be applied similary.  E.g., if for basic Remez the
error pattern is 'E -E | E -E E' where '|' is the origin, the for odd d
the error pattern becomes '-E E | E -E E'.  Note that the signs no longer
alternate across the origin.

Complications like the following are handled very well by this algorithm:
suppose that in approximating cos(x) we fix the x**2 coeff at -0.4999
instead of -0.5 (similar inaccuracies are unavoidable for higher terms).
Then the naive method of trying to approximate
(cos(x) - (1 - 0.4999*x**2) / x^4 using basic Remez doesn't work because
the function to be approximated is singular at the origin.  (We divided
by x**4 to try to reduce to basic Remez.)  The singularity tends to cause
singular matrixes.  I used to use essentially this method, with hack to
avoid the singularity.  It basically works to a not try to approximate
near the singularity, since multiplying by x**4 will cancel the singularity
and leave a good approximation near it.  But it is hard to keep the
interpolation points away from the singularity without having many special
cases.  The above algorithm hits a sort of removable singularity at 0
instead,  It seems to be numerically stable near the singularity if f(x)
is smooth.

To handle some coeffs being rounded more severely than others, the following
algorithm works suprisingly well:
- do the above iterations to round all coeffs from the first to the last
- fix all the (trailing) coeffs that are rounded more severely than the
   others
- iterate from the first to he last non-trailing coeff to calculate all
   the leading coeffs again, now for the new function f(x) - p1(x) where
   p1 is the poly with the trailing coeffs, and of course the poly for the
   leading coeffs has degree reduced by the number of trailing coeffs
   compared with the poly found in the first pass.  This makes the leading
   coeffs match the trailing coeffs better than is possible on the first
   pass.  On the first pass, leading coeffs are chosen under the assumption
   that trailing coeffs are infinte-precision.  When they are severely
   rounded, this assumption breaks badly.
This algorithm works so well that trying to improve it using more passes
gets nowhere or backwards.  E.g., it usually makes no significant difference
to caclulate all the trailing coeffs again with the recalulated set of
leading coeffs.

Someone at INRIA wrote a paper about poly approximations with some
coeffs forced to 0.  Normally, all the generators {x**0, x**1, x**2, ...}
are allowed.  Forcing some coeffs to 0 restricts the generators to a
specifed subset of the standard one.  The author noted that this can
easily give a singular matrix for the corresponding extended Remez
algorithm.  I think too easily to be very useful.  My extension works
fairly generally for the special subset {x**d, x**(d+1), ...}.  I have
the same amount of proof that it works as the paper (none).  I didn't
notice anything about the complications for the sign flipping in the
paper.  Perhaps it required the subset to include {x**0} for general
functions or {x**1} for odd functions.  Without the function being 0
at 0, it is rarely possible to approximate it well enough if the x**0
coeff is forced to 0.  Similarly for higher coeffs that want to be
nonzero.  The point of forcing coeffs to 0 is that small ones tend to
give terms that are harder to add up accurately.  But the method seems
to have very limited use since only really tiny terms can be omitted
without having to do too much compsensation in other terms.

Bruce