From owner-freebsd-numerics@FreeBSD.ORG Mon Aug 26 17:19:20 2013 Return-Path: Delivered-To: freebsd-numerics@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 1D042198 for ; Mon, 26 Aug 2013 17:19:20 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail108.syd.optusnet.com.au (mail108.syd.optusnet.com.au [211.29.132.59]) by mx1.freebsd.org (Postfix) with ESMTP id BC0B7223E for ; Mon, 26 Aug 2013 17:19:19 +0000 (UTC) Received: from c122-106-156-23.carlnfd1.nsw.optusnet.com.au (c122-106-156-23.carlnfd1.nsw.optusnet.com.au [122.106.156.23]) by mail108.syd.optusnet.com.au (Postfix) with ESMTPS id F06991A054A; Tue, 27 Aug 2013 03:19:13 +1000 (EST) Date: Tue, 27 Aug 2013 03:19:13 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Steve Kargl Subject: Re: (2nd time) tweaks to erff() threshold values In-Reply-To: <20130825175817.GA60599@troutmask.apl.washington.edu> Message-ID: <20130827012348.G2328@besplex.bde.org> References: <20130822213315.GA6708@troutmask.apl.washington.edu> <20130823202257.Q1593@besplex.bde.org> <20130823165744.GA47369@troutmask.apl.washington.edu> <20130824202102.GA54981@troutmask.apl.washington.edu> <20130825023029.GA56772@troutmask.apl.washington.edu> <20130825171910.GA60396@troutmask.apl.washington.edu> <20130825175817.GA60599@troutmask.apl.washington.edu> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.1 cv=YYGEuWhf c=1 sm=1 tr=0 a=ebeQFi2P/qHVC0Yw9JDJ4g==:117 a=PO7r1zJSAAAA:8 a=F6uwjYSw4mAA:10 a=kj9zAlcOel0A:10 a=JzwRw_2MAAAA:8 a=bRjWgPqadKYA:10 a=xRB_yx2URKs_nkmv_ssA:9 a=CjuIK1q_8ugA:10 Cc: freebsd-numerics@freebsd.org X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 26 Aug 2013 17:19:20 -0000 On Sun, 25 Aug 2013, Steve Kargl wrote: > On Sun, Aug 25, 2013 at 10:19:10AM -0700, Steve Kargl wrote: >> >> Note**2, I did not record a domain and range as my routine that >> generates plots seems to give a really messed up result for >> the range [-0.08, 0.06]. However, the error estimate from >> solving the Remes matrix equation is err = 0x1.39c84809b7ed2p-38. > > Found the bug in the graphing routine. > > /* > * Domain [0.84375, 1.25], range ~[-1.954e-10,1.940e-11]: > * |(erf(x) - erx) - p(x)/q(x)| < 2**-36. > */ > > Note, the range and 2**-36 estimate are from the results of > the Remes algorithm run with 1024-bits of precision. The > above coefficients were simply rounded to single precision, > as I haven't investigated a better method for running, yet. I now have a good method for rounding the coefficients. It is like my old iterative method of rounding the the n'th coeff and then calculating all coeffs after the n'th again with all coeffs up to and including the n'th fixed, except I now use an almost pure Remez algorithm for all steps, and later another set of iterations to recaclulate all early coeffs after fixing some later ones. Here is the almost pure Remez algorithm in pari: % cpolinterpolate(p, d, a) = - p is a polynomial containing fixed leading coeffs (0 for basic Remez) - d is the order of the lowest non-fixed coeff (0 for basic Remez) - 'a' is an array of n + 2 "interpolation" points (basic Remez) - the function to be approximated is a global f(x) - the function giving the relative errors a global e(x) For example, f(x) might be cos(x) ~= 1 - 0.5*x**2 + ... After fixing the first 2 coeffs, p = 1 - 0.5*x**2, d = 4, and we have a modified Remez approximation problem of finding a poly starting with the x**4 term. Nonzero d requires one obvious change to basic Remez (set up the equations to be solved starting at the x**d term instead of the x**0 term), and one not so obvious change (if d is odd, then the signs in the +-1 column (the coefficients for the error E) of the matrix must be flipped for negative interpolation points). You can understand the need for something like the latter by understanding basic Remez staring at the matrix a little. Note that the signes in the +- column for basic Remez give a nonsingular matrix. This just fails for odd d, but is fixed by the adjustment just described, except when the interpolation points are symmetric with the origin. The symmetric case is easy to avoid in practice. % { % local(M, c, i, j, n, v); % % n = matsize(a)[2] - 2; % v = vector(n); % M = matrix(n, n); % for (i = 1, n, % v[i] = f(a[1 + i]) - subst(p, x, a[1 + i]); Function value at an interpolation point after adjusting for the fixed part p = p(x). % if (evenfunc, % for (j = 1, n - 1, M[i, j] = a[1 + i]^(d + 2*(j - 1));); % , % for (j = 1, n - 1, M[i, j] = a[1 + i]^(d + j - 1);); % ); % M[i, n] = (-1)^i; This sets up basic Remez, with the adjustment for x**d instead of x**0. The special case for even functions is not quite just an optimization. % if (d % 2 == 1 && a[1 + i] < 0, % M[i, n] = -M[i, n]; % ); This flips the signs. % M[i, n] *= e(a[1 + i]); This changes the +-1 column to a weighted version of the same. It is best for e(x) to be as smooth as f(x), but this algorithm does OK with even widly discontinuious e(x) (e.g., 2**ilogb(x)). % ); % v = precision(v, default(realprecision)); % M = precision(M, default(realprecision)); % c = matsolve(M, v~); % c = precision(c, default(realprecision)); Solve the equation, with technical complications to avoid problems with insufficient ot too much precision. % gE = c[n]; This is the global error (E). % if (evenfunc, % for (j = 1, n - 1, p += c[j] * x^(d + 2 * (j - 1));); % , % for (j = 1, n - 1, p += c[j] * x^(d + j - 1);); % ); Add the new terms to the fixed terms in p. % return (p); % } The iteration step in basic Remez needs an adjustment for signs too. In basic Remez, the signs of the errors at the interpolation points alternate (+-E), and a new set of interpolation points is chosen to preserve this alternation while choosing points where the error is larger than gE. The sign flipping must be applied similary. E.g., if for basic Remez the error pattern is 'E -E | E -E E' where '|' is the origin, the for odd d the error pattern becomes '-E E | E -E E'. Note that the signs no longer alternate across the origin. Complications like the following are handled very well by this algorithm: suppose that in approximating cos(x) we fix the x**2 coeff at -0.4999 instead of -0.5 (similar inaccuracies are unavoidable for higher terms). Then the naive method of trying to approximate (cos(x) - (1 - 0.4999*x**2) / x^4 using basic Remez doesn't work because the function to be approximated is singular at the origin. (We divided by x**4 to try to reduce to basic Remez.) The singularity tends to cause singular matrixes. I used to use essentially this method, with hack to avoid the singularity. It basically works to a not try to approximate near the singularity, since multiplying by x**4 will cancel the singularity and leave a good approximation near it. But it is hard to keep the interpolation points away from the singularity without having many special cases. The above algorithm hits a sort of removable singularity at 0 instead, It seems to be numerically stable near the singularity if f(x) is smooth. To handle some coeffs being rounded more severely than others, the following algorithm works suprisingly well: - do the above iterations to round all coeffs from the first to the last - fix all the (trailing) coeffs that are rounded more severely than the others - iterate from the first to he last non-trailing coeff to calculate all the leading coeffs again, now for the new function f(x) - p1(x) where p1 is the poly with the trailing coeffs, and of course the poly for the leading coeffs has degree reduced by the number of trailing coeffs compared with the poly found in the first pass. This makes the leading coeffs match the trailing coeffs better than is possible on the first pass. On the first pass, leading coeffs are chosen under the assumption that trailing coeffs are infinte-precision. When they are severely rounded, this assumption breaks badly. This algorithm works so well that trying to improve it using more passes gets nowhere or backwards. E.g., it usually makes no significant difference to caclulate all the trailing coeffs again with the recalulated set of leading coeffs. Someone at INRIA wrote a paper about poly approximations with some coeffs forced to 0. Normally, all the generators {x**0, x**1, x**2, ...} are allowed. Forcing some coeffs to 0 restricts the generators to a specifed subset of the standard one. The author noted that this can easily give a singular matrix for the corresponding extended Remez algorithm. I think too easily to be very useful. My extension works fairly generally for the special subset {x**d, x**(d+1), ...}. I have the same amount of proof that it works as the paper (none). I didn't notice anything about the complications for the sign flipping in the paper. Perhaps it required the subset to include {x**0} for general functions or {x**1} for odd functions. Without the function being 0 at 0, it is rarely possible to approximate it well enough if the x**0 coeff is forced to 0. Similarly for higher coeffs that want to be nonzero. The point of forcing coeffs to 0 is that small ones tend to give terms that are harder to add up accurately. But the method seems to have very limited use since only really tiny terms can be omitted without having to do too much compsensation in other terms. Bruce