From owner-freebsd-numerics@FreeBSD.ORG Sun Aug 12 23:10:13 2012 Return-Path: Delivered-To: freebsd-numerics@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 2E1E010656B7 for ; Sun, 12 Aug 2012 23:10:13 +0000 (UTC) (envelope-from peter@rulingia.com) Received: from vps.rulingia.com (host-122-100-2-194.octopus.com.au [122.100.2.194]) by mx1.freebsd.org (Postfix) with ESMTP id 4D6338FC17 for ; Sun, 12 Aug 2012 23:10:12 +0000 (UTC) Received: from server.rulingia.com (c220-239-249-137.belrs5.nsw.optusnet.com.au [220.239.249.137]) by vps.rulingia.com (8.14.5/8.14.5) with ESMTP id q7CNACvT075811 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK) for ; Mon, 13 Aug 2012 09:10:12 +1000 (EST) (envelope-from peter@rulingia.com) X-Bogosity: Ham, spamicity=0.000000 Received: from server.rulingia.com (localhost.rulingia.com [127.0.0.1]) by server.rulingia.com (8.14.5/8.14.5) with ESMTP id q7CNA5vq021599 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Mon, 13 Aug 2012 09:10:06 +1000 (EST) (envelope-from peter@server.rulingia.com) Received: (from peter@localhost) by server.rulingia.com (8.14.5/8.14.5/Submit) id q7CNA5Yg021598 for freebsd-numerics@freebsd.org; Mon, 13 Aug 2012 09:10:05 +1000 (EST) (envelope-from peter) Resent-From: Peter Jeremy Resent-Date: Mon, 13 Aug 2012 09:10:05 +1000 Resent-Message-ID: <20120812231005.GT20453@server.rulingia.com> Resent-To: freebsd-numerics@freebsd.org Received: from vps.rulingia.com (host-122-100-2-194.octopus.com.au [122.100.2.194]) by server.rulingia.com (8.14.5/8.14.5) with ESMTP id q6KGPYTX032431 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK) for ; Sat, 21 Jul 2012 02:25:34 +1000 (EST) (envelope-from brde@optusnet.com.au) Received: from mail03.syd.optusnet.com.au (mail03.syd.optusnet.com.au [211.29.132.184]) by vps.rulingia.com (8.14.5/8.14.5) with ESMTP id q6KGPYnS087133 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Sat, 21 Jul 2012 02:25:34 +1000 (EST) (envelope-from brde@optusnet.com.au) Received: from c122-106-171-246.carlnfd1.nsw.optusnet.com.au (c122-106-171-246.carlnfd1.nsw.optusnet.com.au [122.106.171.246]) by mail03.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id q6KGPHnj024560 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sat, 21 Jul 2012 02:25:19 +1000 From: Bruce Evans Mail-Followup-To: freebsd-numerics@freebsd.org X-X-Sender: bde@besplex.bde.org To: Stephen Montgomery-Smith In-Reply-To: <50095CDE.4050507@missouri.edu> Message-ID: <20120721011112.D5008@besplex.bde.org> References: <20120714120432.GA70706@server.rulingia.com> <20120717084457.U3890@besplex.bde.org> <5004A5C7.1040405@missouri.edu> <5004DEA9.1050001@missouri.edu> <20120717200931.U6624@besplex.bde.org> <5006D13D.2080702@missouri.edu> <20120718205625.GA409@troutmask.apl.washington.edu> <500725F2.7060603@missouri.edu> <20120719025345.GA1376@troutmask.apl.washington.edu> <50077987.1080307@missouri.edu> <20120719032706.GA1558@troutmask.apl.washington.edu> <5007826D.7060806@missouri.edu> <5007AD41.9070000@missouri.edu> <20120719205347.T2601@besplex.bde.org> <50084322.7020401@missouri.edu> <20120720035001.W4053@besplex.bde.org> <50085441.4090305@missouri.edu> <20120720162953.N2162@besplex.bde.org> <20120720184114.B2790@besplex.bde.org> <50095CDE.4050507@missouri.edu> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Mailman-Approved-At: Sun, 12 Aug 2012 23:55:59 +0000 Cc: Diane Bruce , Bruce Evans , John Baldwin , David Chisnall , Bruce Evans , Steve Kargl , David Schultz , Peter Jeremy , Warner Losh Subject: Re: Use of C99 extra long double math functions after r236148 X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Date: Sun, 12 Aug 2012 23:10:13 -0000 X-Original-Date: Sat, 21 Jul 2012 02:25:16 +1000 (EST) X-List-Received-Date: Sun, 12 Aug 2012 23:10:13 -0000 On Fri, 20 Jul 2012, Stephen Montgomery-Smith wrote: > I worked quite a bit on my clog function last night. Could you look at this > one? Will look later. Another set of fixes for the old one now. The maximum error now seems to be about 1 ulp. I prefer it inline, but can duplicate it. % } else if (isfinite(x) && isfinite(y) && % (fabs(x) > 1e308 || fabs(y) > 1e308)) % /* % * To avoid unnecessary overflow, if x or y are very large, % * divide x and y by M_E, and then add 1 to the logarithm. % * This depends on M_E being larger than sqrt(2). % */ % return (cpack(log(hypot(x / M_E, y / M_E)) + 1, atan2(y, x))); Add 1 in the right place here. % else if (fabs(x) < 1e-50 && fabs(y) != 1 || % fabs(y) < 1e-50 && fabs(x) != 1 || % fabs(x) > 1e50 || fabs(y) > 1e50) Old modifications for |x| == 1 and |y| == 1. This needs to be expanded a bit, since cases near 1 are now handled too. % /* % * Because atan2 and hypot conform to C99, this also covers % * all the edge cases when x or y are 0 or infinite. % */ % return (cpack(log(hypot(x, y)), atan2(y, x))); % else { % /* We don't need to worry about overflow in x*x+y*y. */ % h = x * x + y * y; It is also good to avoid underflow. Underflow used to be avoided by using hypot() if x or y is tiny, but the above modification lets these cases through to here when |x| or |y| is 1. The square then underflows for values below about 1e-172. Except on i386, extra exponent range may prevent underflow, depending on optimization. To be fixed. % if (h < 0.1 || h > 10) % return (cpack(log(h) * 0.5, atan2(y, x))); Start replacing multiplications by divisions. % /* Take extra care if h is moderately close to 1 */ % else { % double ox = x; % double oy = y; I want to take absolute values and swap x and y so that 0 <= y <= x, so preserve the original values here. % x = fabs(x); % y = fabs(y); % if (x < y) { % x0 = x; % x = y; % y = x0; % } Normalize. % /* % * x0 and y0 are good approximations to x and y, but % * have their bits trimmed so that double precision % * floating point is capable of calculating x0*x0 + % * y0*y0 - 1 exactly. % */ It wasn't exact, even after fixing the hi+lo decompositions of x and y. Only the products were. The following fixes both. % x0 = (float)x; % x1 = x - x0; % y0 = (float)y; % y1 = y - y0; A good way to do the hi+lo decompositions. % if (y0 < 1e-50) { % t2 = y / x; % return (cpack(logf(x) + t2 * (t2 * 0.5), % atan2(oy, ox))); % } When y is tiny, this is necessary to handle the case x == 1. It also handles many nearby cases. Note that when y is tiny, x is fairly large, since h is fairly large. Thus t2 is also tiny. The formula is sloppy when x is not exactly 1, but works OK. t2*t2/2 must be parenthesized as above to avoid spurious underflow when t2 ~= sqrt(smallest denormal). Spurious underflow gives very large relative errors (I forget if they were ~8 ulps, ~436 ulps, or mega-ulps...). One of the main reasons for this special case is to avoid this spurious underflow. I developed this mainly for the float precision case on i386 and saw confusing behaviour when i386's extra precision sometimes avoided the underflow. % /* Notice that mathematically, h = t1*(1+t3). */ % t1 = x0 * x0; /* Exact. */ % t2 = y0 * y0; /* Exact. */ % STRICT_ASSIGN(double, t3, t1 + t2); % t2 = (t1 - t3) + t2; % t1 = t3; /* Now t1+t2 is hi+lo for x0*x0+y0*y0.*/ t1+t2 is not necessarily exact, and we must do multi-precision addition of it to get enough accuracy. Even this is not accurate enough when the final t3 is tiny. % t2 += 2 * x0 * x1 + x1 * x1 + 2 * y0 * y1 + y1 * y1; Add the rest of the low terms to t2. As before, except now t2 starts as usually nonzero to hold the low part of x0*x0+y0*y0. % t3 = t2 / t1; t2 is tiny relative to t1 (about 2**53 times smaller), so the accuracy of this is unimportant, except when t1 is near 1, in which case log() of it reduces it to near 1 and the error becomes dominated by that of t3. The special case for tiny y above is supposed to avoid reaching here in such cases, but the classification above is sloppy so it isn't clear that it does. Ww want to avoid reaching here unless t1 is a few tens or hundreds or thousands of ulps away from 1. % return (cpack((log(t1) + log1p(t3)) * 0.5, % atan2(oy, ox))); Suppose that t1 is exactly 1 and we reach here. t3 is only guaranteed to be less than about 2**-53. Even when it is 2**-205 times larger than that, we have a large error for it. One bad case is when it underflows. log1p(t3) * 0.5 gives a grouping of terms which allows t3 to underflow to 0. We can't put the factor of 0.5 inside log1p() without using a special case as above. Plain log(t1) is the same as log1p(t1 - 1) here. % } % } % } Bruce