From owner-freebsd-numerics@FreeBSD.ORG Sun Aug 12 23:11:18 2012 Return-Path: Delivered-To: freebsd-numerics@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 12F791065672 for ; Sun, 12 Aug 2012 23:11:18 +0000 (UTC) (envelope-from peter@rulingia.com) Received: from vps.rulingia.com (host-122-100-2-194.octopus.com.au [122.100.2.194]) by mx1.freebsd.org (Postfix) with ESMTP id 8BE3B8FC18 for ; Sun, 12 Aug 2012 23:11:17 +0000 (UTC) Received: from server.rulingia.com (c220-239-249-137.belrs5.nsw.optusnet.com.au [220.239.249.137]) by vps.rulingia.com (8.14.5/8.14.5) with ESMTP id q7CNBHdr075875 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK) for ; Mon, 13 Aug 2012 09:11:17 +1000 (EST) (envelope-from peter@rulingia.com) X-Bogosity: Ham, spamicity=0.000000 Received: from server.rulingia.com (localhost.rulingia.com [127.0.0.1]) by server.rulingia.com (8.14.5/8.14.5) with ESMTP id q7CNBBkR021721 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Mon, 13 Aug 2012 09:11:11 +1000 (EST) (envelope-from peter@server.rulingia.com) Received: (from peter@localhost) by server.rulingia.com (8.14.5/8.14.5/Submit) id q7CNBBGo021720 for freebsd-numerics@freebsd.org; Mon, 13 Aug 2012 09:11:11 +1000 (EST) (envelope-from peter) Resent-From: Peter Jeremy Resent-Date: Mon, 13 Aug 2012 09:11:11 +1000 Resent-Message-ID: <20120812231111.GH20453@server.rulingia.com> Resent-To: freebsd-numerics@freebsd.org Received: from vps.rulingia.com (host-122-100-2-194.octopus.com.au [122.100.2.194]) by server.rulingia.com (8.14.5/8.14.5) with ESMTP id q6L3Ylxh063492 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK) for ; Sat, 21 Jul 2012 13:34:47 +1000 (EST) (envelope-from brde@optusnet.com.au) Received: from mail34.syd.optusnet.com.au (mail34.syd.optusnet.com.au [211.29.133.218]) by vps.rulingia.com (8.14.5/8.14.5) with ESMTP id q6L3Ykb0095439 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Sat, 21 Jul 2012 13:34:46 +1000 (EST) (envelope-from brde@optusnet.com.au) Received: from c122-106-171-246.carlnfd1.nsw.optusnet.com.au (c122-106-171-246.carlnfd1.nsw.optusnet.com.au [122.106.171.246]) by mail34.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id q6L3YW2C028811 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sat, 21 Jul 2012 13:34:33 +1000 From: Bruce Evans Mail-Followup-To: freebsd-numerics@freebsd.org X-X-Sender: bde@besplex.bde.org To: Stephen Montgomery-Smith In-Reply-To: <5009BD6C.9050301@missouri.edu> Message-ID: <20120721123522.T877@besplex.bde.org> References: <20120714120432.GA70706@server.rulingia.com> <20120717084457.U3890@besplex.bde.org> <5004A5C7.1040405@missouri.edu> <5004DEA9.1050001@missouri.edu> <20120717200931.U6624@besplex.bde.org> <5006D13D.2080702@missouri.edu> <20120718205625.GA409@troutmask.apl.washington.edu> <500725F2.7060603@missouri.edu> <20120719025345.GA1376@troutmask.apl.washington.edu> <50077987.1080307@missouri.edu> <20120719032706.GA1558@troutmask.apl.washington.edu> <5007826D.7060806@missouri.edu> <5007AD41.9070000@missouri.edu> <20120719205347.T2601@besplex.bde.org> <50084322.7020401@missouri.edu> <20120720035001.W4053@besplex.bde.org> <50085441.4090305@missouri.edu> <20120720162953.N2162@besplex.bde.org> <20120720184114.B2790@besplex.bde.org> <50097128.6030405@missouri.edu> <20120721032448.X5744@besplex.bde.org> <5009BD6C.9050301@missouri.edu> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Mailman-Approved-At: Sun, 12 Aug 2012 23:56:00 +0000 Cc: Diane Bruce , Bruce Evans , John Baldwin , David Chisnall , Bruce Evans , Steve Kargl , David Schultz , Peter Jeremy , Warner Losh Subject: Re: Use of C99 extra long double math functions after r236148 X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Date: Sun, 12 Aug 2012 23:11:18 -0000 X-Original-Date: Sat, 21 Jul 2012 13:34:32 +1000 (EST) X-List-Received-Date: Sun, 12 Aug 2012 23:11:18 -0000 On Fri, 20 Jul 2012, Stephen Montgomery-Smith wrote: > Bruce, with both of us working at the same time on clog, it is getting hard > for me to follow. The version I sent this morning is the last change I made. > > How about if you come the owner of the code for a while. When you are > finished, send it back to me, and I will look over everything you have done. > I won't work on it until then. This works for me in other ways too, because > my life is very busy at the moment. I'd prefer you (or Somone Else) to keep working on it. I just plugged it into my test framework and started zapping errors... (I need to make my test framework easier to set up so that I don't have any investment in the not seeing the errors.) > If I do work on code, it will be on casinh/casin. I am looking over the > paper by Hull et al, and I am learning a lot from it. One thing I did It will be a larger task. I don't plan to do it (better not look at the paper :-). I looked at one of Kahan's old papers about this. For not seeing these papers, it helps that they are behind paywalls. I only saw the Kahan paper because someone sent me an obtained copy. I just remembered that log1p(x) has essentially the same problems with underflow that we are seeing for clog(): - log(1 + x) for small x. This is analytic with a zero at 0. Any such function shares the following numeric property: you expand it as P1*x + P2*x^2 + ... Even the x^2 term in this underflows for tiny x, so you must not evaluate. You must just return P1*x (with inexact if x != 0). There is a threshold below which this must be done to avoid underflow. There is a not much higher threshold above which this must not be done since it is too inaccurate. All functions in fdlibm try to be careful with these thresholds, and most succeed. I broke some of them by invalid optimizations to avoid branches, but learned better. - log(1 + x) for medium x. It has no direct problems with underflow or overflow. We have to be careful not to introduce underflow problems by doing things like evaluating x^100 where x is small or by writing x as hi+lo and evaluating lo^10. Naive code can produce the former problem by using too many terms in a power series. hi+lo decompositions are fairly immune to such problems, since in double precision the lo part is at most 2**54 times smaller than the hi part (usually about 2**26 times smaller). Underflow can never occur in the decomposition (except possibly in unusual rounding modes), and the lo part can be raised to a small power provided the original value is not too tiny. - log(1 + x) for large x. Think of it as log(x + 1), where x and 1 are the hi and lo parts of a decomposition. You scale this to 1 + 1/x, where 1 and 1/x are hi and lo parts. This is not a hi+lo decomposition of any representable number, since the whole point of log1p() is that 1+x cannot be represented exactly. Now the lo part can be up to about 2**1023 times smaller than the hi part, so even squaring it can underflow. The situation for 1+1/x is essentially the same as for 1+x, except there is a scale factor that gives extra sources if inaccuracies: log(1 + x) = log(x + 1) = log(x * (1 + 1/x)) = log(x) + log(1 + 1/x). For the log(1 + 1/x) part, there is a threshold (for x) above which 1/x must not be squared since it would underflow, and not much lower threshold below which the 1/x approximation must not be used since it is too inaccurate. It is technically easier to approximate log(1 + 1/x) by 0 instead of 1/x and use the threshold for that. For clog(), we need to evaluate log(x*x + y*y). This is harder than log(1 + x) since the first additive term is not constant and there are multiplications. We should first reduce so that |x| >= |y|. If |x| is 1 and y is small, we now have essentially the same setup as for log(1 + x) for small x. My initial quick fix was only for this case. In the general case, we should evaluate x*x + y*y as hi+lo to great accuracy (we use about 24 extra bits, and this seems to be enough). Then log(hi + lo) has essentially the setup as log(x + 1) for large x, after we pull out the hi factor from each: log(x + 1) = log(x) + log(1 + 1/x) log(hi + lo) = log(hi) + log(1 + lo/hi). We may or may not end up with a lo/hi term that causes underflow. There is the additional problem that the lo/hi term is itself a square (essentially (y/x)^2, so it may already have underflowed. Earlier special cases in our algorithm are apparently enough to avoid underflow for the general case. Bruce