Date: Sun, 12 Aug 2012 23:11:18 -0000 From: Bruce Evans <brde@optusnet.com.au> To: Stephen Montgomery-Smith <stephen@missouri.edu> Cc: Diane Bruce <db@db.net>, Bruce Evans <brde@optusnet.com.au>, John Baldwin <jhb@freebsd.org>, David Chisnall <theraven@freebsd.org>, Bruce Evans <bde@freebsd.org>, Steve Kargl <sgk@troutmask.apl.washington.edu>, David Schultz <das@freebsd.org>, Peter Jeremy <peter@rulingia.com>, Warner Losh <imp@bsdimp.com> Subject: Re: Use of C99 extra long double math functions after r236148 Message-ID: <20120721123522.T877@besplex.bde.org> Resent-Message-ID: <20120812231111.GH20453@server.rulingia.com> In-Reply-To: <5009BD6C.9050301@missouri.edu> References: <20120714120432.GA70706@server.rulingia.com> <20120717084457.U3890@besplex.bde.org> <5004A5C7.1040405@missouri.edu> <5004DEA9.1050001@missouri.edu> <20120717200931.U6624@besplex.bde.org> <5006D13D.2080702@missouri.edu> <20120718205625.GA409@troutmask.apl.washington.edu> <500725F2.7060603@missouri.edu> <20120719025345.GA1376@troutmask.apl.washington.edu> <50077987.1080307@missouri.edu> <20120719032706.GA1558@troutmask.apl.washington.edu> <5007826D.7060806@missouri.edu> <5007AD41.9070000@missouri.edu> <20120719205347.T2601@besplex.bde.org> <50084322.7020401@missouri.edu> <20120720035001.W4053@besplex.bde.org> <50085441.4090305@missouri.edu> <20120720162953.N2162@besplex.bde.org> <20120720184114.B2790@besplex.bde.org> <50097128.6030405@missouri.edu> <20120721032448.X5744@besplex.bde.org> <5009BD6C.9050301@missouri.edu>
next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, 20 Jul 2012, Stephen Montgomery-Smith wrote: > Bruce, with both of us working at the same time on clog, it is getting hard > for me to follow. The version I sent this morning is the last change I made. > > How about if you come the owner of the code for a while. When you are > finished, send it back to me, and I will look over everything you have done. > I won't work on it until then. This works for me in other ways too, because > my life is very busy at the moment. I'd prefer you (or Somone Else) to keep working on it. I just plugged it into my test framework and started zapping errors... (I need to make my test framework easier to set up so that I don't have any investment in the not seeing the errors.) > If I do work on code, it will be on casinh/casin. I am looking over the > paper by Hull et al, and I am learning a lot from it. One thing I did It will be a larger task. I don't plan to do it (better not look at the paper :-). I looked at one of Kahan's old papers about this. For not seeing these papers, it helps that they are behind paywalls. I only saw the Kahan paper because someone sent me an obtained copy. I just remembered that log1p(x) has essentially the same problems with underflow that we are seeing for clog(): - log(1 + x) for small x. This is analytic with a zero at 0. Any such function shares the following numeric property: you expand it as P1*x + P2*x^2 + ... Even the x^2 term in this underflows for tiny x, so you must not evaluate. You must just return P1*x (with inexact if x != 0). There is a threshold below which this must be done to avoid underflow. There is a not much higher threshold above which this must not be done since it is too inaccurate. All functions in fdlibm try to be careful with these thresholds, and most succeed. I broke some of them by invalid optimizations to avoid branches, but learned better. - log(1 + x) for medium x. It has no direct problems with underflow or overflow. We have to be careful not to introduce underflow problems by doing things like evaluating x^100 where x is small or by writing x as hi+lo and evaluating lo^10. Naive code can produce the former problem by using too many terms in a power series. hi+lo decompositions are fairly immune to such problems, since in double precision the lo part is at most 2**54 times smaller than the hi part (usually about 2**26 times smaller). Underflow can never occur in the decomposition (except possibly in unusual rounding modes), and the lo part can be raised to a small power provided the original value is not too tiny. - log(1 + x) for large x. Think of it as log(x + 1), where x and 1 are the hi and lo parts of a decomposition. You scale this to 1 + 1/x, where 1 and 1/x are hi and lo parts. This is not a hi+lo decomposition of any representable number, since the whole point of log1p() is that 1+x cannot be represented exactly. Now the lo part can be up to about 2**1023 times smaller than the hi part, so even squaring it can underflow. The situation for 1+1/x is essentially the same as for 1+x, except there is a scale factor that gives extra sources if inaccuracies: log(1 + x) = log(x + 1) = log(x * (1 + 1/x)) = log(x) + log(1 + 1/x). For the log(1 + 1/x) part, there is a threshold (for x) above which 1/x must not be squared since it would underflow, and not much lower threshold below which the 1/x approximation must not be used since it is too inaccurate. It is technically easier to approximate log(1 + 1/x) by 0 instead of 1/x and use the threshold for that. For clog(), we need to evaluate log(x*x + y*y). This is harder than log(1 + x) since the first additive term is not constant and there are multiplications. We should first reduce so that |x| >= |y|. If |x| is 1 and y is small, we now have essentially the same setup as for log(1 + x) for small x. My initial quick fix was only for this case. In the general case, we should evaluate x*x + y*y as hi+lo to great accuracy (we use about 24 extra bits, and this seems to be enough). Then log(hi + lo) has essentially the setup as log(x + 1) for large x, after we pull out the hi factor from each: log(x + 1) = log(x) + log(1 + 1/x) log(hi + lo) = log(hi) + log(1 + lo/hi). We may or may not end up with a lo/hi term that causes underflow. There is the additional problem that the lo/hi term is itself a square (essentially (y/x)^2, so it may already have underflowed. Earlier special cases in our algorithm are apparently enough to avoid underflow for the general case. Bruce
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20120721123522.T877>