Date: Sun, 12 Aug 2012 23:11:18 -0000 From: Bruce Evans <brde@optusnet.com.au> To: Stephen Montgomery-Smith <stephen@missouri.edu> Cc: Diane Bruce <db@db.net>, Bruce Evans <brde@optusnet.com.au>, John Baldwin <jhb@freebsd.org>, David Chisnall <theraven@freebsd.org>, Bruce Evans <bde@freebsd.org>, Steve Kargl <sgk@troutmask.apl.washington.edu>, David Schultz <das@freebsd.org>, Peter Jeremy <peter@rulingia.com>, Warner Losh <imp@bsdimp.com> Subject: Re: Use of C99 extra long double math functions after r236148 Message-ID: <20120721123522.T877@besplex.bde.org> Resent-Message-ID: <20120812231111.GH20453@server.rulingia.com> In-Reply-To: <5009BD6C.9050301@missouri.edu> References: <20120714120432.GA70706@server.rulingia.com> <20120717084457.U3890@besplex.bde.org> <5004A5C7.1040405@missouri.edu> <5004DEA9.1050001@missouri.edu> <20120717200931.U6624@besplex.bde.org> <5006D13D.2080702@missouri.edu> <20120718205625.GA409@troutmask.apl.washington.edu> <500725F2.7060603@missouri.edu> <20120719025345.GA1376@troutmask.apl.washington.edu> <50077987.1080307@missouri.edu> <20120719032706.GA1558@troutmask.apl.washington.edu> <5007826D.7060806@missouri.edu> <5007AD41.9070000@missouri.edu> <20120719205347.T2601@besplex.bde.org> <50084322.7020401@missouri.edu> <20120720035001.W4053@besplex.bde.org> <50085441.4090305@missouri.edu> <20120720162953.N2162@besplex.bde.org> <20120720184114.B2790@besplex.bde.org> <50097128.6030405@missouri.edu> <20120721032448.X5744@besplex.bde.org> <5009BD6C.9050301@missouri.edu>
index | next in thread | previous in thread | raw e-mail
On Fri, 20 Jul 2012, Stephen Montgomery-Smith wrote:
> Bruce, with both of us working at the same time on clog, it is getting hard
> for me to follow. The version I sent this morning is the last change I made.
>
> How about if you come the owner of the code for a while. When you are
> finished, send it back to me, and I will look over everything you have done.
> I won't work on it until then. This works for me in other ways too, because
> my life is very busy at the moment.
I'd prefer you (or Somone Else) to keep working on it. I just plugged it
into my test framework and started zapping errors... (I need to make my
test framework easier to set up so that I don't have any investment in
the not seeing the errors.)
> If I do work on code, it will be on casinh/casin. I am looking over the
> paper by Hull et al, and I am learning a lot from it. One thing I did
It will be a larger task. I don't plan to do it (better not look at the
paper :-). I looked at one of Kahan's old papers about this. For not
seeing these papers, it helps that they are behind paywalls. I only
saw the Kahan paper because someone sent me an obtained copy.
I just remembered that log1p(x) has essentially the same problems with
underflow that we are seeing for clog():
- log(1 + x) for small x. This is analytic with a zero at 0. Any such
function shares the following numeric property: you expand it as
P1*x + P2*x^2 + ... Even the x^2 term in this underflows for tiny x,
so you must not evaluate. You must just return P1*x (with inexact if
x != 0). There is a threshold below which this must be done to avoid
underflow. There is a not much higher threshold above which this must
not be done since it is too inaccurate. All functions in fdlibm try
to be careful with these thresholds, and most succeed. I broke some
of them by invalid optimizations to avoid branches, but learned better.
- log(1 + x) for medium x. It has no direct problems with underflow or
overflow. We have to be careful not to introduce underflow problems
by doing things like evaluating x^100 where x is small or by writing
x as hi+lo and evaluating lo^10. Naive code can produce the former
problem by using too many terms in a power series. hi+lo decompositions
are fairly immune to such problems, since in double precision the lo
part is at most 2**54 times smaller than the hi part (usually about
2**26 times smaller). Underflow can never occur in the decomposition
(except possibly in unusual rounding modes), and the lo part can be
raised to a small power provided the original value is not too tiny.
- log(1 + x) for large x. Think of it as log(x + 1), where x and 1 are
the hi and lo parts of a decomposition. You scale this to 1 + 1/x,
where 1 and 1/x are hi and lo parts. This is not a hi+lo decomposition
of any representable number, since the whole point of log1p() is that
1+x cannot be represented exactly. Now the lo part can be up to about
2**1023 times smaller than the hi part, so even squaring it can underflow.
The situation for 1+1/x is essentially the same as for 1+x, except there
is a scale factor that gives extra sources if inaccuracies:
log(1 + x) = log(x + 1) = log(x * (1 + 1/x)) = log(x) + log(1 + 1/x).
For the log(1 + 1/x) part, there is a threshold (for x) above which
1/x must not be squared since it would underflow, and not much lower
threshold below which the 1/x approximation must not be used since it
is too inaccurate. It is technically easier to approximate log(1 + 1/x)
by 0 instead of 1/x and use the threshold for that.
For clog(), we need to evaluate log(x*x + y*y). This is harder than
log(1 + x) since the first additive term is not constant and there are
multiplications. We should first reduce so that |x| >= |y|. If |x|
is 1 and y is small, we now have essentially the same setup as
for log(1 + x) for small x. My initial quick fix was only for this
case. In the general case, we should evaluate x*x + y*y as hi+lo
to great accuracy (we use about 24 extra bits, and this seems to be
enough). Then log(hi + lo) has essentially the setup as log(x + 1)
for large x, after we pull out the hi factor from each:
log(x + 1) = log(x) + log(1 + 1/x)
log(hi + lo) = log(hi) + log(1 + lo/hi).
We may or may not end up with a lo/hi term that causes underflow. There
is the additional problem that the lo/hi term is itself a square
(essentially (y/x)^2, so it may already have underflowed. Earlier
special cases in our algorithm are apparently enough to avoid underflow
for the general case.
Bruce
help
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20120721123522.T877>
