From owner-freebsd-numerics@FreeBSD.ORG  Sun Aug 12 23:11:18 2012
Return-Path: <owner-freebsd-numerics@FreeBSD.ORG>
Delivered-To: freebsd-numerics@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 12F791065672
	for <freebsd-numerics@freebsd.org>;
	Sun, 12 Aug 2012 23:11:18 +0000 (UTC)
	(envelope-from peter@rulingia.com)
Received: from vps.rulingia.com (host-122-100-2-194.octopus.com.au
	[122.100.2.194])
	by mx1.freebsd.org (Postfix) with ESMTP id 8BE3B8FC18
	for <freebsd-numerics@freebsd.org>;
	Sun, 12 Aug 2012 23:11:17 +0000 (UTC)
Received: from server.rulingia.com
	(c220-239-249-137.belrs5.nsw.optusnet.com.au [220.239.249.137])
	by vps.rulingia.com (8.14.5/8.14.5) with ESMTP id q7CNBHdr075875
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK)
	for <freebsd-numerics@freebsd.org>;
	Mon, 13 Aug 2012 09:11:17 +1000 (EST)
	(envelope-from peter@rulingia.com)
X-Bogosity: Ham, spamicity=0.000000
Received: from server.rulingia.com (localhost.rulingia.com [127.0.0.1])
	by server.rulingia.com (8.14.5/8.14.5) with ESMTP id q7CNBBkR021721
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO)
	for <freebsd-numerics@freebsd.org>;
	Mon, 13 Aug 2012 09:11:11 +1000 (EST)
	(envelope-from peter@server.rulingia.com)
Received: (from peter@localhost)
	by server.rulingia.com (8.14.5/8.14.5/Submit) id q7CNBBGo021720
	for freebsd-numerics@freebsd.org; Mon, 13 Aug 2012 09:11:11 +1000 (EST)
	(envelope-from peter)
Resent-From: Peter Jeremy <peter@rulingia.com>
Resent-Date: Mon, 13 Aug 2012 09:11:11 +1000
Resent-Message-ID: <20120812231111.GH20453@server.rulingia.com>
Resent-To: freebsd-numerics@freebsd.org
Received: from vps.rulingia.com (host-122-100-2-194.octopus.com.au
	[122.100.2.194])
	by server.rulingia.com (8.14.5/8.14.5) with ESMTP id q6L3Ylxh063492
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK)
	for <peter@home.rulingia.com>; Sat, 21 Jul 2012 13:34:47 +1000 (EST)
	(envelope-from brde@optusnet.com.au)
Received: from mail34.syd.optusnet.com.au (mail34.syd.optusnet.com.au
	[211.29.133.218])
	by vps.rulingia.com (8.14.5/8.14.5) with ESMTP id q6L3Ykb0095439
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO)
	for <peter@rulingia.com>; Sat, 21 Jul 2012 13:34:46 +1000 (EST)
	(envelope-from brde@optusnet.com.au)
Received: from c122-106-171-246.carlnfd1.nsw.optusnet.com.au
	(c122-106-171-246.carlnfd1.nsw.optusnet.com.au [122.106.171.246])
	by mail34.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
	q6L3YW2C028811
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Sat, 21 Jul 2012 13:34:33 +1000
From: Bruce Evans <brde@optusnet.com.au>
Mail-Followup-To: freebsd-numerics@freebsd.org
X-X-Sender: bde@besplex.bde.org
To: Stephen Montgomery-Smith <stephen@missouri.edu>
In-Reply-To: <5009BD6C.9050301@missouri.edu>
Message-ID: <20120721123522.T877@besplex.bde.org>
References: <20120714120432.GA70706@server.rulingia.com>
	<20120717084457.U3890@besplex.bde.org> <5004A5C7.1040405@missouri.edu>
	<5004DEA9.1050001@missouri.edu> <20120717200931.U6624@besplex.bde.org>
	<5006D13D.2080702@missouri.edu>
	<20120718205625.GA409@troutmask.apl.washington.edu>
	<500725F2.7060603@missouri.edu>
	<20120719025345.GA1376@troutmask.apl.washington.edu>
	<50077987.1080307@missouri.edu>
	<20120719032706.GA1558@troutmask.apl.washington.edu>
	<5007826D.7060806@missouri.edu> <5007AD41.9070000@missouri.edu>
	<20120719205347.T2601@besplex.bde.org> <50084322.7020401@missouri.edu>
	<20120720035001.W4053@besplex.bde.org> <50085441.4090305@missouri.edu>
	<20120720162953.N2162@besplex.bde.org>
	<20120720184114.B2790@besplex.bde.org>
	<50097128.6030405@missouri.edu> <20120721032448.X5744@besplex.bde.org>
	<5009BD6C.9050301@missouri.edu>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Mailman-Approved-At: Sun, 12 Aug 2012 23:56:00 +0000
Cc: Diane Bruce <db@db.net>, Bruce Evans <brde@optusnet.com.au>,
	John Baldwin <jhb@freebsd.org>, David Chisnall <theraven@freebsd.org>,
	Bruce Evans <bde@freebsd.org>,
	Steve Kargl <sgk@troutmask.apl.washington.edu>,
	David Schultz <das@freebsd.org>,
	Peter Jeremy <peter@rulingia.com>, Warner Losh <imp@bsdimp.com>
Subject: Re: Use of C99 extra long double math functions after r236148
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
	<freebsd-numerics.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
	<mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
	<mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
Date: Sun, 12 Aug 2012 23:11:18 -0000
X-Original-Date: Sat, 21 Jul 2012 13:34:32 +1000 (EST)
X-List-Received-Date: Sun, 12 Aug 2012 23:11:18 -0000

On Fri, 20 Jul 2012, Stephen Montgomery-Smith wrote:

> Bruce, with both of us working at the same time on clog, it is getting hard 
> for me to follow.  The version I sent this morning is the last change I made.
>
> How about if you come the owner of the code for a while.  When you are 
> finished, send it back to me, and I will look over everything you have done. 
> I won't work on it until then.  This works for me in other ways too, because 
> my life is very busy at the moment.

I'd prefer you (or Somone Else) to keep working on it.  I just plugged it
into my test framework and started zapping errors...  (I need to make my
test framework easier to set up so that I don't have any investment in
the not seeing the errors.)

> If I do work on code, it will be on casinh/casin.  I am looking over the 
> paper by Hull et al, and I am learning a lot from it.  One thing I did

It will be a larger task.  I don't plan to do it (better not look at the
paper :-).  I looked at one of Kahan's old papers about this.  For not
seeing these papers, it helps that they are behind paywalls.  I only
saw the Kahan paper because someone sent me an obtained copy.

I just remembered that log1p(x) has essentially the same problems with
underflow that we are seeing for clog():

- log(1 + x) for small x.  This is analytic with a zero at 0.  Any such
   function shares the following numeric property: you expand it as
   P1*x + P2*x^2 + ...  Even the x^2 term in this underflows for tiny x,
   so you must not evaluate.  You must just return P1*x (with inexact if
   x != 0).  There is a threshold below which this must be done to avoid
   underflow.  There is a not much higher threshold above which this must
   not be done since it is too inaccurate.  All functions in fdlibm try
   to be careful with these thresholds, and most succeed.  I broke some
   of them by invalid optimizations to avoid branches, but learned better.

- log(1 + x) for medium x.  It has no direct problems with underflow or
   overflow.  We have to be careful not to introduce underflow problems
   by doing things like evaluating x^100 where x is small or by writing
   x as hi+lo and evaluating lo^10.  Naive code can produce the former
   problem by using too many terms in a power series.  hi+lo decompositions
   are fairly immune to such problems, since in double precision the lo
   part is at most 2**54 times smaller than the hi part (usually about
   2**26 times smaller).  Underflow can never occur in the decomposition
   (except possibly in unusual rounding modes), and the lo part can be
   raised to a small power provided the original value is not too tiny.

- log(1 + x) for large x.  Think of it as log(x + 1), where x and 1 are
   the hi and lo parts of a decomposition.  You scale this to 1 + 1/x,
   where 1 and 1/x are hi and lo parts.  This is not a hi+lo decomposition
   of any representable number, since the whole point of log1p() is that
   1+x cannot be represented exactly.  Now the lo part can be up to about
   2**1023 times smaller than the hi part, so even squaring it can underflow.
   The situation for 1+1/x is essentially the same as for 1+x, except there
   is a scale factor that gives extra sources if inaccuracies:
     log(1 + x) = log(x + 1) = log(x * (1 + 1/x)) = log(x) + log(1 + 1/x).
   For the log(1 + 1/x) part, there is a threshold (for x) above which
   1/x must not be squared since it would underflow, and not much lower
   threshold below which the 1/x approximation must not be used since it
   is too inaccurate.  It is technically easier to approximate log(1 + 1/x)
   by 0 instead of 1/x and use the threshold for that.

For clog(), we need to evaluate log(x*x + y*y).  This is harder than
log(1 + x) since the first additive term is not constant and there are
multiplications.  We should first reduce so that |x| >= |y|.  If |x|
is 1 and y is small, we now have essentially the same setup as
for log(1 + x) for small x.  My initial quick fix was only for this
case.  In the general case, we should evaluate x*x + y*y as hi+lo
to great accuracy (we use about 24 extra bits, and this seems to be
enough).  Then log(hi + lo) has essentially the setup as log(x + 1)
for large x, after we pull out the hi factor from each:
     log(x + 1)   = log(x)  + log(1 + 1/x)
     log(hi + lo) = log(hi) + log(1 + lo/hi).
We may or may not end up with a lo/hi term that causes underflow.  There
is the additional problem that the lo/hi term is itself a square
(essentially (y/x)^2, so it may already have underflowed.  Earlier
special cases in our algorithm are apparently enough to avoid underflow
for the general case.

Bruce