From owner-freebsd-numerics@FreeBSD.ORG  Sun Sep 23 00:08:10 2012
Return-Path: <owner-freebsd-numerics@FreeBSD.ORG>
Delivered-To: freebsd-numerics@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 867F1106564A
	for <freebsd-numerics@freebsd.org>;
	Sun, 23 Sep 2012 00:08:10 +0000 (UTC)
	(envelope-from stephen@missouri.edu)
Received: from wilberforce.math.missouri.edu (wilberforce.math.missouri.edu
	[128.206.184.213])
	by mx1.freebsd.org (Postfix) with ESMTP id 44B4F8FC0C
	for <freebsd-numerics@freebsd.org>;
	Sun, 23 Sep 2012 00:08:10 +0000 (UTC)
Received: from [127.0.0.1] (wilberforce.math.missouri.edu [128.206.184.213])
	by wilberforce.math.missouri.edu (8.14.5/8.14.5) with ESMTP id
	q8N089Wi033037 for <freebsd-numerics@freebsd.org>;
	Sat, 22 Sep 2012 19:08:09 -0500 (CDT)
	(envelope-from stephen@missouri.edu)
Message-ID: <505E52EA.7090102@missouri.edu>
Date: Sat, 22 Sep 2012 19:08:10 -0500
From: Stephen Montgomery-Smith <stephen@missouri.edu>
User-Agent: Mozilla/5.0 (X11; Linux i686;
	rv:15.0) Gecko/20120827 Thunderbird/15.0
MIME-Version: 1.0
To: freebsd-numerics@freebsd.org
References: <5017111E.6060003@missouri.edu>
	<20120917022614.R2943@besplex.bde.org>
	<50562213.9020400@missouri.edu>
	<20120917060116.G3825@besplex.bde.org>
	<50563C57.60806@missouri.edu>
	<20120918012459.V5094@besplex.bde.org>
	<5057A932.3000603@missouri.edu> <5057F24B.7020605@missouri.edu>
	<20120918162105.U991@besplex.bde.org>
	<20120918232850.N2144@besplex.bde.org>
	<20120919010613.T2493@besplex.bde.org>
	<505BD9B4.8020801@missouri.edu>
	<20120921172402.W945@besplex.bde.org>
	<20120921212525.W1732@besplex.bde.org>
	<505C7490.90600@missouri.edu>
	<20120922042112.E3044@besplex.bde.org>
	<505CBF14.70908@missouri.edu> <505CC11A.5030502@missouri.edu>
	<20120922081607.F3613@besplex.bde.org>
	<20120922091625.Y3828@besplex.b! de.org>
	<505D1037.8010202@missouri.edu>
	<20120922142349.X4599@besplex.bde.org>
	<20120923044814.S1465@besplex.bde.org>
	<505E2575.6030302@missouri.edu> <505E27CE.3060107@missouri.edu>
	<20120923073807.K2059@besplex.bde.org>
	<505E3AE6.2010006@missouri.edu>
In-Reply-To: <505E3AE6.2010006@missouri.edu>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: Complex arg-trig functions
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
	<freebsd-numerics.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
	<mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
	<mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 23 Sep 2012 00:08:10 -0000

On 09/22/2012 05:25 PM, Stephen Montgomery-Smith wrote:
> On 09/22/2012 04:47 PM, Bruce Evans wrote:
>> On Sat, 22 Sep 2012, Stephen Montgomery-Smith wrote:
>>
>>> 2.  In my accuracy tests for casin(h), I have never seen the double or
>>> long double have an error greater than 4 ULP.  But for the float case
>>> I have seen 4.15 ULP.
>>
>> I haven't seen any larger than 3.4.  What is the worst case you found?
>> Errors found for float precision tend to be because the density of bad
>> cases is higher so it is easier to test more of them accidentally.  I
>> did do some non-random testing for all float cases in narrow strips
>> about x or y = 0 or 1, but not for all combinations of this with all
>> functions.
>
> Here are some examples for float.  In all these outputs:
> The first entry is the "count".
> The second entry is the function.
> The third and fourth entries are the real and imaginary part of the
> error in ULP.
> The fifth and sixth entries are the real and imaginary part of the input.
> The seventh and eighth and ninth and tenth entries are the real part and
> imaginary part of the answers from the float/double respectively
> (printed to few enough decimal places that you cannot tell they are
> different.)
>
> 2365614 acos 3.75621 0.86681 1.0338860750198364258
> -0.090228326618671417236 0.246582 0.361712 0.246582 0.361712
> 3087248 acos 3.56538 0.1165 2.3730618953704833984 0.26976472139358520508
> 0.124496 -1.51821 0.124496 -1.51821
> 5973027 asinh 3.61544 0.513 0.10977014899253845215
> 0.48254761099815368652 0.124712 0.499309 0.124712 0.499309
> 6558511 acosh 3.57286 0.419525 -0.29658588767051696777
> -0.11975207924842834473 0.124975 -1.8695 0.124975 -1.8695
> 9998127 acos 3.51324 1.09793 1.0892471075057983398
> -0.12541522085666656494 0.247452 0.491951 0.247452 0.491951
> 14879751 asinh 3.5643 1.83067 -0.11303693056106567383
> 0.4351412653923034668 -0.124994 0.446448 -0.124994 0.446448
> 19510082 asin 3.61922 0.0103899 0.46096378564834594727
> -0.01612871512770652771 0.478995 -0.0181731 0.478995 -0.0181731
>
> I can send more examples on request.  I'm not seeing a real pattern here.

Here is an error of about 4.15 ULP

171886645 acosh 0.504775 4.15104 1.0684067010879516602 
-0.049668960273265838623 0.389377 -0.124716 0.389377 -0.124716


From owner-freebsd-numerics@FreeBSD.ORG  Sun Sep 23 00:37:11 2012
Return-Path: <owner-freebsd-numerics@FreeBSD.ORG>
Delivered-To: freebsd-numerics@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 0D181106566B
	for <freebsd-numerics@FreeBSD.org>;
	Sun, 23 Sep 2012 00:37:11 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from mail35.syd.optusnet.com.au (mail35.syd.optusnet.com.au
	[211.29.133.51])
	by mx1.freebsd.org (Postfix) with ESMTP id 8CB1D8FC08
	for <freebsd-numerics@FreeBSD.org>;
	Sun, 23 Sep 2012 00:37:09 +0000 (UTC)
Received: from c122-106-157-84.carlnfd1.nsw.optusnet.com.au
	(c122-106-157-84.carlnfd1.nsw.optusnet.com.au [122.106.157.84])
	by mail35.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
	q8N0b1no005538
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Sun, 23 Sep 2012 10:37:02 +1000
Date: Sun, 23 Sep 2012 10:37:01 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Stephen Montgomery-Smith <stephen@missouri.edu>
In-Reply-To: <505E3AE6.2010006@missouri.edu>
Message-ID: <20120923094809.O2556@besplex.bde.org>
References: <5017111E.6060003@missouri.edu> <50562213.9020400@missouri.edu>
	<20120917060116.G3825@besplex.bde.org> <50563C57.60806@missouri.edu>
	<20120918012459.V5094@besplex.bde.org> <5057A932.3000603@missouri.edu>
	<5057F24B.7020605@missouri.edu> <20120918162105.U991@besplex.bde.org>
	<20120918232850.N2144@besplex.bde.org>
	<20120919010613.T2493@besplex.bde.org>
	<505BD9B4.8020801@missouri.edu> <20120921172402.W945@besplex.bde.org>
	<20120921212525.W1732@besplex.bde.org> <505C7490.90600@missouri.edu>
	<20120922042112.E3044@besplex.bde.org> <505CBF14.70908@missouri.edu>
	<505CC11A.5030502@missouri.edu> <20120922081607.F3613@besplex.bde.org>
	<20120922091625.Y3828@besplex.b! de.org>
	<505D1037.8010202@missouri.edu>
	<20120922142349.X4599@besplex.bde.org>
	<20120923044814.S1465@besplex.bde.org>
	<505E2575.6030302@missouri.edu> <505E27CE.3060107@missouri.edu>
	<20120923073807.K2059@besplex.bde.org> <505E3AE6.2010006@missouri.edu>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: freebsd-numerics@FreeBSD.org
Subject: Re: Complex arg-trig functions
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
	<freebsd-numerics.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
	<mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
	<mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 23 Sep 2012 00:37:11 -0000

On Sat, 22 Sep 2012, Stephen Montgomery-Smith wrote:

> On 09/22/2012 04:47 PM, Bruce Evans wrote:
>> On Sat, 22 Sep 2012, Stephen Montgomery-Smith wrote:
>> 
>>> 2.  In my accuracy tests for casin(h), I have never seen the double or
>>> long double have an error greater than 4 ULP.  But for the float case
>>> I have seen 4.15 ULP.
>> 
>> I haven't seen any larger than 3.4.  What is the worst case you found?
>> Errors found for float precision tend to be because the density of bad
>> cases is higher so it is easier to test more of them accidentally.  I
>> did do some non-random testing for all float cases in narrow strips
>> about x or y = 0 or 1, but not for all combinations of this with all
>> functions.
>
> Here are some examples for float.  In all these outputs:
> The first entry is the "count".
> The second entry is the function.
> The third and fourth entries are the real and imaginary part of the error in 
> ULP.
> The fifth and sixth entries are the real and imaginary part of the input.
> The seventh and eighth and ninth and tenth entries are the real part and 
> imaginary part of the answers from the float/double respectively (printed to 
> few enough decimal places that you cannot tell they are different.)
>
> 2365614 acos 3.75621 0.86681 1.0338860750198364258 -0.090228326618671417236 
> 0.246582 0.361712 0.246582 0.361712
> 3087248 acos 3.56538 0.1165 2.3730618953704833984 0.26976472139358520508 
> 0.124496 -1.51821 0.124496 -1.51821
> 5973027 asinh 3.61544 0.513 0.10977014899253845215 0.48254761099815368652 
> 0.124712 0.499309 0.124712 0.499309
> 6558511 acosh 3.57286 0.419525 -0.29658588767051696777 
> -0.11975207924842834473 0.124975 -1.8695 0.124975 -1.8695
> 9998127 acos 3.51324 1.09793 1.0892471075057983398 -0.12541522085666656494 
> 0.247452 0.491951 0.247452 0.491951
> 14879751 asinh 3.5643 1.83067 -0.11303693056106567383 0.4351412653923034668 
> -0.124994 0.446448 -0.124994 0.446448
> 19510082 asin 3.61922 0.0103899 0.46096378564834594727 
> -0.01612871512770652771 0.478995 -0.0181731 0.478995 -0.0181731
>
> I can send more examples on request.  I'm not seeing a real pattern here.

I think they really are that big.  3.75621 is still below 4, and only
slightly higher than the worse case I found for acosf (3.5303).
Searching more cases will find slightly worse cases.  4.15 only seems
bad because we thought we had 2 or 3.   The important thing is to find
all cases where the error might blow up and check near them.  Also check
near cases where the algorithm changes (these tend to be the same).  I
have no good way for generating the test data for this and have to
type it in for each case.

Bruce

From owner-freebsd-numerics@FreeBSD.ORG  Sun Sep 23 00:46:08 2012
Return-Path: <owner-freebsd-numerics@FreeBSD.ORG>
Delivered-To: freebsd-numerics@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id C23FA1065670
	for <freebsd-numerics@freebsd.org>;
	Sun, 23 Sep 2012 00:46:08 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from mail35.syd.optusnet.com.au (mail35.syd.optusnet.com.au
	[211.29.133.51])
	by mx1.freebsd.org (Postfix) with ESMTP id 439398FC0A
	for <freebsd-numerics@freebsd.org>;
	Sun, 23 Sep 2012 00:46:08 +0000 (UTC)
Received: from c122-106-157-84.carlnfd1.nsw.optusnet.com.au
	(c122-106-157-84.carlnfd1.nsw.optusnet.com.au [122.106.157.84])
	by mail35.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
	q8N0jsDM023629
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Sun, 23 Sep 2012 10:46:06 +1000
Date: Sun, 23 Sep 2012 10:45:54 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Mehmet Erol Sanliturk <m.e.sanliturk@gmail.com>
In-Reply-To: <CAOgwaMvw-u0yHs2RxBdGV+41knhvrzafO9umuswwBbRK+h3ahQ@mail.gmail.com>
Message-ID: <20120923103809.Y2696@besplex.bde.org>
References: <CAOgwaMvw-u0yHs2RxBdGV+41knhvrzafO9umuswwBbRK+h3ahQ@mail.gmail.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: freebsd-numerics@freebsd.org
Subject: Re: Book names about Computer Approximations
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
	<freebsd-numerics.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
	<mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
	<mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 23 Sep 2012 00:46:08 -0000

On Sat, 22 Sep 2012, Mehmet Erol Sanliturk wrote:

> Dear All ,
>
> I want to buy some books about computer approximations to
> functions such as elementary , distributions , etc.
> like the books
>
> http://www.amazon.com/Approximations-Digital-Computers-Cecil-Hastings/dp/B000Q5GBG6/ref=sr_1_1?s=books&ie=UTF8&qid=1348357087&sr=1-1
> http://www.amazon.com/Computer-Approximations-John-Fraser-Hart/dp/0882756427
> http://www.amazon.com/Elementary-Functions-Implementation-Jean-Michel-Muller/dp/0817643729/ref=pd_sim_sbs_b_2

Muller's boot is probably good.  (I haven't found time to read it and hope
I already know most of it.)

Also one by Muller et al "Handbook of Floating Point ...".  It was too
expensive to buy for something I don't have time to read, but a google
search found it and it had exactly what I needed in google books.

> http://www.amazon.com/Elementary-Functions-Prentice-Hall-computational-mathematics/dp/0138220646/ref=pd_sim_sbs_b_3

I don't know anything about the others.

Bruce

From owner-freebsd-numerics@FreeBSD.ORG  Sun Sep 23 00:49:46 2012
Return-Path: <owner-freebsd-numerics@FreeBSD.ORG>
Delivered-To: freebsd-numerics@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 9F72D106566B
	for <freebsd-numerics@freebsd.org>;
	Sun, 23 Sep 2012 00:49:46 +0000 (UTC)
	(envelope-from m.e.sanliturk@gmail.com)
Received: from mail-oa0-f54.google.com (mail-oa0-f54.google.com
	[209.85.219.54])
	by mx1.freebsd.org (Postfix) with ESMTP id 5AF198FC12
	for <freebsd-numerics@freebsd.org>;
	Sun, 23 Sep 2012 00:49:46 +0000 (UTC)
Received: by oagm1 with SMTP id m1so5686987oag.13
	for <freebsd-numerics@freebsd.org>;
	Sat, 22 Sep 2012 17:49:45 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
	h=mime-version:in-reply-to:references:date:message-id:subject:from:to
	:cc:content-type;
	bh=DVFCmEI353iFjGt0m1f7Rww4pjaKxaaa0NgsvwLeP7A=;
	b=gwDxGV6FN5ikJrolf9JRRby6HzfXHPINR+ILwcQoIHURFAKetSKh+DGa7cnZk1Aiwg
	NhOX7g8j8z0abEZhEx35IIg8/K1QtS9JhOOs/CJkNPbKC+P3TFpBoh8Bd3ovrU2O4QgZ
	jAbWdXfV3vAnzjcGgHrreGTvnsSL364Rjcc+Q/MVo+Fn3GmnRFhIqnXOUupEP3v3efgm
	UpYwrUbjxzK8hrKaIz19LVgrHnjukvnFvGxRa364yCzB6Sm3bhPRPntVEfkXUYn7AFou
	TuWmehh5aCkF5PrmJYiCtUxHYo34eAQp4rikEpoNirkkHUuIgaaQdPp0C6a7DJ6/sddr
	scww==
MIME-Version: 1.0
Received: by 10.182.76.194 with SMTP id m2mr6986185obw.27.1348361385748; Sat,
	22 Sep 2012 17:49:45 -0700 (PDT)
Received: by 10.182.141.66 with HTTP; Sat, 22 Sep 2012 17:49:45 -0700 (PDT)
In-Reply-To: <20120923103809.Y2696@besplex.bde.org>
References: <CAOgwaMvw-u0yHs2RxBdGV+41knhvrzafO9umuswwBbRK+h3ahQ@mail.gmail.com>
	<20120923103809.Y2696@besplex.bde.org>
Date: Sat, 22 Sep 2012 17:49:45 -0700
Message-ID: <CAOgwaMtiCcduh06Da3h4Wx_0VVV5d5uwwsDr9RoYG-qGcg2Stg@mail.gmail.com>
From: Mehmet Erol Sanliturk <m.e.sanliturk@gmail.com>
To: Bruce Evans <brde@optusnet.com.au>
Content-Type: text/plain; charset=UTF-8
X-Content-Filtered-By: Mailman/MimeDel 2.1.5
Cc: freebsd-numerics@freebsd.org
Subject: Re: Book names about Computer Approximations
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
	<freebsd-numerics.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
	<mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
	<mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 23 Sep 2012 00:49:46 -0000

On Sat, Sep 22, 2012 at 5:45 PM, Bruce Evans <brde@optusnet.com.au> wrote:

> On Sat, 22 Sep 2012, Mehmet Erol Sanliturk wrote:
>
>  Dear All ,
>>
>> I want to buy some books about computer approximations to
>> functions such as elementary , distributions , etc.
>> like the books
>>
>> http://www.amazon.com/**Approximations-Digital-**
>> Computers-Cecil-Hastings/dp/**B000Q5GBG6/ref=sr_1_1?s=books&**
>> ie=UTF8&qid=1348357087&sr=1-1<http://www.amazon.com/Approximations-Digital-Computers-Cecil-Hastings/dp/B000Q5GBG6/ref=sr_1_1?s=books&ie=UTF8&qid=1348357087&sr=1-1>
>> http://www.amazon.com/**Computer-Approximations-John-**
>> Fraser-Hart/dp/0882756427<http://www.amazon.com/Computer-Approximations-John-Fraser-Hart/dp/0882756427>
>> http://www.amazon.com/**Elementary-Functions-**
>> Implementation-Jean-Michel-**Muller/dp/0817643729/ref=pd_**sim_sbs_b_2<http://www.amazon.com/Elementary-Functions-Implementation-Jean-Michel-Muller/dp/0817643729/ref=pd_sim_sbs_b_2>
>>
>
> Muller's boot is probably good.  (I haven't found time to read it and hope
> I already know most of it.)
>
> Also one by Muller et al "Handbook of Floating Point ...".  It was too
> expensive to buy for something I don't have time to read, but a google
> search found it and it had exactly what I needed in google books.
>
>  http://www.amazon.com/**Elementary-Functions-Prentice-**
>> Hall-computational-**mathematics/dp/0138220646/ref=**pd_sim_sbs_b_3<http://www.amazon.com/Elementary-Functions-Prentice-Hall-computational-mathematics/dp/0138220646/ref=pd_sim_sbs_b_3>
>>
>
> I don't know anything about the others.
>
> Bruce
>


Thank you very much .

Mehmet Erol Sanliturk

From owner-freebsd-numerics@FreeBSD.ORG  Sun Sep 23 01:18:14 2012
Return-Path: <owner-freebsd-numerics@FreeBSD.ORG>
Delivered-To: freebsd-numerics@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 04800106564A
	for <freebsd-numerics@freebsd.org>;
	Sun, 23 Sep 2012 01:18:14 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from mail01.syd.optusnet.com.au (mail01.syd.optusnet.com.au
	[211.29.132.182])
	by mx1.freebsd.org (Postfix) with ESMTP id 877828FC0A
	for <freebsd-numerics@freebsd.org>;
	Sun, 23 Sep 2012 01:18:13 +0000 (UTC)
Received: from c122-106-157-84.carlnfd1.nsw.optusnet.com.au
	(c122-106-157-84.carlnfd1.nsw.optusnet.com.au [122.106.157.84])
	by mail01.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
	q8N1IAKf025856
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Sun, 23 Sep 2012 11:18:11 +1000
Date: Sun, 23 Sep 2012 11:18:10 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Stephen Montgomery-Smith <stephen@missouri.edu>
In-Reply-To: <505E52EA.7090102@missouri.edu>
Message-ID: <20120923105546.J2746@besplex.bde.org>
References: <5017111E.6060003@missouri.edu>
	<20120917060116.G3825@besplex.bde.org>
	<50563C57.60806@missouri.edu> <20120918012459.V5094@besplex.bde.org>
	<5057A932.3000603@missouri.edu> <5057F24B.7020605@missouri.edu>
	<20120918162105.U991@besplex.bde.org>
	<20120918232850.N2144@besplex.bde.org>
	<20120919010613.T2493@besplex.bde.org> <505BD9B4.8020801@missouri.edu>
	<20120921172402.W945@besplex.bde.org>
	<20120921212525.W1732@besplex.bde.org>
	<505C7490.90600@missouri.edu> <20120922042112.E3044@besplex.bde.org>
	<505CBF14.70908@missouri.edu> <505CC11A.5030502@missouri.edu>
	<20120922081607.F3613@besplex.bde.org> <20120922091625.Y3828@besplex.b!
	de.org> <505D1037.8010202@missouri.edu>
	<20120922142349.X4599@besplex.bde.org>
	<20120923044814.S1465@besplex.bde.org> <505E2575.6030302@missouri.edu>
	<505E27CE.3060107@missouri.edu> <20120923073807.K2059@besplex.bde.org>
	<505E3AE6.2010006@missouri.edu> <505E52EA.7090102@missouri.edu>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: freebsd-numerics@freebsd.org
Subject: Re: Complex arg-trig functions
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
	<freebsd-numerics.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
	<mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
	<mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 23 Sep 2012 01:18:14 -0000

On Sat, 22 Sep 2012, Stephen Montgomery-Smith wrote:

> On 09/22/2012 05:25 PM, Stephen Montgomery-Smith wrote:
>> On 09/22/2012 04:47 PM, Bruce Evans wrote:
>>> On Sat, 22 Sep 2012, Stephen Montgomery-Smith wrote:
>>> 
>>>> 2.  In my accuracy tests for casin(h), I have never seen the double or
>>>> long double have an error greater than 4 ULP.  But for the float case
>>>> I have seen 4.15 ULP.
>>> 
>>> I haven't seen any larger than 3.4.  What is the worst case you found?
> ...
> Here is an error of about 4.15 ULP
>
> 171886645 acosh 0.504775 4.15104 1.0684067010879516602 
> -0.049668960273265838623 0.389377 -0.124716 0.389377 -0.124716

Even y+1 is inaccurate for this arg.  (do_hard_work() has several of
these.)

On i386, the error is only 2.15 ulps.  But changing all 1's to 1.0's and
all sqrtf()'s to sqrt()s in do_hard_work() made no difference, so it
must be other extra calculations that help on i386.

Bruce

From owner-freebsd-numerics@FreeBSD.ORG  Mon Sep 24 19:05:48 2012
Return-Path: <owner-freebsd-numerics@FreeBSD.ORG>
Delivered-To: numerics@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 6EAA4106566B
	for <numerics@freebsd.org>; Mon, 24 Sep 2012 19:05:48 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from mail06.syd.optusnet.com.au (mail06.syd.optusnet.com.au
	[211.29.132.187])
	by mx1.freebsd.org (Postfix) with ESMTP id 0E8138FC0C
	for <numerics@freebsd.org>; Mon, 24 Sep 2012 19:05:44 +0000 (UTC)
Received: from c122-106-157-84.carlnfd1.nsw.optusnet.com.au
	(c122-106-157-84.carlnfd1.nsw.optusnet.com.au [122.106.157.84])
	by mail06.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
	q8OJ5gc3024699
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO)
	for <numerics@freebsd.org>; Tue, 25 Sep 2012 05:05:43 +1000
Date: Tue, 25 Sep 2012 05:05:42 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: numerics@freebsd.org
Message-ID: <20120925035402.C1433@besplex.bde.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: 
Subject: LDBL_MAX broken on i386
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
	<freebsd-numerics.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
	<mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
	<mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 24 Sep 2012 19:05:48 -0000

I just noticed that LDBL_MAX is completely broken on i386.  gcc
supports i386's default of not-really-long-double precision to a
fault, by rounding all constant expressions including literal
constants in the same way that they would be rounded at runtime in the
FreeBSD default rounding precision of 53 bits.  The definition of
LDBL_MAX knows nothing of this.  It asks for a value that is
significantly larger than possible, so the value is rounded to
infinity:

#define LDBL_MAX	1.1897314953572317650E+4932L

It is hard to see what this is, but in hex it asks for the value that
would work with 64-bit precision: 0x0.ffffffffffffffffp16384L (it is
just this value rounded to 20 decimal digits.

   (This 20 or perhaps DECIMAL_DIG = 21 is another bug.  All the long
   double constants in x86 float.h are rounded to 20 decimal digits,
   but DECIMAL_DIG says that 21 should be used.  Of course, 21 is vary
   rarely needed, and the small set of constants in float.h is very
   unlikely to need it or even 20, but they should set a better example.
   DECIMAL_DIG is new in C99, and the number of digits in the constants
   wasn't increased from 20 to 21 when it was added.

   OTOH, sparc64 has off-by-1-or-2 errors in the opposite direction:
   - its DECIMAL_DIG is 36
   - it uses 37 decimal digits
   - its DECIMAL_DIG used to be off-by-1 in the opposite direction (was
     35)
   Using 37 digits might be the result of trying to use the correct number
   (36) all along, but printing the values using "%.36Le" format.  This
   gives 1 more digit than the number in the format.)

Rounding this to nearest fairly obviously rounds up, so it overflows.
The half-way case is 0x0.fffffffffffffc00p16384L.  This rounds up too,
to round to even.  The largest value that rounds down is thus
0x0.fffffffffffffbffp16384L, and the largest representable 53-bit
value is 0x0.fffffffffffff800p16384L.

clang is not compatible to a fault here.  This gives bugfeatures like
the following:
- the buggy definition of LDBL_MAX actually works.  It gives the 64-bit
   value 0x0.ffffffffffffffffp16384L.
- other constant literals work similarly
- compile-time evaluation is broken.  For example, let L be half of the
   64-bit LDBL_MAX (0x0.ffffffffffffffffp16383L).  Then:
   - L+L = Inf at runtime (with not-really-long-double precision -- just
     extra exponent range)
   - L+L = 64-bit LDBL_MAX at compile time
   But with gcc:
   - L is rounded to nearest, so it becomes 0x1p383L
   - L+L = Inf at runtime (with either non-really-long-double precision or
     full long double precision)
   - L+L = Inf at compile time

clang warns about overflow in too-large constant literals for LDBL_MAX.
This was useful for debugging.

I think the technically handling for gcc is to round constant literals
to full binary precision (64 bits for i386 long doubles), and only
evaluate FP constant expressions at compile time if the result doesn't
depend on the rounding mode and doesn't lose precision in the default
rounding precision (whatever that may be -- even in full precision,
the calcuation should be deferred to runtime if it would set inexact),
or at least have an option for this and/or warn if the calculation
loses precision.  Decimal constants cause a problem here -- 0.3 cannot
be represented without losing precision, but warning for that would
be unreasonable.  An option to calculate this at runtime as 3.0/10.0
might be useful, but wouldn't work for longer decimal constants like
1.1897314953572317650E+4932L (this must have more digits than can be
represented so that it gives the desired value after rounding).

Bruce

From owner-freebsd-numerics@FreeBSD.ORG  Mon Sep 24 19:51:27 2012
Return-Path: <owner-freebsd-numerics@FreeBSD.ORG>
Delivered-To: numerics@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id A9C431065677
	for <numerics@freebsd.org>; Mon, 24 Sep 2012 19:51:27 +0000 (UTC)
	(envelope-from sgk@troutmask.apl.washington.edu)
Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu
	[128.95.76.21]) by mx1.freebsd.org (Postfix) with ESMTP id 620338FC1E
	for <numerics@freebsd.org>; Mon, 24 Sep 2012 19:51:27 +0000 (UTC)
Received: from troutmask.apl.washington.edu (localhost.apl.washington.edu
	[127.0.0.1])
	by troutmask.apl.washington.edu (8.14.5/8.14.5) with ESMTP id
	q8OJpLQ8022221; Mon, 24 Sep 2012 12:51:21 -0700 (PDT)
	(envelope-from sgk@troutmask.apl.washington.edu)
Received: (from sgk@localhost)
	by troutmask.apl.washington.edu (8.14.5/8.14.5/Submit) id
	q8OJpL2E022220; Mon, 24 Sep 2012 12:51:21 -0700 (PDT)
	(envelope-from sgk)
Date: Mon, 24 Sep 2012 12:51:21 -0700
From: Steve Kargl <sgk@troutmask.apl.washington.edu>
To: Bruce Evans <brde@optusnet.com.au>
Message-ID: <20120924195121.GA22138@troutmask.apl.washington.edu>
References: <20120925035402.C1433@besplex.bde.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20120925035402.C1433@besplex.bde.org>
User-Agent: Mutt/1.5.21 (2010-09-15)
Cc: numerics@freebsd.org
Subject: Re: LDBL_MAX broken on i386
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
	<freebsd-numerics.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
	<mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
	<mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 24 Sep 2012 19:51:27 -0000

On Tue, Sep 25, 2012 at 05:05:42AM +1000, Bruce Evans wrote:
>
> I just noticed that LDBL_MAX is completely broken on i386.
>

It seems history is repeating itself.

http://lists.freebsd.org/pipermail/freebsd-hackers/2005-August/013007.html

I can't find the other email exchange I had with at
leaset Warner about LDBL_MAX.  I distinctly remember
asking why LDBL_MAX on i386 was 2^64 instead of 2^53.

-- 
Steve

From owner-freebsd-numerics@FreeBSD.ORG  Mon Sep 24 20:02:11 2012
Return-Path: <owner-freebsd-numerics@FreeBSD.ORG>
Delivered-To: numerics@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 5B570106566B
	for <numerics@freebsd.org>; Mon, 24 Sep 2012 20:02:11 +0000 (UTC)
	(envelope-from sgk@troutmask.apl.washington.edu)
Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu
	[128.95.76.21]) by mx1.freebsd.org (Postfix) with ESMTP id 37D158FC08
	for <numerics@freebsd.org>; Mon, 24 Sep 2012 20:02:11 +0000 (UTC)
Received: from troutmask.apl.washington.edu (localhost.apl.washington.edu
	[127.0.0.1])
	by troutmask.apl.washington.edu (8.14.5/8.14.5) with ESMTP id
	q8OK2Al7022293; Mon, 24 Sep 2012 13:02:10 -0700 (PDT)
	(envelope-from sgk@troutmask.apl.washington.edu)
Received: (from sgk@localhost)
	by troutmask.apl.washington.edu (8.14.5/8.14.5/Submit) id
	q8OK2AXm022292; Mon, 24 Sep 2012 13:02:10 -0700 (PDT)
	(envelope-from sgk)
Date: Mon, 24 Sep 2012 13:02:10 -0700
From: Steve Kargl <sgk@troutmask.apl.washington.edu>
To: Bruce Evans <brde@optusnet.com.au>
Message-ID: <20120924200210.GA22228@troutmask.apl.washington.edu>
References: <20120925035402.C1433@besplex.bde.org>
	<20120924195121.GA22138@troutmask.apl.washington.edu>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20120924195121.GA22138@troutmask.apl.washington.edu>
User-Agent: Mutt/1.5.21 (2010-09-15)
Cc: numerics@freebsd.org
Subject: Re: LDBL_MAX broken on i386
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
	<freebsd-numerics.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
	<mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
	<mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 24 Sep 2012 20:02:11 -0000

On Mon, Sep 24, 2012 at 12:51:21PM -0700, Steve Kargl wrote:
> On Tue, Sep 25, 2012 at 05:05:42AM +1000, Bruce Evans wrote:
> >
> > I just noticed that LDBL_MAX is completely broken on i386.
> >
> 
> It seems history is repeating itself.
> 
> http://lists.freebsd.org/pipermail/freebsd-hackers/2005-August/013007.html
> 
> I can't find the other email exchange I had with at
> leaset Warner about LDBL_MAX.  I distinctly remember
> asking why LDBL_MAX on i386 was 2^64 instead of 2^53.
> 

The other thread (well at least part of it) is at

http://lists.freebsd.org/pipermail/freebsd-current/2005-August/053852.html

-- 
Steve

From owner-freebsd-numerics@FreeBSD.ORG  Mon Sep 24 22:28:37 2012
Return-Path: <owner-freebsd-numerics@FreeBSD.ORG>
Delivered-To: numerics@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id B4C65106566C
	for <numerics@freebsd.org>; Mon, 24 Sep 2012 22:28:37 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from mail08.syd.optusnet.com.au (mail08.syd.optusnet.com.au
	[211.29.132.189])
	by mx1.freebsd.org (Postfix) with ESMTP id 4F0AE8FC0C
	for <numerics@freebsd.org>; Mon, 24 Sep 2012 22:28:36 +0000 (UTC)
Received: from c122-106-157-84.carlnfd1.nsw.optusnet.com.au
	(c122-106-157-84.carlnfd1.nsw.optusnet.com.au [122.106.157.84])
	by mail08.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
	q8OMSYMc015706
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Tue, 25 Sep 2012 08:28:35 +1000
Date: Tue, 25 Sep 2012 08:28:34 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Steve Kargl <sgk@troutmask.apl.washington.edu>
In-Reply-To: <20120924195121.GA22138@troutmask.apl.washington.edu>
Message-ID: <20120925073508.M2077@besplex.bde.org>
References: <20120925035402.C1433@besplex.bde.org>
	<20120924195121.GA22138@troutmask.apl.washington.edu>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: numerics@freebsd.org
Subject: Re: LDBL_MAX broken on i386
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
	<freebsd-numerics.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
	<mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
	<mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 24 Sep 2012 22:28:37 -0000

On Mon, 24 Sep 2012, Steve Kargl wrote:

> On Tue, Sep 25, 2012 at 05:05:42AM +1000, Bruce Evans wrote:
>>
>> I just noticed that LDBL_MAX is completely broken on i386.
>
> It seems history is repeating itself.
>
> http://lists.freebsd.org/pipermail/freebsd-hackers/2005-August/013007.html
>
> I can't find the other email exchange I had with at
> leaset Warner about LDBL_MAX.  I distinctly remember
> asking why LDBL_MAX on i386 was 2^64 instead of 2^53.

2**16384 less 1 ulp.  It really does go up to this and not
the double limit of 2**1024 less 1 ulp.

I should have objected more stongly when the constants were
changed prematurely in 2002:

% RCS file: /home/ncvs/src/sys/i386/include/float.h,v
% Working file: float.h
% head: 1.17
% ----------------------------
% revision 1.9
% date: 2002/10/25 07:02:52;  author: imp;  state: Exp;  lines: +10 -9
% Use the correct values for LDBL_*.  Libc doesn't completely support
% long doubles at the moment (printf truncates them to doubles).
% However, long doubles to appear to work to the ranges listed in this
% commit on both -stable (4.5) and -current.  There may be some slight
% rounding issues with long doubles, but that's an orthogonal issue to
% these constants.
% 
% I've had this in my local tree for 3 months, and in my company's local
% tree for 15 months with no ill effects.
% 
% Obtained from: NetBSD
% Not likely to like it: bde
% ----------------------------

Only the changes related to the exponents in this were correct, but
the incorrect LDBL_MAX mostly worked at the time, since gcc rounded
it in 64-bit precision so it didn't overflow.

LDBL_MAX stopped working with this change to gcc.  From gcc/ChangeLog-
2003:

% 2003-07-01  Richard Henderson  <rth@redhat.com>
% 	    (blame to: Loren James Rittle  <ljrittle@acm.org>)
% 
% 	* real.h (ieee_extended_intel_96_round_53_format): New.
% 	* real.c (ieee_extended_intel_96_round_53_format): New.
% 	* config/i386/freebsd.h (SUBTARGET_OVERRIDE_OPTIONS): Use it
% 	for XFmode and TFmode.

This was well enough discussed in FreeBSD lists before it was committed
(or even developed?), and I agreed with it at the time, but didn't
understand its full awfulness.  gcc-4.2.1 still hard-configures the
flag that enables this: from gcc/config/freebsd.h:

% /* FreeBSD sets the rounding precision of the FPU to 53 bits.  Let the
%    compiler get the contents of <float.h> and std::numeric_limits correct.  */
% #undef  TARGET_96_ROUND_53_LONG_DOUBLE
% #define TARGET_96_ROUND_53_LONG_DOUBLE (!TARGET_64BIT)

so i386 always gets it and amd64 never does.  There seems to be no way to
get 53-bit rounding for expressions without getting it for literals.

I don't understand C++ and forgot about it.  Values in headers in
/usr/include/c++ seems to depend on builtin definitions like the
following:

% #define __FLT_EVAL_METHOD__ 2

Broken.  I knew this, but I just looked at the gcc code for this and
the code seemed reasonable there (it seems to try to define it as -1
unless SSE is used).

% #define __LDBL_MAX__ 1.1897314953572316e+4932L

This is less than the value in <float.h>, so it has a chance of not
producing infinity.  It is also rounded to DBL_DECIMAL_DIG digits.
Reminder of the value in <float.h>  with formatting manged so that
it lines up:

% #define   LDBL_MAX   1.1897314953572317650E+4932L  (20 digits)
% #define   LDBL_MAX   1.1897314953572318E+4932L  (rounded to 17 digits)

But the builtin value isn't actually the max.  It is possible to produce
the actual max although not with gcc literals.  Altogether there are
(2**11 - 1) representable values exceeding gcc's builtin max.  Comparison
of these values with __LDBL_MAX__ would give interesting bugs.

% #define __LDBL_MAX_EXP__ 16384
% #define __LDBL_HAS_INFINITY__ 1
% #define __LDBL_MIN__ 3.3621031431120935e-4932L

This is a power of 2, so rounding it to 53 bits gives the correct value.
But clang's rounding to 64 bits would give a different value.  Spelling
it with 21 decimal digits would give the correct value for everyone.

% #define __LDBL_HAS_QUIET_NAN__ 1
% #define __LDBL_HAS_DENORM__ 1
% #define __LDBL_EPSILON__ 2.2204460492503131e-16L

Like __LDBL_MIN__ (power of 2), but only correct with the default
53-bit rounding precision.  It of course differs by a factor of
2**11 from the value in <float.h>.

% #define __LDBL_DIG__ 15
% #define __LDBL_MANT_DIG__ 53
% #define __LDBL_MIN_EXP__ (-16381)
% #define __LDBL_MAX_10_EXP__ 4932
% #define __LDBL_DENORM_MIN__ 7.4653686412953080e-4948L

Like __LDBL_MIN__ (power of 2).

% #define __LDBL_MIN_10_EXP__ (-4931)

gcc also predefines __DECIMAL_DIG__ as 17, consistently with the above.

So gcc's values are perfectly consistent, but are correct if no one
ever creates values with the lower 11 bits set.

Bruce

From owner-freebsd-numerics@FreeBSD.ORG  Mon Sep 24 23:12:09 2012
Return-Path: <owner-freebsd-numerics@FreeBSD.ORG>
Delivered-To: numerics@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 508BA106566B
	for <numerics@freebsd.org>; Mon, 24 Sep 2012 23:12:09 +0000 (UTC)
	(envelope-from sgk@troutmask.apl.washington.edu)
Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu
	[128.95.76.21]) by mx1.freebsd.org (Postfix) with ESMTP id 1C5B98FC08
	for <numerics@freebsd.org>; Mon, 24 Sep 2012 23:12:09 +0000 (UTC)
Received: from troutmask.apl.washington.edu (localhost.apl.washington.edu
	[127.0.0.1])
	by troutmask.apl.washington.edu (8.14.5/8.14.5) with ESMTP id
	q8ONC8h8023075; Mon, 24 Sep 2012 16:12:08 -0700 (PDT)
	(envelope-from sgk@troutmask.apl.washington.edu)
Received: (from sgk@localhost)
	by troutmask.apl.washington.edu (8.14.5/8.14.5/Submit) id
	q8ONC8wD023074; Mon, 24 Sep 2012 16:12:08 -0700 (PDT)
	(envelope-from sgk)
Date: Mon, 24 Sep 2012 16:12:08 -0700
From: Steve Kargl <sgk@troutmask.apl.washington.edu>
To: Bruce Evans <brde@optusnet.com.au>
Message-ID: <20120924231208.GA22960@troutmask.apl.washington.edu>
References: <20120925035402.C1433@besplex.bde.org>
	<20120924195121.GA22138@troutmask.apl.washington.edu>
	<20120925073508.M2077@besplex.bde.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20120925073508.M2077@besplex.bde.org>
User-Agent: Mutt/1.5.21 (2010-09-15)
Cc: numerics@freebsd.org
Subject: Re: LDBL_MAX broken on i386
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
	<freebsd-numerics.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
	<mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
	<mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 24 Sep 2012 23:12:09 -0000

On Tue, Sep 25, 2012 at 08:28:34AM +1000, Bruce Evans wrote:
> On Mon, 24 Sep 2012, Steve Kargl wrote:
> 
> > On Tue, Sep 25, 2012 at 05:05:42AM +1000, Bruce Evans wrote:
> >>
> >> I just noticed that LDBL_MAX is completely broken on i386.
> >
> > It seems history is repeating itself.
> >
> > http://lists.freebsd.org/pipermail/freebsd-hackers/2005-August/013007.html
> >
> > I can't find the other email exchange I had with at
> > leaset Warner about LDBL_MAX.  I distinctly remember
> > asking why LDBL_MAX on i386 was 2^64 instead of 2^53.
> 
> 2**16384 less 1 ulp.  It really does go up to this and not
> the double limit of 2**1024 less 1 ulp.

Yes, your right.  I was trying to remember the 2005 discussion,
and I clearly forgot/forget the details.  I suspect that the 
issue I saw back then was related to 2**emax where I was not
not getting what I was expecting.  This is back when I first
started working on the long double math functions.

> I should have objected more stongly when the constants were
> changed prematurely in 2002:
> 
> % RCS file: /home/ncvs/src/sys/i386/include/float.h,v
> % Working file: float.h
> % head: 1.17
> % ----------------------------
> % revision 1.9
> % date: 2002/10/25 07:02:52;  author: imp;  state: Exp;  lines: +10 -9
> % Use the correct values for LDBL_*.  Libc doesn't completely support
> % long doubles at the moment (printf truncates them to doubles).
> % However, long doubles to appear to work to the ranges listed in this
> % commit on both -stable (4.5) and -current.  There may be some slight
> % rounding issues with long doubles, but that's an orthogonal issue to
> % these constants.
> % 
> % I've had this in my local tree for 3 months, and in my company's local
> % tree for 15 months with no ill effects.
> % 
> % Obtained from: NetBSD
> % Not likely to like it: bde

It seems that imp was anticipating your dislike.

> Only the changes related to the exponents in this were correct, but
> the incorrect LDBL_MAX mostly worked at the time, since gcc rounded
> it in 64-bit precision so it didn't overflow.
> 
> LDBL_MAX stopped working with this change to gcc.  From gcc/ChangeLog-
> 2003:
> 
> % 2003-07-01  Richard Henderson  <rth@redhat.com>
> % 	    (blame to: Loren James Rittle  <ljrittle@acm.org>)
> % 
> % 	* real.h (ieee_extended_intel_96_round_53_format): New.
> % 	* real.c (ieee_extended_intel_96_round_53_format): New.
> % 	* config/i386/freebsd.h (SUBTARGET_OVERRIDE_OPTIONS): Use it
> % 	for XFmode and TFmode.
> 
> This was well enough discussed in FreeBSD lists before it was committed
> (or even developed?), and I agreed with it at the time, but didn't
> understand its full awfulness.  gcc-4.2.1 still hard-configures the
> flag that enables this: from gcc/config/freebsd.h:
> 
> % /* FreeBSD sets the rounding precision of the FPU to 53 bits.  Let the
> %    compiler get the contents of <float.h> and std::numeric_limits correct.  */
> % #undef  TARGET_96_ROUND_53_LONG_DOUBLE
> % #define TARGET_96_ROUND_53_LONG_DOUBLE (!TARGET_64BIT)

I just checked the head of gcc, the above is still present.
This suggests to me that i386 FreeBSD will never be free of
the npx feature of setting the FPU to 53 bits.

> so i386 always gets it and amd64 never does.  There seems to be no way to
> get 53-bit rounding for expressions without getting it for literals.

I think you're correct about literals.  gcc, since about version 
4.5.x, uses MPFR to do constant-folding and it does this in the
precision of literal constant as determined by gcc.  On the bright
side, MPFR claims to correctly round the folding.
> 
> I don't understand C++ and forgot about it.  Values in headers in
> /usr/include/c++ seems to depend on builtin definitions like the
> following:
> 
> % #define __FLT_EVAL_METHOD__ 2
> 
> Broken.  I knew this, but I just looked at the gcc code for this and
> the code seemed reasonable there (it seems to try to define it as -1
> unless SSE is used).
> 
> % #define __LDBL_MAX__ 1.1897314953572316e+4932L
> 
> This is less than the value in <float.h>, so it has a chance of not
> producing infinity.  It is also rounded to DBL_DECIMAL_DIG digits.
> Reminder of the value in <float.h>  with formatting manged so that
> it lines up:
> 
> % #define   LDBL_MAX   1.1897314953572317650E+4932L  (20 digits)
> % #define   LDBL_MAX   1.1897314953572318E+4932L  (rounded to 17 digits)
> 
> But the builtin value isn't actually the max.  It is possible to produce
> the actual max although not with gcc literals.  Altogether there are
> (2**11 - 1) representable values exceeding gcc's builtin max.  Comparison
> of these values with __LDBL_MAX__ would give interesting bugs.
> 
> % #define __LDBL_MAX_EXP__ 16384
> % #define __LDBL_HAS_INFINITY__ 1
> % #define __LDBL_MIN__ 3.3621031431120935e-4932L
> 
> This is a power of 2, so rounding it to 53 bits gives the correct value.
> But clang's rounding to 64 bits would give a different value.  Spelling
> it with 21 decimal digits would give the correct value for everyone.
> 
> % #define __LDBL_HAS_QUIET_NAN__ 1
> % #define __LDBL_HAS_DENORM__ 1
> % #define __LDBL_EPSILON__ 2.2204460492503131e-16L
> 
> Like __LDBL_MIN__ (power of 2), but only correct with the default
> 53-bit rounding precision.  It of course differs by a factor of
> 2**11 from the value in <float.h>.
> 
> % #define __LDBL_DIG__ 15
> % #define __LDBL_MANT_DIG__ 53
> % #define __LDBL_MIN_EXP__ (-16381)
> % #define __LDBL_MAX_10_EXP__ 4932
> % #define __LDBL_DENORM_MIN__ 7.4653686412953080e-4948L
> 
> Like __LDBL_MIN__ (power of 2).
> 
> % #define __LDBL_MIN_10_EXP__ (-4931)
> 
> gcc also predefines __DECIMAL_DIG__ as 17, consistently with the above.
> 
> So gcc's values are perfectly consistent, but are correct if no one
> ever creates values with the lower 11 bits set.

So, if I understand the above, should we try to correct float.h
to have 21 (36) digits on ld80 (ld128)?  Doing we limit LDBL_MAX
on i386 to 2**(emax - 11)?

-- 
Steve

From owner-freebsd-numerics@FreeBSD.ORG  Tue Sep 25 15:46:08 2012
Return-Path: <owner-freebsd-numerics@FreeBSD.ORG>
Delivered-To: numerics@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id F3CC61065670
	for <numerics@freebsd.org>; Tue, 25 Sep 2012 15:46:07 +0000 (UTC)
	(envelope-from sgk@troutmask.apl.washington.edu)
Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu
	[128.95.76.21]) by mx1.freebsd.org (Postfix) with ESMTP id BEEE28FC0A
	for <numerics@freebsd.org>; Tue, 25 Sep 2012 15:46:07 +0000 (UTC)
Received: from troutmask.apl.washington.edu (localhost.apl.washington.edu
	[127.0.0.1])
	by troutmask.apl.washington.edu (8.14.5/8.14.5) with ESMTP id
	q8PFk6lU028993
	for <numerics@freebsd.org>; Tue, 25 Sep 2012 08:46:06 -0700 (PDT)
	(envelope-from sgk@troutmask.apl.washington.edu)
Received: (from sgk@localhost)
	by troutmask.apl.washington.edu (8.14.5/8.14.5/Submit) id
	q8PFk66t028992
	for numerics@freebsd.org; Tue, 25 Sep 2012 08:46:06 -0700 (PDT)
	(envelope-from sgk)
Date: Tue, 25 Sep 2012 08:46:06 -0700
From: Steve Kargl <sgk@troutmask.apl.washington.edu>
To: numerics@freebsd.org
Message-ID: <20120925154606.GA28919@troutmask.apl.washington.edu>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.5.21 (2010-09-15)
Cc: 
Subject: [PATCH] implementation for expm1l()
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
	<freebsd-numerics.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
	<mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
	<mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 25 Sep 2012 15:46:08 -0000

The patch following my .sig implements expm1l().  An example
of testing 10M values uniformly distributed in each interval
gives

amd64:
% ./testl -n 10 
Interval tested: [-11355.000000:-0.250000]
    Max ULP: 0.50325
  x Max ULP: -5.21316172131617162e+00, -0x1.4da4710f73f83p+2
Interval tested: [-0.250000:0.250000]
    Max ULP: 0.53926
  x Max ULP: -1.15709936570993657e-01, -0x1.d9f2a996507ec308p-4
Interval tested: [0.250000:11356.000000]
    Max ULP: 0.50584
  x Max ULP: 8.79177827642782764e+03, 0x1.12be39e8fde623b2p+13

sparc64 (only very limited testing due to speed of flame):
% ./testl -n 10
Interval tested: [-11355.000000:-0.287682]
    Max ULP: 0.50165
  x Max ULP: -5.34961332997700764996445801944829055e+00,
             -0x1.566010969fcd47a521247b4128p+2
Interval tested: [-0.287682:0.223144]
    Max ULP: 0.54843
  x Max ULP: -2.86792265208762197924244858751372070e-01,
             -0x1.25acdf1f45027985d7c04fa34168p-2
Interval tested: [0.223144:11356.000000]
    Max ULP: 0.50577
  x Max ULP: 7.83174792710817081424055325173740399e+03,
             0x1.e97bf7826a562b03aafc33340393p+12

Now, the problem!  On i386 in the interval [-0.287682:0.223144],
I see up to 15.5 ULP for the ld80 patch.  Note, if I don't
use ENTERI and RETURNI in the ld80 code, I see ULP that are
consistent with the amd64 results.  This of course leads back
to the 53- versus 64-bit precision settings on i386.

I have tried re-arranging the order of evaluation of the polynomial
and some of the intermediate results with little improvement.  I've
also increased the order of the polynomial.  Everything I've tried
seems to fail on i386.  So, anyone got a suggestion?

PS: the relevant section of code is in the 'if (T1 < x && x < T2)'
block of code.

-- 
Steve

Index: ld80/s_expl.c
===================================================================
--- ld80/s_expl.c	(revision 240889)
+++ ld80/s_expl.c	(working copy)
@@ -301,3 +301,149 @@
 		RETURNI(t * twopkp10000 * twom10000);
 	}
 }
+
+/*-
+ * Implementation of expm1l(x) in Intel 80-bit long double format.
+ * The significand is 64 bits.  This is based on 
+ * 
+ *   PTP Tang, "Table-driven implementation of the Expm1 function
+ *   in IEEE floating-point arithmetic," ACM Trans. Math. Soft., 18,
+ *   211-222 (1992).
+ */
+
+#if 0
+static const long double
+T1 = -0x1.269621134db92786p-2L, /* log(1-1/4) = -2.876820724517809275e-01L */
+T2 =  0x1.c8ff7c79a9a21ac4p-3L; /* log(1+1/4) =  2.231435513142097558e-01L */
+#else
+static const long double
+T1 = -0.25L,
+T2 =  0.25L;
+#endif
+
+/*
+ * Remes polynomial coefficients on the interval [T1:T2] with an error
+ * estimate of log2(|expm1(x)-p(x)|) ~ -74.  These are the coefficients
+ * rounded to long double precision where B5 through B14 can be used with
+ * a double type.
+ */
+static const long double
+B3 = 0x1.5555555555555556p-3L,	/* 1.66666666666666666671e-01 */
+B4 = 0x1.5555555555555556p-5L;	/* 4.16666666666666666678e-02 */
+
+static const double __aligned(64)
+B5 = 0x1.1111111111111118p-7,	/* 8.33333333333333333627e-03 */
+B6 = 0x1.6c16c16c16c147cp-10,	/* 1.38888888888888839641e-03 */
+B7 = 0x1.a01a01a019ffd3b6p-13,	/* 1.98412698412697632766e-04 */
+B8 = 0x1.a01a01a01db26a7cp-16,	/* 2.48015873016385187861e-05 */
+B9 = 0x1.71de3a55a0c1ee4ep-19,	/* 2.75573192248932803421e-06 */
+B10 = 0x1.27e4fb50209391bep-22,	/* 2.75573190052072576834e-07 */
+B11 = 0x1.ae645165ce954702p-26,	/* 2.50521038560011694434e-08 */
+B12 = 0x1.1eef018ecbb0486cp-29,	/* 2.08771683916037414477e-09 */
+B13 = 0x1.615bbeecd7fc9bd8p-33,	/* 1.60688788144810303816e-10 */
+B14 = 0x1.89b55fd4d237b3a2p-37;	/* 1.11898684031101176243e-11 */
+
+long double
+expm1l(long double x)
+{
+	union IEEEl2bits u, v;
+
+	long double fn, q, r, r1, r2, t, t23, t45, twomk, twopk, z;
+	int k, n, n2;
+	uint16_t hx, ix;
+
+	/* Filter out exceptional cases. */
+	u.e = x;
+	hx = u.xbits.expsign;
+	ix = hx & 0x7fff;
+	if (ix >= BIAS + 13) {		/* |x| >= 8192 or x is NaN */
+		if (ix == BIAS + LDBL_MAX_EXP) {
+			if (hx & 0x8000 && u.xbits.man == 1ULL << 63)
+				return (-1.0L);	/* x is -Inf */
+			return (x + x);	/* x is +Inf, NaN or unsupported */
+		}
+		if (x > o_threshold.e)
+			return (huge * huge);
+		if (x < -50)		/* XXX s.b. ~66ln2, maybe in bits */
+			return (tiny - 1.0L);
+
+	} else if (ix < BIAS - 66) {	/* |x| < 0x1p-66 */
+		if ((int)x == 0)
+			return (x);	/* x with inexact if x != 0 */
+	}
+
+	ENTERI();
+
+	if (T1 < x && x < T2) {
+
+		long double t1, t2, t3, t4, u, v, x3, y;
+
+		u = (0x1.0p40 * x + x) - 0x1.0p40 * x;
+		v = x - u;
+		y = u * u * 0.5L;
+		z = v * (x + u) * 0.5L;
+		t1 = B3 + x * (B4 + B5 * x);
+		t2 = B6 + x * (B7 + B8 * x);
+		t3 = B9 + x * (B10 + B11 * x);
+		t4 = B12 + x * (B13 + B14 * x);
+		x3 = x * x * x;
+		q = x3 * (t1 + x3 * (t2 + x3 * (t3 + x3 * t4)));
+
+		if (y >= 0x1.p-7)
+			RETURNI((u + y) + (q + (v + z)));
+		else
+			RETURNI(x + (y + (q + z)));
+	}
+
+	/* Reduce x to (k*ln2 + midpoint[n2] + r1 + r2). */
+	/* Use a specialized rint() to get fn.  Assume round-to-nearest. */
+	fn = x * INV_L + 0x1.8p63 - 0x1.8p63;
+	r = x - fn * L1 - fn * L2;	/* r = r1 + r2 done independently. */
+#if defined(HAVE_EFFICIENT_IRINTL)
+	n  = irintl(fn);
+#elif defined(HAVE_EFFICIENT_IRINT)
+	n  = irint(fn);
+#else
+	n  = (int)fn;
+#endif
+	n2 = (unsigned)n % INTERVALS;
+	k = (n - n2) / INTERVALS;
+	r1 = x - fn * L1;
+	r2 = -fn * L2;
+
+	/* Prepare scale factor. */
+	v.xbits.man = 1ULL << 63;
+	v.xbits.expsign = BIAS + k;
+	twopk = v.e;
+
+	/* Evaluate expl(midpoint[n2] + r1 + r2) = s[n2] * expl(r1 + r2). */
+	/* Here q = q(r), not q(r1), since r1 is lopped like L1. */
+	t45 = r * P5 + P4;
+	z = r * r;
+	t23 = r * P3 + P2;
+	q = r2 + z * t23 + z * z * t45 + z * z * z * P6;
+	t = (long double)s[n2].lo + s[n2].hi;
+
+	if (k < -7) {
+		t = s[n2].lo + t * (q + r1) + s[n2].hi;
+		RETURNI(t * twopk - 1.L);
+	}
+
+	if (k > 2 * LDBL_MANT_DIG - 1) {
+		t =  s[n2].lo + t * (q + r1) + s[n2].hi;
+		if (k == LDBL_MAX_EXP)
+			RETURNI(t * 2.L * 0x1p16383L);
+		RETURNI(t * twopk - 1.L);
+	}
+
+	v.xbits.man = 1ULL << 63;
+	v.xbits.expsign = BIAS - k;
+	twomk = v.e;
+
+	if (k > LDBL_MANT_DIG - 1)
+		t = s[n2].hi + (t * (q + r1) + (s[n2].lo - twomk));
+	else
+		t = (s[n2].hi - twomk) + (t * (q + r1) + s[n2].lo);
+
+	RETURNI(t * twopk);
+}
Index: ld128/s_expl.c
===================================================================
--- ld128/s_expl.c	(revision 240889)
+++ ld128/s_expl.c	(working copy)
@@ -259,3 +259,132 @@
 		return (t * twopkp10000 * twom10000);
 	}
 }
+
+static const long double
+T1 = -2.87682072451780927439219005993827491e-01L, /* log(1-1/4) */
+T2 =  2.23143551314209755766295090309834524e-01L; /* log(1+1/4) */
+
+/*
+ * Remes polynomial coefficients on the interval [T1:T2] with an error
+ * estimate of log2(|expm1(x)-p(x)|) ~ -118.
+ */
+static const long double
+B3 = 1.6666666666666666666666666666666666467e-01L,
+B4 = 4.1666666666666666666666666666666496958e-02L,
+B5 = 8.3333333333333333333333333333383308238e-03L,
+B6 = 1.3888888888888888888888888890337273186e-03L,
+B7 = 1.9841269841269841269841269642813099020e-04L,
+B8 = 2.4801587301587301587301550362968625647e-05L,
+B9 = 2.7557319223985890652560213443426485839e-06L,
+B10 = 2.7557319223985890652989939489574826910e-07L,
+B11 = 2.5052108385441718755414747818597290671e-08L,
+B12 = 2.0876756987868096239671566018121112987e-09L,
+B13 = 1.6059043836821679450153540085716850723e-10L,
+B14 = 1.1470745597739719945035140610973825690e-11L,
+B15 = 7.6471637317367933536887863417862982136e-13L,
+B16 = 4.7794773113757503628361216897593430644e-14L,
+B17 = 2.8114571923374448979404436413403060597e-15L,
+B18 = 1.5619443087615951176479734695316869815e-16L,
+B19 = 8.2233234109330560523187524072569835812e-18L,
+B20 = 4.0010880199748106995049690682517608541e-19L;
+
+long double
+expm1l(long double x)
+{
+	union IEEEl2bits u, v;
+
+	long double fn, q, r, r1, r2, t, t1, t2, t3, twomk, twopk, z;
+	int k, n, n2;
+	uint32_t hx, ix;
+
+	/* Filter out exceptional cases. */
+	u.e = x;
+	hx = u.xbits.expsign;
+	ix = hx & 0x7fff;
+	if (ix >= BIAS + 13) {		/* |x| >= 8192 or x is NaN */
+		if (ix == BIAS + LDBL_MAX_EXP) {
+			if (hx & 0x8000 && u.xbits.manh == 0 &&
+			    u.xbits.manl == 0)
+				return (-1.0L);	/* x is -Inf */
+			return (x + x);	/* x is +Inf or NaN */
+		}
+		if (x > o_threshold)
+			return (huge * huge);
+		if (x < -84)		/* XXX s.b. ~115ln2, maybe in bits */
+			return (tiny - 1.0L);
+	} else if (ix < BIAS - 115) {	/* |x| < 0x1p-115 */
+	    	if ((int)x == 0)	
+			return (x);	/* x with inexact if x != 0 */
+	}
+
+	if (T1 < x && x < T2) {
+
+		long double t1, t2, t3, t4, t5, t6, u, v, x3, y;
+
+		u = (0x1.0p80 * x + x) - 0x1.0p80 * x;
+		v = x - u;
+		y = u * u * 0.5L;
+		z = v * (x + u) * 0.5L;
+		t1 = B3 + x * (B4 + B5 * x);
+		t2 = B6 + x * (B7 + B8 * x);
+		t3 = B9 + x * (B10 + B11 * x);
+		t4 = B12 + x * (B13 + B14 * x);
+		t5 = B15 + x * (B16 + B17 * x);
+		t6 = B18 + x * (B19 + B20 * x);
+		x3 = x * x * x;
+		t4 += x3 * (t5 + x3 * t6);
+		q = x3 * (t1 + x3 * (t2 + x3 * (t3 + x3 * t4)));
+
+		if (y >= 0x1.p-7)
+			return ((u + y) + (q + (v + z)));
+		else
+			return (x + (y + (q + z)));
+	}
+
+	/* Reduce x to (k*ln2 + midpoint[n2] + r1 + r2). */
+	fn = x * INV_L + 0x1.8p112 - 0x1.8p112;
+	r = x - fn * L1 - fn * L2;	/* r = r1 + r2 done independently. */
+	n  = (int)fn;
+	n2 = (unsigned)n % INTERVALS;
+	k = (n - n2) / INTERVALS;
+	r1 = x - fn * L1;
+	r2 = -fn * L2;
+
+	/* Prepare scale factors. */
+	v.xbits.manh = 0;
+	v.xbits.manl = 0;
+	v.xbits.expsign = BIAS + k;
+	twopk = v.e;
+
+	z = r * r;
+	t1 = P2 + P3 * r + P4 * z;
+	t2 = P5 + P6 * r + P7 * z;
+	t3 = P8 + P9 * r + P10 * z;
+	q = r2 + z * (t1 + r * (t2 + r * (t3 + r * P11 * z) * z) * z);
+	t = s[n2].lo + s[n2].hi;
+
+	if (k < -7) {
+		t = s[n2].lo + t * (q + r1) + s[n2].hi;
+		return (t * twopk - 1.L);
+	}
+
+	if (k > 2 * LDBL_MANT_DIG - 1) {
+		t =  s[n2].lo + t * (q + r1) + s[n2].hi;
+		if (k == LDBL_MAX_EXP)
+			return (t * 2.L * 0x1p16383L);
+		return (t * twopk - 1.L);
+	}
+
+	/* Prepare scale factors. */
+	v.xbits.manh = 0;
+	v.xbits.manl = 0;
+	v.xbits.expsign = BIAS - k;
+	twomk = v.e;
+
+	if (k > LDBL_MANT_DIG - 1)
+		t = s[n2].hi + (t * (q + r1) + (s[n2].lo - twomk));
+	else
+		t = (s[n2].hi - twomk) + (t * (q + r1) + s[n2].lo);
+
+	return (t * twopk);
+}

From owner-freebsd-numerics@FreeBSD.ORG  Tue Sep 25 18:39:46 2012
Return-Path: <owner-freebsd-numerics@FreeBSD.ORG>
Delivered-To: numerics@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id DF805106566B
	for <numerics@freebsd.org>; Tue, 25 Sep 2012 18:39:46 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from mail07.syd.optusnet.com.au (mail07.syd.optusnet.com.au
	[211.29.132.188])
	by mx1.freebsd.org (Postfix) with ESMTP id 4A77E8FC15
	for <numerics@freebsd.org>; Tue, 25 Sep 2012 18:39:45 +0000 (UTC)
Received: from c122-106-157-84.carlnfd1.nsw.optusnet.com.au
	(c122-106-157-84.carlnfd1.nsw.optusnet.com.au [122.106.157.84])
	by mail07.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
	q8PIdaND018640
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Wed, 26 Sep 2012 04:39:37 +1000
Date: Wed, 26 Sep 2012 04:39:36 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Steve Kargl <sgk@troutmask.apl.washington.edu>
In-Reply-To: <20120925154606.GA28919@troutmask.apl.washington.edu>
Message-ID: <20120926021900.R2081@besplex.bde.org>
References: <20120925154606.GA28919@troutmask.apl.washington.edu>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: numerics@freebsd.org
Subject: Re: [PATCH] implementation for expm1l()
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
	<freebsd-numerics.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
	<mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
	<mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 25 Sep 2012 18:39:47 -0000

On Tue, 25 Sep 2012, Steve Kargl wrote:

> The patch following my .sig implements expm1l().  An example
> of testing 10M values uniformly distributed in each interval
> gives
>
> amd64:
> % ./testl -n 10
> Interval tested: [-11355.000000:-0.250000]
>    Max ULP: 0.50325
>  x Max ULP: -5.21316172131617162e+00, -0x1.4da4710f73f83p+2
> Interval tested: [-0.250000:0.250000]
>    Max ULP: 0.53926
>  x Max ULP: -1.15709936570993657e-01, -0x1.d9f2a996507ec308p-4
> Interval tested: [0.250000:11356.000000]
>    Max ULP: 0.50584
>  x Max ULP: 8.79177827642782764e+03, 0x1.12be39e8fde623b2p+13
>
> sparc64 (only very limited testing due to speed of flame):
> % ./testl -n 10
> Interval tested: [-11355.000000:-0.287682]
>    Max ULP: 0.50165
>  x Max ULP: -5.34961332997700764996445801944829055e+00,
>             -0x1.566010969fcd47a521247b4128p+2
> Interval tested: [-0.287682:0.223144]
>    Max ULP: 0.54843
>  x Max ULP: -2.86792265208762197924244858751372070e-01,
>             -0x1.25acdf1f45027985d7c04fa34168p-2
> Interval tested: [0.223144:11356.000000]
>    Max ULP: 0.50577
>  x Max ULP: 7.83174792710817081424055325173740399e+03,
>             0x1.e97bf7826a562b03aafc33340393p+12
>
> Now, the problem!  On i386 in the interval [-0.287682:0.223144],
> I see up to 15.5 ULP for the ld80 patch.  Note, if I don't
                    ulps or ULPs
> use ENTERI and RETURNI in the ld80 code, I see ULP that are
> consistent with the amd64 results.  This of course leads back
> to the 53- versus 64-bit precision settings on i386.

I thought I fixed it, and you have all my patches, but...

It can't possibly work better without ENTERI.  Maybe a bug in the
test code.

You should test the internal values before they are added together,
and expect an error of < ~1/256 ulps (or ~0.5 ulps in 72-bit precision).
I thought I did that.  Maybe I only tested amd64.  Now that I look at
it again, I see that it wasn't worth testing on i386 since it can't
possibly work there...

> I have tried re-arranging the order of evaluation of the polynomial
> and some of the intermediate results with little improvement.  I've
> also increased the order of the polynomial.  Everything I've tried
> seems to fail on i386.  So, anyone got a suggestion?
>
> PS: the relevant section of code is in the 'if (T1 < x && x < T2)'
> block of code.
>
> -- 
> Steve
>
> Index: ld80/s_expl.c
> ===================================================================
> --- ld80/s_expl.c	(revision 240889)
> +++ ld80/s_expl.c	(working copy)
> @@ -301,3 +301,149 @@
> ...
> +#if 0
> +static const long double
> +T1 = -0x1.269621134db92786p-2L, /* log(1-1/4) = -2.876820724517809275e-01L */
> +T2 =  0x1.c8ff7c79a9a21ac4p-3L; /* log(1+1/4) =  2.231435513142097558e-01L */
> +#else
> +static const long double
> +T1 = -0.25L,
> +T2 =  0.25L;
> +#endif

Apparently these thresholds aren't critical (since fuzzy ones work on amd64).
The fuzzy ones should be declared double.

> +
> +/*
> + * Remes polynomial coefficients on the interval [T1:T2] with an error
> + * estimate of log2(|expm1(x)-p(x)|) ~ -74.  These are the coefficients

It's expm1(x)/x that must be approximated by p(x)/x, not expm1(x) by
p(x).  This combined with the nonstandard comment style confused me for
a while.  (Plain expm1(x) is just as easy to approximate as exp(x) and
doesn't require any long double coefficients.)

> + * rounded to long double precision where B5 through B14 can be used with
> + * a double type.
> + */
> +static const long double
> +B3 = 0x1.5555555555555556p-3L,	/* 1.66666666666666666671e-01 */
> +B4 = 0x1.5555555555555556p-5L;	/* 4.16666666666666666678e-02 */

These 2 aren't actually rounded to long double precision on i386, so the
point of having them in more precision than the others is completely lost.
gcc rounds them to nearest, so the result is the same as:

static const non_long double
B3 = 0x1.5555555555555800p-3,	/* some wrong value */
B4 = 0x1.5555555555555800p-5;	/* some wrong value */

This is the only obvious problem.

For a quick check that this is the only problem, try compiling with clang.
I think it is incompatible with gcc here and rounds to 64 bits.

LD80C must be used for most long double constants in ld80 (unless the
constants can be double).

I generated the following polynomial.  Using only 2 full-precision coeffs
limit the accuracy, so B14 doesn't improve accuracy significantly.

@ P0 =  1.00000000000000000000e0L,		/*  0x8000000000000000.0p-63L */
@ P1 =  5.00000000000000000000e-1L,	/*  0x8000000000000000.0p-64L */

P0 and P1 are implicit and/or should be manually reduced to doubles.
The above was automatically generated in a standard format that is too
general for them.

@ P2 =  1.66666666666666666671e-1L,	/*  0xaaaaaaaaaaaaaaab.0p-66L */
@ P2hi =  1.6666666666666666e-1,		/*  0x15555555555555.0p-55 */
@ P2lo =  9.2563760475949941e-18,		/*  0x15580000000000.0p-109 */
@ P3 =  4.16666666666666670913e-2L,	/*  0xaaaaaaaaaaaaab28.0p-68L */
@ P3hi =  4.1666666666666664e-2,		/*  0x15555555555555.0p-57 */
@ P3lo =  2.7376104855258987e-18,		/*  0x19400000000000.0p-111 */

This is still automatically generated for the old pre-LD80C() format in
which P2 is written as (P2hi + P2lo) and some manual editing was needed
to put P2lo in the volatile section.  The hex values are perhaps not
quite write for manyally copying them into LD80().

@ P4 =  8.3333333333333332e-3,		/*  0x11111111111111.0p-59 */
@ P5 =  1.3888888888888057e-3,		/*  0x16c16c16c16a97.0p-62 */
@ P6 =  1.9841269841266616e-4,		/*  0x1a01a01a019b74.0p-65 */
@ P7 =  2.4801587307106348e-5,		/*  0x1a01a01a1a7b21.0p-68 */
@ P8 =  2.7557319244569047e-6,		/*  0x171de3a5a0efd0.0p-71 */
@ P9 =  2.7557302872739145e-7,		/*  0x127e4eff5f5ccd.0p-74 */
@ P10 =  2.5052063395588939e-8,		/*  0x1ae6423d7e31ae.0p-78 */
@ P11 =  2.0899093104590682e-9,		/*  0x11f3c259569a3b.0p-81 */
@ P12 =  1.6096692806293200e-10,		/*  0x161f8531c9dc24.0p-85 */

The others are in a standard format.

Note that my P12 is your B13 (different by 1 for dividing by x in
expm1(x)/x).

@ 
@ 1.147e-22 |"'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''|
@           |:                                                             |
@           |:                                                             |
@           |::                                                            |
@           |::     ""                                                     |
@           |::                                             _x             |
@           : "    x  "                                       _            |
@           : :    :  :                                    "  :            |
@           : :   :    :                                       :       __  |
@           : :   :    "          """_                    "    "       ::  |
@           :  :  "    :         "    x                        :      : :  |
@           :,,:,,:,,,,,:,,,,,,,x,,,,,,"x___,,,,,,,,,,,,,_,,,,,,:,,,,,_,,:,,
@           _  : :      x                   ""x          :      x     :  : |
@           |  " :      :      x               "        :       :     :  _ |
@           |    "       :                      "       "        :   :   : |
@           |   _        x    x                  "               :   :   : |
@           |                                     x    x         "   "   : |
@           |             x  "                     x  _              :    :|
@           |              x"                       "x            _ :     :|
@           |                                                       "     :|
@           |                                                      "      :|
@ -9.54e-23 |............................................................._"
@           -0.25                                                       0.25
@ adj cp = 64 range ~[-1.1479e-22, 1.1501e-22] 2**-72.881

> Index: ld128/s_expl.c
> ===================================================================
> --- ld128/s_expl.c	(revision 240889)
> +++ ld128/s_expl.c	(working copy)
> @@ -259,3 +259,132 @@
>  		return (t * twopkp10000 * twom10000);
>  	}
>  }
> +
> +static const long double
> +T1 = -2.87682072451780927439219005993827491e-01L, /* log(1-1/4) */
> +T2 =  2.23143551314209755766295090309834524e-01L; /* log(1+1/4) */

Now I see that these aren't critical.  But isn't T1 = -0.25 not far
enough negative?  Probably T2 being larger than necessary compensates.

These should probably be double in all cases.

> +
> +/*
> + * Remes polynomial coefficients on the interval [T1:T2] with an error
> + * estimate of log2(|expm1(x)-p(x)|) ~ -118.
> + */
> +static const long double
> +B3 = 1.6666666666666666666666666666666666467e-01L,
> ...
> +B20 = 4.0010880199748106995049690682517608541e-19L;

I get 123 bits with the same number of terms (P19 corresponds to B20).

P0 =  1.0000000000000000000000000000000000e0L,		/*  0x10000000000000000000000000000.0p-112L */
P1 =  5.0000000000000000000000000000000000e-1L,		/*  0x10000000000000000000000000000.0p-113L */
P2 =  1.6666666666666666666666666666666661e-1L,		/*  0x15555555555555555555555555553.0p-115L */
P3 =  4.1666666666666666666666666666665461e-2L,		/*  0x1555555555555555555555555548d.0p-117L */
P4 =  8.3333333333333333333333333333702524e-3L,		/*  0x111111111111111111111111170ea.0p-119L */
P5 =  1.3888888888888888888888888894097559e-3L,		/*  0x16c16c16c16c16c16c16c16ebae13.0p-122L */
P6 =  1.9841269841269841269841269028212328e-4L,		/*  0x1a01a01a01a01a01a01a005649497.0p-125L */
P7 =  2.4801587301587301587301496436041811e-5L,		/*  0x1a01a01a01a01a01a0198e708bd2f.0p-128L */
P8 =  2.7557319223985890652565668610907848e-6L,		/*  0x171de3a556c7338faae2dbd716389.0p-131L */
P9 =  2.7557319223985890653389692780995245e-7L,		/*  0x127e4fb7789f5c72f94f52e6db855.0p-134L */
P10 =  2.5052108385441718730021176147351305e-8L,		/*  0x1ae64567f544e38c6b59f0b86d695.0p-138L */
P11 =  2.0876756987868094553775599688331605e-9L,		/*  0x11eed8eff8d896a3590b97ce89c87.0p-141L */
@ P12 =  1.6059043836821743738564862857627589e-10L,		/*  0x16124613a86d3b71f680b5698fe4d.0p-145L */
@ P13 =  1.1470745597743859829054288214163328e-11L,		/*  0x193974a8c09eca572389f431d83ef.0p-149L */
@ P14 =  7.6471637316522595640322188859803229e-13L,		/*  0x1ae7f3e7312feabddd5c9a9a658ec.0p-153L */
@ P15 =  4.7794773056701410114955405123070665e-14L,		/*  0x1ae7f3e4adbc63a0143e13d8c8b13.0p-157L */
@ P16 =  2.8114572392723202593376369659548583e-15L,		/*  0x1952c76de98f8b161d8d0e78ecd0a.0p-161L */
@ P17 =  1.5619481870532875016483016145454221e-16L,		/*  0x1682925a721faed3a1323d09bba02.0p-165L */
@ P18 =  8.2232942744013145419986831227112038e-18L,		/*  0x12f62d18a8dc6089b1fa968c8c030.0p-169L */
@ P19 =  3.9923442242776910307469064711618566e-19L,		/*  0x1d75533495953f6181159d629556a.0p-174L */
@ 
@   7.3e-38 "'''''x'''''''_''''''''''''''''''''''''''''''''''''''''''''''''_
@           :     :      ":                                  x       "     :
@           :     ::     ::                         ""        _      :     :
@           : "   :"     : :                                _ :     x:   x :
@           |::  : :    :  :                       "  "     : :     ::   ::|
@           |::  : :    :  x                          :    :   :    : :  ::|
@           |::  : :    _  :                      _    :   :   :    : :  ::|
@           |::  : :    :  :                      :    x   _   :    : :  ::|
@           |":  x  :   :  :       x             :     :   :   x   :  _  :x|
@           | :  :  :   :   :     "          x   x      : :    :   :  :  : |
@           ,,:,,:,,:,,,:,,,:,,,,,:,",,,,,,,","__,,,,,,,_,:,,,,:,,,:,,:,,:,,
@           |  : :  :  :    :    :  :      _              x    :   :  : :  |
@           |  : :  _  :    x    x   :     :             x      :  "  : :  |
@           |  : :  :  :    :    :   x    :                     :  :  : :  |
@           |  ::   :  :    :    :   :    x                     :  :   ::  |
@           |  ::   :  "     :  :     :   :                     _  :   ::  |
@           |  ::    : :     :  :     _  :                      : :    ::  |
@           |  ::    : :     :  "        "                      : :    ::  |
@           |  _:    ::      _  :      __                        ::    :"  |
@           |   :    ::      : :                                 :x    x   |
@           |   "    x:       :_                                 x         |
@ -7.32e-38 |........."......._............................................|
@           -0.288                                                     0.224
@ adj cp = 113 range ~[-7.4093e-38, 7.3114e-38] 2**-123.34

Next I will apply coefficient reduction to double.  This is a much large
rptimizations for ld128 (for the full optimizations, lower terms of
the poly (most of the ones using double coeffs) should also use the
arg reduced to double).  It will take a while to search for the best
number of double coeffs.

Bruce

From owner-freebsd-numerics@FreeBSD.ORG  Tue Sep 25 20:22:36 2012
Return-Path: <owner-freebsd-numerics@FreeBSD.ORG>
Delivered-To: numerics@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 961471065672
	for <numerics@freebsd.org>; Tue, 25 Sep 2012 20:22:36 +0000 (UTC)
	(envelope-from sgk@troutmask.apl.washington.edu)
Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu
	[128.95.76.21]) by mx1.freebsd.org (Postfix) with ESMTP id 5A7C78FC0C
	for <numerics@freebsd.org>; Tue, 25 Sep 2012 20:22:36 +0000 (UTC)
Received: from troutmask.apl.washington.edu (localhost.apl.washington.edu
	[127.0.0.1])
	by troutmask.apl.washington.edu (8.14.5/8.14.5) with ESMTP id
	q8PKMSTs031409; Tue, 25 Sep 2012 13:22:28 -0700 (PDT)
	(envelope-from sgk@troutmask.apl.washington.edu)
Received: (from sgk@localhost)
	by troutmask.apl.washington.edu (8.14.5/8.14.5/Submit) id
	q8PKMSAD031408; Tue, 25 Sep 2012 13:22:28 -0700 (PDT)
	(envelope-from sgk)
Date: Tue, 25 Sep 2012 13:22:28 -0700
From: Steve Kargl <sgk@troutmask.apl.washington.edu>
To: Bruce Evans <brde@optusnet.com.au>
Message-ID: <20120925202228.GA31099@troutmask.apl.washington.edu>
References: <20120925154606.GA28919@troutmask.apl.washington.edu>
	<20120926021900.R2081@besplex.bde.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20120926021900.R2081@besplex.bde.org>
User-Agent: Mutt/1.5.21 (2010-09-15)
Cc: numerics@freebsd.org
Subject: Re: [PATCH] implementation for expm1l()
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
	<freebsd-numerics.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
	<mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
	<mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 25 Sep 2012 20:22:36 -0000

On Wed, Sep 26, 2012 at 04:39:36AM +1000, Bruce Evans wrote:
> On Tue, 25 Sep 2012, Steve Kargl wrote:
> 
> > Now, the problem!  On i386 in the interval [-0.287682:0.223144],
> > I see up to 15.5 ULP for the ld80 patch.  Note, if I don't
>                     ulps or ULPs
> > use ENTERI and RETURNI in the ld80 code, I see ULP that are
> > consistent with the amd64 results.  This of course leads back
> > to the 53- versus 64-bit precision settings on i386.

My laptop (ie, i386) is at home, so I'm going from memory.  My
testl generates a report of the form

Interval tested: [-0.287682:0.223144]
    Max ULP: 15.4554
  x Max ULP: -2.some_set_of_digits_e-01
      Count: 10000000
> 
> I thought I fixed it, and you have all my patches, but...
> 

Perhaps, I do.  I'll go look through my email archive.

> It can't possibly work better without ENTERI.  Maybe a bug in the
> test code.

I could have been looking at the wrong xterm. I check tonight.

> You should test the internal values before they are added together,
> and expect an error of < ~1/256 ulps (or ~0.5 ulps in 72-bit precision).
> I thought I did that.  Maybe I only tested amd64.  Now that I look at
> it again, I see that it wasn't worth testing on i386 since it can't
> possibly work there...
> 
> > I have tried re-arranging the order of evaluation of the polynomial
> > and some of the intermediate results with little improvement.  I've
> > also increased the order of the polynomial.  Everything I've tried
> > seems to fail on i386.  So, anyone got a suggestion?
> >
> > PS: the relevant section of code is in the 'if (T1 < x && x < T2)'
> > block of code.
> >
> > Index: ld80/s_expl.c
> > ===================================================================
> > --- ld80/s_expl.c	(revision 240889)
> > +++ ld80/s_expl.c	(working copy)
> > @@ -301,3 +301,149 @@
> > ...
> > +#if 0
> > +static const long double
> > +T1 = -0x1.269621134db92786p-2L, /* log(1-1/4) = -2.876820724517809275e-01L */
> > +T2 =  0x1.c8ff7c79a9a21ac4p-3L; /* log(1+1/4) =  2.231435513142097558e-01L */
> > +#else
> > +static const long double
> > +T1 = -0.25L,
> > +T2 =  0.25L;
> > +#endif
> 
> Apparently these thresholds aren't critical (since fuzzy ones work
> on amd64). The fuzzy ones should be declared double.

The fuzzy ones are left over from an attempt to use a 2nd lookup
table on the range [-0.25:0.25].   This has the nice property that
T2 - T1 = 0.5 and dividing this up into 2**n segments gives a 
delete of the form 0x1.0p-7.

The log(1+-1/4) expressions are adopted from Tang's paper where he
gives an error analysis to establish an upper bound on the max
ULP.

> > +
> > +/*
> > + * Remes polynomial coefficients on the interval [T1:T2] with an error
> > + * estimate of log2(|expm1(x)-p(x)|) ~ -74.  These are the coefficients
> 
> It's expm1(x)/x that must be approximated by p(x)/x, not expm1(x) by
> p(x).  This combined with the nonstandard comment style confused me for
> a while.  (Plain expm1(x) is just as easy to approximate as exp(x) and
> doesn't require any long double coefficients.)

The actual function that I approximated was (expm1(x)-1-x-x**2/2)/x**3.
Tang's algorithm assumes that one compute x+x**2/2 with extra precision.
He does this by decomposing x into high and low parts.  I suppose I
should update the comment to reflect the function I approximate.

> > + * rounded to long double precision where B5 through B14 can be used with
> > + * a double type.
> > + */
> > +static const long double
> > +B3 = 0x1.5555555555555556p-3L,	/* 1.66666666666666666671e-01 */
> > +B4 = 0x1.5555555555555556p-5L;	/* 4.16666666666666666678e-02 */
> 
> These 2 aren't actually rounded to long double precision on i386, so the
> point of having them in more precision than the others is completely lost.
> gcc rounds them to nearest, so the result is the same as:
> 
> static const non_long double
> B3 = 0x1.5555555555555800p-3,	/* some wrong value */
> B4 = 0x1.5555555555555800p-5;	/* some wrong value */
> 
> This is the only obvious problem.
> 
> For a quick check that this is the only problem, try compiling with clang.
> I think it is incompatible with gcc here and rounds to 64 bits.

I'll try compiling with clang tonight.   I know that B3 and B4
cannot be double as this leads to problems on amd64.  I'll see
about using LD80C for B3 and B4 or simply split these into
high and low parts.

> LD80C must be used for most long double constants in ld80 (unless the
> constants can be double).
> 
> I generated the following polynomial.  Using only 2 full-precision coeffs
> limit the accuracy, so B14 doesn't improve accuracy significantly.

(coefficients deleted)

> 
> Note that my P12 is your B13 (different by 1 for dividing by x in
> expm1(x)/x).

I see if chaning to your set of coefficients is on any help tonight.

> > Index: ld128/s_expl.c
> > ===================================================================
> > --- ld128/s_expl.c	(revision 240889)
> > +++ ld128/s_expl.c	(working copy)
> > @@ -259,3 +259,132 @@
> >  		return (t * twopkp10000 * twom10000);
> >  	}
> >  }
> > +
> > +static const long double
> > +T1 = -2.87682072451780927439219005993827491e-01L, /* log(1-1/4) */
> > +T2 =  2.23143551314209755766295090309834524e-01L; /* log(1+1/4) */
> 
> Now I see that these aren't critical.  But isn't T1 = -0.25 not far
> enough negative?  Probably T2 being larger than necessary compensates.

The ld128 patch is hot off the keyboard as I wrote it on Sunday and
tested it yesterday.  I'm suspect the coefficients can use some tweaking.
I'll look over the coefficients that you sent later this week.

-- 
Steve

From owner-freebsd-numerics@FreeBSD.ORG  Tue Sep 25 20:47:08 2012
Return-Path: <owner-freebsd-numerics@FreeBSD.ORG>
Delivered-To: numerics@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id BE99A106564A
	for <numerics@freebsd.org>; Tue, 25 Sep 2012 20:47:08 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from mail04.syd.optusnet.com.au (mail04.syd.optusnet.com.au
	[211.29.132.185])
	by mx1.freebsd.org (Postfix) with ESMTP id 56B418FC14
	for <numerics@freebsd.org>; Tue, 25 Sep 2012 20:47:07 +0000 (UTC)
Received: from c122-106-157-84.carlnfd1.nsw.optusnet.com.au
	(c122-106-157-84.carlnfd1.nsw.optusnet.com.au [122.106.157.84])
	by mail04.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
	q8PKksAF012163
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Wed, 26 Sep 2012 06:47:00 +1000
Date: Wed, 26 Sep 2012 06:46:54 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Steve Kargl <sgk@troutmask.apl.washington.edu>
In-Reply-To: <20120924231208.GA22960@troutmask.apl.washington.edu>
Message-ID: <20120926063517.N3207@besplex.bde.org>
References: <20120925035402.C1433@besplex.bde.org>
	<20120924195121.GA22138@troutmask.apl.washington.edu>
	<20120925073508.M2077@besplex.bde.org>
	<20120924231208.GA22960@troutmask.apl.washington.edu>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: numerics@freebsd.org
Subject: Re: LDBL_MAX broken on i386
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
	<freebsd-numerics.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
	<mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
	<mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 25 Sep 2012 20:47:08 -0000

On Mon, 24 Sep 2012, Steve Kargl wrote:

> On Tue, Sep 25, 2012 at 08:28:34AM +1000, Bruce Evans wrote:
> ...
>> LDBL_MAX stopped working with this change to gcc.  From gcc/ChangeLog-
>> 2003:
>>
>> % 2003-07-01  Richard Henderson  <rth@redhat.com>
>> % 	    (blame to: Loren James Rittle  <ljrittle@acm.org>)
>> %
>> % 	* real.h (ieee_extended_intel_96_round_53_format): New.
>> % 	* real.c (ieee_extended_intel_96_round_53_format): New.
>> % 	* config/i386/freebsd.h (SUBTARGET_OVERRIDE_OPTIONS): Use it
>> % 	for XFmode and TFmode.
>>
>> This was well enough discussed in FreeBSD lists before it was committed
>> (or even developed?), and I agreed with it at the time, but didn't
>> understand its full awfulness.  gcc-4.2.1 still hard-configures the
>> flag that enables this: from gcc/config/freebsd.h:
>>
>> % /* FreeBSD sets the rounding precision of the FPU to 53 bits.  Let the
>> %    compiler get the contents of <float.h> and std::numeric_limits correct.  */
>> % #undef  TARGET_96_ROUND_53_LONG_DOUBLE
>> % #define TARGET_96_ROUND_53_LONG_DOUBLE (!TARGET_64BIT)
>
> I just checked the head of gcc, the above is still present.
> This suggests to me that i386 FreeBSD will never be free of
> the npx feature of setting the FPU to 53 bits.

We can always change it in our version.

>> so i386 always gets it and amd64 never does.  There seems to be no way to
>> get 53-bit rounding for expressions without getting it for literals.
>
> I think you're correct about literals.  gcc, since about version
> 4.5.x, uses MPFR to do constant-folding and it does this in the
> precision of literal constant as determined by gcc.  On the bright
> side, MPFR claims to correctly round the folding.

I thought it already used something sophisticated with correct rounding
to do constant folding.

> ...
>> So gcc's values are perfectly consistent, but are correct if no one
>> ever creates values with the lower 11 bits set.
>
> So, if I understand the above, should we try to correct float.h
> to have 21 (36) digits on ld80 (ld128)?  Doing we limit LDBL_MAX
> on i386 to 2**(emax - 11)?

17 digits are enough on i386.  Extras are harmless if correctly rounded.
ld128 has 1 extra but I think with correct rounding.  The most critical
change is the last one.  Except it is wrong for clang.  The predefines
can be used to agreee with the compiler, but I prefer to see the actual
values in float.h.  That means idefs are required for clang.  Several
other ifdefs are required for clang.

Bruce

From owner-freebsd-numerics@FreeBSD.ORG  Tue Sep 25 22:16:12 2012
Return-Path: <owner-freebsd-numerics@FreeBSD.ORG>
Delivered-To: numerics@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 5786F10657D5
	for <numerics@FreeBSD.org>; Tue, 25 Sep 2012 22:16:12 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from mail04.syd.optusnet.com.au (mail04.syd.optusnet.com.au
	[211.29.132.185])
	by mx1.freebsd.org (Postfix) with ESMTP id D26B18FC16
	for <numerics@FreeBSD.org>; Tue, 25 Sep 2012 22:16:08 +0000 (UTC)
Received: from c122-106-157-84.carlnfd1.nsw.optusnet.com.au
	(c122-106-157-84.carlnfd1.nsw.optusnet.com.au [122.106.157.84])
	by mail04.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
	q8PMG5I7026222
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Wed, 26 Sep 2012 08:16:06 +1000
Date: Wed, 26 Sep 2012 08:16:05 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Steve Kargl <sgk@troutmask.apl.washington.edu>
In-Reply-To: <20120925202228.GA31099@troutmask.apl.washington.edu>
Message-ID: <20120926071841.T3491@besplex.bde.org>
References: <20120925154606.GA28919@troutmask.apl.washington.edu>
	<20120926021900.R2081@besplex.bde.org>
	<20120925202228.GA31099@troutmask.apl.washington.edu>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: numerics@FreeBSD.org
Subject: Re: [PATCH] implementation for expm1l()
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
	<freebsd-numerics.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
	<mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
	<mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 25 Sep 2012 22:16:12 -0000

On Tue, 25 Sep 2012, Steve Kargl wrote:

> On Wed, Sep 26, 2012 at 04:39:36AM +1000, Bruce Evans wrote:
>> On Tue, 25 Sep 2012, Steve Kargl wrote:
> ...
>>> Index: ld80/s_expl.c
>>> ===================================================================
>>> --- ld80/s_expl.c	(revision 240889)
>>> +++ ld80/s_expl.c	(working copy)
>>> @@ -301,3 +301,149 @@
>>> ...
>>> +#if 0
>>> +static const long double
>>> +T1 = -0x1.269621134db92786p-2L, /* log(1-1/4) = -2.876820724517809275e-01L */
>>> +T2 =  0x1.c8ff7c79a9a21ac4p-3L; /* log(1+1/4) =  2.231435513142097558e-01L */
>>> +#else
>>> +static const long double
>>> +T1 = -0.25L,
>>> +T2 =  0.25L;
>>> +#endif
>>
>> Apparently these thresholds aren't critical (since fuzzy ones work
>> on amd64). The fuzzy ones should be declared double.
>
> The fuzzy ones are left over from an attempt to use a 2nd lookup
> table on the range [-0.25:0.25].   This has the nice property that
> T2 - T1 = 0.5 and dividing this up into 2**n segments gives a
> delete of the form 0x1.0p-7.
>
> The log(1+-1/4) expressions are adopted from Tang's paper where he
> gives an error analysis to establish an upper bound on the max
> ULP.

Do the intervals have anything to do with this polynomial?  As I
understand it (incompletely), we just use a special approximation
for |x| < ~0.25 where the one for exp(x) is not good enough, and
otherwise we use (exp(x) - 1) manually inlined.

Anyway, I regenerated the polynomial for Tang's range.  It is almost
1 bit more accurate.

>>> +
>>> +/*
>>> + * Remes polynomial coefficients on the interval [T1:T2] with an error
>>> + * estimate of log2(|expm1(x)-p(x)|) ~ -74.  These are the coefficients
>>
>> It's expm1(x)/x that must be approximated by p(x)/x, not expm1(x) by
>> p(x).  This combined with the nonstandard comment style confused me for
>> a while.  (Plain expm1(x) is just as easy to approximate as exp(x) and
>> doesn't require any long double coefficients.)
>
> The actual function that I approximated was (expm1(x)-1-x-x**2/2)/x**3.
> Tang's algorithm assumes that one compute x+x**2/2 with extra precision.
> He does this by decomposing x into high and low parts.  I suppose I
> should update the comment to reflect the function I approximate.

I should have done that too, but it complicates understanding and
calculating the relative error.  The result is near x, so the relative
error is obtained by dividing by nearly x.  This happens naturally for
(exp(x)-1)/x.  For your function (fixed) of (exp(x)-1-x-x**2/2)/x**3,
the first term is 1/6 and I think the error must be divided by 1/6 --
minimizing the error gives the same result, but this division is needed
to give the actual error.

What I actually do gives an approximation with first to terms exactly 1
and 0.5, so it gives the same result.  I check this manually.  My
algorithm might prefer to change these constants, so I use special code
to force them for several functions that want them for the first 2
coeffs.  I do this for exmp1 for the first coeff only, although I do
it for the second coeff for most functions where it applies.

>> I generated the following polynomial.  Using only 2 full-precision coeffs
>> limit the accuracy, so B14 doesn't improve accuracy significantly.
>
> (coefficients deleted)

Now using Tang's interval:

@ P0 =  1.00000000000000000000e0L,		/*  0x8000000000000000.0p-63L */
@ P1 =  5.00000000000000000000e-1L,		/*  0x8000000000000000.0p-64L */
@ P2 =  1.66666666666666666712e-1L,		/*  0xaaaaaaaaaaaaaaae.0p-66L */

The hex formats are designed for showing the mantissa bits unshifted,
but bits after the point would work equally well and would obfuscate
the exponent less.  E.g., 1.0 shouldn't be expressed as 0x1.00...p0L
since this shifts the bits, but it can be 0x0.8000000000000000p1L, which
can be read as 0.5*2**1.

@ P2hi =  1.6666666666666666e-1,		/*  0x15555555555555.0p-55 */
@ P2lo =  9.2970336290632005e-18,		/*  0x15700000000000.0p-109 */
@ P3 =  4.16666666666666666678e-2L,		/*  0xaaaaaaaaaaaaaaab.0p-68L */
@ P3hi =  4.1666666666666664e-2,		/*  0x15555555555555.0p-57 */
@ P3lo =  2.3140940118987485e-18,		/*  0x15580000000000.0p-111 */
@ P4 =  8.3333333333333211e-3,		/*  0x1111111111110a.0p-59 */
@ P5 =  1.3888888888888794e-3,		/*  0x16c16c16c16beb.0p-62 */
@ P6 =  1.9841269841376342e-4,		/*  0x1a01a01a023996.0p-65 */
@ P7 =  2.4801587302997418e-5,		/*  0x1a01a01a07f9db.0p-68 */
@ P8 =  2.7557318820808094e-6,		/*  0x171de39faa2d52.0p-71 */
@ P9 =  2.7557312008157410e-7,		/*  0x127e4f663a69b6.0p-74 */
@ P10 =  2.5052802881898768e-8,		/*  0x1ae67646e4deb3.0p-78 */
@ P11 =  2.0892043051363171e-9,		/*  0x11f235775d156c.0p-81 */
@ P12 =  1.5615878424451485e-10,		/*  0x15765948fa3f05.0p-85 */
@ 
@ 6.857e-23 |"''''''"''''''''''''''''''''''''''''''''''''''''x'''''''''''''"
@           |:       x                                      " _            :
@           |:     _ :           "_                           :            :
@           |:     :  :         "                          _   :       _   :
@           |:     :  :            _                       :   :       :   :
@           |::   :   _        _   :                      :    x       :"  :
@           |::   :   :        :    :                     :    :      : :  :
@           |::   :   :       :     :                     _    :      : :  :
@           |::   _    :      :     "                     :     :     " :  :
@           |:x   :    :      _     :                    :      :     :  : :
@           : :   :    :      :      :                   :      x     :  : :
@           :`:```:````"``````:``````x`````````""````````_``````:````:```::`
@           : :  :     :     :       :        "  "       :      :    :   ::|
@           : :  :     :     :        :      _    _     :        :   :   _:|
@           :  : :      :    "        _                 :        :   _   ::|
@           :  : :      :    :        :     _      _    "        _   :   ::|
@           :  : "      x   :          :                         :   :   ::|
@           :  : :      :   :          x   _        _  _         :  :     :|
@           :  : :       :  "                        _            : :     :|
@           :  ":        :              "_"           "           : _     :|
@           _   :        " _                                      "       :|
@ -6.64e-23 |..."........._........................................_......_|
@           -0.288                                                     0.224
@ adj cp = 64 range ~[-6.6356e-23, 6.8973e-23] 2**-73.618

When choosing endpoints, I just use decimal and round up the last place.
Experience shows that using a slightly wider than necessary interval
doesn't allow using fewer coeffs.  Even rounding down is usually harmless,
since the endpoints shouldn't be poles so everything is continuous and
the approximation doesn't diverge very far for a small rounding error.

Now for ld128:

# P0 =  1.0000000000000000000000000000000000e0L,		/*  0x10000000000000000000000000000.0p-112L */

Now 1.0 is well expressed as 0x1.0000000000000000000000000000.0p0L.
Similarly for d64 (the number of mantissa bits is 4n+1).

# P1 =  5.0000000000000000000000000000000000e-1L,		/*  0x10000000000000000000000000000.0p-113L */
# P2 =  1.6666666666666666666666666666666661e-1L,		/*  0x15555555555555555555555555553.0p-115L */
# P3 =  4.1666666666666666666666666666665461e-2L,		/*  0x1555555555555555555555555548d.0p-117L */
# P4 =  8.3333333333333333333333333333702524e-3L,		/*  0x111111111111111111111111170ea.0p-119L */
# P5 =  1.3888888888888888888888888894888658e-3L,		/*  0x16c16c16c16c16c16c16c16f2191f.0p-122L */
# P6 =  1.9841269841269841269841269038522581e-4L,		/*  0x1a01a01a01a01a01a01a005a77f70.0p-125L */
# P7 =  2.4801587301587301587301479770823380e-5L,		/*  0x1a01a01a01a01a01a01979505fa4a.0p-128L */
# P8 =  2.7557319223985890652565291171436921e-6L,		/*  0x171de3a556c7338faae15d1232a5c.0p-131L */
# P9 =  2.7557319223985890653527586372574888e-7L,		/*  0x127e4fb7789f5c72fb0453795a335.0p-134L */
# P10 =  2.5052108385441718734295029309253960e-8L,		/*  0x1ae64567f544e38cc000f34e989ac.0p-138L */
# P11 =  2.0876756987868093985249237137054359e-9L,		/*  0x11eed8eff8d896802862ae8d8dbd9.0p-141L */
# P12 =  1.6059043836821721373405224365590722e-10L,		/*  0x16124613a86d32cb06578b4f2c75d.0p-145L */
# P13 =  1.1470745597745053e-11,		/*  0x193974a8c0a1ad.0p-89 */
# P14 =  7.6471637317124909e-13,		/*  0x1ae7f3e73218ed.0p-93 */
# P15 =  4.7794773046283608e-14,		/*  0x1ae7f3e4948c15.0p-97 */
# P16 =  2.8114571580397259e-15,		/*  0x1952c761a306e5.0p-101 */
# P17 =  1.5619480336357893e-16,		/*  0x168292355a0a63.0p-105 */
# P18 =  8.2237292848800199e-18,		/*  0x12f66ed54ff940.0p-109 */
# P19 =  3.9981067521176347e-19,		/*  0x1d8035cafa761f.0p-114 */
# 
# 1.048e-37 _'''''_'''''''''''''''''''''''''''''''''''''''''''''''''''''''"|
#           : x   :                                                       :|
#           : :  ::                _                                      :|
#           : :  _ :                _                         "           :|
#           : :  : :     _x       x :                        ":           :|
#           : :: : _     ::       :  :              x_       : :         ::|
#           |: : : :     : :     :   x             x :      :  "     "   ::|
#           |: ::  :    :  _     :   :                :     :  :     :   ::|
#           |: ::  :    :  :     x    :           x   "     "  :    : :  ::|
#           |: ":  :    _  :     :    _               :     :   :   : x  ::|
#           |:  "   :   :   :    :    :     x"x  "     :    :   :   x :  x:|
#           `_``````:```:```:```:``````:```"```""``````_```:````:```:`:``:`:
#           |       :   :   :   :      x  _            :   :    "   : :  : :
#           |       :  :    "   x       __              :  "    :  :  :  : :
#           |       _  :    :   :                       :  :    :  :   : : :
#           |       :  :     : :                        " :      : :   : : :
#           |       :  _     : :                          x      : _   ::  :
#           |       :  :     _ _                         "       : :   ::  :
#           |        : :                                         ":    x:  :
#           |        ::       x                                   :    ::  :
#           |        ::                                           "     :  :
# -1.01e-37 |........"_................................................._.."
#           -0.288                                                     0.224
# adj cp = 113 range ~[-1.0660e-37, 1.0660e-37] 2**-122.82

ld128 is much more resistant to reducing to double precision -- it needs
more than half of the coeffs to have full precision.  Double precision
for P11 onwards is unusable.  Double precision for P12 onwards loses
a couple of bits.  There are a couple to spare, but I prefer not to
reduce if the result loses many bits compared with infinite precision.

Bruce

From owner-freebsd-numerics@FreeBSD.ORG  Wed Sep 26 03:14:09 2012
Return-Path: <owner-freebsd-numerics@FreeBSD.ORG>
Delivered-To: numerics@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id C3A08106564A
	for <numerics@freebsd.org>; Wed, 26 Sep 2012 03:14:09 +0000 (UTC)
	(envelope-from sgk@troutmask.apl.washington.edu)
Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu
	[128.95.76.21]) by mx1.freebsd.org (Postfix) with ESMTP id 5FBD08FC08
	for <numerics@freebsd.org>; Wed, 26 Sep 2012 03:14:09 +0000 (UTC)
Received: from troutmask.apl.washington.edu (localhost.apl.washington.edu
	[127.0.0.1])
	by troutmask.apl.washington.edu (8.14.5/8.14.5) with ESMTP id
	q8Q3E8eX033714; Tue, 25 Sep 2012 20:14:08 -0700 (PDT)
	(envelope-from sgk@troutmask.apl.washington.edu)
Received: (from sgk@localhost)
	by troutmask.apl.washington.edu (8.14.5/8.14.5/Submit) id
	q8Q3E8u9033713; Tue, 25 Sep 2012 20:14:08 -0700 (PDT)
	(envelope-from sgk)
Date: Tue, 25 Sep 2012 20:14:07 -0700
From: Steve Kargl <sgk@troutmask.apl.washington.edu>
To: Bruce Evans <brde@optusnet.com.au>
Message-ID: <20120926031407.GA33693@troutmask.apl.washington.edu>
References: <20120925154606.GA28919@troutmask.apl.washington.edu>
	<20120926021900.R2081@besplex.bde.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20120926021900.R2081@besplex.bde.org>
User-Agent: Mutt/1.5.21 (2010-09-15)
Cc: numerics@freebsd.org
Subject: Re: [PATCH] implementation for expm1l()
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
	<freebsd-numerics.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
	<mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
	<mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 26 Sep 2012 03:14:09 -0000

On Wed, Sep 26, 2012 at 04:39:36AM +1000, Bruce Evans wrote:
> On Tue, 25 Sep 2012, Steve Kargl wrote:
> 
> It's expm1(x)/x that must be approximated by p(x)/x, not expm1(x) by
> p(x).  This combined with the nonstandard comment style confused me for
> a while.  (Plain expm1(x) is just as easy to approximate as exp(x) and
> doesn't require any long double coefficients.)
> 
> > + * rounded to long double precision where B5 through B14 can be used with
> > + * a double type.
> > + */
> > +static const long double
> > +B3 = 0x1.5555555555555556p-3L,	/* 1.66666666666666666671e-01 */
> > +B4 = 0x1.5555555555555556p-5L;	/* 4.16666666666666666678e-02 */
> 
> These 2 aren't actually rounded to long double precision on i386, so the
> point of having them in more precision than the others is completely lost.
> gcc rounds them to nearest, so the result is the same as:
> 
> static const non_long double
> B3 = 0x1.5555555555555800p-3,	/* some wrong value */
> B4 = 0x1.5555555555555800p-5;	/* some wrong value */
> 
> This is the only obvious problem.
> 

You nailed it on the head.  I haven't tried your coefficient, yet.
I just regenerated a new set and split B3 and B4 into high and 
low parts.  With these coefficients:

/*
 * Remes polynomial coefficients on the interval [T1:T2] with an error
 * estimate of log2(|f(x)-p(x)|) ~ -68 with f(x)=(expm1(x)-1-x-x**2/2)/x**3.
 */
static const long double
B3hi = 0x1.5555555555555p-3, B3lo = 0x1.5554d1fb0e211p-57,
B4hi = 0x1.5555555555555p-5, B4lo = 0x1.66bec322050bep-59,
B5 = 0x1.1111111111111p-7,
B6 = 0x1.6c16c16c16b69p-10,
B7 = 0x1.a01a01a019ee2p-13,
B8 = 0x1.a01a01a10d6f9p-16,
B9 = 0x1.71de3a568cf37p-19,
B10 = 0x1.27e4f2c784f73p-22,
B11 = 0x1.ae644762b882bp-26,
B12 = 0x1.1f3300cf9538ap-29,
B13 = 0x1.6181ba21eb17dp-33;

I needed to re-arrange the relevant section of code to add
terms in the correct order:

		x2 = x * x;
		t1 = (long double)B3lo + x * B4lo + x2 * B5 + x * B4hi + B3hi;

With these changes, I see

% ./testl -m -0.25 -M 0.25 
Interval tested: [-0.250000:0.250000]
    Max ULP: 0.54516
  x Max ULP: -2.45493495493495494e-01, -0x1.f6c54b3433c96bfap-3
      Count: 1000000

I test with coefficients in a bit.

-- 
Steve

From owner-freebsd-numerics@FreeBSD.ORG  Wed Sep 26 05:06:54 2012
Return-Path: <owner-freebsd-numerics@FreeBSD.ORG>
Delivered-To: numerics@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 83F22106566B
	for <numerics@freebsd.org>; Wed, 26 Sep 2012 05:06:54 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from mail12.syd.optusnet.com.au (mail12.syd.optusnet.com.au
	[211.29.132.193])
	by mx1.freebsd.org (Postfix) with ESMTP id 1B1438FC0C
	for <numerics@freebsd.org>; Wed, 26 Sep 2012 05:06:53 +0000 (UTC)
Received: from c122-106-157-84.carlnfd1.nsw.optusnet.com.au
	(c122-106-157-84.carlnfd1.nsw.optusnet.com.au [122.106.157.84])
	by mail12.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
	q8Q56XuP010967
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Wed, 26 Sep 2012 15:06:46 +1000
Date: Wed, 26 Sep 2012 15:06:33 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Steve Kargl <sgk@troutmask.apl.washington.edu>
In-Reply-To: <20120926031407.GA33693@troutmask.apl.washington.edu>
Message-ID: <20120926140942.K4521@besplex.bde.org>
References: <20120925154606.GA28919@troutmask.apl.washington.edu>
	<20120926021900.R2081@besplex.bde.org>
	<20120926031407.GA33693@troutmask.apl.washington.edu>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: numerics@freebsd.org
Subject: Re: [PATCH] implementation for expm1l()
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
	<freebsd-numerics.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
	<mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
	<mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 26 Sep 2012 05:06:54 -0000

On Tue, 25 Sep 2012, Steve Kargl wrote:

> On Wed, Sep 26, 2012 at 04:39:36AM +1000, Bruce Evans wrote:
>> On Tue, 25 Sep 2012, Steve Kargl wrote:
>>> +static const long double
>>> +B3 = 0x1.5555555555555556p-3L,	/* 1.66666666666666666671e-01 */
>>> +B4 = 0x1.5555555555555556p-5L;	/* 4.16666666666666666678e-02 */
>>
>> These 2 aren't actually rounded to long double precision on i386, so the
>> point of having them in more precision than the others is completely lost.
>> gcc rounds them to nearest, so the result is the same as:
>>
>> static const non_long double
>> B3 = 0x1.5555555555555800p-3,	/* some wrong value */
>> B4 = 0x1.5555555555555800p-5;	/* some wrong value */
>>
>> This is the only obvious problem.
>
> You nailed it on the head.  I haven't tried your coefficient, yet.
> I just regenerated a new set and split B3 and B4 into high and
> low parts.  With these coefficients:

Don't do that.  Use LD80C() to simplify generation of the constants and
avoid bugs.

> /*
> * Remes polynomial coefficients on the interval [T1:T2] with an error
> * estimate of log2(|f(x)-p(x)|) ~ -68 with f(x)=(expm1(x)-1-x-x**2/2)/x**3.
> */
> static const long double
> B3hi = 0x1.5555555555555p-3, B3lo = 0x1.5554d1fb0e211p-57,
> B4hi = 0x1.5555555555555p-5, B4lo = 0x1.66bec322050bep-59,

> ...
> I needed to re-arrange the relevant section of code to add
> terms in the correct order:
>
> 		x2 = x * x;
> 		t1 = (long double)B3lo + x * B4lo + x2 * B5 + x * B4hi + B3hi;

This is slower and more complicated than using LD80C(), especially on
arches (all except i386) where LD80C() is not needed.  Before LD80C()
existed, this should have been written as:

 		t1 = B3 + x * B4;

(no change), where B3 and B4 give the constants with ifdefs:

#ifdef __i386__
static const double		/* split values */
B3hi = ...,			/* values in bits */
B4hi = ...;
static const volatile double	/* volatile for run-time addition */
B3lo = ...,
B4lo = ...;
#define	B3	((long double)B3hi + B3lo)
#define	B4	((long double)B4hi + B4lo)
#else
static const long double	/* unsplit values */
B3 = ...,			/* slightly different values in bits */
B4 = ...;
#endif

Copy this method from a nearby file, e.g. k_sinl.c, but omit the __amd64__
ifdef there (except for testing; it was intended as an optimization,
but made little difference even on non-Intel CPUs where loading long
doubles is slow, because the load latency is usually hidden by
scheduling).  But you shouldn't want to do this.  Use LD80C() so that
the declarations take 1 line per additional constant with no visible
ifdefs (they are in the implementation of LD80C(), instead of 5 lines
each like the above, with ifdefs.

See the nearby files invtrig.[ch] and the non-nearby files ../*/invtrig.c
for an alternative old not-so-good way of declaring these values.  The
other way produces long doubles directly using bits much like LD80C()
internally, but has the equivalent of even more convoluted ifdefs and
macros then the above.  For example, for the long double value pio2_hi:
- it is #defined as atanhi[3]
- for ld128, there no further layers (splitting is not supported); for
   ld80, there are the following complications:
- atanhi is #defined as _ItL_atanhi
- atanhi is declared as extern const LONGDOUBLE in ld*/invtrig.h
- LONGDOUBLE is long double for !i386 and a mispacked struct for i386.
   Mispacking gives slowness via misalignment, especially in arrays
   like atanhi, where 8-byte alignment of the first element ensures
   perfect misalignment for odd elements lik atanhi[3].  This strcuct
   just contains bits and the long double values are obtained by a
   type pun, so the compiler can't align things properly (alignment
   for the uint64_t in the struct is 4 bytes, but long doubles should
   have at least 8-byte alignment and preferably 16-byte alignment.
- for !i386, ld80/invtrig.c declares _ItL_atanhi as an array of long
   doubles with ordinary long double constants.  It has a lot of style
   bugs (mainly missing hex values in comments.  Another feature of
   LD80C() is that it doesn't permit omitting either the hex or the
   decimal value.  Both are (not quite) active code).
- for i386, i387/invtrig.c declares _ItL_atanhi as an array of
   LONGDOUBLE, where LONGDOUBLE is now a struct and the initializers
   give values in bits.

So to add a new constant using this method, you need to understand some
of above and perform the following edits:
- if not for invtrig, first duplicate invtrig's messes in the relevant
   subsystem (don't pollute invtrig files with !invtrig declarations).
   Then apply the following to the relevant files.
- add a macro and an extern declaration in ld80/invtrig.h.  (This complication
   is mostly related.  It is mostly to use extern constants.  There are
   namespace problems then...)
- add an extern declaration in ld128/invtrig.h
- add a normal const long double declaration to ld80/invtrig.c and
   ld128/invtrig.c
- add a const LONGDOUBLE declaration using bits to i387/invtrig.c.

You shouldn't want to do this.

I have only partly cleaned up invtrig.  I fixed the misalignment and
removed 1 layer of macros.  I don't use LD80C() there yet.

> With these changes, I see
>
> [... correct results]

Bruce