From owner-freebsd-numerics@freebsd.org  Fri Mar  8 20:53:33 2019
Return-Path: <owner-freebsd-numerics@freebsd.org>
Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id D8E011528C0D
 for <freebsd-numerics@mailman.ysv.freebsd.org>;
 Fri,  8 Mar 2019 20:53:32 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from mail104.syd.optusnet.com.au (mail104.syd.optusnet.com.au
 [211.29.132.246])
 by mx1.freebsd.org (Postfix) with ESMTP id E25388B976
 for <freebsd-numerics@freebsd.org>; Fri,  8 Mar 2019 20:53:29 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from [192.168.0.102] (c110-21-101-228.carlnfd1.nsw.optusnet.com.au
 [110.21.101.228])
 by mail104.syd.optusnet.com.au (Postfix) with ESMTPS id 13F37433BA5;
 Sat,  9 Mar 2019 07:53:26 +1100 (AEDT)
Date: Sat, 9 Mar 2019 07:53:24 +1100 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Steve Kargl <sgk@troutmask.apl.washington.edu>
cc: freebsd-numerics@freebsd.org
Subject: Re: cexpl() (was: Re: Update ENTERI() macro)
In-Reply-To: <20190308191150.GA32980@troutmask.apl.washington.edu>
Message-ID: <20190309070413.G2539@besplex.bde.org>
References: <20190306184829.GA44023@troutmask.apl.washington.edu>
 <20190307061214.R4911@besplex.bde.org>
 <20190306214233.GA23159@troutmask.apl.washington.edu>
 <20190307144315.N932@besplex.bde.org>
 <20190307044447.GA16298@troutmask.apl.washington.edu>
 <20190307163958.V1293@besplex.bde.org>
 <20190307195436.GA20856@troutmask.apl.washington.edu>
 <20190308225807.D6410@besplex.bde.org>
 <20190308162432.GA31392@troutmask.apl.washington.edu>
 <20190309033820.J1443@besplex.bde.org>
 <20190308191150.GA32980@troutmask.apl.washington.edu>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Optus-CM-Score: 0
X-Optus-CM-Analysis: v=2.2 cv=P6RKvmIu c=1 sm=1 tr=0
 a=PalzARQSbocsUSjMRkwAPg==:117 a=PalzARQSbocsUSjMRkwAPg==:17
 a=kj9zAlcOel0A:10 a=R02BefTfkBCJ0zfWrscA:9 a=CjuIK1q_8ugA:10
X-Rspamd-Queue-Id: E25388B976
X-Spamd-Bar: ------
Authentication-Results: mx1.freebsd.org;
 spf=pass (mx1.freebsd.org: domain of brde@optusnet.com.au designates
 211.29.132.246 as permitted sender) smtp.mailfrom=brde@optusnet.com.au
X-Spamd-Result: default: False [-6.23 / 15.00]; ARC_NA(0.00)[];
 NEURAL_HAM_MEDIUM(-1.00)[-1.000,0];
 RCVD_IN_DNSWL_LOW(-0.10)[246.132.29.211.list.dnswl.org : 127.0.5.1];
 FROM_HAS_DN(0.00)[]; TO_DN_SOME(0.00)[];
 R_SPF_ALLOW(-0.20)[+ip4:211.29.132.0/23];
 FREEMAIL_FROM(0.00)[optusnet.com.au];
 MIME_GOOD(-0.10)[text/plain]; DMARC_NA(0.00)[optusnet.com.au];
 NEURAL_HAM_LONG(-1.00)[-1.000,0];
 TO_MATCH_ENVRCPT_SOME(0.00)[];
 MX_GOOD(-0.01)[cached: extmail.optusnet.com.au];
 RCPT_COUNT_TWO(0.00)[2]; NEURAL_HAM_SHORT(-0.97)[-0.968,0];
 IP_SCORE(-2.95)[ip: (-7.69), ipnet: 211.28.0.0/14(-3.90), asn: 4804(-3.11),
 country: AU(-0.04)]; RCVD_NO_TLS_LAST(0.10)[];
 FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[];
 FREEMAIL_ENVFROM(0.00)[optusnet.com.au];
 ASN(0.00)[asn:4804, ipnet:211.28.0.0/14, country:AU];
 MIME_TRACE(0.00)[0:+]; RCVD_COUNT_TWO(0.00)[2]
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
 <freebsd-numerics.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics/>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 08 Mar 2019 20:53:33 -0000

On Fri, 8 Mar 2019, Steve Kargl wrote:

> On Sat, Mar 09, 2019 at 05:02:57AM +1100, Bruce Evans wrote:
>> On Fri, 8 Mar 2019, Steve Kargl wrote:
>>> On Sat, Mar 09, 2019 at 12:51:07AM +1100, Bruce Evans wrote:
>>>> On Thu, 7 Mar 2019, Steve Kargl wrote:
>>>> ...
>>>> IIRC, that was done by extracting into a 32-bit lx.  This could be written
>>>> as (lx >> 32) > 0xb17217f7 (or a fuzzier value), and it might be better
>>>> to do that anyway.
>>>
>>> This is what I do in my WIP ccosh.  There are 3 thresholds
>>> of ~22.9, ~11356, and ~22755.  I'm casting the result to
>>> uint32_t, so something like (uint32_t)(lx >> 32) > 0xb17000000.
>>> Do I need the cast and/or do you prefer it?
>>
>> I think you can even round to a power of 2 and not look at any mantissa
>> bits to classify this, as for the 64 in coshl().
>
> You used your k_hexpl() and take advantage of the hi+lo splitting
> to use x < 64.

Or I designed my k_hexpl() to support this splitting up to x < 64.  Long
double precision is easier because it has (committed) support functions
like k_hexpl().

>> 22.9 here seems wrong.  It is approximately the threshold where cosh()
>> decays to exp().  The threshold for coshl() delay is much higher.
>> coshl() uses 64, which is high enough for both ld80 and ld128.  More
>> like 32 would work for ld80.
>
> I actually round up to 23 for ld80, and haven't thought too much
> about ld128.  A value of 22 is used for s_ccosh.c, which isn't
> quite large enough for long double. (Yes, I looked at bits)
>
> % ./gen 22
> coshl(22)    = 1.79245642306579578097e+09
> expl(22) / 2 = 1.79245642306579578086e+09
> sinhl(22)    = 1.79245642306579578074e+09
>
> % ./gen 23
> coshl(23)    = 4.87240172312445130013e+09
> expl(23) / 2 = 4.87240172312445130013e+09
> sinhl(23)    = 4.87240172312445130013e+09

Look at more bits :-).  For r + 1/r, ignoring the 1/r term gives an error
of about 1 ulp in ld80 precision when 1/r/r is about 2**-64.  This gives
a threshold of about log(2**(64/2)) = 22.18 for accuracy of about 1 ulp.
But we want more accuracy than that.  For 0.5 ulps, log(2**(65/2) = 22.52+.
We want more than that too.  expl() gives almost 10 extra bits of accuracy,
and the hyperbolic functions use the expl kernel to get all these extra
bits.  expl(x) itself uses a threshold of 2**-75 instead of 2**-65 for 
x being so tiny that the x**2/2 term and higher terms are negligible
compared with the x term.  So we should try for 11 extra bits here.  This
gives a threshold of log(2**(75/2)) = 25.99++.

ld128 expl() is missing the change corresponding to changing 65 to 75 for
the x vs x**2/2 threshold.  It still uses 114.  After fixing this to 124,
the corresponding threshold here is log(2**124/2) = 42.97+.

> in s_ccoshl.c I then have
>
> 	if (ix < 0x4003 ||      /* |x| < ~23: normal case */
> 	    (ix == 0x4003 && (uint32_t)(lx >> 32) < 0xb8000000))
> 		RETURNI(CMPLXL(coshl(x) * c, sinhl(x) * s));

k_hexpl() can't handle this case.  (But if you want efficiency or accuracy,
then you should implement sinhcoshl() or just use the kernel to evaluate
exp(x) only once.)

I think this formula works up to 32 or 64 or even up to the overflow
threshold.  coshl(x) and sinhl() decay to expl(x) starting a bit above
23, but, but these functions can optimize that almost as well as here
(they actually don't bother to until x reaches 64).

You can use this to eliminate the check of lx.

>
> 	/* |x| >= 23, so cosh(x) ~= exp(|x|) / 2 */
> 	if (ix < 0x400c ||
> 	    (ix == 0x400c && (uint32_t)(lx >> 32) <= 0xb1700000)) {
> 		/* x < 11356: exp(|x|) won't overflow */
> 		h = expl(fabsl(x)) / 2;
> 		RETURNI(CMPLXL(h * c, copysignl(h, x) * s));

Use a power of 2 for the upper threshold too if possible.

> With |x| < 23, we have 2 function calls for coshl() and sinhl()
> as opposed to the 1 function call for expl() in 23 <= x <= 32 range.
> I can write the first conditional as
>
> 	if (ix < 0x4004)     /* |x| < 32: normal case */
> 		RETURNI(CMPLXL(coshl(x) * c, sinhl(x) * s));

The general formula is indeed unecessarily slow in that range.  Also
for the more important range [1, 23].  coshl() in the lower range is
lo + 0.25/(hi + lo) + hi after h_expl() to get hi and lo.  Similarly
for sinhl().  These formulas are short enough to repeat in the above,
but I think this optimization is excessive.  It is better for accuracy
after some rearrangement.

> This then raises the question on whether we should change the
> limit to 32 in the double complex ccosh()?

Do you mean from 64 to 32 the non-complex cosh(), or from your current
limit to the above?

Bruce