From owner-freebsd-numerics@freebsd.org Fri Mar 8 20:53:33 2019 Return-Path: Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id D8E011528C0D for ; Fri, 8 Mar 2019 20:53:32 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail104.syd.optusnet.com.au (mail104.syd.optusnet.com.au [211.29.132.246]) by mx1.freebsd.org (Postfix) with ESMTP id E25388B976 for ; Fri, 8 Mar 2019 20:53:29 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from [192.168.0.102] (c110-21-101-228.carlnfd1.nsw.optusnet.com.au [110.21.101.228]) by mail104.syd.optusnet.com.au (Postfix) with ESMTPS id 13F37433BA5; Sat, 9 Mar 2019 07:53:26 +1100 (AEDT) Date: Sat, 9 Mar 2019 07:53:24 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Steve Kargl cc: freebsd-numerics@freebsd.org Subject: Re: cexpl() (was: Re: Update ENTERI() macro) In-Reply-To: <20190308191150.GA32980@troutmask.apl.washington.edu> Message-ID: <20190309070413.G2539@besplex.bde.org> References: <20190306184829.GA44023@troutmask.apl.washington.edu> <20190307061214.R4911@besplex.bde.org> <20190306214233.GA23159@troutmask.apl.washington.edu> <20190307144315.N932@besplex.bde.org> <20190307044447.GA16298@troutmask.apl.washington.edu> <20190307163958.V1293@besplex.bde.org> <20190307195436.GA20856@troutmask.apl.washington.edu> <20190308225807.D6410@besplex.bde.org> <20190308162432.GA31392@troutmask.apl.washington.edu> <20190309033820.J1443@besplex.bde.org> <20190308191150.GA32980@troutmask.apl.washington.edu> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.2 cv=P6RKvmIu c=1 sm=1 tr=0 a=PalzARQSbocsUSjMRkwAPg==:117 a=PalzARQSbocsUSjMRkwAPg==:17 a=kj9zAlcOel0A:10 a=R02BefTfkBCJ0zfWrscA:9 a=CjuIK1q_8ugA:10 X-Rspamd-Queue-Id: E25388B976 X-Spamd-Bar: ------ Authentication-Results: mx1.freebsd.org; spf=pass (mx1.freebsd.org: domain of brde@optusnet.com.au designates 211.29.132.246 as permitted sender) smtp.mailfrom=brde@optusnet.com.au X-Spamd-Result: default: False [-6.23 / 15.00]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0]; RCVD_IN_DNSWL_LOW(-0.10)[246.132.29.211.list.dnswl.org : 127.0.5.1]; FROM_HAS_DN(0.00)[]; TO_DN_SOME(0.00)[]; R_SPF_ALLOW(-0.20)[+ip4:211.29.132.0/23]; FREEMAIL_FROM(0.00)[optusnet.com.au]; MIME_GOOD(-0.10)[text/plain]; DMARC_NA(0.00)[optusnet.com.au]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; TO_MATCH_ENVRCPT_SOME(0.00)[]; MX_GOOD(-0.01)[cached: extmail.optusnet.com.au]; RCPT_COUNT_TWO(0.00)[2]; NEURAL_HAM_SHORT(-0.97)[-0.968,0]; IP_SCORE(-2.95)[ip: (-7.69), ipnet: 211.28.0.0/14(-3.90), asn: 4804(-3.11), country: AU(-0.04)]; RCVD_NO_TLS_LAST(0.10)[]; FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[]; FREEMAIL_ENVFROM(0.00)[optusnet.com.au]; ASN(0.00)[asn:4804, ipnet:211.28.0.0/14, country:AU]; MIME_TRACE(0.00)[0:+]; RCVD_COUNT_TWO(0.00)[2] X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 08 Mar 2019 20:53:33 -0000 On Fri, 8 Mar 2019, Steve Kargl wrote: > On Sat, Mar 09, 2019 at 05:02:57AM +1100, Bruce Evans wrote: >> On Fri, 8 Mar 2019, Steve Kargl wrote: >>> On Sat, Mar 09, 2019 at 12:51:07AM +1100, Bruce Evans wrote: >>>> On Thu, 7 Mar 2019, Steve Kargl wrote: >>>> ... >>>> IIRC, that was done by extracting into a 32-bit lx. This could be written >>>> as (lx >> 32) > 0xb17217f7 (or a fuzzier value), and it might be better >>>> to do that anyway. >>> >>> This is what I do in my WIP ccosh. There are 3 thresholds >>> of ~22.9, ~11356, and ~22755. I'm casting the result to >>> uint32_t, so something like (uint32_t)(lx >> 32) > 0xb17000000. >>> Do I need the cast and/or do you prefer it? >> >> I think you can even round to a power of 2 and not look at any mantissa >> bits to classify this, as for the 64 in coshl(). > > You used your k_hexpl() and take advantage of the hi+lo splitting > to use x < 64. Or I designed my k_hexpl() to support this splitting up to x < 64. Long double precision is easier because it has (committed) support functions like k_hexpl(). >> 22.9 here seems wrong. It is approximately the threshold where cosh() >> decays to exp(). The threshold for coshl() delay is much higher. >> coshl() uses 64, which is high enough for both ld80 and ld128. More >> like 32 would work for ld80. > > I actually round up to 23 for ld80, and haven't thought too much > about ld128. A value of 22 is used for s_ccosh.c, which isn't > quite large enough for long double. (Yes, I looked at bits) > > % ./gen 22 > coshl(22) = 1.79245642306579578097e+09 > expl(22) / 2 = 1.79245642306579578086e+09 > sinhl(22) = 1.79245642306579578074e+09 > > % ./gen 23 > coshl(23) = 4.87240172312445130013e+09 > expl(23) / 2 = 4.87240172312445130013e+09 > sinhl(23) = 4.87240172312445130013e+09 Look at more bits :-). For r + 1/r, ignoring the 1/r term gives an error of about 1 ulp in ld80 precision when 1/r/r is about 2**-64. This gives a threshold of about log(2**(64/2)) = 22.18 for accuracy of about 1 ulp. But we want more accuracy than that. For 0.5 ulps, log(2**(65/2) = 22.52+. We want more than that too. expl() gives almost 10 extra bits of accuracy, and the hyperbolic functions use the expl kernel to get all these extra bits. expl(x) itself uses a threshold of 2**-75 instead of 2**-65 for x being so tiny that the x**2/2 term and higher terms are negligible compared with the x term. So we should try for 11 extra bits here. This gives a threshold of log(2**(75/2)) = 25.99++. ld128 expl() is missing the change corresponding to changing 65 to 75 for the x vs x**2/2 threshold. It still uses 114. After fixing this to 124, the corresponding threshold here is log(2**124/2) = 42.97+. > in s_ccoshl.c I then have > > if (ix < 0x4003 || /* |x| < ~23: normal case */ > (ix == 0x4003 && (uint32_t)(lx >> 32) < 0xb8000000)) > RETURNI(CMPLXL(coshl(x) * c, sinhl(x) * s)); k_hexpl() can't handle this case. (But if you want efficiency or accuracy, then you should implement sinhcoshl() or just use the kernel to evaluate exp(x) only once.) I think this formula works up to 32 or 64 or even up to the overflow threshold. coshl(x) and sinhl() decay to expl(x) starting a bit above 23, but, but these functions can optimize that almost as well as here (they actually don't bother to until x reaches 64). You can use this to eliminate the check of lx. > > /* |x| >= 23, so cosh(x) ~= exp(|x|) / 2 */ > if (ix < 0x400c || > (ix == 0x400c && (uint32_t)(lx >> 32) <= 0xb1700000)) { > /* x < 11356: exp(|x|) won't overflow */ > h = expl(fabsl(x)) / 2; > RETURNI(CMPLXL(h * c, copysignl(h, x) * s)); Use a power of 2 for the upper threshold too if possible. > With |x| < 23, we have 2 function calls for coshl() and sinhl() > as opposed to the 1 function call for expl() in 23 <= x <= 32 range. > I can write the first conditional as > > if (ix < 0x4004) /* |x| < 32: normal case */ > RETURNI(CMPLXL(coshl(x) * c, sinhl(x) * s)); The general formula is indeed unecessarily slow in that range. Also for the more important range [1, 23]. coshl() in the lower range is lo + 0.25/(hi + lo) + hi after h_expl() to get hi and lo. Similarly for sinhl(). These formulas are short enough to repeat in the above, but I think this optimization is excessive. It is better for accuracy after some rearrangement. > This then raises the question on whether we should change the > limit to 32 in the double complex ccosh()? Do you mean from 64 to 32 the non-complex cosh(), or from your current limit to the above? Bruce