From owner-freebsd-numerics@freebsd.org Thu Mar 7 19:54:46 2019 Return-Path: Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 6B45615239FD for ; Thu, 7 Mar 2019 19:54:46 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu [128.95.76.21]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) server-signature RSA-PSS (4096 bits) client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "troutmask", Issuer "troutmask" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 9331F70523 for ; Thu, 7 Mar 2019 19:54:44 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (localhost [127.0.0.1]) by troutmask.apl.washington.edu (8.15.2/8.15.2) with ESMTPS id x27JsbOC021893 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NO); Thu, 7 Mar 2019 11:54:37 -0800 (PST) (envelope-from sgk@troutmask.apl.washington.edu) Received: (from sgk@localhost) by troutmask.apl.washington.edu (8.15.2/8.15.2/Submit) id x27Jsa5Z021892; Thu, 7 Mar 2019 11:54:36 -0800 (PST) (envelope-from sgk) Date: Thu, 7 Mar 2019 11:54:36 -0800 From: Steve Kargl To: Bruce Evans Cc: freebsd-numerics@freebsd.org Subject: Re: Update ENTERI() macro Message-ID: <20190307195436.GA20856@troutmask.apl.washington.edu> Reply-To: sgk@troutmask.apl.washington.edu References: <20190304212159.GA12587@troutmask.apl.washington.edu> <20190305153243.Y1349@besplex.bde.org> <20190306055201.GA40298@troutmask.apl.washington.edu> <20190306225811.P2731@besplex.bde.org> <20190306184829.GA44023@troutmask.apl.washington.edu> <20190307061214.R4911@besplex.bde.org> <20190306214233.GA23159@troutmask.apl.washington.edu> <20190307144315.N932@besplex.bde.org> <20190307044447.GA16298@troutmask.apl.washington.edu> <20190307163958.V1293@besplex.bde.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190307163958.V1293@besplex.bde.org> User-Agent: Mutt/1.11.2 (2019-01-07) X-Rspamd-Queue-Id: 9331F70523 X-Spamd-Bar: / Authentication-Results: mx1.freebsd.org X-Spamd-Result: default: False [0.92 / 15.00]; ARC_NA(0.00)[]; HAS_REPLYTO(0.00)[sgk@troutmask.apl.washington.edu]; FROM_HAS_DN(0.00)[]; TO_DN_SOME(0.00)[]; NEURAL_SPAM_SHORT(0.59)[0.591,0]; NEURAL_HAM_LONG(-0.58)[-0.582,0]; MIME_GOOD(-0.10)[text/plain]; RCVD_TLS_LAST(0.00)[]; DMARC_NA(0.00)[washington.edu]; AUTH_NA(1.00)[]; REPLYTO_ADDR_EQ_FROM(0.00)[]; RCVD_COUNT_THREE(0.00)[3]; TO_MATCH_ENVRCPT_SOME(0.00)[]; RCVD_IN_DNSWL_MED(-0.20)[21.76.95.128.list.dnswl.org : 127.0.11.2]; RCPT_COUNT_TWO(0.00)[2]; MX_GOOD(-0.01)[troutmask.apl.washington.edu]; NEURAL_SPAM_MEDIUM(0.17)[0.169,0]; R_SPF_NA(0.00)[]; FREEMAIL_TO(0.00)[optusnet.com.au]; FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[]; MIME_TRACE(0.00)[0:+]; ASN(0.00)[asn:73, ipnet:128.95.0.0/16, country:US]; MID_RHS_MATCH_FROM(0.00)[]; IP_SCORE(0.05)[ip: (0.10), ipnet: 128.95.0.0/16(0.15), asn: 73(0.05), country: US(-0.07)] X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 07 Mar 2019 19:54:46 -0000 On Thu, Mar 07, 2019 at 05:36:22PM +1100, Bruce Evans wrote: > On Wed, 6 Mar 2019, Steve Kargl wrote: > > > On Thu, Mar 07, 2019 at 03:09:06PM +1100, Bruce Evans wrote: > >> On Wed, 6 Mar 2019, Steve Kargl wrote: > >> ... > >> But does it preserve NaN bits as much as possible, and produce the same > >> NaN bits as double precision after conversion to double precision? The > >> report spells NaN and Inf using C99's fuzzy bad names. Even the C99 > >> specification spells NaN and Inf correctly in Appendix F. > >> > >> The complex hyperbolic functions have better comments, so are almost > >> an easier place to start. > > > > I suppose it preserves the NaN bits as well as cexpf and cexp as > > it uses the same algorthim. > > > ... > > long double complex > > cexpl (long double complex z) > > { > > long double c, exp_x, s, x, y; > > uint64_t lx, ly; > > uint16_t hx, hy, ix, iy; > > > > ENTERI(); > > > > x = creall(z); > > y = cimagl(z); > > > > EXTRACT_LDBL80_WORDS(hx, lx, x); > > EXTRACT_LDBL80_WORDS(hy, ly, y); > > I think you recently changed to this from your new macro. Yes, the new macro is gone. > > > > ix = hx&0x7fff; > > iy = hy&0x7fff; > > Some style bugs. s_cexp.c doesn't use the fdlibm'ish style of sometimes > no spaces around '&'. It puts the ix and iy calculations immediately > after the extractions. They shouldn't be separated since they are > also extractions. c_cexp.c extracts in a different order, and doesn't > use iy. You reordered it a bit to use sincos(). I don't like the > reordering much. But old s_cexp.c has its own style bug of a separating > the extraction for y but not even following its usual style of a blank > line before comments near the extraction of x. I may unorder the re-ordering to the old order. If I understand one of the comments in the GCC bugzilla report, the optimization of converting a = cos(x); b = sin(x): to sincos(x, &b, &a). Actually, does "a = cos(x); b = sin(x)" goes to cexp(cmplx(0,x)) in the middle-end of the compiler. The backend then can recognize cexp(cmplx(0,x)) and converts this to sincos(x, &b, &a). I can't verify this for FreeBSD, because libm isn't C99 compliant. GCC requires a C99 compliant to activate the optimization. I fixed the extraction for hx, and hy. > > /* cexp(0 + I y) = cos(y) + I sin(y) */ > > if ((ix | lx) == 0) { > > sincosl(y, &s, &c); > > RETURNI(CMPLXL(c, s)); > > } > > > > /* cexp(x + I 0) = exp(x) + I 0 */ > > if ((iy | ly) == 0) > > RETURNI(CMPLXL(expl(x), y)); > > > > if (iy == 0x7fff) { > > s_cexp.c actually uses a clobbered hy here. Perhaps fix this too (compilers > mostly ignore program order for initializing variables like iy and optimize > to an expression involving the unclobbered hy here if that is optimal. The > extra variables are just for readability or unreadability). > > > if (ix < 0x7fff || > > (ix == 0x7fff && (lx & 0x7fffffffffffffffULL) != 0)) { > > /* cexp(finite|NaN +- I Inf|NaN) = NaN + I NaN */ > > RETURNI(CMPLXL(y - y, y - y)); > > Looks OK. > > I think ix and iy are only used here, so it is clearer to not have > variables for them -- just use (hx & 0x7fff), etc. > > y - y and similar expressions usually give the right NaN (y = Inf -> dNaN > and y = NaN -> d(y)). > > IIRC, the only differences for ld128 is that then normalization bit is > hidden so masking out the top bit in the mantissa is not needed, but the > mantissa is in 2 64-bit words. > > > } else if (hx & 0x8000) { > > This is clearest as it is with an explicit mask. > > > /* > > * exp_ovfl = 11356.5234062941439497, 0x400c, 0xb17217f7d1cf79ac > > * cexp_ovfl = 22755.3287906024445633, 0x400d, 0xb1c6a8573de9768c > > */ > > if ((hx == 0x400c && lx > 0xb17217f7d1cf79acULL) || > > (hx == 0x400d && lx < 0xb1c6a8573de9768cULL)) { > > Don't duplicate the hex constants in the comment. The limits can be fuzzy > so don't need so many decimal digits or nonzero nonzero bits or LD80C() > to round them (s_cexp.c uses only 1 decimal digit and doesn't test the > lower 32 bits in the mantissa; here we have more bits than we need in lx). Being fuzzy was the movitation for the new macro, I was only using the upper 32 bits of lx in the original cexpl() I posted. > Maybe not use bit-fiddling at all for this (in other precisions too). > The more important exp() functions check thresholds in FP. Here x can > be NaN so FP comparisons with it don't work right. That seems to be > the only reason to check in bits here. > > > /* > > * x is between exp_ovfl and cexp_ovfl, so we must scale to > > * avoid overflow in exp(x). > > */ > > RETURNI(__ldexp_cexpl(z, 1)); > > Did your tests cover this case, and does it use the fixed version of the > old __ldexp_cexpl() in k_expl.h? No. The test I reported was restricted to -11350 < x < 11350. Spot checked x outside that range. I had the conditional wrong for x < -exp_ovfl. Also, the comment is actaully a little optimistic. One cannot avoid overflow. For example, x = 11357 is just above exp_ovfl, and exp(x) = inf. __ldexp_cexpl() does scaling and exploits |cos(y)| <= 1 and |sin(y)| <= 1 to try to extend the range of cexpl(). ./testl -a 11357 0.1 libm: Re(cexpf) = inf mpfr: Re(cexpf) = 1.90658369228334884925e+4932 libm: Im(cexpf) = 1.91296449568717353572e+4931 mpfr: Im(cexpf) = 1.91296449568717353558e+4931 troutmask:kargl[430] ./testl -a 11357 1.57 libm: Re(cexpf) = 1.52588659766018377153e+4929 mpfr: Re(cexpf) = 1.52588659766018377147e+4929 libm: Im(cexpf) = inf mpfr: Im(cexpf) = 1.91615588587371861485e+4932 > > > } else { > > /* > > * Cases covered here: > > * - x < exp_ovfl and exp(x) won't overflow (common case) > > * - x > cexp_ovfl, so exp(x) * s overflows for all s > 0 > > * - x = +-Inf (generated by exp()) > > * - x = NaN (spurious inexact exception from y) > > */ > > exp_x = expl(x); > > sincosl(y, &s, &c); > > RETURNI(CMPLXL(exp_x * c, exp_x * s)); > > } > > } > > I think the last clause can be simplfied and even optimized by using the > kernel in all cases: > - the kernel produces exp_x in the form kexp_x * 2**k where kexp_x is near > 1. Just multiply kexp_x by c and s, and later multiply by 2**k. This > cost an extra multiplication by 2**k (for the real and imaginary parts > separately), but avoids other work so may be faster. > - hopefully kexp_x is actually >= 1 (and < 2). Otherwise we have to > worry about spurious underflow > - the above method seems to avoid spurious underflow automatically, and > doesn't even mention this (multiplying by exp_x may underflow, but not > spuriously since c and s are <= 1). You might be right about using the kernel. I'll leave that to others on freebsd-numerics@; although you and I seem to be the only subscribers. -- Steve