From owner-freebsd-numerics@freebsd.org Wed Feb 27 08:42:49 2019 Return-Path: Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 422E71515B9A for ; Wed, 27 Feb 2019 08:42:49 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail104.syd.optusnet.com.au (mail104.syd.optusnet.com.au [211.29.132.246]) by mx1.freebsd.org (Postfix) with ESMTP id 157F26D4DF for ; Wed, 27 Feb 2019 08:42:45 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from [192.168.0.102] (c110-21-101-228.carlnfd1.nsw.optusnet.com.au [110.21.101.228]) by mail104.syd.optusnet.com.au (Postfix) with ESMTPS id 610EB4365DE; Wed, 27 Feb 2019 19:42:42 +1100 (AEDT) Date: Wed, 27 Feb 2019 19:42:41 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Steve Kargl cc: freebsd-numerics@freebsd.org Subject: Re: Implement cexpl In-Reply-To: <20190227020142.GA74833@troutmask.apl.washington.edu> Message-ID: <20190227170706.J907@besplex.bde.org> References: <20190203173056.GA42499@troutmask.apl.washington.edu> <20190227020142.GA74833@troutmask.apl.washington.edu> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.2 cv=FNpr/6gs c=1 sm=1 tr=0 a=PalzARQSbocsUSjMRkwAPg==:117 a=PalzARQSbocsUSjMRkwAPg==:17 a=kj9zAlcOel0A:10 a=z-HYpUQ0eYvp4CnOBZ8A:9 a=CjuIK1q_8ugA:10 X-Rspamd-Queue-Id: 157F26D4DF X-Spamd-Bar: ------ Authentication-Results: mx1.freebsd.org; spf=pass (mx1.freebsd.org: domain of brde@optusnet.com.au designates 211.29.132.246 as permitted sender) smtp.mailfrom=brde@optusnet.com.au X-Spamd-Result: default: False [-6.32 / 15.00]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0]; RCVD_IN_DNSWL_LOW(-0.10)[246.132.29.211.list.dnswl.org : 127.0.5.1]; FROM_HAS_DN(0.00)[]; TO_DN_SOME(0.00)[]; R_SPF_ALLOW(-0.20)[+ip4:211.29.132.0/23]; FREEMAIL_FROM(0.00)[optusnet.com.au]; MIME_GOOD(-0.10)[text/plain]; DMARC_NA(0.00)[optusnet.com.au]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; TO_MATCH_ENVRCPT_SOME(0.00)[]; MX_GOOD(-0.01)[cached: extmail.optusnet.com.au]; RCPT_COUNT_TWO(0.00)[2]; NEURAL_HAM_SHORT(-0.93)[-0.931,0]; IP_SCORE(-3.08)[ip: (-7.92), ipnet: 211.28.0.0/14(-4.14), asn: 4804(-3.30), country: AU(-0.04)]; RCVD_NO_TLS_LAST(0.10)[]; FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[]; FREEMAIL_ENVFROM(0.00)[optusnet.com.au]; ASN(0.00)[asn:4804, ipnet:211.28.0.0/14, country:AU]; MIME_TRACE(0.00)[0:+]; RCVD_COUNT_TWO(0.00)[2] X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 27 Feb 2019 08:42:49 -0000 On Tue, 26 Feb 2019, Steve Kargl wrote: > On Sun, Feb 03, 2019 at 09:30:56AM -0800, Steve Kargl wrote: >> The following patch implements cexpl() for ld80 and ld128 architectures. > ... > Index: lib/msun/src/math_private.h > =================================================================== > --- lib/msun/src/math_private.h (revision 344600) > +++ lib/msun/src/math_private.h (working copy) > @@ -243,6 +243,20 @@ > } while (0) > > /* > + * Get expsign and mantissa as 16 bit and 2 32-bit ints from an 80 bit long > + * double. > + */ > + > +#define EXTRACT_LDBL80_WORDS2(ix0,ix1,ix2,d) \ > +do { \ > + union IEEEl2bits ew_u; \ > + ew_u.e = (d); \ > + (ix0) = ew_u.xbits.expsign; \ > + (ix1) = ew_u.bits.manh; \ > + (ix2) = ew_u.bits.manl; \ > +} while (0) Don't add more of these accessor functions (more were already added to support powl, since powl doesn't use the FreeBSD spelling in APIs). Use existing separate functions and let the compiler combine them. The one corresponding to this is EXTRACT_LDBL80_WORDS(). That extracts the mantissa into a single 64-bit word which is easier to use than the 2 32-bit words given by the above. It also doesn't ask for pessimal extraction from memory 32 bits at a time, but compilers usually combine into a single 64-bit word for the memory access and then split into 2 32-bit words for 32-bit arches. 3 words instead of 2 also enlarges diffs with the double precision case. > Index: lib/msun/src/s_cexp.c > =================================================================== > --- lib/msun/src/s_cexp.c (revision 344600) > +++ lib/msun/src/s_cexp.c (working copy) > @@ -30,6 +30,7 @@ > __FBSDID("$FreeBSD$"); > > #include > +#include > #include > > #include "math_private.h" > @@ -41,12 +42,19 @@ > double complex > cexp(double complex z) > { > - double x, y, exp_x; > + double c, exp_x, s, x, y; > uint32_t hx, hy, lx, ly; > > x = creal(z); > y = cimag(z); > > + EXTRACT_WORDS(hx, lx, x); > + /* cexp(0 + I y) = cos(y) + I sin(y) */ > + if (((hx & 0x7fffffff) | lx) == 0) { > + sincos(y, &s, &c); > + return (CMPLX(c, s)); > + } > + > EXTRACT_WORDS(hy, ly, y); > hy &= 0x7fffffff; > Another reason I don't like sincos*() is that if the compiler can very easily do the above rewrite automatically. The only large difficulty is to know if the sin(), cos() and sincos() are any good, or at least their relative badness. x86 compilers already know that the i387 sin() and cos() are no good, (or perhaps they just don't know that the FreeBSD implementation doesn't make the i387 functions non-conforming to C99 and/or POSIX (since they don't set errno and might do other side effects wrong)), so x86 compilers don't convert the above to i387 sin(), cos() or sincos(). It is even harder for compilers to know the quality of the library. > @@ -53,10 +61,6 @@ > /* cexp(x + I 0) = exp(x) + I 0 */ > if ((hy | ly) == 0) > return (CMPLX(exp(x), y)); The comments aren't as careful as for hyperbolic functions. The 0 here is actually -0 if y is -0. The code handles both +-0 by clearing the sign bit in hy for the check byt keeping it in y. > Index: lib/msun/ld128/k_cexpl.c > ... > +/* > + * __ldexp_exp(x, expt) and __ldexp_cexp(x, expt) compute exp(x) * 2**expt. > + * They are intended for large arguments (real part >= ln(DBL_MAX)) > + * where care is needed to avoid overflow. > + * > + * The present implementation is narrowly tailored for our hyperbolic and > + * exponential functions. We assume expt is small (0 or -1), and the caller > + * has filtered out very large x, for which overflow would be inevitable. > + */ > + > +long double > +__ldexp_expl(long double x, int expt) > +{ > +#if 0 > + double exp_x, scale; > + int ex_expt; > + > + exp_x = __frexp_exp(x, &ex_expt); > + expt += ex_expt; > + INSERT_WORDS(scale, (0x3ff + expt) << 20, 0); > + return (exp_x * scale); > +#endif > + return (x - x) / (x - x); > +} This is less likely to work than the comment says :-). > + > +long double complex > +__ldexp_cexpl(long double complex z, int expt) > +{ > +#if 0 > + double x, y, exp_x, scale1, scale2; > + int ex_expt, half_expt; Same. The code that has a chance of working won't even be tested at compile time since it is ifdefed out. > Index: lib/msun/ld128/s_cexpl.c > =================================================================== > --- lib/msun/ld128/s_cexpl.c (nonexistent) > +++ lib/msun/ld128/s_cexpl.c (working copy) > @@ -0,0 +1,76 @@ > ... > + exp_x = expl(x); > + sincosl(y, &s, &c); > + return (CMPLXL(exp_x * c, exp_x * s)); This doesn't use the kernel, although one is written (but always returns NaN). For testing, the low quality kernel should at least use the above formula. > Index: lib/msun/ld80/s_cexpl.c > =================================================================== > --- lib/msun/ld80/s_cexpl.c (nonexistent) > +++ lib/msun/ld80/s_cexpl.c (working copy) > @@ -0,0 +1,105 @@ > ... > +long double complex > +cexpl (long double complex z) > +{ > + long double c, exp_x, s, x, y; > + uint32_t hwx, hwy, lwx, lwy; > + uint16_t hx, hy; > + > + ENTERI(z); > + > + x = creall(z); > + y = cimagl(z); > + > + EXTRACT_LDBL80_WORDS2(hx, hwx, lwx, x); > + EXTRACT_LDBL80_WORDS2(hy, hwy, lwy, y); This asks for slowness relative to the double precision method, by doing 2 slow extraction steps up front. This seems to be broken by not using the normal method of hx &= 0x7fff up front. > + > + /* cexp(0 + I y) = cos(y) + I sin(y) */ > + if ((hwx | lwx | (hx & 0x7fff)) == 0) { The 2 words for the mantissa are just harder to use. Write hwx | lwx normally as lx after extracting into 1 64-bit word. This handles -0 correctly by masking hx with 0x7fff. > + sincosl(y, &s, &c); > + RETURNI(CMPLXL(c, s)); > + } > + > + /* cexp(x + I 0) = exp(x) + I 0 */ > + if ((hwy | lwy | (hy & 0x7fff)) == 0) > + RETURNI(CMPLXL(expl(x), y)); > + > + if (hy >= 0x7fff) { Bugs from not masking h[xy] with 0x7fff up front start here. This misclassifies all negative y's except -0 as being NaNs. > + if (hx < 0x7fff || > + (hx == 0x7fff && (hwx & 0x7fffffff) != 0)) { Many bugs here: - not masking hx with 0x7fff - not checking the low bits of the mantissa. It is foot shooting to have these bits in a separate variable - not checking the normalization bit. Fix this by copying mostly logl() (expl() is too subtle to copy): lx = from correct extraction ix = hx & 0x7fff; /* up front, as in logl */ ... if (ix < 0x7fff || (hx == 0x7fff && lx != 0x8000000000000000)) The first line classifies finite x, and the second line classifies everyhing else except +-Inf. The other cases are: - normal NaN with top bit set - unnormal NaN with top bit set - unnormal +-Inf with top bit clear. This is slightly wrong too. x87 hardware treats unnormal finite values as NaNs, so this should treat them as NaNs too. Checking the normalization bit first works especially easily here. There are complications for normalized zeros and normalized denormals (sic), but we have already handled the zeros. IIRC, the hardware doesn't treat normalized denomals as NaNs, so we get the correct result by treating them as finite. This gives: ... if ((lx & 0x8000000000000000) == 0 || ix < 0x7fff || (ix == 0x7fff && (lx & 0x7fffffffffffffff) != 0)) > + /* cexp(finite|NaN +- I Inf|NaN) = NaN + I NaN */ > + RETURNI(CMPLXL(y - y, y - y)); > + } else if (hx & 0x8000) { cexp() doesn't mask the sign bit in hx or use ix, so as to test the sign via hx here. It clobbers the sign bit in hy. > + /* cexp(-Inf +- I Inf|NaN) = 0 + I 0 */ > + RETURNI(CMPLXL(0.0L, 0.0L)); Style bug: explcit long double constants. The float case risks using double precision by spelling the constants 0.0, the same as the double case. Plain 0 is a better spelling. There may be further bugs from misclassification of y using hy >= 0x7fff. This is supposed to classify Infs and NaNs. The masking bug makes it misclassify all negative values and -0. The further bugs are that it doesn't classify unnormals (except unnormal denormals) as NaNs. Only ld80 has unnormals. Bruce