From owner-freebsd-numerics@freebsd.org Tue Feb 26 19:18:29 2019 Return-Path: Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 1D1ED15008AB for ; Tue, 26 Feb 2019 19:18:29 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu [128.95.76.21]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) server-signature RSA-PSS (4096 bits) client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "troutmask", Issuer "troutmask" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 4DD366EE8F for ; Tue, 26 Feb 2019 19:18:27 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (localhost [127.0.0.1]) by troutmask.apl.washington.edu (8.15.2/8.15.2) with ESMTPS id x1QJIP5H068514 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NO) for ; Tue, 26 Feb 2019 11:18:25 -0800 (PST) (envelope-from sgk@troutmask.apl.washington.edu) Received: (from sgk@localhost) by troutmask.apl.washington.edu (8.15.2/8.15.2/Submit) id x1QJIPcN068513 for freebsd-numerics@freebsd.org; Tue, 26 Feb 2019 11:18:25 -0800 (PST) (envelope-from sgk) Date: Tue, 26 Feb 2019 11:18:25 -0800 From: Steve Kargl To: freebsd-numerics@freebsd.org Subject: Update ENTERI() macro Message-ID: <20190226191825.GA68479@troutmask.apl.washington.edu> Reply-To: sgk@troutmask.apl.washington.edu MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.11.2 (2019-01-07) X-Rspamd-Queue-Id: 4DD366EE8F X-Spamd-Bar: + Authentication-Results: mx1.freebsd.org X-Spamd-Result: default: False [1.35 / 15.00]; ARC_NA(0.00)[]; HAS_REPLYTO(0.00)[sgk@troutmask.apl.washington.edu]; FROM_HAS_DN(0.00)[]; TO_MATCH_ENVRCPT_ALL(0.00)[]; NEURAL_HAM_LONG(-0.29)[-0.288,0]; MIME_GOOD(-0.10)[text/plain]; PREVIOUSLY_DELIVERED(0.00)[freebsd-numerics@freebsd.org]; TO_DN_NONE(0.00)[]; AUTH_NA(1.00)[]; RCPT_COUNT_ONE(0.00)[1]; RCVD_COUNT_THREE(0.00)[3]; RCVD_TLS_LAST(0.00)[]; NEURAL_SPAM_SHORT(0.42)[0.417,0]; RCVD_IN_DNSWL_MED(-0.20)[21.76.95.128.list.dnswl.org : 127.0.11.2]; MX_GOOD(-0.01)[cached: troutmask.apl.washington.edu]; DMARC_NA(0.00)[washington.edu]; NEURAL_SPAM_MEDIUM(0.47)[0.475,0]; R_SPF_NA(0.00)[]; REPLYTO_ADDR_EQ_FROM(0.00)[]; FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[]; MIME_TRACE(0.00)[0:+]; ASN(0.00)[asn:73, ipnet:128.95.0.0/16, country:US]; MID_RHS_MATCH_FROM(0.00)[]; IP_SCORE(0.05)[ip: (0.11), ipnet: 128.95.0.0/16(0.16), asn: 73(0.06), country: US(-0.07)] X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 26 Feb 2019 19:18:29 -0000 I have submitted https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=236063 Update the ENTERI() macro in math_private.h to take a parameter. This parameter is used by __typeof() to set the type for an intermediate variable that holds the return value of a function on i386 FreeBSD. The patch updates the consumers of ENTERI(). --- /data/kargl/freebsd/src/lib/msun/src/e_acoshl.c 2013-07-30 10:32:47.000000000 -0700 +++ ./src/e_acoshl.c 2019-02-25 16:36:08.532381000 -0800 @@ -68,7 +68,7 @@ long double t; int16_t hx; - ENTERI(); + ENTERI(x); GET_LDBL_EXPSIGN(hx, x); if (hx < 0x3fff) { /* x < 1, or misnormal */ RETURNI((x-x)/(x-x)); --- /data/kargl/freebsd/src/lib/msun/src/e_atanhl.c 2013-07-30 10:32:47.000000000 -0700 +++ ./src/e_atanhl.c 2019-02-25 16:36:24.414626000 -0800 @@ -57,7 +57,7 @@ long double t; uint16_t hx, ix; - ENTERI(); + ENTERI(x); GET_LDBL_EXPSIGN(hx, x); ix = hx & 0x7fff; if (ix >= 0x3fff) /* |x| >= 1, or NaN or misnormal */ --- /data/kargl/freebsd/src/lib/msun/src/e_coshl.c 2016-10-10 14:48:39.000000000 -0700 +++ ./src/e_coshl.c 2019-02-25 16:37:03.670060000 -0800 @@ -97,7 +97,7 @@ /* x is INF or NaN */ if(ix>=0x7fff) return x*x; - ENTERI(); + ENTERI(x); /* |x| < 1, return 1 or c(x) */ if(ix<0x3fff) { --- /data/kargl/freebsd/src/lib/msun/src/e_sinhl.c 2016-10-10 14:48:39.000000000 -0700 +++ ./src/e_sinhl.c 2019-02-25 16:37:14.623690000 -0800 @@ -97,7 +97,7 @@ /* x is INF or NaN */ if(ix>=0x7fff) return x+x; - ENTERI(); + ENTERI(x); s = 1; if (jx<0) s = -1; --- /data/kargl/freebsd/src/lib/msun/src/math_private.h 2018-10-31 16:15:21.809877000 -0700 +++ ./src/math_private.h 2019-02-25 16:54:26.897247000 -0800 @@ -335,15 +348,14 @@ /* Support switching the mode to FP_PE if necessary. */ #if defined(__i386__) && !defined(NO_FPSETPREC) -#define ENTERI() ENTERIT(long double) -#define ENTERIT(returntype) \ - returntype __retval; \ +#define ENTERI(a) \ + __typeof(a) __retval; \ fp_prec_t __oprec; \ \ if ((__oprec = fpgetprec()) != FP_PE) \ fpsetprec(FP_PE) -#define RETURNI(x) do { \ - __retval = (x); \ +#define RETURNI(a) do { \ + __retval = (a); \ if (__oprec != FP_PE) \ fpsetprec(__oprec); \ RETURNF(__retval); \ @@ -359,9 +371,8 @@ return; \ } while (0) #else -#define ENTERI() -#define ENTERIT(x) -#define RETURNI(x) RETURNF(x) +#define ENTERI(a) +#define RETURNI(a) RETURNF(a) #define ENTERV() #define RETURNV() return #endif --- /data/kargl/freebsd/src/lib/msun/src/s_asinhl.c 2013-07-30 10:32:47.000000000 -0700 +++ ./src/s_asinhl.c 2019-02-25 16:37:44.264923000 -0800 @@ -71,7 +71,7 @@ long double t, w; uint16_t hx, ix; - ENTERI(); + ENTERI(x); GET_LDBL_EXPSIGN(hx, x); ix = hx & 0x7fff; if (ix >= 0x7fff) RETURNI(x+x); /* x is inf, NaN or misnormal */ --- /data/kargl/freebsd/src/lib/msun/src/s_cbrtl.c 2012-07-30 14:58:28.000000000 -0700 +++ ./src/s_cbrtl.c 2019-02-25 16:43:03.606703000 -0800 @@ -53,7 +53,7 @@ if (k == BIAS + LDBL_MAX_EXP) return (x + x); - ENTERI(); + ENTERI(x); if (k == 0) { /* If x = +-0, then cbrt(x) = +-0. */ if ((u.bits.manh | u.bits.manl) == 0) --- /data/kargl/freebsd/src/lib/msun/src/s_clogl.c 2019-02-26 11:00:53.735653000 -0800 +++ ./src/s_clogl.c 2019-02-25 16:43:19.194615000 -0800 @@ -65,7 +65,7 @@ uint16_t hax, hay; int kx, ky; - ENTERIT(long double complex); + ENTERI(z); x = creall(z); y = cimagl(z); --- /data/kargl/freebsd/src/lib/msun/src/s_cosl.c 2017-12-01 13:29:36.000000000 -0800 +++ ./src/s_cosl.c 2019-02-25 16:43:29.546277000 -0800 @@ -68,7 +68,7 @@ if (z.bits.exp == 32767) return ((x - x) / (x - x)); - ENTERI(); + ENTERI(x); /* Optimize the case where x is already within range. */ if (z.e < M_PI_4) --- /data/kargl/freebsd/src/lib/msun/src/s_roundl.c 2017-12-01 13:29:36.000000000 -0800 +++ ./src/s_roundl.c 2019-02-25 16:43:42.814592000 -0800 @@ -48,7 +48,7 @@ if ((hx & 0x7fff) == 0x7fff) return (x + x); - ENTERI(); + ENTERI(x); if (!(hx & 0x8000)) { t = floorl(x); --- /data/kargl/freebsd/src/lib/msun/src/s_sinl.c 2017-12-01 13:29:36.000000000 -0800 +++ ./src/s_sinl.c 2019-02-25 16:43:49.574030000 -0800 @@ -64,7 +64,7 @@ if (z.bits.exp == 32767) return ((x - x) / (x - x)); - ENTERI(); + ENTERI(x); /* Optimize the case where x is already within range. */ if (z.e < M_PI_4) { --- /data/kargl/freebsd/src/lib/msun/src/s_tanhl.c 2016-10-10 14:48:39.000000000 -0700 +++ ./src/s_tanhl.c 2019-02-25 16:44:10.604476000 -0800 @@ -127,7 +127,7 @@ else return one/x-one; /* tanh(NaN) = NaN */ } - ENTERI(); + ENTERI(x); /* |x| < 40 */ if (ix < 0x4004 || fabsl(x) < 40) { /* |x|<40 */ --- /data/kargl/freebsd/src/lib/msun/src/s_tanl.c 2017-12-01 13:29:36.000000000 -0800 +++ ./src/s_tanl.c 2019-02-25 16:44:21.348406000 -0800 @@ -70,7 +70,7 @@ if (z.bits.exp == 32767) return ((x - x) / (x - x)); - ENTERI(); + ENTERI(x); /* Optimize the case where x is already within range. */ if (z.e < M_PI_4) { --- /data/kargl/freebsd/src/lib/msun/ld80/e_lgammal_r.c 2016-10-10 14:48:38.000000000 -0700 +++ ./ld80/e_lgammal_r.c 2019-02-26 11:04:45.992781000 -0800 @@ -261,7 +261,7 @@ ix = hx&0x7fff; if(ix==0x7fff) return x*x; - ENTERI(); + ENTERI(x); /* purge +-0 and tiny arguments */ *signgamp = 1-2*(hx>>15); --- /data/kargl/freebsd/src/lib/msun/ld80/s_erfl.c 2014-07-13 10:05:05.000000000 -0700 +++ ./ld80/s_erfl.c 2019-02-26 11:05:35.163816000 -0800 @@ -231,7 +231,7 @@ return (1-i)+one/x; /* erfl(+-inf)=+-1 */ } - ENTERI(); + ENTERI(x); ax = fabsl(x); if(ax < 0.84375) { @@ -284,7 +284,7 @@ return ((hx>>15)<<1)+one/x; } - ENTERI(); + ENTERI(x); ax = fabsl(x); if(ax < 0.84375L) { --- /data/kargl/freebsd/src/lib/msun/ld80/s_exp2l.c 2017-12-01 13:29:35.000000000 -0800 +++ ./ld80/s_exp2l.c 2019-02-25 16:31:42.300930000 -0800 @@ -242,7 +242,7 @@ return (1.0L + x); /* 1 with inexact */ } - ENTERI(); + ENTERI(x); /* * Reduce x, computing z, i0, and k. The low bits of x + redux --- /data/kargl/freebsd/src/lib/msun/ld80/s_expl.c 2018-10-31 16:15:21.293646000 -0700 +++ ./ld80/s_expl.c 2019-02-25 16:32:05.335182000 -0800 @@ -97,7 +97,7 @@ RETURN2P(1, x); /* 1 with inexact iff x != 0 */ } - ENTERI(); + ENTERI(x); twopk = 1; __k_expl(x, &hi, &lo, &k); @@ -193,7 +193,7 @@ RETURN2P(tiny, -1); /* good for x < -65ln2 - eps */ } - ENTERI(); + ENTERI(x); if (T1 < x && x < T2) { if (ix < BIAS - 74) { /* |x| < 0x1p-74 (includes pseudos) */ --- /data/kargl/freebsd/src/lib/msun/ld80/s_logl.c 2017-12-01 13:29:35.000000000 -0800 +++ ./ld80/s_logl.c 2019-02-25 16:34:17.938768000 -0800 @@ -481,7 +481,7 @@ /* log(pseudo-NaN) = qNaN */ /* log(unnormal) = qNaN */ #ifndef STRUCT_RETURN - ENTERI(); + ENTERI(x); #endif k += hx; ix = lx & 0x7fffffffffffffffULL; @@ -588,7 +588,7 @@ f_hi = x; f_lo = 0; /* avoid underflow of the P5 term */ } - ENTERI(); + ENTERI(x); x = f_hi + f_lo; f_lo = (f_hi - x) + f_lo; @@ -668,7 +668,7 @@ { struct ld r; - ENTERI(); + ENTERI(x); DOPRINT_START(&x); k_logl(x, &r); RETURNSPI(&r); @@ -686,7 +686,7 @@ struct ld r; long double hi, lo; - ENTERI(); + ENTERI(x); DOPRINT_START(&x); k_logl(x, &r); if (!r.lo_set) @@ -704,7 +704,7 @@ struct ld r; long double hi, lo; - ENTERI(); + ENTERI(x); DOPRINT_START(&x); k_logl(x, &r); if (!r.lo_set) --- /data/kargl/freebsd/src/lib/msun/ld128/s_expl.c 2018-10-31 16:15:21.240218000 -0700 +++ ./ld128/s_expl.c 2019-02-25 16:28:16.950376000 -0800 @@ -85,7 +85,7 @@ RETURN2P(1, x); /* 1 with inexact iff x != 0 */ } - ENTERI(); + ENTERI(x); twopk = 1; __k_expl(x, &hi, &lo, &k); @@ -232,7 +232,7 @@ RETURN2P(tiny, -1); /* good for x < -114ln2 - eps */ } - ENTERI(); + ENTERI(x); if (T1 < x && x < T2) { x2 = x * x; --- /data/kargl/freebsd/src/lib/msun/ld128/s_logl.c 2017-12-01 13:29:34.000000000 -0800 +++ ./ld128/s_logl.c 2019-02-25 16:30:18.608573000 -0800 @@ -478,7 +478,7 @@ } else if (hx >= 0x7fff) RETURN1(rp, x + x); /* log(Inf or NaN) = Inf or qNaN */ #ifndef STRUCT_RETURN - ENTERI(); + ENTERI(x); #endif k += hx; dk = k; @@ -597,7 +597,7 @@ f_hi = x; f_lo = 0; /* avoid underflow of the P3 term */ } - ENTERI(); + ENTERI(x); x = f_hi + f_lo; f_lo = (f_hi - x) + f_lo; @@ -678,7 +678,7 @@ { struct ld r; - ENTERI(); + ENTERI(x); DOPRINT_START(&x); k_logl(x, &r); RETURNSPI(&r); @@ -705,7 +705,7 @@ long double lo; float hi; - ENTERI(); + ENTERI(x); DOPRINT_START(&x); k_logl(x, &r); if (!r.lo_set) @@ -724,7 +724,7 @@ long double lo; float hi; - ENTERI(); + ENTERI(x); DOPRINT_START(&x); k_logl(x, &r); if (!r.lo_set) -- Steve From owner-freebsd-numerics@freebsd.org Wed Feb 27 02:01:46 2019 Return-Path: Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 94E81150BD45 for ; Wed, 27 Feb 2019 02:01:46 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu [128.95.76.21]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) server-signature RSA-PSS (4096 bits) client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "troutmask", Issuer "troutmask" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 421CC86A1E for ; Wed, 27 Feb 2019 02:01:45 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (localhost [127.0.0.1]) by troutmask.apl.washington.edu (8.15.2/8.15.2) with ESMTPS id x1R21gEa075080 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NO) for ; Tue, 26 Feb 2019 18:01:42 -0800 (PST) (envelope-from sgk@troutmask.apl.washington.edu) Received: (from sgk@localhost) by troutmask.apl.washington.edu (8.15.2/8.15.2/Submit) id x1R21g2g075079 for freebsd-numerics@freebsd.org; Tue, 26 Feb 2019 18:01:42 -0800 (PST) (envelope-from sgk) Date: Tue, 26 Feb 2019 18:01:42 -0800 From: Steve Kargl To: freebsd-numerics@freebsd.org Subject: Re: Implement cexpl Message-ID: <20190227020142.GA74833@troutmask.apl.washington.edu> Reply-To: sgk@troutmask.apl.washington.edu References: <20190203173056.GA42499@troutmask.apl.washington.edu> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190203173056.GA42499@troutmask.apl.washington.edu> User-Agent: Mutt/1.11.2 (2019-01-07) X-Rspamd-Queue-Id: 421CC86A1E X-Spamd-Bar: / Authentication-Results: mx1.freebsd.org X-Spamd-Result: default: False [0.18 / 15.00]; ARC_NA(0.00)[]; HAS_REPLYTO(0.00)[sgk@troutmask.apl.washington.edu]; FROM_HAS_DN(0.00)[]; TO_MATCH_ENVRCPT_ALL(0.00)[]; NEURAL_HAM_LONG(-0.25)[-0.254,0]; MIME_GOOD(-0.10)[text/plain]; PREVIOUSLY_DELIVERED(0.00)[freebsd-numerics@freebsd.org]; TO_DN_NONE(0.00)[]; AUTH_NA(1.00)[]; RCPT_COUNT_ONE(0.00)[1]; RCVD_COUNT_THREE(0.00)[3]; RCVD_TLS_LAST(0.00)[]; RCVD_IN_DNSWL_MED(-0.20)[21.76.95.128.list.dnswl.org : 127.0.11.2]; MX_GOOD(-0.01)[cached: troutmask.apl.washington.edu]; DMARC_NA(0.00)[washington.edu]; NEURAL_HAM_SHORT(-0.79)[-0.790,0]; NEURAL_SPAM_MEDIUM(0.48)[0.483,0]; R_SPF_NA(0.00)[]; REPLYTO_ADDR_EQ_FROM(0.00)[]; FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[]; MIME_TRACE(0.00)[0:+]; ASN(0.00)[asn:73, ipnet:128.95.0.0/16, country:US]; MID_RHS_MATCH_FROM(0.00)[]; IP_SCORE(0.05)[ip: (0.11), ipnet: 128.95.0.0/16(0.16), asn: 73(0.06), country: US(-0.07)] X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 27 Feb 2019 02:01:46 -0000 On Sun, Feb 03, 2019 at 09:30:56AM -0800, Steve Kargl wrote: > The following patch implements cexpl() for ld80 and ld128 architectures. > > The kernel files for large x are named ld{80,128}/k_cexpl.c to avoid > confusion with the ld{80,128}/k_expl.h files. > > I do not have access to ld128, so the ld128 does not use bit twiddling. > I cannot test the ld128 implementation, but I believe that it works > (for some definition of work). I have attached a new patch to https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=235413 This new patch incorporates changes to cexp and cexpf to compute the z = x + I y with x == 0 case first. This is in preparation for fixing constanting folding with GCC. The implementation for ld128/*_cexpl.c have not been compiled, and hence, are untested. I don't have ld128 hardware. The specific changes are * lib/msun/src/math_private.h: . Add an EXTRACT_LDBL80_WORDS2() macro to get access to the high and low word of a 64-bit significand as well as the expsign. . Add prototype for __ldexp_expl(). . Add prototype for __ldexp_cexpl(). * lib/msun/src/s_cexp.c: . A float.h to get LDBL_MANT_DIG. . Add c and s to declaration, and sort. . Move x = 0 case to be the first case tested. This is in preparation for fixing constanting folding in GCC. . Use sincos() instead of a call to sin() and to cos(). . A week_refrence for LDBL_MANT_DIG == 53. * lib/msun/src/s_cexpf.c: . Add c and s to declaration, and sort. . Move x = 0 case to be the first case tested. This is in preparation for fixing constanting folding in GCC. . Use sincosf() instead of a call to sinf() and to cosf(). * lib/msun/src/k_exp.c: . Add c and s to declaration, and sort. . Use sincos() instead of a call to sin() and to cos(). * lib/msun/src/k_expf.c . Add c and s to declaration, and sort. . Use sincosf() instead of a call to sinf() and to cosf(). * lib/msun/ld128/k_cexpl.c: . Copy src/k_exp.c to here. #if 0 ... #endif all code and have functions return NaN. These functions are currently unused, and await someone who cares. . Issue a compile-time warning about the sloppiness. * lib/msun/ld128/s_cexpl.c: . Copy src/s_cexp.c to here. . Convert "double complex" to "long double complex" without use of bit-twiddling. . Add compile-time warning about the sloppiness. . Add run-time warning about the sloppiness. * lib/msun/ld80/k_cexpl.c: . Copy src/k_exp.c to here. . Convert "double complex" to "long double complex". Use bit-twiddling. * lib/msun/ld80/s_cexpl.c: . Copy src/s_cexp.c to here. . Convert "double complex" to "long double complex". Use bit-twiddling where bits are grabbed with new EXTRACT_LDBL80_WORDS2() macro. * lib/msun/man/cexp.3: . Document the addtion of cexpl. * include/complex.h: . Add prototype for cexpl(). Index: lib/msun/src/math_private.h =================================================================== --- lib/msun/src/math_private.h (revision 344600) +++ lib/msun/src/math_private.h (working copy) @@ -243,6 +243,20 @@ } while (0) /* + * Get expsign and mantissa as 16 bit and 2 32-bit ints from an 80 bit long + * double. + */ + +#define EXTRACT_LDBL80_WORDS2(ix0,ix1,ix2,d) \ +do { \ + union IEEEl2bits ew_u; \ + ew_u.e = (d); \ + (ix0) = ew_u.xbits.expsign; \ + (ix1) = ew_u.bits.manh; \ + (ix2) = ew_u.bits.manl; \ +} while (0) + +/* * Get expsign and mantissa as one 16 bit and two 64 bit ints from a 128 bit * long double. */ @@ -920,5 +934,9 @@ long double __kernel_sinl(long double, long double, int); long double __kernel_cosl(long double, long double); long double __kernel_tanl(long double, long double, int); +long double __ldexp_expl(long double, int); +#ifdef _COMPLEX_H +long double complex __ldexp_cexpl(long double complex, int); +#endif #endif /* !_MATH_PRIVATE_H_ */ Index: lib/msun/src/s_cexp.c =================================================================== --- lib/msun/src/s_cexp.c (revision 344600) +++ lib/msun/src/s_cexp.c (working copy) @@ -30,6 +30,7 @@ __FBSDID("$FreeBSD$"); #include +#include #include #include "math_private.h" @@ -41,12 +42,19 @@ double complex cexp(double complex z) { - double x, y, exp_x; + double c, exp_x, s, x, y; uint32_t hx, hy, lx, ly; x = creal(z); y = cimag(z); + EXTRACT_WORDS(hx, lx, x); + /* cexp(0 + I y) = cos(y) + I sin(y) */ + if (((hx & 0x7fffffff) | lx) == 0) { + sincos(y, &s, &c); + return (CMPLX(c, s)); + } + EXTRACT_WORDS(hy, ly, y); hy &= 0x7fffffff; @@ -53,10 +61,6 @@ /* cexp(x + I 0) = exp(x) + I 0 */ if ((hy | ly) == 0) return (CMPLX(exp(x), y)); - EXTRACT_WORDS(hx, lx, x); - /* cexp(0 + I y) = cos(y) + I sin(y) */ - if (((hx & 0x7fffffff) | lx) == 0) - return (CMPLX(cos(y), sin(y))); if (hy >= 0x7ff00000) { if (lx != 0 || (hx & 0x7fffffff) != 0x7ff00000) { @@ -86,6 +90,11 @@ * - x = NaN (spurious inexact exception from y) */ exp_x = exp(x); - return (CMPLX(exp_x * cos(y), exp_x * sin(y))); + sincos(y, &s, &c); + return (CMPLX(exp_x * c, exp_x * s)); } } + +#if (LDBL_MANT_DIG == 53) +__weak_reference(cexp, cexpl); +#endif Index: lib/msun/src/s_cexpf.c =================================================================== --- lib/msun/src/s_cexpf.c (revision 344600) +++ lib/msun/src/s_cexpf.c (working copy) @@ -41,12 +41,19 @@ float complex cexpf(float complex z) { - float x, y, exp_x; + float c, exp_x, s, x, y; uint32_t hx, hy; x = crealf(z); y = cimagf(z); + GET_FLOAT_WORD(hx, x); + /* cexp(0 + I y) = cos(y) + I sin(y) */ + if ((hx & 0x7fffffff) == 0) { + sincosf(y, &s, &c); + return (CMPLXF(c, s)); + } + GET_FLOAT_WORD(hy, y); hy &= 0x7fffffff; @@ -53,10 +60,6 @@ /* cexp(x + I 0) = exp(x) + I 0 */ if (hy == 0) return (CMPLXF(expf(x), y)); - GET_FLOAT_WORD(hx, x); - /* cexp(0 + I y) = cos(y) + I sin(y) */ - if ((hx & 0x7fffffff) == 0) - return (CMPLXF(cosf(y), sinf(y))); if (hy >= 0x7f800000) { if ((hx & 0x7fffffff) != 0x7f800000) { @@ -86,6 +89,7 @@ * - x = NaN (spurious inexact exception from y) */ exp_x = expf(x); - return (CMPLXF(exp_x * cosf(y), exp_x * sinf(y))); + sincosf(y, &s, &c); + return (CMPLXF(exp_x * c, exp_x * s)); } } Index: lib/msun/src/k_exp.c =================================================================== --- lib/msun/src/k_exp.c (revision 344600) +++ lib/msun/src/k_exp.c (working copy) @@ -88,7 +88,7 @@ double complex __ldexp_cexp(double complex z, int expt) { - double x, y, exp_x, scale1, scale2; + double c, exp_x, s, scale1, scale2, x, y; int ex_expt, half_expt; x = creal(z); @@ -105,6 +105,7 @@ half_expt = expt - half_expt; INSERT_WORDS(scale2, (0x3ff + half_expt) << 20, 0); - return (CMPLX(cos(y) * exp_x * scale1 * scale2, - sin(y) * exp_x * scale1 * scale2)); + sincos(y, &s, &c); + return (CMPLX(c * exp_x * scale1 * scale2, + s * exp_x * scale1 * scale2)); } Index: lib/msun/src/k_expf.c =================================================================== --- lib/msun/src/k_expf.c (revision 344600) +++ lib/msun/src/k_expf.c (working copy) @@ -71,7 +71,7 @@ float complex __ldexp_cexpf(float complex z, int expt) { - float x, y, exp_x, scale1, scale2; + float c, exp_x, s, scale1, scale2, x, y; int ex_expt, half_expt; x = crealf(z); @@ -84,6 +84,7 @@ half_expt = expt - half_expt; SET_FLOAT_WORD(scale2, (0x7f + half_expt) << 23); - return (CMPLXF(cosf(y) * exp_x * scale1 * scale2, - sinf(y) * exp_x * scale1 * scale2)); + sincosf(y, &s, &c); + return (CMPLXF(c * exp_x * scale1 * scale2, + s * exp_x * scale1 * scale2)); } Index: lib/msun/ld128/k_cexpl.c =================================================================== --- lib/msun/ld128/k_cexpl.c (nonexistent) +++ lib/msun/ld128/k_cexpl.c (working copy) @@ -0,0 +1,126 @@ +/*- + * SPDX-License-Identifier: BSD-2-Clause-FreeBSD + * + * Copyright (c) 2011 David Schultz + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + */ + +#include +__FBSDID("$FreeBSD$"); + +#include + +#include "math.h" +#include "math_private.h" + +#warning These functions are broken on ld128 architectures. +#warning Functions defined here are currently unused. +#warning Someone who cares needs to convert src/k_exp.c. + +#if 0 +static const uint32_t k = 1799; /* constant for reduction */ +static const double kln2 = 1246.97177782734161156; /* k * ln2 */ +#endif + +/* + * Compute exp(x), scaled to avoid spurious overflow. An exponent is + * returned separately in 'expt'. + * + * Input: ln(DBL_MAX) <= x < ln(2 * DBL_MAX / DBL_MIN_DENORM) ~= 1454.91 + * Output: 2**1023 <= y < 2**1024 + */ +static long double +__frexp_expl(long double x, int *expt) +{ +#if 0 + double exp_x; + uint32_t hx; + + /* + * We use exp(x) = exp(x - kln2) * 2**k, carefully chosen to + * minimize |exp(kln2) - 2**k|. We also scale the exponent of + * exp_x to MAX_EXP so that the result can be multiplied by + * a tiny number without losing accuracy due to denormalization. + */ + exp_x = exp(x - kln2); + GET_HIGH_WORD(hx, exp_x); + *expt = (hx >> 20) - (0x3ff + 1023) + k; + SET_HIGH_WORD(exp_x, (hx & 0xfffff) | ((0x3ff + 1023) << 20)); + return (exp_x); +#endif + return (x - x) / (x - x); +} + +/* + * __ldexp_exp(x, expt) and __ldexp_cexp(x, expt) compute exp(x) * 2**expt. + * They are intended for large arguments (real part >= ln(DBL_MAX)) + * where care is needed to avoid overflow. + * + * The present implementation is narrowly tailored for our hyperbolic and + * exponential functions. We assume expt is small (0 or -1), and the caller + * has filtered out very large x, for which overflow would be inevitable. + */ + +long double +__ldexp_expl(long double x, int expt) +{ +#if 0 + double exp_x, scale; + int ex_expt; + + exp_x = __frexp_exp(x, &ex_expt); + expt += ex_expt; + INSERT_WORDS(scale, (0x3ff + expt) << 20, 0); + return (exp_x * scale); +#endif + return (x - x) / (x - x); +} + +long double complex +__ldexp_cexpl(long double complex z, int expt) +{ +#if 0 + double x, y, exp_x, scale1, scale2; + int ex_expt, half_expt; + + x = creal(z); + y = cimag(z); + exp_x = __frexp_exp(x, &ex_expt); + expt += ex_expt; + + /* + * Arrange so that scale1 * scale2 == 2**expt. We use this to + * compensate for scalbn being horrendously slow. + */ + half_expt = expt / 2; + INSERT_WORDS(scale1, (0x3ff + half_expt) << 20, 0); + half_expt = expt - half_expt; + INSERT_WORDS(scale2, (0x3ff + half_expt) << 20, 0); + + sincos(y, &s, &c); + return (CMPLX(c) * exp_x * scale1 * scale2, + s * exp_x * scale1 * scale2)); +#endif + return (x - x) / (x - x); +} Property changes on: lib/msun/ld128/k_cexpl.c ___________________________________________________________________ Added: svn:eol-style ## -0,0 +1 ## +native \ No newline at end of property Added: svn:keywords ## -0,0 +1 ## +FreeBSD=%H \ No newline at end of property Added: svn:mime-type ## -0,0 +1 ## +text/plain \ No newline at end of property Index: lib/msun/ld128/s_cexpl.c =================================================================== --- lib/msun/ld128/s_cexpl.c (nonexistent) +++ lib/msun/ld128/s_cexpl.c (working copy) @@ -0,0 +1,76 @@ +/*- + * SPDX-License-Identifier: BSD-2-Clause-FreeBSD + * + * Copyright (c) 2019 Steven G. Kargl + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + */ + +#include +__FBSDID("$FreeBSD$"); + +#include +#include +#include + +#include "math_private.h" + +#warning cexpl is likely broken on ld128 architectures. +#warning Someone who cares needs to convert src/s_cexp.c. + +long double complex +cexpl(long double complex z) +{ + long double c, exp_x, s, x, y; + + x = creall(z); + y = cimagl(z); + + /* cexp(0 + I y) = cos(y) + I sin(y) */ + if (x == 0) { + sincosl(y, &s, &c); + return (CMPLXL(c, s)); + } + + /* cexp(x + I 0) = exp(x) + I 0 */ + if (y == 0) + return (CMPLXL(expl(x), y)); + + if (!isfinite(y)) { + if (isfinite(x) || isnan(x)) { + /* cexp(finite|NaN +- I Inf|NaN) = NaN + I NaN */ + return (CMPLXL(y - y, y - y)); + } else if (isinf(x) && copysignl(1.L, x) < 0) { + /* cexp(-Inf +- I Inf|NaN) = 0 + I 0 */ + return (CMPLXL(0.0L, 0.0L)); + } else { + /* cexp(+Inf +- I Inf|NaN) = Inf + I NaN */ + return (CMPLXL(x, y - y)); + } + } + + exp_x = expl(x); + sincosl(y, &s, &c); + return (CMPLXL(exp_x * c, exp_x * s)); +} +__warn_references("Precision of cexpl may have lower than expected precision"); Property changes on: lib/msun/ld128/s_cexpl.c ___________________________________________________________________ Added: svn:eol-style ## -0,0 +1 ## +native \ No newline at end of property Added: svn:keywords ## -0,0 +1 ## +FreeBSD=%H \ No newline at end of property Added: svn:mime-type ## -0,0 +1 ## +text/plain \ No newline at end of property Index: lib/msun/ld80/k_cexpl.c =================================================================== --- lib/msun/ld80/k_cexpl.c (nonexistent) +++ lib/msun/ld80/k_cexpl.c (working copy) @@ -0,0 +1,118 @@ +/*- + * SPDX-License-Identifier: BSD-2-Clause-FreeBSD + * + * Copyright (c) 2011 David Schultz + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + * + * src/k_exp.c converted to long double complex by Steven G. Kargl + */ + +#include +__FBSDID("$FreeBSD$"); + +#include + +#include "fpmath.h" +#include "math.h" +#include "math_private.h" + +static const uint32_t k = 22956; /* constant for reduction */ +static const union IEEEl2bits +kln2u = LD80C(0xf89f8bf509c862ca, 13, 1.59118866769341045231e+04L); +#define kln2 (kln2u.e) +/* + * Compute exp(x), scaled to avoid spurious overflow. An exponent is + * returned separately in 'expt'. + * + * Input: ln(LDBL_MAX) <= x < ln(2 * LDBL_MAX / LDBL_MIN_DENORM) ~= 22756.0 + * Output: 2**16383 <= y < 2**16384 + */ +static long double +__frexp_expl(long double x, int *expt) +{ + union IEEEl2bits exp_x; + + /* + * We use exp(x) = exp(x - kln2) * 2**k, carefully chosen to + * minimize |exp(kln2) - 2**k|. We also scale the exponent of + * exp_x to MAX_EXP so that the result can be multiplied by + * a tiny number without losing accuracy due to denormalization. + */ + exp_x.e = expl(x - kln2); + *expt = exp_x.bits.exp - (0x3fff + 16383) + k - 1; + exp_x.bits.exp = 0x3fff + 16383; + return (exp_x.e); +} + +/* + * __ldexp_exp(x, expt) and __ldexp_cexp(x, expt) compute exp(x) * 2**expt. + * They are intended for large arguments (real part >= ln(DBL_MAX)) + * where care is needed to avoid overflow. + * + * The present implementation is narrowly tailored for our hyperbolic and + * exponential functions. We assume expt is small (0 or -1), and the caller + * has filtered out very large x, for which overflow would be inevitable. + */ + +long double +__ldexp_expl(long double x, int expt) +{ + union IEEEl2bits scale; + long double exp_x; + int ex_expt; + + exp_x = __frexp_expl(x, &ex_expt); + expt += ex_expt; + scale.e = 1; + scale.bits.exp = 0x3fff + expt; + return (exp_x * scale.e); +} + +long double complex +__ldexp_cexpl(long double complex z, int expt) +{ + union IEEEl2bits scale1, scale2; + long double c, exp_x, s, x, y; + int ex_expt, half_expt; + + x = creall(z); + y = cimagl(z); + exp_x = __frexp_expl(x, &ex_expt); + expt += ex_expt; + + /* + * Arrange so that scale1 * scale2 == 2**expt. We use this to + * compensate for scalbn being horrendously slow. + */ + half_expt = expt / 2; + scale1.e = 1; + scale1.bits.exp = 0x3fff + half_expt; + half_expt = expt - half_expt; + scale2.e = 1; + scale2.bits.exp = 0x3fff + half_expt + 1; + + sincosl(y, &s, &c); + return (CMPLXL(c * exp_x * scale1.e * scale2.e, + s * exp_x * scale1.e * scale2.e)); +} Property changes on: lib/msun/ld80/k_cexpl.c ___________________________________________________________________ Added: svn:eol-style ## -0,0 +1 ## +native \ No newline at end of property Added: svn:keywords ## -0,0 +1 ## +FreeBSD=%H \ No newline at end of property Added: svn:mime-type ## -0,0 +1 ## +text/plain \ No newline at end of property Index: lib/msun/ld80/s_cexpl.c =================================================================== --- lib/msun/ld80/s_cexpl.c (nonexistent) +++ lib/msun/ld80/s_cexpl.c (working copy) @@ -0,0 +1,105 @@ +/*- + * SPDX-License-Identifier: BSD-2-Clause-FreeBSD + * + * Copyright (c) 2011 David Schultz + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + * + * src/s_cexp.c converted to long double complex by Steven G. Kargl + */ + +#include +__FBSDID("$FreeBSD$"); + +#include +#include +#ifdef __i386__ +#include +#endif +#include + +#include "fpmath.h" +#include "math_private.h" + +static const uint32_t +exp_ovfl = 0xb17217f7, /* high bits of MAX_EXP * ln2 ~= 11356 */ +cexp_ovfl = 0xb1c6a857; /* (MAX_EXP - MIN_DENORM_EXP) * ln2 */ + +long double complex +cexpl (long double complex z) +{ + long double c, exp_x, s, x, y; + uint32_t hwx, hwy, lwx, lwy; + uint16_t hx, hy; + + ENTERI(z); + + x = creall(z); + y = cimagl(z); + + EXTRACT_LDBL80_WORDS2(hx, hwx, lwx, x); + EXTRACT_LDBL80_WORDS2(hy, hwy, lwy, y); + + /* cexp(0 + I y) = cos(y) + I sin(y) */ + if ((hwx | lwx | (hx & 0x7fff)) == 0) { + sincosl(y, &s, &c); + RETURNI(CMPLXL(c, s)); + } + + /* cexp(x + I 0) = exp(x) + I 0 */ + if ((hwy | lwy | (hy & 0x7fff)) == 0) + RETURNI(CMPLXL(expl(x), y)); + + if (hy >= 0x7fff) { + if (hx < 0x7fff || + (hx == 0x7fff && (hwx & 0x7fffffff) != 0)) { + /* cexp(finite|NaN +- I Inf|NaN) = NaN + I NaN */ + RETURNI(CMPLXL(y - y, y - y)); + } else if (hx & 0x8000) { + /* cexp(-Inf +- I Inf|NaN) = 0 + I 0 */ + RETURNI(CMPLXL(0.0L, 0.0L)); + } else { + /* cexp(+Inf +- I Inf|NaN) = Inf + I NaN */ + RETURNI(CMPLXL(x, y - y)); + } + } + + if (hx >= 0x400c && hx <= 0x400d) { + /* + * x is between 11356 and 22755, so we must scale to avoid + * overflow in exp(x). + */ + RETURNI(__ldexp_cexpl(z, 0)); + } else { + /* + * Cases covered here: + * - x < exp_ovfl and exp(x) won't overflow (common case) + * - x > cexp_ovfl, so exp(x) * s overflows for all s > 0 + * - x = +-Inf (generated by exp()) + * - x = NaN (spurious inexact exception from y) + */ + exp_x = expl(x); + sincosl(y, &s, &c); + RETURNI(CMPLXL(exp_x * c, exp_x * s)); + } +} Property changes on: lib/msun/ld80/s_cexpl.c ___________________________________________________________________ Added: svn:eol-style ## -0,0 +1 ## +native \ No newline at end of property Added: svn:keywords ## -0,0 +1 ## +FreeBSD=%H \ No newline at end of property Added: svn:mime-type ## -0,0 +1 ## +text/plain \ No newline at end of property Index: lib/msun/man/cexp.3 =================================================================== --- lib/msun/man/cexp.3 (revision 344600) +++ lib/msun/man/cexp.3 (working copy) @@ -24,12 +24,13 @@ .\" .\" $FreeBSD$ .\" -.Dd March 6, 2011 +.Dd February 4, 2019 .Dt CEXP 3 .Os .Sh NAME .Nm cexp , -.Nm cexpf +.Nm cexpf , +.Nm cexpl .Nd complex exponential functions .Sh LIBRARY .Lb libm @@ -39,11 +40,14 @@ .Fn cexp "double complex z" .Ft float complex .Fn cexpf "float complex z" +.Ft long double complex +.Fn cexpl "long double complex z" .Sh DESCRIPTION The -.Fn cexp +.Fn cexp , +.Fn cexpf , and -.Fn cexpf +.Fn cexpl functions compute the complex exponential of .Fa z , also known as @@ -106,8 +110,9 @@ .Xr math 3 .Sh STANDARDS The -.Fn cexp +.Fn cexp , +.Fn cexpf , and -.Fn cexpf +.Fn cexpl functions conform to .St -isoC-99 . Index: include/complex.h =================================================================== --- include/complex.h (revision 344600) +++ include/complex.h (working copy) @@ -98,6 +98,8 @@ float complex ccoshf(float complex); double complex cexp(double complex); float complex cexpf(float complex); +long double complex + cexpl(long double complex); double cimag(double complex) __pure2; float cimagf(float complex) __pure2; long double cimagl(long double complex) __pure2; From owner-freebsd-numerics@freebsd.org Wed Feb 27 06:05:26 2019 Return-Path: Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 92A5615120A1 for ; Wed, 27 Feb 2019 06:05:26 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail104.syd.optusnet.com.au (mail104.syd.optusnet.com.au [211.29.132.246]) by mx1.freebsd.org (Postfix) with ESMTP id D2F3F8EE73 for ; Wed, 27 Feb 2019 06:05:21 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from [192.168.0.102] (c110-21-101-228.carlnfd1.nsw.optusnet.com.au [110.21.101.228]) by mail104.syd.optusnet.com.au (Postfix) with ESMTPS id 80B93435045; Wed, 27 Feb 2019 17:05:18 +1100 (AEDT) Date: Wed, 27 Feb 2019 17:05:15 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Steve Kargl cc: freebsd-numerics@freebsd.org Subject: Re: Update ENTERI() macro In-Reply-To: <20190226191825.GA68479@troutmask.apl.washington.edu> Message-ID: <20190227145002.P907@besplex.bde.org> References: <20190226191825.GA68479@troutmask.apl.washington.edu> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.2 cv=UJetJGXy c=1 sm=1 tr=0 a=PalzARQSbocsUSjMRkwAPg==:117 a=PalzARQSbocsUSjMRkwAPg==:17 a=kj9zAlcOel0A:10 a=6I5d2MoRAAAA:8 a=D3kSSieMYeL5IMV4CcIA:9 a=CjuIK1q_8ugA:10 a=IjZwj45LgO3ly-622nXo:22 X-Rspamd-Queue-Id: D2F3F8EE73 X-Spamd-Bar: ------ Authentication-Results: mx1.freebsd.org; spf=pass (mx1.freebsd.org: domain of brde@optusnet.com.au designates 211.29.132.246 as permitted sender) smtp.mailfrom=brde@optusnet.com.au X-Spamd-Result: default: False [-6.35 / 15.00]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0]; RCVD_IN_DNSWL_LOW(-0.10)[246.132.29.211.list.dnswl.org : 127.0.5.1]; FROM_HAS_DN(0.00)[]; TO_DN_SOME(0.00)[]; R_SPF_ALLOW(-0.20)[+ip4:211.29.132.0/23]; FREEMAIL_FROM(0.00)[optusnet.com.au]; MIME_GOOD(-0.10)[text/plain]; DMARC_NA(0.00)[optusnet.com.au]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; TO_MATCH_ENVRCPT_SOME(0.00)[]; MX_GOOD(-0.01)[extmail.optusnet.com.au]; RCPT_COUNT_TWO(0.00)[2]; NEURAL_HAM_SHORT(-0.98)[-0.978,0]; IP_SCORE(-3.06)[ip: (-7.84), ipnet: 211.28.0.0/14(-4.13), asn: 4804(-3.29), country: AU(-0.04)]; RCVD_NO_TLS_LAST(0.10)[]; FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[]; FREEMAIL_ENVFROM(0.00)[optusnet.com.au]; ASN(0.00)[asn:4804, ipnet:211.28.0.0/14, country:AU]; MIME_TRACE(0.00)[0:+]; RCVD_COUNT_TWO(0.00)[2] X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 27 Feb 2019 06:05:26 -0000 On Tue, 26 Feb 2019, Steve Kargl wrote: > I have submitted > > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=236063 > > Update the ENTERI() macro in math_private.h to take a parameter. > This parameter is used by __typeof() to set the type for an > intermediate variable that holds the return value of a function > on i386 FreeBSD. The patch updates the consumers of ENTERI(). I don't like this. It churns and complicates all the simple cases that only need ENTERI(). It bogotifies the existence of ENTERIT(), whose reason for existence is to avoid this. (Actually, it removes ENTERIT().) __typeof() is unnecessarily unportable. The implementation uses literal typesin thousands of places so gains little or less than nothing by not spelling types literally for ENTERIT(). > ... > --- /data/kargl/freebsd/src/lib/msun/src/math_private.h 2018-10-31 16:15:21.809877000 -0700 > +++ ./src/math_private.h 2019-02-25 16:54:26.897247000 -0800 > @@ -335,15 +348,14 @@ > > /* Support switching the mode to FP_PE if necessary. */ > #if defined(__i386__) && !defined(NO_FPSETPREC) > -#define ENTERI() ENTERIT(long double) > -#define ENTERIT(returntype) \ > - returntype __retval; \ > +#define ENTERI(a) \ > + __typeof(a) __retval; \ > fp_prec_t __oprec; \ > \ > if ((__oprec = fpgetprec()) != FP_PE) \ > fpsetprec(FP_PE) > -#define RETURNI(x) do { \ > - __retval = (x); \ > +#define RETURNI(a) do { \ > + __retval = (a); \ > if (__oprec != FP_PE) \ > fpsetprec(__oprec); \ > RETURNF(__retval); \ > @@ -359,9 +371,8 @@ > return; \ > } while (0) > #else > -#define ENTERI() > -#define ENTERIT(x) > -#define RETURNI(x) RETURNF(x) > +#define ENTERI(a) > +#define RETURNI(a) RETURNF(a) > #define ENTERV() > #define RETURNV() return > #endif This also changes RETURNI(), by unimproving its parameter name. 'x' for ENTERI() wasn't a very good name for a type, but is good for a variable. 'x' for RETURNI() is slightly worse than 'r', but better than 'a' I had forgotten that ENTERIT() creates a local variable, and don't like this, but this is its main reason for existence. Now I see a better way, especially if if we accept unportabilities like __typeof(): ENTERIT() doesn't need to declare local variable with nearly function scope, since the variable is only needed for returning. Only the precision variable needs to have nearly function scope. The return variable can be created in block scope in RETURNI(). Something like: #define RETURNI(r) do { \ __typeof(r) __retval; \ \ __retval = (r); \ if ((__oprec = fpgetprec()) != FP_PE) \ fpsetprec(FP_PE) \ RETURNF(__retval); \ } while (0) You already added ENTERV() and RETURNV(). That seemed reasonable, but now I don't like it. With the above change (and of course also remove the variable from ENTERI()), ENTERV() becomes the same as ENTERI(). RETURNV() still needs to be different. Another bug in ENTERV() is its name. It is missing the 'I' to indicate that it is a complication for only i386. RETURNV() has the same naming problem, and more. It can complement ENTERI() perfectly since it doesn't need to return. It should be named LEAVEI(), and the return should be done by the caller. But I now see 3 more problems. The return in RETURNI() is not direct, but goes through the macro RETURNF(x). In the committed version, this is a default that just returns x, but in my version it returns hackdouble_t(x) or hackfloat_t(x) in some cases (no cases are needed for long doubles, so there is no interaction with ENTERI()/LEAVEI(), and I only do this in a few simple cases not including any with complex types). In the hack*_t() cases, x has type double_t or float_t, and we want to avoid C99's requirement to destroy efficiency and accuracy on return by removing any extra precision in x. (C99 only requires this destruction for careful code that uses double_t or float_t in the function. Then since the return type is different from the internal type, destruction is required. C99 also requires removing extra precision on assignment or casts, but most compilers with most CFLAGS don't do this since it is pessimal. Careful code uses double_t and float_t internally so as to not depend on compiler bugs for efficiency and to avoid unintended loss of accuracy on assignment ands casts for compilers without bugs. C11 adds further destruction for sloppy code that uses the return type internally.) hackdouble_t(x) and hackfloat_t(x) change from type xxx_t to type xxx using a type pun so that the destruction is not required in the C99 case. The 3 problems are: - the type pun interacts in a complicated way with changing the precision. This problem is null since type type pun is not needed for long doubles (unless there is a type with more precision than long double, but there is no such type for any supported arch, and C99 is missing support for it). If this were a problem, then we would have to be carefull about the order of the type pun and the precision change. Do the type pun first. - also, be careful to apply any __typeof() to an arg with the correct type. Normally do the type pun first, and use the return type for the local variable. However, for printing, the arg should be the type that has extra precision, and __typeof() on that doesn't give the return type in general. In simple cases, The modified RETURNF() expands to something like 'return (hackfloat_t(hi + lo));', where hi + lo is evalated in extra precision using float_t if possible, hackfloat_t puns the resulting float_t value to plain float without any destruction, and we finally return the value. In more complicated cases, the punned returned value is stored in a variable for printing using RETURNP(). I forget if the type of this variable is the return type or the arg type (printing in the return type shows what callers will normally see but printing in the arg type gives more details). The caller uses RETURNP*(), and RETURNP*() uses RETURNF() and RETURNF() might do the type pun. My code is not careful enough to handle all cases. For the common hi + lo case, RETURN2P[I]() is used to print hi and lo separately, so the extra precision in hi + lo can be checked even on arches where hi + lo is not evaluated in extra precision. - finally, it is necessary to go through a macro to reach the printing. Not many functions do this. But most long double functions get this at no extra cost (except complexity in the macros) by having to go though ENTERI()/RETURNI(). Currently, the caller must decide, but it is a trivaial edit to change RETURNI() to RETURNP() and a simple edit to change RETURNI(hi + lo) to RETURN2P(hi, lo). There is also SUM2P() to handle more complicated cases of hi + lo. The return value is not passed to RETURNV(), so it is harder to get automatic printing at no extra cost. In s_sincosf.c and s_sincos.c, the macros are just not used, the same as for s_sinf.c, etc. The float case double on all arches instead of double_t. On i386, this reduces to a slightly pessimized use of double_t, and on arches with no extra precision it sacrifices a little efficiency to get more accuracy. In s_sincosl.c, the values are calculated and stored in local variables, and then RETURNV() returns void. This reminds me of a reason why I don't like sincos*(). Its API requires destruction of efficiency and accuracy by returning the values indirectly. On i386 with not very old CPUs, this costs about 8 cycles per long double value. Float and double values cost about half as much. On amd64, the long double case is the same and the float and double cases are faster. sinf() and cosf() on small args take only 15-20 cycles (thoughput) on amd64 with not very old CPUs, so 2-8 extra cycles for the 2 indirect return values is a lot. sincosf() still ends up being slightly faster than separate sinf()/cosf(). Bruce From owner-freebsd-numerics@freebsd.org Wed Feb 27 07:48:21 2019 Return-Path: Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id BB6381514694 for ; Wed, 27 Feb 2019 07:48:20 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu [128.95.76.21]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) server-signature RSA-PSS (4096 bits) client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "troutmask", Issuer "troutmask" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 7AE3A6B873 for ; Wed, 27 Feb 2019 07:48:19 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (localhost [127.0.0.1]) by troutmask.apl.washington.edu (8.15.2/8.15.2) with ESMTPS id x1R7mBxP076055 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NO); Tue, 26 Feb 2019 23:48:11 -0800 (PST) (envelope-from sgk@troutmask.apl.washington.edu) Received: (from sgk@localhost) by troutmask.apl.washington.edu (8.15.2/8.15.2/Submit) id x1R7mBAr076054; Tue, 26 Feb 2019 23:48:11 -0800 (PST) (envelope-from sgk) Date: Tue, 26 Feb 2019 23:48:11 -0800 From: Steve Kargl To: Bruce Evans Cc: freebsd-numerics@freebsd.org Subject: Re: Update ENTERI() macro Message-ID: <20190227074811.GA75972@troutmask.apl.washington.edu> Reply-To: sgk@troutmask.apl.washington.edu References: <20190226191825.GA68479@troutmask.apl.washington.edu> <20190227145002.P907@besplex.bde.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190227145002.P907@besplex.bde.org> User-Agent: Mutt/1.11.2 (2019-01-07) X-Rspamd-Queue-Id: 7AE3A6B873 X-Spamd-Bar: + Authentication-Results: mx1.freebsd.org X-Spamd-Result: default: False [1.06 / 15.00]; ARC_NA(0.00)[]; HAS_REPLYTO(0.00)[sgk@troutmask.apl.washington.edu]; FROM_HAS_DN(0.00)[]; TO_DN_SOME(0.00)[]; NEURAL_SPAM_SHORT(0.49)[0.491,0]; NEURAL_HAM_LONG(-0.51)[-0.508,0]; MIME_GOOD(-0.10)[text/plain]; RCVD_TLS_LAST(0.00)[]; DMARC_NA(0.00)[washington.edu]; AUTH_NA(1.00)[]; REPLYTO_ADDR_EQ_FROM(0.00)[]; RCVD_COUNT_THREE(0.00)[3]; TO_MATCH_ENVRCPT_SOME(0.00)[]; RCVD_IN_DNSWL_MED(-0.20)[21.76.95.128.list.dnswl.org : 127.0.11.2]; RCPT_COUNT_TWO(0.00)[2]; MX_GOOD(-0.01)[troutmask.apl.washington.edu]; NEURAL_SPAM_MEDIUM(0.33)[0.331,0]; R_SPF_NA(0.00)[]; FREEMAIL_TO(0.00)[optusnet.com.au]; FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[]; MIME_TRACE(0.00)[0:+]; ASN(0.00)[asn:73, ipnet:128.95.0.0/16, country:US]; MID_RHS_MATCH_FROM(0.00)[]; IP_SCORE(0.05)[ip: (0.11), ipnet: 128.95.0.0/16(0.16), asn: 73(0.06), country: US(-0.07)] X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 27 Feb 2019 07:48:21 -0000 On Wed, Feb 27, 2019 at 05:05:15PM +1100, Bruce Evans wrote: > On Tue, 26 Feb 2019, Steve Kargl wrote: > > > I have submitted > > > > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=236063 > > > > Update the ENTERI() macro in math_private.h to take a parameter. > > This parameter is used by __typeof() to set the type for an > > intermediate variable that holds the return value of a function > > on i386 FreeBSD. The patch updates the consumers of ENTERI(). > > I don't like this. It churns and complicates all the simple cases > that only need ENTERI(). It bogotifies the existence of ENTERIT(), > whose reason for existence is to avoid this. (Actually, it removes > ENTERIT().) __typeof() is unnecessarily unportable. The implementation > uses literal typesin thousands of places so gains little or less than > nothing by not spelling types literally for ENTERIT(). Okay. The other option is an ENTERC() and RETURNC() as we need to toggle FP_PE for long double complex functions. I suppose I could follow the one example currently in the tree that use ENTERIT(long double complex) I find it somewhat odd that we have ENTERI() /* Implicit declaration of __retval to long double. */ but must use directly ENTERIT(long double complex). > > ... > > --- /data/kargl/freebsd/src/lib/msun/src/math_private.h 2018-10-31 16:15:21.809877000 -0700 > > +++ ./src/math_private.h 2019-02-25 16:54:26.897247000 -0800 > > @@ -335,15 +348,14 @@ > > > > /* Support switching the mode to FP_PE if necessary. */ > > #if defined(__i386__) && !defined(NO_FPSETPREC) > > -#define ENTERI() ENTERIT(long double) > > -#define ENTERIT(returntype) \ > > - returntype __retval; \ > > +#define ENTERI(a) \ > > + __typeof(a) __retval; \ > > fp_prec_t __oprec; \ > > \ > > if ((__oprec = fpgetprec()) != FP_PE) \ > > fpsetprec(FP_PE) > > -#define RETURNI(x) do { \ > > - __retval = (x); \ > > +#define RETURNI(a) do { \ > > + __retval = (a); \ > > if (__oprec != FP_PE) \ > > fpsetprec(__oprec); \ > > RETURNF(__retval); \ > > @@ -359,9 +371,8 @@ > > return; \ > > } while (0) > > #else > > -#define ENTERI() > > -#define ENTERIT(x) > > -#define RETURNI(x) RETURNF(x) > > +#define ENTERI(a) > > +#define RETURNI(a) RETURNF(a) > > #define ENTERV() > > #define RETURNV() return > > #endif > > This also changes RETURNI(), by unimproving its parameter name. 'x' for > ENTERI() wasn't a very good name for a type, but is good for a variable. > 'x' for RETURNI() is slightly worse than 'r', but better than 'a' The renaming is for consistency. I can use 'r'. > I had forgotten that ENTERIT() creates a local variable, and don't like this, > but this is its main reason for existence. Now I see a better way, > especially if if we accept unportabilities like __typeof(): > > ENTERIT() doesn't need to declare local variable with nearly function scope, > since the variable is only needed for returning. Only the precision variable > needs to have nearly function scope. The return variable can be created in > block scope in RETURNI(). Something like: > > #define RETURNI(r) do { \ > __typeof(r) __retval; \ > \ > __retval = (r); \ > if ((__oprec = fpgetprec()) != FP_PE) \ > fpsetprec(FP_PE) \ > RETURNF(__retval); \ > } while (0) > > You already added ENTERV() and RETURNV(). That seemed reasonable, but now > I don't like it. With the above change (and of course also remove the > variable from ENTERI()), ENTERV() becomes the same as ENTERI(). RETURNV() > still needs to be different. > > Another bug in ENTERV() is its name. It is missing the 'I' to indicate that > it is a complication for only i386. You probably told at some point in the distant past what the 'I' meant, but that is long forgotten. I'm fine with you changing these names. ENTERIV() and RETURNIV() are needed for sincosl(), which is a void function. > RETURNV() has the same naming problem, and more. It can complement > ENTERI() perfectly since it doesn't need to return. It should be named > LEAVEI(), and the return should be done by the caller. > > But I now see 3 more problems. The return in RETURNI() is not direct, > but goes through the macro RETURNF(x). In the committed version, this > is a default that just returns x, but in my version it returns > hackdouble_t(x) or hackfloat_t(x) in some cases (no cases are needed > for long doubles, so there is no interaction with ENTERI()/LEAVEI(), > and I only do this in a few simple cases not including any with > complex types). I'm fine with making ENTERI() only toggle precision, and adding a LEAVEI() to reset precision. RETURNI(r) would then be #define RETURNI(r) \ do { \ LEAVEI(); \ return (r); \ } while (0) ENTERV and RETURNV can then go away as ENTERV is simply ENTERI and RETURNV is now spelled LEAVEI. > In the hack*_t() cases, x has type double_t or float_t, and we want > to avoid C99's requirement to destroy efficiency and accuracy on return > by removing any extra precision in x. (C99 only requires this > destruction for careful code that uses double_t or float_t in the > function. Then since the return type is different from the internal > type, destruction is required. C99 also requires removing extra > precision on assignment or casts, but most compilers with most CFLAGS > don't do this since it is pessimal. Careful code uses double_t and > float_t internally so as to not depend on compiler bugs for efficiency > and to avoid unintended loss of accuracy on assignment ands casts for > compilers without bugs. C11 adds further destruction for sloppy code > that uses the return type internally.) hackdouble_t(x) and > hackfloat_t(x) change from type xxx_t to type xxx using a type pun so > that the destruction is not required in the C99 case. > > The 3 problems are: > - the type pun interacts in a complicated way with changing the precision. > This problem is null since type type pun is not needed for long doubles > (unless there is a type with more precision than long double, but there > is no such type for any supported arch, and C99 is missing support for > it). If this were a problem, then we would have to be carefull about > the order of the type pun and the precision change. Do the type pun > first. > > - also, be careful to apply any __typeof() to an arg with the correct > type. Normally do the type pun first, and use the return type for > the local variable. However, for printing, the arg should be the > type that has extra precision, and __typeof() on that doesn't give > the return type in general. > > In simple cases, The modified RETURNF() expands to something like > 'return (hackfloat_t(hi + lo));', > where hi + lo is evalated in extra precision using float_t if possible, > hackfloat_t puns the resulting float_t value to plain float without > any destruction, and we finally return the value. > > In more complicated cases, the punned returned value is stored in a > variable for printing using RETURNP(). I forget if the type of this > variable is the return type or the arg type (printing in the return > type shows what callers will normally see but printing in the arg > type gives more details). The caller uses RETURNP*(), and RETURNP*() > uses RETURNF() and RETURNF() might do the type pun. My code is not > careful enough to handle all cases. For the common hi + lo case, > RETURN2P[I]() is used to print hi and lo separately, so the extra > precision in hi + lo can be checked even on arches where hi + lo is > not evaluated in extra precision. > > - finally, it is necessary to go through a macro to reach the printing. > Not many functions do this. But most long double functions get this > at no extra cost (except complexity in the macros) by having to go > though ENTERI()/RETURNI(). Currently, the caller must decide, but > it is a trivaial edit to change RETURNI() to RETURNP() and a simple > edit to change RETURNI(hi + lo) to RETURN2P(hi, lo). There is also > SUM2P() to handle more complicated cases of hi + lo. > > The return value is not passed to RETURNV(), so it is harder to get > automatic printing at no extra cost. In s_sincosf.c and s_sincos.c, > the macros are just not used, the same as for s_sinf.c, etc. The > float case double on all arches instead of double_t. On i386, this > reduces to a slightly pessimized use of double_t, and on arches with > no extra precision it sacrifices a little efficiency to get more > accuracy. In s_sincosl.c, the values are calculated and stored in > local variables, and then RETURNV() returns void. > > This reminds me of a reason why I don't like sincos*(). Its API > requires destruction of efficiency and accuracy by returning the values > indirectly. On i386 with not very old CPUs, this costs about 8 cycles per > long double value. Float and double values cost about half as much. On > amd64, the long double case is the same and the float and double cases > are faster. Not sure your efficiency claim holds. I've seen significant improves in cexp and cexpf where sin[f]() and cos[f]() are replaced by sincos[f]. On my core2 running i386 freebsd, I see 0.1779 usecs/call for cexpf with sinf and cosf and 0.12522 usecs/call for sincosf. Yes, that's a 29.6% improvement. For cexp the numbers are 0.2697 usecs/call for sin and cos and 0.20586 for sincos (ie, 23.7% improvement). This is for z = x + I y with x and y in the non-exceptable case. > sinf() and cosf() on small args take only 15-20 cycles (thoughput) on > amd64 with not very old CPUs, so 2-8 extra cycles for the 2 indirect > return values is a lot. sincosf() still ends up being slightly faster > than separate sinf()/cosf(). Seems to be much faster when used in other functions. -- Steve From owner-freebsd-numerics@freebsd.org Wed Feb 27 08:42:49 2019 Return-Path: Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 422E71515B9A for ; Wed, 27 Feb 2019 08:42:49 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail104.syd.optusnet.com.au (mail104.syd.optusnet.com.au [211.29.132.246]) by mx1.freebsd.org (Postfix) with ESMTP id 157F26D4DF for ; Wed, 27 Feb 2019 08:42:45 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from [192.168.0.102] (c110-21-101-228.carlnfd1.nsw.optusnet.com.au [110.21.101.228]) by mail104.syd.optusnet.com.au (Postfix) with ESMTPS id 610EB4365DE; Wed, 27 Feb 2019 19:42:42 +1100 (AEDT) Date: Wed, 27 Feb 2019 19:42:41 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Steve Kargl cc: freebsd-numerics@freebsd.org Subject: Re: Implement cexpl In-Reply-To: <20190227020142.GA74833@troutmask.apl.washington.edu> Message-ID: <20190227170706.J907@besplex.bde.org> References: <20190203173056.GA42499@troutmask.apl.washington.edu> <20190227020142.GA74833@troutmask.apl.washington.edu> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.2 cv=FNpr/6gs c=1 sm=1 tr=0 a=PalzARQSbocsUSjMRkwAPg==:117 a=PalzARQSbocsUSjMRkwAPg==:17 a=kj9zAlcOel0A:10 a=z-HYpUQ0eYvp4CnOBZ8A:9 a=CjuIK1q_8ugA:10 X-Rspamd-Queue-Id: 157F26D4DF X-Spamd-Bar: ------ Authentication-Results: mx1.freebsd.org; spf=pass (mx1.freebsd.org: domain of brde@optusnet.com.au designates 211.29.132.246 as permitted sender) smtp.mailfrom=brde@optusnet.com.au X-Spamd-Result: default: False [-6.32 / 15.00]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0]; RCVD_IN_DNSWL_LOW(-0.10)[246.132.29.211.list.dnswl.org : 127.0.5.1]; FROM_HAS_DN(0.00)[]; TO_DN_SOME(0.00)[]; R_SPF_ALLOW(-0.20)[+ip4:211.29.132.0/23]; FREEMAIL_FROM(0.00)[optusnet.com.au]; MIME_GOOD(-0.10)[text/plain]; DMARC_NA(0.00)[optusnet.com.au]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; TO_MATCH_ENVRCPT_SOME(0.00)[]; MX_GOOD(-0.01)[cached: extmail.optusnet.com.au]; RCPT_COUNT_TWO(0.00)[2]; NEURAL_HAM_SHORT(-0.93)[-0.931,0]; IP_SCORE(-3.08)[ip: (-7.92), ipnet: 211.28.0.0/14(-4.14), asn: 4804(-3.30), country: AU(-0.04)]; RCVD_NO_TLS_LAST(0.10)[]; FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[]; FREEMAIL_ENVFROM(0.00)[optusnet.com.au]; ASN(0.00)[asn:4804, ipnet:211.28.0.0/14, country:AU]; MIME_TRACE(0.00)[0:+]; RCVD_COUNT_TWO(0.00)[2] X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 27 Feb 2019 08:42:49 -0000 On Tue, 26 Feb 2019, Steve Kargl wrote: > On Sun, Feb 03, 2019 at 09:30:56AM -0800, Steve Kargl wrote: >> The following patch implements cexpl() for ld80 and ld128 architectures. > ... > Index: lib/msun/src/math_private.h > =================================================================== > --- lib/msun/src/math_private.h (revision 344600) > +++ lib/msun/src/math_private.h (working copy) > @@ -243,6 +243,20 @@ > } while (0) > > /* > + * Get expsign and mantissa as 16 bit and 2 32-bit ints from an 80 bit long > + * double. > + */ > + > +#define EXTRACT_LDBL80_WORDS2(ix0,ix1,ix2,d) \ > +do { \ > + union IEEEl2bits ew_u; \ > + ew_u.e = (d); \ > + (ix0) = ew_u.xbits.expsign; \ > + (ix1) = ew_u.bits.manh; \ > + (ix2) = ew_u.bits.manl; \ > +} while (0) Don't add more of these accessor functions (more were already added to support powl, since powl doesn't use the FreeBSD spelling in APIs). Use existing separate functions and let the compiler combine them. The one corresponding to this is EXTRACT_LDBL80_WORDS(). That extracts the mantissa into a single 64-bit word which is easier to use than the 2 32-bit words given by the above. It also doesn't ask for pessimal extraction from memory 32 bits at a time, but compilers usually combine into a single 64-bit word for the memory access and then split into 2 32-bit words for 32-bit arches. 3 words instead of 2 also enlarges diffs with the double precision case. > Index: lib/msun/src/s_cexp.c > =================================================================== > --- lib/msun/src/s_cexp.c (revision 344600) > +++ lib/msun/src/s_cexp.c (working copy) > @@ -30,6 +30,7 @@ > __FBSDID("$FreeBSD$"); > > #include > +#include > #include > > #include "math_private.h" > @@ -41,12 +42,19 @@ > double complex > cexp(double complex z) > { > - double x, y, exp_x; > + double c, exp_x, s, x, y; > uint32_t hx, hy, lx, ly; > > x = creal(z); > y = cimag(z); > > + EXTRACT_WORDS(hx, lx, x); > + /* cexp(0 + I y) = cos(y) + I sin(y) */ > + if (((hx & 0x7fffffff) | lx) == 0) { > + sincos(y, &s, &c); > + return (CMPLX(c, s)); > + } > + > EXTRACT_WORDS(hy, ly, y); > hy &= 0x7fffffff; > Another reason I don't like sincos*() is that if the compiler can very easily do the above rewrite automatically. The only large difficulty is to know if the sin(), cos() and sincos() are any good, or at least their relative badness. x86 compilers already know that the i387 sin() and cos() are no good, (or perhaps they just don't know that the FreeBSD implementation doesn't make the i387 functions non-conforming to C99 and/or POSIX (since they don't set errno and might do other side effects wrong)), so x86 compilers don't convert the above to i387 sin(), cos() or sincos(). It is even harder for compilers to know the quality of the library. > @@ -53,10 +61,6 @@ > /* cexp(x + I 0) = exp(x) + I 0 */ > if ((hy | ly) == 0) > return (CMPLX(exp(x), y)); The comments aren't as careful as for hyperbolic functions. The 0 here is actually -0 if y is -0. The code handles both +-0 by clearing the sign bit in hy for the check byt keeping it in y. > Index: lib/msun/ld128/k_cexpl.c > ... > +/* > + * __ldexp_exp(x, expt) and __ldexp_cexp(x, expt) compute exp(x) * 2**expt. > + * They are intended for large arguments (real part >= ln(DBL_MAX)) > + * where care is needed to avoid overflow. > + * > + * The present implementation is narrowly tailored for our hyperbolic and > + * exponential functions. We assume expt is small (0 or -1), and the caller > + * has filtered out very large x, for which overflow would be inevitable. > + */ > + > +long double > +__ldexp_expl(long double x, int expt) > +{ > +#if 0 > + double exp_x, scale; > + int ex_expt; > + > + exp_x = __frexp_exp(x, &ex_expt); > + expt += ex_expt; > + INSERT_WORDS(scale, (0x3ff + expt) << 20, 0); > + return (exp_x * scale); > +#endif > + return (x - x) / (x - x); > +} This is less likely to work than the comment says :-). > + > +long double complex > +__ldexp_cexpl(long double complex z, int expt) > +{ > +#if 0 > + double x, y, exp_x, scale1, scale2; > + int ex_expt, half_expt; Same. The code that has a chance of working won't even be tested at compile time since it is ifdefed out. > Index: lib/msun/ld128/s_cexpl.c > =================================================================== > --- lib/msun/ld128/s_cexpl.c (nonexistent) > +++ lib/msun/ld128/s_cexpl.c (working copy) > @@ -0,0 +1,76 @@ > ... > + exp_x = expl(x); > + sincosl(y, &s, &c); > + return (CMPLXL(exp_x * c, exp_x * s)); This doesn't use the kernel, although one is written (but always returns NaN). For testing, the low quality kernel should at least use the above formula. > Index: lib/msun/ld80/s_cexpl.c > =================================================================== > --- lib/msun/ld80/s_cexpl.c (nonexistent) > +++ lib/msun/ld80/s_cexpl.c (working copy) > @@ -0,0 +1,105 @@ > ... > +long double complex > +cexpl (long double complex z) > +{ > + long double c, exp_x, s, x, y; > + uint32_t hwx, hwy, lwx, lwy; > + uint16_t hx, hy; > + > + ENTERI(z); > + > + x = creall(z); > + y = cimagl(z); > + > + EXTRACT_LDBL80_WORDS2(hx, hwx, lwx, x); > + EXTRACT_LDBL80_WORDS2(hy, hwy, lwy, y); This asks for slowness relative to the double precision method, by doing 2 slow extraction steps up front. This seems to be broken by not using the normal method of hx &= 0x7fff up front. > + > + /* cexp(0 + I y) = cos(y) + I sin(y) */ > + if ((hwx | lwx | (hx & 0x7fff)) == 0) { The 2 words for the mantissa are just harder to use. Write hwx | lwx normally as lx after extracting into 1 64-bit word. This handles -0 correctly by masking hx with 0x7fff. > + sincosl(y, &s, &c); > + RETURNI(CMPLXL(c, s)); > + } > + > + /* cexp(x + I 0) = exp(x) + I 0 */ > + if ((hwy | lwy | (hy & 0x7fff)) == 0) > + RETURNI(CMPLXL(expl(x), y)); > + > + if (hy >= 0x7fff) { Bugs from not masking h[xy] with 0x7fff up front start here. This misclassifies all negative y's except -0 as being NaNs. > + if (hx < 0x7fff || > + (hx == 0x7fff && (hwx & 0x7fffffff) != 0)) { Many bugs here: - not masking hx with 0x7fff - not checking the low bits of the mantissa. It is foot shooting to have these bits in a separate variable - not checking the normalization bit. Fix this by copying mostly logl() (expl() is too subtle to copy): lx = from correct extraction ix = hx & 0x7fff; /* up front, as in logl */ ... if (ix < 0x7fff || (hx == 0x7fff && lx != 0x8000000000000000)) The first line classifies finite x, and the second line classifies everyhing else except +-Inf. The other cases are: - normal NaN with top bit set - unnormal NaN with top bit set - unnormal +-Inf with top bit clear. This is slightly wrong too. x87 hardware treats unnormal finite values as NaNs, so this should treat them as NaNs too. Checking the normalization bit first works especially easily here. There are complications for normalized zeros and normalized denormals (sic), but we have already handled the zeros. IIRC, the hardware doesn't treat normalized denomals as NaNs, so we get the correct result by treating them as finite. This gives: ... if ((lx & 0x8000000000000000) == 0 || ix < 0x7fff || (ix == 0x7fff && (lx & 0x7fffffffffffffff) != 0)) > + /* cexp(finite|NaN +- I Inf|NaN) = NaN + I NaN */ > + RETURNI(CMPLXL(y - y, y - y)); > + } else if (hx & 0x8000) { cexp() doesn't mask the sign bit in hx or use ix, so as to test the sign via hx here. It clobbers the sign bit in hy. > + /* cexp(-Inf +- I Inf|NaN) = 0 + I 0 */ > + RETURNI(CMPLXL(0.0L, 0.0L)); Style bug: explcit long double constants. The float case risks using double precision by spelling the constants 0.0, the same as the double case. Plain 0 is a better spelling. There may be further bugs from misclassification of y using hy >= 0x7fff. This is supposed to classify Infs and NaNs. The masking bug makes it misclassify all negative values and -0. The further bugs are that it doesn't classify unnormals (except unnormal denormals) as NaNs. Only ld80 has unnormals. Bruce From owner-freebsd-numerics@freebsd.org Wed Feb 27 10:16:08 2019 Return-Path: Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 29EE51518470 for ; Wed, 27 Feb 2019 10:16:08 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail106.syd.optusnet.com.au (mail106.syd.optusnet.com.au [211.29.132.42]) by mx1.freebsd.org (Postfix) with ESMTP id 6316E70622 for ; Wed, 27 Feb 2019 10:16:04 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from [192.168.0.102] (c110-21-101-228.carlnfd1.nsw.optusnet.com.au [110.21.101.228]) by mail106.syd.optusnet.com.au (Postfix) with ESMTPS id 752763D6714; Wed, 27 Feb 2019 21:15:54 +1100 (AEDT) Date: Wed, 27 Feb 2019 21:15:52 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Steve Kargl cc: freebsd-numerics@freebsd.org Subject: Re: Update ENTERI() macro In-Reply-To: <20190227074811.GA75972@troutmask.apl.washington.edu> Message-ID: <20190227201214.V1823@besplex.bde.org> References: <20190226191825.GA68479@troutmask.apl.washington.edu> <20190227145002.P907@besplex.bde.org> <20190227074811.GA75972@troutmask.apl.washington.edu> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.2 cv=P6RKvmIu c=1 sm=1 tr=0 a=PalzARQSbocsUSjMRkwAPg==:117 a=PalzARQSbocsUSjMRkwAPg==:17 a=kj9zAlcOel0A:10 a=_NwNQ0rs8jac9D6pTMUA:9 a=CjuIK1q_8ugA:10 X-Rspamd-Queue-Id: 6316E70622 X-Spamd-Bar: ------ Authentication-Results: mx1.freebsd.org; spf=pass (mx1.freebsd.org: domain of brde@optusnet.com.au designates 211.29.132.42 as permitted sender) smtp.mailfrom=brde@optusnet.com.au X-Spamd-Result: default: False [-6.44 / 15.00]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0]; RCVD_IN_DNSWL_LOW(-0.10)[42.132.29.211.list.dnswl.org : 127.0.5.1]; FROM_HAS_DN(0.00)[]; TO_DN_SOME(0.00)[]; R_SPF_ALLOW(-0.20)[+ip4:211.29.132.0/23]; FREEMAIL_FROM(0.00)[optusnet.com.au]; MIME_GOOD(-0.10)[text/plain]; DMARC_NA(0.00)[optusnet.com.au]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; TO_MATCH_ENVRCPT_SOME(0.00)[]; MX_GOOD(-0.01)[cached: extmail.optusnet.com.au]; RCPT_COUNT_TWO(0.00)[2]; NEURAL_HAM_SHORT(-0.88)[-0.884,0]; IP_SCORE(-3.25)[ip: (-8.75), ipnet: 211.28.0.0/14(-4.15), asn: 4804(-3.30), country: AU(-0.04)]; RCVD_NO_TLS_LAST(0.10)[]; FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[]; FREEMAIL_ENVFROM(0.00)[optusnet.com.au]; ASN(0.00)[asn:4804, ipnet:211.28.0.0/14, country:AU]; MIME_TRACE(0.00)[0:+]; RCVD_COUNT_TWO(0.00)[2] X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 27 Feb 2019 10:16:08 -0000 On Tue, 26 Feb 2019, Steve Kargl wrote: > On Wed, Feb 27, 2019 at 05:05:15PM +1100, Bruce Evans wrote: >> On Tue, 26 Feb 2019, Steve Kargl wrote: >* ... >>> Update the ENTERI() macro in math_private.h to take a parameter. >> ... >> I don't like this. It churns and complicates all the simple cases >> that only need ENTERI(). It bogotifies the existence of ENTERIT(), > ... > Okay. The other option is an ENTERC() and RETURNC() as > we need to toggle FP_PE for long double complex functions. > I suppose I could follow the one example currently in the > tree that use > > ENTERIT(long double complex) > > I find it somewhat odd that we have > > ENTERI() /* Implicit declaration of __retval to long double. */ > > but must use directly ENTERIT(long double complex). ENTERI() hard-codes the long double for simplicity. Remember, it is only needed for long double precision on i386. But I forgot about long double complex types, and didn't dream about indirect long double types in sincosl(). > ... >>> -#define RETURNI(x) RETURNF(x) >>> +#define ENTERI(a) >>> +#define RETURNI(a) RETURNF(a) >>> #define ENTERV() >>> #define RETURNV() return >>> #endif >> >> This also changes RETURNI(), by unimproving its parameter name. 'x' for >> ENTERI() wasn't a very good name for a type, but is good for a variable. >> 'x' for RETURNI() is slightly worse than 'r', but better than 'a' > > The renaming is for consistency. I can use 'r'. 'r' is not quite right either, since the arg can be and is often an expression. 'a' is good for 'arg'. >> ... >> But I now see 3 more problems. The return in RETURNI() is not direct, >> but goes through the macro RETURNF(x). In the committed version, this >> is a default that just returns x, but in my version it returns >> hackdouble_t(x) or hackfloat_t(x) in some cases (no cases are needed >> for long doubles, so there is no interaction with ENTERI()/LEAVEI(), >> and I only do this in a few simple cases not including any with >> complex types). > > I'm fine with making ENTERI() only toggle precision, and adding > a LEAVEI() to reset precision. RETURNI(r) would then be > > #define RETURNI(r) \ > do { \ > LEAVEI(); \ > return (r); \ > } while (0) No, may be an expression, so it must be evaluated before LEAVEI(). This is the reason for existence of the variable to hold the result. >> [... about complications for the general case] >> This reminds me of a reason why I don't like sincos*(). Its API >> requires destruction of efficiency and accuracy by returning the values >> indirectly. On i386 with not very old CPUs, this costs about 8 cycles per >> long double value. Float and double values cost about half as much. On >> amd64, the long double case is the same and the float and double cases >> are faster. > > Not sure your efficiency claim holds. I've seen significant improves > in cexp and cexpf where sin[f]() and cos[f]() are replaced by > sincos[f]. On my core2 running i386 freebsd, I see 0.1779 usecs/call > for cexpf with sinf and cosf and 0.12522 usecs/call for sincosf. > Yes, that's a 29.6% improvement. For cexp the numbers are 0.2697 > usecs/call for sin and cos and 0.20586 for sincos (ie, 23.7% improvement). > This is for z = x + I y with x and y in the non-exceptable case. Combined sin and cos probably does work better outside of benchmarks for sin and cos alone, since it does less work so leaves more resources for the, more useful things. >> sinf() and cosf() on small args take only 15-20 cycles (thoughput) on >> amd64 with not very old CPUs, so 2-8 extra cycles for the 2 indirect >> return values is a lot. sincosf() still ends up being slightly faster >> than separate sinf()/cosf(). > > Seems to be much faster when used in other functions. It's hard tp be much faster than 15-20 cycles. The latency is more like 50 cycles, with 3 sinf()'s or cosf()'s running in parallel. sincos() is far from the best possible optimization for repeated calls on the same or nearby args. If sin() and cos() cached the arg reduction, then separate sin() and cos() on the same arg would run about as fast as sincos(), and repeated sin()'s on the same arg would run much faster than now. Caching the arg reduction may also be good when the arg changes slightly. However, caching is slower if the args are not close. Even a 1-entry cache takes a long time to look up relative to the 15-20 cycles taken by sinf() and cosf(). Caching is complicated by signal handlers and threads. Perhaps the right API one that has to ask for caching and provides the cache storage. Then sincos() could be: ... _dh_init(x, &dh); /* prefill 1-entry cache dh */ s = _sin_cache(x, &dh, 1); /* cache hit unless x is NaN /* cache misses update dh */ c = _cos_cache(x, &dh, 1); /* cache hit unless x is NaN ... and with everything inlined this is little different from the current sincos() except for NaNs. NaNs can be cache hits too if you compare them as bits, but the comparison should probably be x == dhp->dh_x for a 1-entry cache, so as to not to have to extract the bits of x. Bruce From owner-freebsd-numerics@freebsd.org Wed Feb 27 16:19:11 2019 Return-Path: Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id BE29A1521000 for ; Wed, 27 Feb 2019 16:19:11 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu [128.95.76.21]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) server-signature RSA-PSS (4096 bits) client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "troutmask", Issuer "troutmask" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id B8D9B84AFF for ; Wed, 27 Feb 2019 16:19:09 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (localhost [127.0.0.1]) by troutmask.apl.washington.edu (8.15.2/8.15.2) with ESMTPS id x1RGJ7J6077887 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NO); Wed, 27 Feb 2019 08:19:07 -0800 (PST) (envelope-from sgk@troutmask.apl.washington.edu) Received: (from sgk@localhost) by troutmask.apl.washington.edu (8.15.2/8.15.2/Submit) id x1RGJ6kv077886; Wed, 27 Feb 2019 08:19:06 -0800 (PST) (envelope-from sgk) Date: Wed, 27 Feb 2019 08:19:06 -0800 From: Steve Kargl To: Bruce Evans Cc: freebsd-numerics@freebsd.org Subject: Re: Update ENTERI() macro Message-ID: <20190227161906.GA77785@troutmask.apl.washington.edu> Reply-To: sgk@troutmask.apl.washington.edu References: <20190226191825.GA68479@troutmask.apl.washington.edu> <20190227145002.P907@besplex.bde.org> <20190227074811.GA75972@troutmask.apl.washington.edu> <20190227201214.V1823@besplex.bde.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190227201214.V1823@besplex.bde.org> User-Agent: Mutt/1.11.2 (2019-01-07) X-Rspamd-Queue-Id: B8D9B84AFF X-Spamd-Bar: + Authentication-Results: mx1.freebsd.org X-Spamd-Result: default: False [1.32 / 15.00]; ARC_NA(0.00)[]; HAS_REPLYTO(0.00)[sgk@troutmask.apl.washington.edu]; NEURAL_HAM_MEDIUM(-0.47)[-0.472,0]; FROM_HAS_DN(0.00)[]; TO_DN_SOME(0.00)[]; NEURAL_SPAM_SHORT(0.90)[0.899,0]; MIME_GOOD(-0.10)[text/plain]; RCVD_TLS_LAST(0.00)[]; DMARC_NA(0.00)[washington.edu]; AUTH_NA(1.00)[]; REPLYTO_ADDR_EQ_FROM(0.00)[]; RCVD_COUNT_THREE(0.00)[3]; TO_MATCH_ENVRCPT_SOME(0.00)[]; RCVD_IN_DNSWL_MED(-0.20)[21.76.95.128.list.dnswl.org : 127.0.11.2]; MX_GOOD(-0.01)[cached: troutmask.apl.washington.edu]; RCPT_COUNT_TWO(0.00)[2]; NEURAL_SPAM_LONG(0.15)[0.149,0]; R_SPF_NA(0.00)[]; FREEMAIL_TO(0.00)[optusnet.com.au]; FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[]; MIME_TRACE(0.00)[0:+]; ASN(0.00)[asn:73, ipnet:128.95.0.0/16, country:US]; MID_RHS_MATCH_FROM(0.00)[]; IP_SCORE(0.05)[ip: (0.11), ipnet: 128.95.0.0/16(0.16), asn: 73(0.06), country: US(-0.07)] X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 27 Feb 2019 16:19:12 -0000 On Wed, Feb 27, 2019 at 09:15:52PM +1100, Bruce Evans wrote: > On Tue, 26 Feb 2019, Steve Kargl wrote: > > > On Wed, Feb 27, 2019 at 05:05:15PM +1100, Bruce Evans wrote: > >> On Tue, 26 Feb 2019, Steve Kargl wrote: > >* ... > >>> Update the ENTERI() macro in math_private.h to take a parameter. > >> ... > >> I don't like this. It churns and complicates all the simple cases > >> that only need ENTERI(). It bogotifies the existence of ENTERIT(), > > ... > > Okay. The other option is an ENTERC() and RETURNC() as > > we need to toggle FP_PE for long double complex functions. > > I suppose I could follow the one example currently in the > > tree that use > > > > ENTERIT(long double complex) > > > > I find it somewhat odd that we have > > > > ENTERI() /* Implicit declaration of __retval to long double. */ > > > > but must use directly ENTERIT(long double complex). > > ENTERI() hard-codes the long double for simplicity. Remember, it is only > needed for long double precision on i386. But I forgot about long double > complex types, and didn't dream about indirect long double types in sincosl(). That simplicity does not work for long double complex. We will need either ENTERIC as in #define ENTERIC() ENTERIT(long double complex) or a direct use of ENTERIT as you have done s_clogl.c > > > ... > >>> -#define RETURNI(x) RETURNF(x) > >>> +#define ENTERI(a) > >>> +#define RETURNI(a) RETURNF(a) > >>> #define ENTERV() > >>> #define RETURNV() return > >>> #endif > >> > >> This also changes RETURNI(), by unimproving its parameter name. 'x' for > >> ENTERI() wasn't a very good name for a type, but is good for a variable. > >> 'x' for RETURNI() is slightly worse than 'r', but better than 'a' > > > > The renaming is for consistency. I can use 'r'. > > 'r' is not quite right either, since the arg can be and is often an > expression. 'a' is good for 'arg'. > > >> ... > >> But I now see 3 more problems. The return in RETURNI() is not direct, > >> but goes through the macro RETURNF(x). In the committed version, this > >> is a default that just returns x, but in my version it returns > >> hackdouble_t(x) or hackfloat_t(x) in some cases (no cases are needed > >> for long doubles, so there is no interaction with ENTERI()/LEAVEI(), > >> and I only do this in a few simple cases not including any with > >> complex types). > > > > I'm fine with making ENTERI() only toggle precision, and adding > > a LEAVEI() to reset precision. RETURNI(r) would then be > > > > #define RETURNI(r) \ > > do { \ > > LEAVEI(); \ > > return (r); \ > > } while (0) > > No, may be an expression, so it must be evaluated before LEAVEI(). This > is the reason for existence of the variable to hold the result. So, we'll need RETURNI for long double and one for long double complex. Or, we give RETURNI a second parameter, which is the input parameter of the function #define RETURNI(x, r) \ do { \ x = (r) \ LEAVEI(); \ return (r); \ } while (0) This will cause a lot of churn. So, it seems that ENTERIC is the way forward. > >> [... about complications for the general case] > > >> This reminds me of a reason why I don't like sincos*(). Its API > >> requires destruction of efficiency and accuracy by returning the values > >> indirectly. On i386 with not very old CPUs, this costs about 8 cycles per > >> long double value. Float and double values cost about half as much. On > >> amd64, the long double case is the same and the float and double cases > >> are faster. > > > > Not sure your efficiency claim holds. I've seen significant improves > > in cexp and cexpf where sin[f]() and cos[f]() are replaced by > > sincos[f]. On my core2 running i386 freebsd, I see 0.1779 usecs/call > > for cexpf with sinf and cosf and 0.12522 usecs/call for sincosf. > > Yes, that's a 29.6% improvement. For cexp the numbers are 0.2697 > > usecs/call for sin and cos and 0.20586 for sincos (ie, 23.7% improvement). > > This is for z = x + I y with x and y in the non-exceptable case. > > Combined sin and cos probably does work better outside of benchmarks for > sin and cos alone, since it does less work so leaves more resources for > the, more useful things. Exactly! I have a significant amount of Fortran code that does z = cmplx(cos(x), sin(x)) in modern C this is 'z = CMPLX(cos(x), sin(x))'. GCC with optimization enables will convert this to z = cexp(cmplx(0,x)) where it expects cexp to optimize this to sincos(). GCC on FreeBSD will not do this optimization because FreeBSD's libm is not C99 compliant. > >> sinf() and cosf() on small args take only 15-20 cycles (thoughput) on > >> amd64 with not very old CPUs, so 2-8 extra cycles for the 2 indirect > >> return values is a lot. sincosf() still ends up being slightly faster > >> than separate sinf()/cosf(). > > > > Seems to be much faster when used in other functions. > > It's hard tp be much faster than 15-20 cycles. The latency is more like > 50 cycles, with 3 sinf()'s or cosf()'s running in parallel. > > sincos() is far from the best possible optimization for repeated calls on > the same or nearby args. If sin() and cos() cached the arg reduction, then > separate sin() and cos() on the same arg would run about as fast as sincos(), > and repeated sin()'s on the same arg would run much faster than now. > Caching the arg reduction may also be good when the arg changes slightly. > However, caching is slower if the args are not close. Even a 1-entry cache > takes a long time to look up relative to the 15-20 cycles taken by sinf() > and cosf(). Caching is complicated by signal handlers and threads. Perhaps > the right API one that has to ask for caching and provides the cache storage. > Then sincos() could be: > > ... > _dh_init(x, &dh); /* prefill 1-entry cache dh */ > s = _sin_cache(x, &dh, 1); /* cache hit unless x is NaN > /* cache misses update dh */ > c = _cos_cache(x, &dh, 1); /* cache hit unless x is NaN > ... > > and with everything inlined this is little different from the current > sincos() except for NaNs. NaNs can be cache hits too if you compare > them as bits, but the comparison should probably be x == dhp->dh_x > for a 1-entry cache, so as to not to have to extract the bits of x. When I worked on sincos() I tried a few variations. This included the simpliest implementation: void sincos(double x, double *s, double *c) { *c = cos(x); *s= sin(x); } I tried argument reduction with kernels. void sincos(double x, double *s, double *c) { a = inline argument reduction done to set a. *c = k_cos(x); *s= k_sin(x); } And finally the version that was committed where k_cos and k_sin were manually inlined and re-arranged to reduce redundant computations. Never thought about some caching mechanism. It seems to be more complicated than it may be worth. -- Steve From owner-freebsd-numerics@freebsd.org Wed Feb 27 18:06:25 2019 Return-Path: Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id B29111500CF1 for ; Wed, 27 Feb 2019 18:06:25 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu [128.95.76.21]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) server-signature RSA-PSS (4096 bits) client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "troutmask", Issuer "troutmask" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 451558904D for ; Wed, 27 Feb 2019 18:06:24 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (localhost [127.0.0.1]) by troutmask.apl.washington.edu (8.15.2/8.15.2) with ESMTPS id x1RI6L2t078491 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NO); Wed, 27 Feb 2019 10:06:22 -0800 (PST) (envelope-from sgk@troutmask.apl.washington.edu) Received: (from sgk@localhost) by troutmask.apl.washington.edu (8.15.2/8.15.2/Submit) id x1RI6LJh078490; Wed, 27 Feb 2019 10:06:21 -0800 (PST) (envelope-from sgk) Date: Wed, 27 Feb 2019 10:06:21 -0800 From: Steve Kargl To: Bruce Evans Cc: freebsd-numerics@freebsd.org Subject: Re: Implement cexpl Message-ID: <20190227180621.GA78303@troutmask.apl.washington.edu> Reply-To: sgk@troutmask.apl.washington.edu References: <20190203173056.GA42499@troutmask.apl.washington.edu> <20190227020142.GA74833@troutmask.apl.washington.edu> <20190227170706.J907@besplex.bde.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190227170706.J907@besplex.bde.org> User-Agent: Mutt/1.11.2 (2019-01-07) X-Rspamd-Queue-Id: 451558904D X-Spamd-Bar: / Authentication-Results: mx1.freebsd.org X-Spamd-Result: default: False [0.91 / 15.00]; HAS_REPLYTO(0.00)[sgk@troutmask.apl.washington.edu]; TO_DN_SOME(0.00)[]; REPLYTO_ADDR_EQ_FROM(0.00)[]; RCVD_COUNT_THREE(0.00)[3]; RCVD_IN_DNSWL_MED(-0.20)[21.76.95.128.list.dnswl.org : 127.0.11.2]; MX_GOOD(-0.01)[cached: troutmask.apl.washington.edu]; RCPT_COUNT_TWO(0.00)[2]; FREEMAIL_TO(0.00)[optusnet.com.au]; FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[]; MIME_TRACE(0.00)[0:+]; ASN(0.00)[asn:73, ipnet:128.95.0.0/16, country:US]; MID_RHS_MATCH_FROM(0.00)[]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-0.51)[-0.512,0]; FROM_HAS_DN(0.00)[]; NEURAL_SPAM_SHORT(0.66)[0.656,0]; MIME_GOOD(-0.10)[text/plain]; RCVD_TLS_LAST(0.00)[]; DMARC_NA(0.00)[washington.edu]; AUTH_NA(1.00)[]; TO_MATCH_ENVRCPT_SOME(0.00)[]; NEURAL_SPAM_LONG(0.03)[0.030,0]; R_SPF_NA(0.00)[]; IP_SCORE(0.05)[ip: (0.11), ipnet: 128.95.0.0/16(0.16), asn: 73(0.06), country: US(-0.07)] X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 27 Feb 2019 18:06:25 -0000 On Wed, Feb 27, 2019 at 07:42:41PM +1100, Bruce Evans wrote: > On Tue, 26 Feb 2019, Steve Kargl wrote: > > > On Sun, Feb 03, 2019 at 09:30:56AM -0800, Steve Kargl wrote: > >> The following patch implements cexpl() for ld80 and ld128 architectures. > > ... > > Index: lib/msun/src/math_private.h > > =================================================================== > > --- lib/msun/src/math_private.h (revision 344600) > > +++ lib/msun/src/math_private.h (working copy) > > @@ -243,6 +243,20 @@ > > } while (0) > > > > /* > > + * Get expsign and mantissa as 16 bit and 2 32-bit ints from an 80 bit long > > + * double. > > + */ > > + > > +#define EXTRACT_LDBL80_WORDS2(ix0,ix1,ix2,d) \ > > +do { \ > > + union IEEEl2bits ew_u; \ > > + ew_u.e = (d); \ > > + (ix0) = ew_u.xbits.expsign; \ > > + (ix1) = ew_u.bits.manh; \ > > + (ix2) = ew_u.bits.manl; \ > > +} while (0) > > Don't add more of these accessor functions (more were already added to > support powl, since powl doesn't use the FreeBSD spelling in APIs). This response was not totally unexpected. :-) I haven't looked a bit twiddling details in awhile, so it can be considered an exercise in "Let's get something working". > Use existing separate functions and let the compiler combine them. > The one corresponding to this is EXTRACT_LDBL80_WORDS(). That extracts > the mantissa into a single 64-bit word which is easier to use than the > 2 32-bit words given by the above. It also doesn't ask for pessimal > extraction from memory 32 bits at a time, but compilers usually combine > into a single 64-bit word for the memory access and then split into > 2 32-bit words for 32-bit arches. > > 3 words instead of 2 also enlarges diffs with the double precision case. > > > Index: lib/msun/src/s_cexp.c > > =================================================================== > > --- lib/msun/src/s_cexp.c (revision 344600) > > +++ lib/msun/src/s_cexp.c (working copy) > > @@ -30,6 +30,7 @@ > > __FBSDID("$FreeBSD$"); > > > > #include > > +#include > > #include > > > > #include "math_private.h" > > @@ -41,12 +42,19 @@ > > double complex > > cexp(double complex z) > > { > > - double x, y, exp_x; > > + double c, exp_x, s, x, y; > > uint32_t hx, hy, lx, ly; > > > > x = creal(z); > > y = cimag(z); > > > > + EXTRACT_WORDS(hx, lx, x); > > + /* cexp(0 + I y) = cos(y) + I sin(y) */ > > + if (((hx & 0x7fffffff) | lx) == 0) { > > + sincos(y, &s, &c); > > + return (CMPLX(c, s)); > > + } > > + > > EXTRACT_WORDS(hy, ly, y); > > hy &= 0x7fffffff; > > > > Another reason I don't like sincos*() is that if the compiler can very > easily do the above rewrite automatically. The only large difficulty > is to know if the sin(), cos() and sincos() are any good, or at least > their relative badness. x86 compilers already know that the i387 sin() > and cos() are no good, (or perhaps they just don't know that the FreeBSD > implementation doesn't make the i387 functions non-conforming to C99 > and/or POSIX (since they don't set errno and might do other side effects > wrong)), so x86 compilers don't convert the above to i387 sin(), cos() or > sincos(). It is even harder for compilers to know the quality of the > library. Unfortunately, the compiler doesn't do the rewrite automatically. AFAICT, for gcc CMPLX(cos(x),sin(x)) is transformed into cexp(CMPLX(0,x)). It then assumes that cexp handles the special case. gcc doesn't do this optimization, yet. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89125 See comment #7. > > @@ -53,10 +61,6 @@ > > /* cexp(x + I 0) = exp(x) + I 0 */ > > if ((hy | ly) == 0) > > return (CMPLX(exp(x), y)); > > The comments aren't as careful as for hyperbolic functions. The 0 here > is actually -0 if y is -0. The code handles both +-0 by clearing the > sign bit in hy for the check byt keeping it in y. > > > Index: lib/msun/ld128/k_cexpl.c > > ... > > +/* > > + * __ldexp_exp(x, expt) and __ldexp_cexp(x, expt) compute exp(x) * 2**expt. > > + * They are intended for large arguments (real part >= ln(DBL_MAX)) > > + * where care is needed to avoid overflow. > > + * > > + * The present implementation is narrowly tailored for our hyperbolic and > > + * exponential functions. We assume expt is small (0 or -1), and the caller > > + * has filtered out very large x, for which overflow would be inevitable. > > + */ > > + > > +long double > > +__ldexp_expl(long double x, int expt) > > +{ > > +#if 0 > > + double exp_x, scale; > > + int ex_expt; > > + > > + exp_x = __frexp_exp(x, &ex_expt); > > + expt += ex_expt; > > + INSERT_WORDS(scale, (0x3ff + expt) << 20, 0); > > + return (exp_x * scale); > > +#endif > > + return (x - x) / (x - x); > > +} > > This is less likely to work than the comment says :-). I don't have ld128 hardware. I can't write the functions. When this is compile, one should see warnings #warning These functions are broken on ld128 architectures. #warning Functions defined here are currently unused. #warning Someone who cares needs to convert src/k_exp.c. I suspect no one cares. > > Index: lib/msun/ld128/s_cexpl.c > > =================================================================== > > --- lib/msun/ld128/s_cexpl.c (nonexistent) > > +++ lib/msun/ld128/s_cexpl.c (working copy) > > @@ -0,0 +1,76 @@ > > ... > > + exp_x = expl(x); > > + sincosl(y, &s, &c); > > + return (CMPLXL(exp_x * c, exp_x * s)); > > This doesn't use the kernel, although one is written (but always returns > NaN). For testing, the low quality kernel should at least use the > above formula. I don't have ld128 hardware. I can't write the function in a manner similar to src/s_cexp.c. When this is compile, one should see warnings #warning cexpl is likely broken on ld128 architectures. #warning Someone who cares needs to convert src/s_cexp.c. I suspect no one cares. Furthermore, when one links in cexpl(), she should see __warn_references("Precision of cexpl may have lower than expected precision"); I suspect no one cares. > > Index: lib/msun/ld80/s_cexpl.c > > =================================================================== > > --- lib/msun/ld80/s_cexpl.c (nonexistent) > > +++ lib/msun/ld80/s_cexpl.c (working copy) > > @@ -0,0 +1,105 @@ > > ... > > +long double complex > > +cexpl (long double complex z) > > +{ > > + long double c, exp_x, s, x, y; > > + uint32_t hwx, hwy, lwx, lwy; > > + uint16_t hx, hy; > > + > > + ENTERI(z); > > + > > + x = creall(z); > > + y = cimagl(z); > > + > > + EXTRACT_LDBL80_WORDS2(hx, hwx, lwx, x); > > + EXTRACT_LDBL80_WORDS2(hy, hwy, lwy, y); > > This asks for slowness relative to the double precision method, by doing > 2 slow extraction steps up front. > > This seems to be broken by not using the normal method of hx &= 0x7fff > up front. > > > + > > + /* cexp(0 + I y) = cos(y) + I sin(y) */ > > + if ((hwx | lwx | (hx & 0x7fff)) == 0) { > > The 2 words for the mantissa are just harder to use. Write hwx | lwx > normally as lx after extracting into 1 64-bit word. > > This handles -0 correctly by masking hx with 0x7fff. > > > + sincosl(y, &s, &c); > > + RETURNI(CMPLXL(c, s)); > > + } > > + > > + /* cexp(x + I 0) = exp(x) + I 0 */ > > + if ((hwy | lwy | (hy & 0x7fff)) == 0) > > + RETURNI(CMPLXL(expl(x), y)); > > + > > + if (hy >= 0x7fff) { > > Bugs from not masking h[xy] with 0x7fff up front start here. This > misclassifies all negative y's except -0 as being NaNs. Whoops. I lost a month of testing due to the PAE vs non-PAE fallout; yes, I should have done more testing. -- Steve 20170425 https://www.youtube.com/watch?v=VWUpyCsUKR4 20161221 https://www.youtube.com/watch?v=IbCHE-hONow From owner-freebsd-numerics@freebsd.org Wed Feb 27 20:15:27 2019 Return-Path: Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 45F3A1503C6C for ; Wed, 27 Feb 2019 20:15:27 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail105.syd.optusnet.com.au (mail105.syd.optusnet.com.au [211.29.132.249]) by mx1.freebsd.org (Postfix) with ESMTP id D30AA8DAB2 for ; Wed, 27 Feb 2019 20:15:24 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from [192.168.0.102] (c110-21-101-228.carlnfd1.nsw.optusnet.com.au [110.21.101.228]) by mail105.syd.optusnet.com.au (Postfix) with ESMTPS id 7426B105FEF3; Thu, 28 Feb 2019 07:15:14 +1100 (AEDT) Date: Thu, 28 Feb 2019 07:15:14 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Steve Kargl cc: freebsd-numerics@freebsd.org Subject: Re: Update ENTERI() macro In-Reply-To: <20190227161906.GA77785@troutmask.apl.washington.edu> Message-ID: <20190228060920.R4413@besplex.bde.org> References: <20190226191825.GA68479@troutmask.apl.washington.edu> <20190227145002.P907@besplex.bde.org> <20190227074811.GA75972@troutmask.apl.washington.edu> <20190227201214.V1823@besplex.bde.org> <20190227161906.GA77785@troutmask.apl.washington.edu> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.2 cv=P6RKvmIu c=1 sm=1 tr=0 a=PalzARQSbocsUSjMRkwAPg==:117 a=PalzARQSbocsUSjMRkwAPg==:17 a=kj9zAlcOel0A:10 a=k8ySeul569u4Qwmp2EoA:9 a=CjuIK1q_8ugA:10 X-Rspamd-Queue-Id: D30AA8DAB2 X-Spamd-Bar: ------ Authentication-Results: mx1.freebsd.org; spf=pass (mx1.freebsd.org: domain of brde@optusnet.com.au designates 211.29.132.249 as permitted sender) smtp.mailfrom=brde@optusnet.com.au X-Spamd-Result: default: False [-6.41 / 15.00]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0]; RCVD_IN_DNSWL_LOW(-0.10)[249.132.29.211.list.dnswl.org : 127.0.5.1]; FROM_HAS_DN(0.00)[]; TO_DN_SOME(0.00)[]; R_SPF_ALLOW(-0.20)[+ip4:211.29.132.0/23]; FREEMAIL_FROM(0.00)[optusnet.com.au]; MIME_GOOD(-0.10)[text/plain]; DMARC_NA(0.00)[optusnet.com.au]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; TO_MATCH_ENVRCPT_SOME(0.00)[]; MX_GOOD(-0.01)[cached: extmail.optusnet.com.au]; RCPT_COUNT_TWO(0.00)[2]; NEURAL_HAM_SHORT(-0.95)[-0.947,0]; IP_SCORE(-3.15)[ip: (-8.24), ipnet: 211.28.0.0/14(-4.16), asn: 4804(-3.31), country: AU(-0.04)]; RCVD_NO_TLS_LAST(0.10)[]; FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[]; FREEMAIL_ENVFROM(0.00)[optusnet.com.au]; ASN(0.00)[asn:4804, ipnet:211.28.0.0/14, country:AU]; MIME_TRACE(0.00)[0:+]; RCVD_COUNT_TWO(0.00)[2] X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 27 Feb 2019 20:15:27 -0000 On Wed, 27 Feb 2019, Steve Kargl wrote: > On Wed, Feb 27, 2019 at 09:15:52PM +1100, Bruce Evans wrote: >> >> ENTERI() hard-codes the long double for simplicity. Remember, it is only >> needed for long double precision on i386. But I forgot about long double >> complex types, and didn't dream about indirect long double types in sincosl(). > > That simplicity does not work for long double complex. We will > > need either ENTERIC as in > > #define ENTERIC() ENTERIT(long double complex) > > or a direct use of ENTERIT as you have done s_clogl.c I wrote ENTERIT() to work around this problem. >>> I'm fine with making ENTERI() only toggle precision, and adding >>> a LEAVEI() to reset precision. RETURNI(r) would then be >>> >>> #define RETURNI(r) \ >>> do { \ >>> LEAVEI(); \ >>> return (r); \ >>> } while (0) >> >> No, may be an expression, so it must be evaluated before LEAVEI(). This >> is the reason for existence of the variable to hold the result. > > So, we'll need RETURNI for long double and one for long double complex. > Or, we give RETURNI a second parameter, which is the input parameter of > the function I said to use your method of __typeof(). I tested this: XX --- /tmp/math_private.h Sun Nov 27 17:58:57 2005 XX +++ ./math_private.h Thu Feb 28 06:17:26 2019 XX @@ -474,21 +474,22 @@ XX /* Support switching the mode to FP_PE if necessary. */ XX #if defined(__i386__) && !defined(NO_FPSETPREC) XX -#define ENTERI() ENTERIT(long double) XX -#define ENTERIT(returntype) \ XX - returntype __retval; \ XX +#define ENTERI() \ XX fp_prec_t __oprec; \ XX \ XX if ((__oprec = fpgetprec()) != FP_PE) \ XX fpsetprec(FP_PE) XX -#define RETURNI(x) do { \ XX - __retval = (x); \ XX - if (__oprec != FP_PE) \ XX - fpsetprec(__oprec); \ XX +#define LEAVEI() \ XX + if ((__oprec = fpgetprec()) != FP_PE) \ XX + fpsetprec(FP_PE) XX +#define RETURNI(expr) do { \ XX + __typeof(expr) __retval = (expr); \ XX + \ XX + LEAVEI(); \ XX RETURNF(__retval); \ XX } while (0) XX #else XX #define ENTERI() XX -#define ENTERIT(x) XX -#define RETURNI(x) RETURNF(x) XX +#define LEAVEI() XX +#define RETURNI(expr) RETURNF(expr) XX #endif XX This compiles, but has minor problems. Note that the apparent style bug of initializing __retval in its declaration is needed in cases where __typeof() gives a const type. This happens in my code that uses RETURNI(1 + tiny) to set inexact. I think it would also happen for RETURNI(1). The type is then int instead of floating point, and I need to check that this is harmless. clogl() is the only user of ENTERIT(). Its size expands from 2302 bytes text to 2399 when compiled by gcc-3.3.3. I hope that this is just gcc not doing a very good job optimizing the returns (there are many RETURNI()s fpr clogl()). Repeating the return code instead of jumping to it might even be optimal. > #define RETURNI(x, r) \ > do { \ > x = (r) \ > LEAVEI(); \ > return (r); \ > } while (0) > > This will cause a lot of churn. Indeed. My version causes 1 line of churn: XX --- /tmp/s_clogl.c Fri Jul 20 16:00:11 2018 XX +++ ./s_clogl.c Thu Feb 28 05:58:05 2019 XX @@ -66,5 +66,5 @@ XX int kx, ky; XX XX - ENTERIT(long double complex); XX + ENTERI(); XX XX x = creall(z); >> Combined sin and cos probably does work better outside of benchmarks for >> sin and cos alone, since it does less work so leaves more resources for >> the, more useful things. > > Exactly! I have a significant amount of Fortran code that does > > z = cmplx(cos(x), sin(x)) > > in modern C this is 'z = CMPLX(cos(x), sin(x))'. GCC with optimization > enables will convert this to z = cexp(cmplx(0,x)) where it expects cexp > to optimize this to sincos(). This is an pessimization unless everything is inlined. An optimization would convert cexp(cmplx(0,x)) to sin(x) and cos(x) or sincos(x). > GCC on FreeBSD will not do this optimization > because FreeBSD's libm is not C99 compliant. It is more conformant than most for cexp(). I think old gcc just doesn't attempt such optimizations. > When I worked on sincos() I tried a few variations. This included > the simpliest implementation: > > void > sincos(double x, double *s, double *c) > { > *c = cos(x); > *s= sin(x); > } > > I tried argument reduction with kernels. > > void > sincos(double x, double *s, double *c) > { > a = inline argument reduction done to set a. > *c = k_cos(x); > *s= k_sin(x); > } You mean *c = s_cos(x), etc. That was good enough. > And finally the version that was committed where k_cos and k_sin > were manually inlined and re-arranged to reduce redundant computations. That has excessive manual inlining. It should have only inlined s_cos() and s_sin(), and changed k_cos() and k_sin() from extern to static inline. Someday the data for these inline functions should be deduplicated, but the data is small compared with that for the expl kernel. Bruce From owner-freebsd-numerics@freebsd.org Wed Feb 27 21:31:39 2019 Return-Path: Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 859051505BCE for ; Wed, 27 Feb 2019 21:31:39 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu [128.95.76.21]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) server-signature RSA-PSS (4096 bits) client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "troutmask", Issuer "troutmask" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 5C9B190DB1 for ; Wed, 27 Feb 2019 21:31:38 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (localhost [127.0.0.1]) by troutmask.apl.washington.edu (8.15.2/8.15.2) with ESMTPS id x1RLVYjN010710 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NO); Wed, 27 Feb 2019 13:31:34 -0800 (PST) (envelope-from sgk@troutmask.apl.washington.edu) Received: (from sgk@localhost) by troutmask.apl.washington.edu (8.15.2/8.15.2/Submit) id x1RLVY9L010709; Wed, 27 Feb 2019 13:31:34 -0800 (PST) (envelope-from sgk) Date: Wed, 27 Feb 2019 13:31:33 -0800 From: Steve Kargl To: Bruce Evans Cc: freebsd-numerics@freebsd.org Subject: Re: Update ENTERI() macro Message-ID: <20190227213133.GA10233@troutmask.apl.washington.edu> Reply-To: sgk@troutmask.apl.washington.edu References: <20190226191825.GA68479@troutmask.apl.washington.edu> <20190227145002.P907@besplex.bde.org> <20190227074811.GA75972@troutmask.apl.washington.edu> <20190227201214.V1823@besplex.bde.org> <20190227161906.GA77785@troutmask.apl.washington.edu> <20190228060920.R4413@besplex.bde.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190228060920.R4413@besplex.bde.org> User-Agent: Mutt/1.11.2 (2019-01-07) X-Rspamd-Queue-Id: 5C9B190DB1 X-Spamd-Bar: / Authentication-Results: mx1.freebsd.org X-Spamd-Result: default: False [0.56 / 15.00]; ARC_NA(0.00)[]; HAS_REPLYTO(0.00)[sgk@troutmask.apl.washington.edu]; NEURAL_HAM_MEDIUM(-0.11)[-0.106,0]; FROM_HAS_DN(0.00)[]; TO_DN_SOME(0.00)[]; NEURAL_SPAM_SHORT(0.17)[0.169,0]; NEURAL_HAM_LONG(-0.24)[-0.243,0]; MIME_GOOD(-0.10)[text/plain]; RCVD_TLS_LAST(0.00)[]; DMARC_NA(0.00)[washington.edu]; AUTH_NA(1.00)[]; REPLYTO_ADDR_EQ_FROM(0.00)[]; RCVD_COUNT_THREE(0.00)[3]; TO_MATCH_ENVRCPT_SOME(0.00)[]; RCVD_IN_DNSWL_MED(-0.20)[21.76.95.128.list.dnswl.org : 127.0.11.2]; MX_GOOD(-0.01)[cached: troutmask.apl.washington.edu]; RCPT_COUNT_TWO(0.00)[2]; R_SPF_NA(0.00)[]; FREEMAIL_TO(0.00)[optusnet.com.au]; FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[]; MIME_TRACE(0.00)[0:+]; ASN(0.00)[asn:73, ipnet:128.95.0.0/16, country:US]; MID_RHS_MATCH_FROM(0.00)[]; IP_SCORE(0.05)[ip: (0.10), ipnet: 128.95.0.0/16(0.16), asn: 73(0.06), country: US(-0.07)] X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 27 Feb 2019 21:31:39 -0000 On Thu, Feb 28, 2019 at 07:15:14AM +1100, Bruce Evans wrote: > On Wed, 27 Feb 2019, Steve Kargl wrote: > > > On Wed, Feb 27, 2019 at 09:15:52PM +1100, Bruce Evans wrote: > >> > >> ENTERI() hard-codes the long double for simplicity. Remember, it is only > >> needed for long double precision on i386. But I forgot about long double > >> complex types, and didn't dream about indirect long double types in sincosl(). > > > > That simplicity does not work for long double complex. We will > > > > need either ENTERIC as in > > > > #define ENTERIC() ENTERIT(long double complex) > > > > or a direct use of ENTERIT as you have done s_clogl.c > > I wrote ENTERIT() to work around this problem. > > >>> I'm fine with making ENTERI() only toggle precision, and adding > >>> a LEAVEI() to reset precision. RETURNI(r) would then be > >>> > >>> #define RETURNI(r) \ > >>> do { \ > >>> LEAVEI(); \ > >>> return (r); \ > >>> } while (0) > >> > >> No, may be an expression, so it must be evaluated before LEAVEI(). This > >> is the reason for existence of the variable to hold the result. > > > > So, we'll need RETURNI for long double and one for long double complex. > > Or, we give RETURNI a second parameter, which is the input parameter of > > the function > > I said to use your method of __typeof(). I tested this: Oh, I misunderstood. I thought you meant that __typeof() may have portability issues, so you want to avoid it. > XX --- /tmp/math_private.h Sun Nov 27 17:58:57 2005 > XX +++ ./math_private.h Thu Feb 28 06:17:26 2019 (patched deleted to keep this short) > This compiles, but has minor problems. Note that the apparent style > bug of initializing __retval in its declaration is needed in cases > where __typeof() gives a const type. This happens in my code that > uses RETURNI(1 + tiny) to set inexact. I think it would also happen > for RETURNI(1). The type is then int instead of floating point, and > I need to check that this is harmless. I've not used __typeof() until now. Coming with my modern Fortran background, doing __typeof(expr) __retval = (expr) feel weirdly circular. In Fortran, the RHS of an expression is evaluate independently of the type of the LHS. It is only at the actual assignment that type conversion (if needed) occurs. > clogl() is the only user of ENTERIT(). Its size expands from 2302 > bytes text to 2399 when compiled by gcc-3.3.3. I hope that this is > just gcc not doing a very good job optimizing the returns (there are > many RETURNI()s fpr clogl()). Repeating the return code instead of > jumping to it might even be optimal. > > > #define RETURNI(x, r) \ > > do { \ > > x = (r) \ > > LEAVEI(); \ > > return (r); \ > > } while (0) > > > > This will cause a lot of churn. > > Indeed. > > My version causes 1 line of churn: > > XX --- /tmp/s_clogl.c Fri Jul 20 16:00:11 2018 > XX +++ ./s_clogl.c Thu Feb 28 05:58:05 2019 > XX @@ -66,5 +66,5 @@ > XX int kx, ky; > XX > XX - ENTERIT(long double complex); > XX + ENTERI(); > XX > XX x = creall(z); > > >> Combined sin and cos probably does work better outside of benchmarks for > >> sin and cos alone, since it does less work so leaves more resources for > >> the, more useful things. > > > > Exactly! I have a significant amount of Fortran code that does > > > > z = cmplx(cos(x), sin(x)) > > > > in modern C this is 'z = CMPLX(cos(x), sin(x))'. GCC with optimization > > enables will convert this to z = cexp(cmplx(0,x)) where it expects cexp > > to optimize this to sincos(). > > This is an pessimization unless everything is inlined. An optimization > would convert cexp(cmplx(0,x)) to sin(x) and cos(x) or sincos(x). > > > GCC on FreeBSD will not do this optimization > > because FreeBSD's libm is not C99 compliant. > > It is more conformant than most for cexp(). I think old gcc just doesn't > attempt such optimizations. > > > When I worked on sincos() I tried a few variations. This included > > the simpliest implementation: > > > > void > > sincos(double x, double *s, double *c) > > { > > *c = cos(x); > > *s= sin(x); > > } > > > > I tried argument reduction with kernels. > > > > void > > sincos(double x, double *s, double *c) > > { > > a = inline argument reduction done to set a. > > *c = k_cos(x); > > *s= k_sin(x); > > } > > You mean *c = s_cos(x), etc. That was good enough. > > > And finally the version that was committed where k_cos and k_sin > > were manually inlined and re-arranged to reduce redundant computations. > > That has excessive manual inlining. It should have only inlined s_cos() > and s_sin(), and changed k_cos() and k_sin() from extern to static inline. > Someday the data for these inline functions should be deduplicated, but > the data is small compared with that for the expl kernel. > I recall you preferred the "good enough" implementation, because it would reduce maintenance if either sin or cos kernel needed to change. Now, one needs to remember that there is a sincos kernel. I also remember that you felt that a good compiler could optimize the inlined sin and cos kernels. I suppose FreeBSD didn't have good compilers when I manually inlined the sincos kernel, because it was a significant measurable improvement. -- Steve