From owner-freebsd-numerics@freebsd.org  Tue Feb 26 19:18:29 2019
Return-Path: <owner-freebsd-numerics@freebsd.org>
Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 1D1ED15008AB
 for <freebsd-numerics@mailman.ysv.freebsd.org>;
 Tue, 26 Feb 2019 19:18:29 +0000 (UTC)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu
 [128.95.76.21])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 server-signature RSA-PSS (4096 bits)
 client-signature RSA-PSS (2048 bits) client-digest SHA256)
 (Client CN "troutmask", Issuer "troutmask" (not verified))
 by mx1.freebsd.org (Postfix) with ESMTPS id 4DD366EE8F
 for <freebsd-numerics@freebsd.org>; Tue, 26 Feb 2019 19:18:27 +0000 (UTC)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: from troutmask.apl.washington.edu (localhost [127.0.0.1])
 by troutmask.apl.washington.edu (8.15.2/8.15.2) with ESMTPS id x1QJIP5H068514
 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NO)
 for <freebsd-numerics@freebsd.org>; Tue, 26 Feb 2019 11:18:25 -0800 (PST)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: (from sgk@localhost)
 by troutmask.apl.washington.edu (8.15.2/8.15.2/Submit) id x1QJIPcN068513
 for freebsd-numerics@freebsd.org; Tue, 26 Feb 2019 11:18:25 -0800 (PST)
 (envelope-from sgk)
Date: Tue, 26 Feb 2019 11:18:25 -0800
From: Steve Kargl <sgk@troutmask.apl.washington.edu>
To: freebsd-numerics@freebsd.org
Subject: Update ENTERI() macro
Message-ID: <20190226191825.GA68479@troutmask.apl.washington.edu>
Reply-To: sgk@troutmask.apl.washington.edu
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.11.2 (2019-01-07)
X-Rspamd-Queue-Id: 4DD366EE8F
X-Spamd-Bar: +
Authentication-Results: mx1.freebsd.org
X-Spamd-Result: default: False [1.35 / 15.00]; ARC_NA(0.00)[];
 HAS_REPLYTO(0.00)[sgk@troutmask.apl.washington.edu];
 FROM_HAS_DN(0.00)[]; TO_MATCH_ENVRCPT_ALL(0.00)[];
 NEURAL_HAM_LONG(-0.29)[-0.288,0]; MIME_GOOD(-0.10)[text/plain];
 PREVIOUSLY_DELIVERED(0.00)[freebsd-numerics@freebsd.org];
 TO_DN_NONE(0.00)[]; AUTH_NA(1.00)[]; RCPT_COUNT_ONE(0.00)[1];
 RCVD_COUNT_THREE(0.00)[3]; RCVD_TLS_LAST(0.00)[];
 NEURAL_SPAM_SHORT(0.42)[0.417,0];
 RCVD_IN_DNSWL_MED(-0.20)[21.76.95.128.list.dnswl.org : 127.0.11.2];
 MX_GOOD(-0.01)[cached: troutmask.apl.washington.edu];
 DMARC_NA(0.00)[washington.edu];
 NEURAL_SPAM_MEDIUM(0.47)[0.475,0]; R_SPF_NA(0.00)[];
 REPLYTO_ADDR_EQ_FROM(0.00)[]; FROM_EQ_ENVFROM(0.00)[];
 R_DKIM_NA(0.00)[]; MIME_TRACE(0.00)[0:+];
 ASN(0.00)[asn:73, ipnet:128.95.0.0/16, country:US];
 MID_RHS_MATCH_FROM(0.00)[];
 IP_SCORE(0.05)[ip: (0.11), ipnet: 128.95.0.0/16(0.16), asn: 73(0.06),
 country: US(-0.07)]
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
 <freebsd-numerics.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics/>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 26 Feb 2019 19:18:29 -0000

I have submitted 

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=236063

Update the ENTERI() macro in math_private.h to take a parameter.
This parameter is used by __typeof() to set the type for an
intermediate variable that holds the return value of a function
on i386 FreeBSD.  The patch updates the consumers of ENTERI().


--- /data/kargl/freebsd/src/lib/msun/src/e_acoshl.c	2013-07-30 10:32:47.000000000 -0700
+++ ./src/e_acoshl.c	2019-02-25 16:36:08.532381000 -0800
@@ -68,7 +68,7 @@
 	long double t;
 	int16_t hx;
 
-	ENTERI();
+	ENTERI(x);
 	GET_LDBL_EXPSIGN(hx, x);
 	if (hx < 0x3fff) {		/* x < 1, or misnormal */
 	    RETURNI((x-x)/(x-x));
--- /data/kargl/freebsd/src/lib/msun/src/e_atanhl.c	2013-07-30 10:32:47.000000000 -0700
+++ ./src/e_atanhl.c	2019-02-25 16:36:24.414626000 -0800
@@ -57,7 +57,7 @@
 	long double t;
 	uint16_t hx, ix;
 
-	ENTERI();
+	ENTERI(x);
 	GET_LDBL_EXPSIGN(hx, x);
 	ix = hx & 0x7fff;
 	if (ix >= 0x3fff)		/* |x| >= 1, or NaN or misnormal */
--- /data/kargl/freebsd/src/lib/msun/src/e_coshl.c	2016-10-10 14:48:39.000000000 -0700
+++ ./src/e_coshl.c	2019-02-25 16:37:03.670060000 -0800
@@ -97,7 +97,7 @@
     /* x is INF or NaN */
 	if(ix>=0x7fff) return x*x;
 
-	ENTERI();
+	ENTERI(x);
 
     /* |x| < 1, return 1 or c(x) */
 	if(ix<0x3fff) {
--- /data/kargl/freebsd/src/lib/msun/src/e_sinhl.c	2016-10-10 14:48:39.000000000 -0700
+++ ./src/e_sinhl.c	2019-02-25 16:37:14.623690000 -0800
@@ -97,7 +97,7 @@
     /* x is INF or NaN */
 	if(ix>=0x7fff) return x+x;
 
-	ENTERI();
+	ENTERI(x);
 
 	s = 1;
 	if (jx<0) s = -1;
--- /data/kargl/freebsd/src/lib/msun/src/math_private.h	2018-10-31 16:15:21.809877000 -0700
+++ ./src/math_private.h	2019-02-25 16:54:26.897247000 -0800
@@ -335,15 +348,14 @@
 
 /* Support switching the mode to FP_PE if necessary. */
 #if defined(__i386__) && !defined(NO_FPSETPREC)
-#define	ENTERI() ENTERIT(long double)
-#define	ENTERIT(returntype)			\
-	returntype __retval;			\
+#define	ENTERI(a)				\
+	__typeof(a) __retval;			\
 	fp_prec_t __oprec;			\
 						\
 	if ((__oprec = fpgetprec()) != FP_PE)	\
 		fpsetprec(FP_PE)
-#define	RETURNI(x) do {				\
-	__retval = (x);				\
+#define	RETURNI(a) do {				\
+	__retval = (a);				\
 	if (__oprec != FP_PE)			\
 		fpsetprec(__oprec);		\
 	RETURNF(__retval);			\
@@ -359,9 +371,8 @@
 	return;			\
 } while (0)
 #else
-#define	ENTERI()
-#define	ENTERIT(x)
-#define	RETURNI(x)	RETURNF(x)
+#define	ENTERI(a)
+#define	RETURNI(a)	RETURNF(a)
 #define	ENTERV()
 #define	RETURNV()	return
 #endif
--- /data/kargl/freebsd/src/lib/msun/src/s_asinhl.c	2013-07-30 10:32:47.000000000 -0700
+++ ./src/s_asinhl.c	2019-02-25 16:37:44.264923000 -0800
@@ -71,7 +71,7 @@
 	long double t, w;
 	uint16_t hx, ix;
 
-	ENTERI();
+	ENTERI(x);
 	GET_LDBL_EXPSIGN(hx, x);
 	ix = hx & 0x7fff;
 	if (ix >= 0x7fff) RETURNI(x+x);	/* x is inf, NaN or misnormal */
--- /data/kargl/freebsd/src/lib/msun/src/s_cbrtl.c	2012-07-30 14:58:28.000000000 -0700
+++ ./src/s_cbrtl.c	2019-02-25 16:43:03.606703000 -0800
@@ -53,7 +53,7 @@
 	if (k == BIAS + LDBL_MAX_EXP)
 		return (x + x);
 
-	ENTERI();
+	ENTERI(x);
 	if (k == 0) {
 		/* If x = +-0, then cbrt(x) = +-0. */
 		if ((u.bits.manh | u.bits.manl) == 0)
--- /data/kargl/freebsd/src/lib/msun/src/s_clogl.c	2019-02-26 11:00:53.735653000 -0800
+++ ./src/s_clogl.c	2019-02-25 16:43:19.194615000 -0800
@@ -65,7 +65,7 @@
 	uint16_t hax, hay;
 	int kx, ky;
 
-	ENTERIT(long double complex);
+	ENTERI(z);
 
 	x = creall(z);
 	y = cimagl(z);
--- /data/kargl/freebsd/src/lib/msun/src/s_cosl.c	2017-12-01 13:29:36.000000000 -0800
+++ ./src/s_cosl.c	2019-02-25 16:43:29.546277000 -0800
@@ -68,7 +68,7 @@
 	if (z.bits.exp == 32767)
 		return ((x - x) / (x - x));
 
-	ENTERI();
+	ENTERI(x);
 
 	/* Optimize the case where x is already within range. */
 	if (z.e < M_PI_4)
--- /data/kargl/freebsd/src/lib/msun/src/s_roundl.c	2017-12-01 13:29:36.000000000 -0800
+++ ./src/s_roundl.c	2019-02-25 16:43:42.814592000 -0800
@@ -48,7 +48,7 @@
 	if ((hx & 0x7fff) == 0x7fff)
 		return (x + x);
 
-	ENTERI();
+	ENTERI(x);
 
 	if (!(hx & 0x8000)) {
 		t = floorl(x);
--- /data/kargl/freebsd/src/lib/msun/src/s_sinl.c	2017-12-01 13:29:36.000000000 -0800
+++ ./src/s_sinl.c	2019-02-25 16:43:49.574030000 -0800
@@ -64,7 +64,7 @@
 	if (z.bits.exp == 32767)
 		return ((x - x) / (x - x));
 
-	ENTERI();
+	ENTERI(x);
 
 	/* Optimize the case where x is already within range. */
 	if (z.e < M_PI_4) {
--- /data/kargl/freebsd/src/lib/msun/src/s_tanhl.c	2016-10-10 14:48:39.000000000 -0700
+++ ./src/s_tanhl.c	2019-02-25 16:44:10.604476000 -0800
@@ -127,7 +127,7 @@
 	    else       return one/x-one;    /* tanh(NaN) = NaN */
 	}
 
-	ENTERI();
+	ENTERI(x);
 
     /* |x| < 40 */
 	if (ix < 0x4004 || fabsl(x) < 40) {	/* |x|<40 */
--- /data/kargl/freebsd/src/lib/msun/src/s_tanl.c	2017-12-01 13:29:36.000000000 -0800
+++ ./src/s_tanl.c	2019-02-25 16:44:21.348406000 -0800
@@ -70,7 +70,7 @@
 	if (z.bits.exp == 32767)
 		return ((x - x) / (x - x));
 
-	ENTERI();
+	ENTERI(x);
 
 	/* Optimize the case where x is already within range. */
 	if (z.e < M_PI_4) {
--- /data/kargl/freebsd/src/lib/msun/ld80/e_lgammal_r.c	2016-10-10 14:48:38.000000000 -0700
+++ ./ld80/e_lgammal_r.c	2019-02-26 11:04:45.992781000 -0800
@@ -261,7 +261,7 @@
 	ix = hx&0x7fff;
 	if(ix==0x7fff) return x*x;
 
-	ENTERI();
+	ENTERI(x);
 
     /* purge +-0 and tiny arguments */
 	*signgamp = 1-2*(hx>>15);
--- /data/kargl/freebsd/src/lib/msun/ld80/s_erfl.c	2014-07-13 10:05:05.000000000 -0700
+++ ./ld80/s_erfl.c	2019-02-26 11:05:35.163816000 -0800
@@ -231,7 +231,7 @@
 		return (1-i)+one/x;	/* erfl(+-inf)=+-1 */
 	}
 
-	ENTERI();
+	ENTERI(x);
 
 	ax = fabsl(x);
 	if(ax < 0.84375) {
@@ -284,7 +284,7 @@
 	    return ((hx>>15)<<1)+one/x;
 	}
 
-	ENTERI();
+	ENTERI(x);
 
 	ax = fabsl(x);
 	if(ax < 0.84375L) {
--- /data/kargl/freebsd/src/lib/msun/ld80/s_exp2l.c	2017-12-01 13:29:35.000000000 -0800
+++ ./ld80/s_exp2l.c	2019-02-25 16:31:42.300930000 -0800
@@ -242,7 +242,7 @@
 		return (1.0L + x);	/* 1 with inexact */
 	}
 
-	ENTERI();
+	ENTERI(x);
 
 	/*
 	 * Reduce x, computing z, i0, and k. The low bits of x + redux
--- /data/kargl/freebsd/src/lib/msun/ld80/s_expl.c	2018-10-31 16:15:21.293646000 -0700
+++ ./ld80/s_expl.c	2019-02-25 16:32:05.335182000 -0800
@@ -97,7 +97,7 @@
 		RETURN2P(1, x);		/* 1 with inexact iff x != 0 */
 	}
 
-	ENTERI();
+	ENTERI(x);
 
 	twopk = 1;
 	__k_expl(x, &hi, &lo, &k);
@@ -193,7 +193,7 @@
 			RETURN2P(tiny, -1);	/* good for x < -65ln2 - eps */
 	}
 
-	ENTERI();
+	ENTERI(x);
 
 	if (T1 < x && x < T2) {
 		if (ix < BIAS - 74) {	/* |x| < 0x1p-74 (includes pseudos) */
--- /data/kargl/freebsd/src/lib/msun/ld80/s_logl.c	2017-12-01 13:29:35.000000000 -0800
+++ ./ld80/s_logl.c	2019-02-25 16:34:17.938768000 -0800
@@ -481,7 +481,7 @@
 					/* log(pseudo-NaN) = qNaN */
 					/* log(unnormal) = qNaN */
 #ifndef STRUCT_RETURN
-	ENTERI();
+	ENTERI(x);
 #endif
 	k += hx;
 	ix = lx & 0x7fffffffffffffffULL;
@@ -588,7 +588,7 @@
 		f_hi = x;
 		f_lo = 0;		/* avoid underflow of the P5 term */
 	}
-	ENTERI();
+	ENTERI(x);
 	x = f_hi + f_lo;
 	f_lo = (f_hi - x) + f_lo;
 
@@ -668,7 +668,7 @@
 {
 	struct ld r;
 
-	ENTERI();
+	ENTERI(x);
 	DOPRINT_START(&x);
 	k_logl(x, &r);
 	RETURNSPI(&r);
@@ -686,7 +686,7 @@
 	struct ld r;
 	long double hi, lo;
 
-	ENTERI();
+	ENTERI(x);
 	DOPRINT_START(&x);
 	k_logl(x, &r);
 	if (!r.lo_set)
@@ -704,7 +704,7 @@
 	struct ld r;
 	long double hi, lo;
 
-	ENTERI();
+	ENTERI(x);
 	DOPRINT_START(&x);
 	k_logl(x, &r);
 	if (!r.lo_set)
--- /data/kargl/freebsd/src/lib/msun/ld128/s_expl.c	2018-10-31 16:15:21.240218000 -0700
+++ ./ld128/s_expl.c	2019-02-25 16:28:16.950376000 -0800
@@ -85,7 +85,7 @@
 		RETURN2P(1, x);		/* 1 with inexact iff x != 0 */
 	}
 
-	ENTERI();
+	ENTERI(x);
 
 	twopk = 1;
 	__k_expl(x, &hi, &lo, &k);
@@ -232,7 +232,7 @@
 			RETURN2P(tiny, -1);	/* good for x < -114ln2 - eps */
 	}
 
-	ENTERI();
+	ENTERI(x);
 
 	if (T1 < x && x < T2) {
 		x2 = x * x;
--- /data/kargl/freebsd/src/lib/msun/ld128/s_logl.c	2017-12-01 13:29:34.000000000 -0800
+++ ./ld128/s_logl.c	2019-02-25 16:30:18.608573000 -0800
@@ -478,7 +478,7 @@
 	} else if (hx >= 0x7fff)
 		RETURN1(rp, x + x);	/* log(Inf or NaN) = Inf or qNaN */
 #ifndef STRUCT_RETURN
-	ENTERI();
+	ENTERI(x);
 #endif
 	k += hx;
 	dk = k;
@@ -597,7 +597,7 @@
 		f_hi = x;
 		f_lo = 0;		/* avoid underflow of the P3 term */
 	}
-	ENTERI();
+	ENTERI(x);
 	x = f_hi + f_lo;
 	f_lo = (f_hi - x) + f_lo;
 
@@ -678,7 +678,7 @@
 {
 	struct ld r;
 
-	ENTERI();
+	ENTERI(x);
 	DOPRINT_START(&x);
 	k_logl(x, &r);
 	RETURNSPI(&r);
@@ -705,7 +705,7 @@
 	long double lo;
 	float hi;
 
-	ENTERI();
+	ENTERI(x);
 	DOPRINT_START(&x);
 	k_logl(x, &r);
 	if (!r.lo_set)
@@ -724,7 +724,7 @@
 	long double lo;
 	float hi;
 
-	ENTERI();
+	ENTERI(x);
 	DOPRINT_START(&x);
 	k_logl(x, &r);
 	if (!r.lo_set)

-- 
Steve

From owner-freebsd-numerics@freebsd.org  Wed Feb 27 02:01:46 2019
Return-Path: <owner-freebsd-numerics@freebsd.org>
Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 94E81150BD45
 for <freebsd-numerics@mailman.ysv.freebsd.org>;
 Wed, 27 Feb 2019 02:01:46 +0000 (UTC)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu
 [128.95.76.21])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 server-signature RSA-PSS (4096 bits)
 client-signature RSA-PSS (2048 bits) client-digest SHA256)
 (Client CN "troutmask", Issuer "troutmask" (not verified))
 by mx1.freebsd.org (Postfix) with ESMTPS id 421CC86A1E
 for <freebsd-numerics@freebsd.org>; Wed, 27 Feb 2019 02:01:45 +0000 (UTC)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: from troutmask.apl.washington.edu (localhost [127.0.0.1])
 by troutmask.apl.washington.edu (8.15.2/8.15.2) with ESMTPS id x1R21gEa075080
 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NO)
 for <freebsd-numerics@freebsd.org>; Tue, 26 Feb 2019 18:01:42 -0800 (PST)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: (from sgk@localhost)
 by troutmask.apl.washington.edu (8.15.2/8.15.2/Submit) id x1R21g2g075079
 for freebsd-numerics@freebsd.org; Tue, 26 Feb 2019 18:01:42 -0800 (PST)
 (envelope-from sgk)
Date: Tue, 26 Feb 2019 18:01:42 -0800
From: Steve Kargl <sgk@troutmask.apl.washington.edu>
To: freebsd-numerics@freebsd.org
Subject: Re: Implement cexpl
Message-ID: <20190227020142.GA74833@troutmask.apl.washington.edu>
Reply-To: sgk@troutmask.apl.washington.edu
References: <20190203173056.GA42499@troutmask.apl.washington.edu>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20190203173056.GA42499@troutmask.apl.washington.edu>
User-Agent: Mutt/1.11.2 (2019-01-07)
X-Rspamd-Queue-Id: 421CC86A1E
X-Spamd-Bar: /
Authentication-Results: mx1.freebsd.org
X-Spamd-Result: default: False [0.18 / 15.00]; ARC_NA(0.00)[];
 HAS_REPLYTO(0.00)[sgk@troutmask.apl.washington.edu];
 FROM_HAS_DN(0.00)[]; TO_MATCH_ENVRCPT_ALL(0.00)[];
 NEURAL_HAM_LONG(-0.25)[-0.254,0]; MIME_GOOD(-0.10)[text/plain];
 PREVIOUSLY_DELIVERED(0.00)[freebsd-numerics@freebsd.org];
 TO_DN_NONE(0.00)[]; AUTH_NA(1.00)[]; RCPT_COUNT_ONE(0.00)[1];
 RCVD_COUNT_THREE(0.00)[3]; RCVD_TLS_LAST(0.00)[];
 RCVD_IN_DNSWL_MED(-0.20)[21.76.95.128.list.dnswl.org : 127.0.11.2];
 MX_GOOD(-0.01)[cached: troutmask.apl.washington.edu];
 DMARC_NA(0.00)[washington.edu];
 NEURAL_HAM_SHORT(-0.79)[-0.790,0];
 NEURAL_SPAM_MEDIUM(0.48)[0.483,0]; R_SPF_NA(0.00)[];
 REPLYTO_ADDR_EQ_FROM(0.00)[]; FROM_EQ_ENVFROM(0.00)[];
 R_DKIM_NA(0.00)[]; MIME_TRACE(0.00)[0:+];
 ASN(0.00)[asn:73, ipnet:128.95.0.0/16, country:US];
 MID_RHS_MATCH_FROM(0.00)[];
 IP_SCORE(0.05)[ip: (0.11), ipnet: 128.95.0.0/16(0.16), asn: 73(0.06),
 country: US(-0.07)]
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
 <freebsd-numerics.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics/>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 27 Feb 2019 02:01:46 -0000

On Sun, Feb 03, 2019 at 09:30:56AM -0800, Steve Kargl wrote:
> The following patch implements cexpl() for ld80 and ld128 architectures.
> 
> The kernel files for large x are named ld{80,128}/k_cexpl.c to avoid
> confusion with the ld{80,128}/k_expl.h files.
> 
> I do not have access to ld128, so the ld128 does not use bit twiddling.
> I cannot test the ld128 implementation, but I believe that it works
> (for some definition of work).

I have attached a new patch to 

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=235413

This new patch incorporates changes to cexp and cexpf to
compute the z = x + I y with x == 0 case first.  This is
in preparation for fixing constanting folding with GCC.

The implementation for ld128/*_cexpl.c have not been 
compiled, and hence, are untested.  I don't have ld128
hardware.  The specific changes are

* lib/msun/src/math_private.h:
  . Add an EXTRACT_LDBL80_WORDS2() macro to get access to the high and
    low word of a 64-bit significand as well as the expsign.
  . Add prototype for __ldexp_expl().
  . Add prototype for __ldexp_cexpl().

* lib/msun/src/s_cexp.c:
  . A float.h to get LDBL_MANT_DIG.
  . Add c and s to declaration, and sort.
  . Move x = 0 case to be the first case tested.  This is in preparation
    for fixing constanting folding in GCC.
  . Use sincos() instead of a call to sin() and to cos().
  . A week_refrence for LDBL_MANT_DIG == 53.

* lib/msun/src/s_cexpf.c:
  . Add c and s to declaration, and sort.
  . Move x = 0 case to be the first case tested.  This is in preparation
    for fixing constanting folding in GCC.
  . Use sincosf() instead of a call to sinf() and to cosf().

* lib/msun/src/k_exp.c:
  . Add c and s to declaration, and sort.
  . Use sincos() instead of a call to sin() and to cos().
  
* lib/msun/src/k_expf.c
  . Add c and s to declaration, and sort.
  . Use sincosf() instead of a call to sinf() and to cosf().

* lib/msun/ld128/k_cexpl.c:
  . Copy src/k_exp.c to here.  #if 0 ... #endif all code and have
    functions return NaN.  These functions are currently unused,
    and await someone who cares.
  . Issue a compile-time warning about the sloppiness.

* lib/msun/ld128/s_cexpl.c:
  . Copy src/s_cexp.c to here.
  . Convert "double complex" to "long double complex" without use of
    bit-twiddling.
  . Add compile-time warning about the sloppiness.
  . Add run-time warning about the sloppiness.

* lib/msun/ld80/k_cexpl.c:
  . Copy src/k_exp.c to here.
  . Convert "double complex" to "long double complex".  Use bit-twiddling.

* lib/msun/ld80/s_cexpl.c:
  . Copy src/s_cexp.c to here.
  . Convert "double complex" to "long double complex".  Use bit-twiddling
    where bits are grabbed with new EXTRACT_LDBL80_WORDS2() macro.

* lib/msun/man/cexp.3:
  . Document the addtion of cexpl.

* include/complex.h:
  . Add prototype for cexpl().

Index: lib/msun/src/math_private.h
===================================================================
--- lib/msun/src/math_private.h	(revision 344600)
+++ lib/msun/src/math_private.h	(working copy)
@@ -243,6 +243,20 @@
 } while (0)
 
 /*
+ * Get expsign and mantissa as 16 bit and 2 32-bit ints from an 80 bit long
+ * double.
+ */
+
+#define	EXTRACT_LDBL80_WORDS2(ix0,ix1,ix2,d)				\
+do {								\
+  union IEEEl2bits ew_u;					\
+  ew_u.e = (d);							\
+  (ix0) = ew_u.xbits.expsign;					\
+  (ix1) = ew_u.bits.manh;					\
+  (ix2) = ew_u.bits.manl;					\
+} while (0)
+
+/*
  * Get expsign and mantissa as one 16 bit and two 64 bit ints from a 128 bit
  * long double.
  */
@@ -920,5 +934,9 @@
 long double __kernel_sinl(long double, long double, int);
 long double __kernel_cosl(long double, long double);
 long double __kernel_tanl(long double, long double, int);
+long double __ldexp_expl(long double, int);
+#ifdef _COMPLEX_H
+long double complex __ldexp_cexpl(long double complex, int);
+#endif
 
 #endif /* !_MATH_PRIVATE_H_ */
Index: lib/msun/src/s_cexp.c
===================================================================
--- lib/msun/src/s_cexp.c	(revision 344600)
+++ lib/msun/src/s_cexp.c	(working copy)
@@ -30,6 +30,7 @@
 __FBSDID("$FreeBSD$");
 
 #include <complex.h>
+#include <float.h>
 #include <math.h>
 
 #include "math_private.h"
@@ -41,12 +42,19 @@
 double complex
 cexp(double complex z)
 {
-	double x, y, exp_x;
+	double c, exp_x, s, x, y;
 	uint32_t hx, hy, lx, ly;
 
 	x = creal(z);
 	y = cimag(z);
 
+	EXTRACT_WORDS(hx, lx, x);
+	/* cexp(0 + I y) = cos(y) + I sin(y) */
+	if (((hx & 0x7fffffff) | lx) == 0) {
+		sincos(y, &s, &c);
+		return (CMPLX(c, s));
+	}
+
 	EXTRACT_WORDS(hy, ly, y);
 	hy &= 0x7fffffff;
 
@@ -53,10 +61,6 @@
 	/* cexp(x + I 0) = exp(x) + I 0 */
 	if ((hy | ly) == 0)
 		return (CMPLX(exp(x), y));
-	EXTRACT_WORDS(hx, lx, x);
-	/* cexp(0 + I y) = cos(y) + I sin(y) */
-	if (((hx & 0x7fffffff) | lx) == 0)
-		return (CMPLX(cos(y), sin(y)));
 
 	if (hy >= 0x7ff00000) {
 		if (lx != 0 || (hx & 0x7fffffff) != 0x7ff00000) {
@@ -86,6 +90,11 @@
 		 *  -  x = NaN (spurious inexact exception from y)
 		 */
 		exp_x = exp(x);
-		return (CMPLX(exp_x * cos(y), exp_x * sin(y)));
+		sincos(y, &s, &c);
+		return (CMPLX(exp_x * c, exp_x * s));
 	}
 }
+
+#if (LDBL_MANT_DIG == 53)
+__weak_reference(cexp, cexpl);
+#endif
Index: lib/msun/src/s_cexpf.c
===================================================================
--- lib/msun/src/s_cexpf.c	(revision 344600)
+++ lib/msun/src/s_cexpf.c	(working copy)
@@ -41,12 +41,19 @@
 float complex
 cexpf(float complex z)
 {
-	float x, y, exp_x;
+	float c, exp_x, s, x, y;
 	uint32_t hx, hy;
 
 	x = crealf(z);
 	y = cimagf(z);
 
+	GET_FLOAT_WORD(hx, x);
+	/* cexp(0 + I y) = cos(y) + I sin(y) */
+	if ((hx & 0x7fffffff) == 0) {
+		sincosf(y, &s, &c);
+		return (CMPLXF(c, s));
+	}
+
 	GET_FLOAT_WORD(hy, y);
 	hy &= 0x7fffffff;
 
@@ -53,10 +60,6 @@
 	/* cexp(x + I 0) = exp(x) + I 0 */
 	if (hy == 0)
 		return (CMPLXF(expf(x), y));
-	GET_FLOAT_WORD(hx, x);
-	/* cexp(0 + I y) = cos(y) + I sin(y) */
-	if ((hx & 0x7fffffff) == 0)
-		return (CMPLXF(cosf(y), sinf(y)));
 
 	if (hy >= 0x7f800000) {
 		if ((hx & 0x7fffffff) != 0x7f800000) {
@@ -86,6 +89,7 @@
 		 *  -  x = NaN (spurious inexact exception from y)
 		 */
 		exp_x = expf(x);
-		return (CMPLXF(exp_x * cosf(y), exp_x * sinf(y)));
+		sincosf(y, &s, &c);
+		return (CMPLXF(exp_x * c, exp_x * s));
 	}
 }
Index: lib/msun/src/k_exp.c
===================================================================
--- lib/msun/src/k_exp.c	(revision 344600)
+++ lib/msun/src/k_exp.c	(working copy)
@@ -88,7 +88,7 @@
 double complex
 __ldexp_cexp(double complex z, int expt)
 {
-	double x, y, exp_x, scale1, scale2;
+	double c, exp_x, s, scale1, scale2, x, y;
 	int ex_expt, half_expt;
 
 	x = creal(z);
@@ -105,6 +105,7 @@
 	half_expt = expt - half_expt;
 	INSERT_WORDS(scale2, (0x3ff + half_expt) << 20, 0);
 
-	return (CMPLX(cos(y) * exp_x * scale1 * scale2,
-	    sin(y) * exp_x * scale1 * scale2));
+	sincos(y, &s, &c);
+	return (CMPLX(c * exp_x * scale1 * scale2,
+	    s * exp_x * scale1 * scale2));
 }
Index: lib/msun/src/k_expf.c
===================================================================
--- lib/msun/src/k_expf.c	(revision 344600)
+++ lib/msun/src/k_expf.c	(working copy)
@@ -71,7 +71,7 @@
 float complex
 __ldexp_cexpf(float complex z, int expt)
 {
-	float x, y, exp_x, scale1, scale2;
+	float c, exp_x, s, scale1, scale2, x, y;
 	int ex_expt, half_expt;
 
 	x = crealf(z);
@@ -84,6 +84,7 @@
 	half_expt = expt - half_expt;
 	SET_FLOAT_WORD(scale2, (0x7f + half_expt) << 23);
 
-	return (CMPLXF(cosf(y) * exp_x * scale1 * scale2,
-	    sinf(y) * exp_x * scale1 * scale2));
+	sincosf(y, &s, &c);
+	return (CMPLXF(c * exp_x * scale1 * scale2,
+	    s * exp_x * scale1 * scale2));
 }
Index: lib/msun/ld128/k_cexpl.c
===================================================================
--- lib/msun/ld128/k_cexpl.c	(nonexistent)
+++ lib/msun/ld128/k_cexpl.c	(working copy)
@@ -0,0 +1,126 @@
+/*-
+ * SPDX-License-Identifier: BSD-2-Clause-FreeBSD
+ *
+ * Copyright (c) 2011 David Schultz <das@FreeBSD.ORG>
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
+ * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
+ * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+ * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+ * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+ * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
+ * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+ * SUCH DAMAGE.
+ */
+
+#include <sys/cdefs.h>
+__FBSDID("$FreeBSD$");
+
+#include <complex.h>
+
+#include "math.h"
+#include "math_private.h"
+
+#warning These functions are broken on ld128 architectures.
+#warning Functions defined here are currently unused.
+#warning Someone who cares needs to convert src/k_exp.c.
+
+#if 0
+static const uint32_t k = 1799;		/* constant for reduction */
+static const double kln2 =  1246.97177782734161156;	/* k * ln2 */
+#endif
+
+/*
+ * Compute exp(x), scaled to avoid spurious overflow.  An exponent is
+ * returned separately in 'expt'.
+ *
+ * Input:  ln(DBL_MAX) <= x < ln(2 * DBL_MAX / DBL_MIN_DENORM) ~= 1454.91
+ * Output: 2**1023 <= y < 2**1024
+ */
+static long double
+__frexp_expl(long double x, int *expt)
+{
+#if 0
+	double exp_x;
+	uint32_t hx;
+
+	/*
+	 * We use exp(x) = exp(x - kln2) * 2**k, carefully chosen to
+	 * minimize |exp(kln2) - 2**k|.  We also scale the exponent of
+	 * exp_x to MAX_EXP so that the result can be multiplied by
+	 * a tiny number without losing accuracy due to denormalization.
+	 */
+	exp_x = exp(x - kln2);
+	GET_HIGH_WORD(hx, exp_x);
+	*expt = (hx >> 20) - (0x3ff + 1023) + k;
+	SET_HIGH_WORD(exp_x, (hx & 0xfffff) | ((0x3ff + 1023) << 20));
+	return (exp_x);
+#endif
+	return (x - x) / (x - x);
+}
+
+/*
+ * __ldexp_exp(x, expt) and __ldexp_cexp(x, expt) compute exp(x) * 2**expt.
+ * They are intended for large arguments (real part >= ln(DBL_MAX))
+ * where care is needed to avoid overflow.
+ *
+ * The present implementation is narrowly tailored for our hyperbolic and
+ * exponential functions.  We assume expt is small (0 or -1), and the caller
+ * has filtered out very large x, for which overflow would be inevitable.
+ */
+
+long double
+__ldexp_expl(long double x, int expt)
+{
+#if 0
+	double exp_x, scale;
+	int ex_expt;
+
+	exp_x = __frexp_exp(x, &ex_expt);
+	expt += ex_expt;
+	INSERT_WORDS(scale, (0x3ff + expt) << 20, 0);
+	return (exp_x * scale);
+#endif
+	return (x - x) / (x - x);
+}
+
+long double complex
+__ldexp_cexpl(long double complex z, int expt)
+{
+#if 0
+	double x, y, exp_x, scale1, scale2;
+	int ex_expt, half_expt;
+
+	x = creal(z);
+	y = cimag(z);
+	exp_x = __frexp_exp(x, &ex_expt);
+	expt += ex_expt;
+
+	/*
+	 * Arrange so that scale1 * scale2 == 2**expt.  We use this to
+	 * compensate for scalbn being horrendously slow.
+	 */
+	half_expt = expt / 2;
+	INSERT_WORDS(scale1, (0x3ff + half_expt) << 20, 0);
+	half_expt = expt - half_expt;
+	INSERT_WORDS(scale2, (0x3ff + half_expt) << 20, 0);
+
+	sincos(y, &s, &c);
+	return (CMPLX(c) * exp_x * scale1 * scale2,
+	    s * exp_x * scale1 * scale2));
+#endif
+	return (x - x) / (x - x);
+}

Property changes on: lib/msun/ld128/k_cexpl.c
___________________________________________________________________
Added: svn:eol-style
## -0,0 +1 ##
+native
\ No newline at end of property
Added: svn:keywords
## -0,0 +1 ##
+FreeBSD=%H
\ No newline at end of property
Added: svn:mime-type
## -0,0 +1 ##
+text/plain
\ No newline at end of property
Index: lib/msun/ld128/s_cexpl.c
===================================================================
--- lib/msun/ld128/s_cexpl.c	(nonexistent)
+++ lib/msun/ld128/s_cexpl.c	(working copy)
@@ -0,0 +1,76 @@
+/*-
+ * SPDX-License-Identifier: BSD-2-Clause-FreeBSD
+ *
+ * Copyright (c) 2019 Steven G. Kargl <kargl@FreeBSD.ORG>
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
+ * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
+ * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+ * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+ * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+ * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
+ * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+ * SUCH DAMAGE.
+ */
+
+#include <sys/cdefs.h>
+__FBSDID("$FreeBSD$");
+
+#include <complex.h>
+#include <float.h>
+#include <math.h>
+
+#include "math_private.h"
+
+#warning cexpl is likely broken on ld128 architectures.
+#warning Someone who cares needs to convert src/s_cexp.c.
+
+long double complex
+cexpl(long double complex z)
+{
+	long double c, exp_x, s, x, y;
+
+	x = creall(z);
+	y = cimagl(z);
+
+	/* cexp(0 + I y) = cos(y) + I sin(y) */
+	if (x == 0) {
+		sincosl(y, &s, &c);
+		return (CMPLXL(c, s));
+	}
+
+	/* cexp(x + I 0) = exp(x) + I 0 */
+	if (y == 0)
+		return (CMPLXL(expl(x), y));
+
+	if (!isfinite(y)) {
+		if (isfinite(x) || isnan(x)) {
+			/* cexp(finite|NaN +- I Inf|NaN) = NaN + I NaN */
+			return (CMPLXL(y - y, y - y));
+		} else if (isinf(x) && copysignl(1.L, x) < 0) {
+			/* cexp(-Inf +- I Inf|NaN) = 0 + I 0 */
+			return (CMPLXL(0.0L, 0.0L));
+		} else {
+			/* cexp(+Inf +- I Inf|NaN) = Inf + I NaN */
+			return (CMPLXL(x, y - y));
+		}
+	}
+
+	exp_x = expl(x);
+	sincosl(y, &s, &c);
+	return (CMPLXL(exp_x * c, exp_x * s));
+}
+__warn_references("Precision of cexpl may have lower than expected precision");

Property changes on: lib/msun/ld128/s_cexpl.c
___________________________________________________________________
Added: svn:eol-style
## -0,0 +1 ##
+native
\ No newline at end of property
Added: svn:keywords
## -0,0 +1 ##
+FreeBSD=%H
\ No newline at end of property
Added: svn:mime-type
## -0,0 +1 ##
+text/plain
\ No newline at end of property
Index: lib/msun/ld80/k_cexpl.c
===================================================================
--- lib/msun/ld80/k_cexpl.c	(nonexistent)
+++ lib/msun/ld80/k_cexpl.c	(working copy)
@@ -0,0 +1,118 @@
+/*-
+ * SPDX-License-Identifier: BSD-2-Clause-FreeBSD
+ *
+ * Copyright (c) 2011 David Schultz <das@FreeBSD.ORG>
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
+ * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
+ * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+ * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+ * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+ * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
+ * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+ * SUCH DAMAGE.
+ *
+ * src/k_exp.c converted to long double complex by Steven G. Kargl
+ */
+
+#include <sys/cdefs.h>
+__FBSDID("$FreeBSD$");
+
+#include <complex.h>
+
+#include "fpmath.h"
+#include "math.h"
+#include "math_private.h"
+
+static const uint32_t k = 22956;		/* constant for reduction */
+static const union IEEEl2bits
+kln2u = LD80C(0xf89f8bf509c862ca, 13,  1.59118866769341045231e+04L);
+#define	kln2 (kln2u.e)
+/*
+ * Compute exp(x), scaled to avoid spurious overflow.  An exponent is
+ * returned separately in 'expt'.
+ *
+ * Input:  ln(LDBL_MAX) <= x < ln(2 * LDBL_MAX / LDBL_MIN_DENORM) ~= 22756.0
+ * Output: 2**16383 <= y < 2**16384
+ */
+static long double
+__frexp_expl(long double x, int *expt)
+{
+	union IEEEl2bits exp_x;
+
+	/*
+	 * We use exp(x) = exp(x - kln2) * 2**k, carefully chosen to
+	 * minimize |exp(kln2) - 2**k|.  We also scale the exponent of
+	 * exp_x to MAX_EXP so that the result can be multiplied by
+	 * a tiny number without losing accuracy due to denormalization.
+	 */
+	exp_x.e = expl(x - kln2);
+	*expt = exp_x.bits.exp - (0x3fff + 16383) + k - 1;
+	exp_x.bits.exp = 0x3fff + 16383;
+	return (exp_x.e);
+}
+
+/*
+ * __ldexp_exp(x, expt) and __ldexp_cexp(x, expt) compute exp(x) * 2**expt.
+ * They are intended for large arguments (real part >= ln(DBL_MAX))
+ * where care is needed to avoid overflow.
+ *
+ * The present implementation is narrowly tailored for our hyperbolic and
+ * exponential functions.  We assume expt is small (0 or -1), and the caller
+ * has filtered out very large x, for which overflow would be inevitable.
+ */
+
+long double
+__ldexp_expl(long double x, int expt)
+{
+	union IEEEl2bits scale;
+	long double exp_x;
+	int ex_expt;
+
+	exp_x = __frexp_expl(x, &ex_expt);
+	expt += ex_expt;
+	scale.e = 1;
+	scale.bits.exp = 0x3fff + expt;
+	return (exp_x * scale.e);
+}
+
+long double complex
+__ldexp_cexpl(long double complex z, int expt)
+{
+	union IEEEl2bits scale1, scale2;
+	long double c, exp_x, s, x, y;
+	int ex_expt, half_expt;
+
+	x = creall(z);
+	y = cimagl(z);
+	exp_x = __frexp_expl(x, &ex_expt);
+	expt += ex_expt;
+
+	/*
+	 * Arrange so that scale1 * scale2 == 2**expt.  We use this to
+	 * compensate for scalbn being horrendously slow.
+	 */
+	half_expt = expt / 2;
+	scale1.e = 1;
+	scale1.bits.exp = 0x3fff + half_expt;
+	half_expt = expt - half_expt;
+	scale2.e = 1;
+	scale2.bits.exp = 0x3fff + half_expt + 1;
+
+	sincosl(y, &s, &c);
+	return (CMPLXL(c * exp_x * scale1.e * scale2.e,
+	    s * exp_x * scale1.e * scale2.e));
+}

Property changes on: lib/msun/ld80/k_cexpl.c
___________________________________________________________________
Added: svn:eol-style
## -0,0 +1 ##
+native
\ No newline at end of property
Added: svn:keywords
## -0,0 +1 ##
+FreeBSD=%H
\ No newline at end of property
Added: svn:mime-type
## -0,0 +1 ##
+text/plain
\ No newline at end of property
Index: lib/msun/ld80/s_cexpl.c
===================================================================
--- lib/msun/ld80/s_cexpl.c	(nonexistent)
+++ lib/msun/ld80/s_cexpl.c	(working copy)
@@ -0,0 +1,105 @@
+/*-
+ * SPDX-License-Identifier: BSD-2-Clause-FreeBSD
+ *
+ * Copyright (c) 2011 David Schultz <das@FreeBSD.ORG>
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
+ * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
+ * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+ * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+ * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+ * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
+ * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+ * SUCH DAMAGE.
+ *
+ * src/s_cexp.c converted to long double complex by Steven G. Kargl
+ */
+
+#include <sys/cdefs.h>
+__FBSDID("$FreeBSD$");
+
+#include <complex.h>
+#include <float.h>
+#ifdef __i386__
+#include <ieeefp.h>
+#endif
+#include <math.h>
+
+#include "fpmath.h"
+#include "math_private.h"
+
+static const uint32_t
+exp_ovfl  = 0xb17217f7,		/* high bits of MAX_EXP * ln2 ~= 11356 */
+cexp_ovfl = 0xb1c6a857;		/* (MAX_EXP - MIN_DENORM_EXP) * ln2 */
+
+long double complex
+cexpl (long double complex z)
+{
+	long double c, exp_x, s, x, y;
+	uint32_t hwx, hwy, lwx, lwy;
+	uint16_t hx, hy;
+
+	ENTERI(z);
+
+	x = creall(z);
+	y = cimagl(z);
+
+	EXTRACT_LDBL80_WORDS2(hx, hwx, lwx, x);
+	EXTRACT_LDBL80_WORDS2(hy, hwy, lwy, y);
+
+	/* cexp(0 + I y) = cos(y) + I sin(y) */
+	if ((hwx | lwx | (hx & 0x7fff)) == 0) {
+		sincosl(y, &s, &c);
+		RETURNI(CMPLXL(c, s));
+	}
+
+	/* cexp(x + I 0) = exp(x) + I 0 */
+	if ((hwy | lwy | (hy & 0x7fff)) == 0)
+		RETURNI(CMPLXL(expl(x), y));
+
+	if (hy >= 0x7fff) {
+		if (hx < 0x7fff ||
+		    (hx == 0x7fff && (hwx & 0x7fffffff) != 0)) {
+			/* cexp(finite|NaN +- I Inf|NaN) = NaN + I NaN */
+			RETURNI(CMPLXL(y - y, y - y));
+		} else if (hx & 0x8000) {
+			/* cexp(-Inf +- I Inf|NaN) = 0 + I 0 */
+			RETURNI(CMPLXL(0.0L, 0.0L));
+		} else {
+			/* cexp(+Inf +- I Inf|NaN) = Inf + I NaN */
+			RETURNI(CMPLXL(x, y - y));
+		}
+	}
+
+	if (hx >= 0x400c && hx <= 0x400d) {
+		/*
+		 * x is between 11356 and 22755, so we must scale to avoid
+		 * overflow in exp(x).
+		 */
+		RETURNI(__ldexp_cexpl(z, 0));
+	} else {
+		/*
+		 * Cases covered here:
+		 *  -  x < exp_ovfl and exp(x) won't overflow (common case)
+		 *  -  x > cexp_ovfl, so exp(x) * s overflows for all s > 0
+		 *  -  x = +-Inf (generated by exp())
+		 *  -  x = NaN (spurious inexact exception from y)
+		 */
+		exp_x = expl(x);
+		sincosl(y, &s, &c);
+		RETURNI(CMPLXL(exp_x * c, exp_x * s));
+	}
+}

Property changes on: lib/msun/ld80/s_cexpl.c
___________________________________________________________________
Added: svn:eol-style
## -0,0 +1 ##
+native
\ No newline at end of property
Added: svn:keywords
## -0,0 +1 ##
+FreeBSD=%H
\ No newline at end of property
Added: svn:mime-type
## -0,0 +1 ##
+text/plain
\ No newline at end of property
Index: lib/msun/man/cexp.3
===================================================================
--- lib/msun/man/cexp.3	(revision 344600)
+++ lib/msun/man/cexp.3	(working copy)
@@ -24,12 +24,13 @@
 .\"
 .\" $FreeBSD$
 .\"
-.Dd March 6, 2011
+.Dd February 4, 2019
 .Dt CEXP 3
 .Os
 .Sh NAME
 .Nm cexp ,
-.Nm cexpf
+.Nm cexpf ,
+.Nm cexpl
 .Nd complex exponential functions
 .Sh LIBRARY
 .Lb libm
@@ -39,11 +40,14 @@
 .Fn cexp "double complex z"
 .Ft float complex
 .Fn cexpf "float complex z"
+.Ft long double complex
+.Fn cexpl "long double complex z"
 .Sh DESCRIPTION
 The
-.Fn cexp
+.Fn cexp ,
+.Fn cexpf ,
 and
-.Fn cexpf
+.Fn cexpl
 functions compute the complex exponential of
 .Fa z ,
 also known as
@@ -106,8 +110,9 @@
 .Xr math 3
 .Sh STANDARDS
 The
-.Fn cexp
+.Fn cexp ,
+.Fn cexpf ,
 and
-.Fn cexpf
+.Fn cexpl
 functions conform to
 .St -isoC-99 .
Index: include/complex.h
===================================================================
--- include/complex.h	(revision 344600)
+++ include/complex.h	(working copy)
@@ -98,6 +98,8 @@
 float complex	ccoshf(float complex);
 double complex	cexp(double complex);
 float complex	cexpf(float complex);
+long double complex
+		cexpl(long double complex);
 double		cimag(double complex) __pure2;
 float		cimagf(float complex) __pure2;
 long double	cimagl(long double complex) __pure2;

From owner-freebsd-numerics@freebsd.org  Wed Feb 27 06:05:26 2019
Return-Path: <owner-freebsd-numerics@freebsd.org>
Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 92A5615120A1
 for <freebsd-numerics@mailman.ysv.freebsd.org>;
 Wed, 27 Feb 2019 06:05:26 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from mail104.syd.optusnet.com.au (mail104.syd.optusnet.com.au
 [211.29.132.246])
 by mx1.freebsd.org (Postfix) with ESMTP id D2F3F8EE73
 for <freebsd-numerics@freebsd.org>; Wed, 27 Feb 2019 06:05:21 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from [192.168.0.102] (c110-21-101-228.carlnfd1.nsw.optusnet.com.au
 [110.21.101.228])
 by mail104.syd.optusnet.com.au (Postfix) with ESMTPS id 80B93435045;
 Wed, 27 Feb 2019 17:05:18 +1100 (AEDT)
Date: Wed, 27 Feb 2019 17:05:15 +1100 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Steve Kargl <sgk@troutmask.apl.washington.edu>
cc: freebsd-numerics@freebsd.org
Subject: Re: Update ENTERI() macro
In-Reply-To: <20190226191825.GA68479@troutmask.apl.washington.edu>
Message-ID: <20190227145002.P907@besplex.bde.org>
References: <20190226191825.GA68479@troutmask.apl.washington.edu>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Optus-CM-Score: 0
X-Optus-CM-Analysis: v=2.2 cv=UJetJGXy c=1 sm=1 tr=0
 a=PalzARQSbocsUSjMRkwAPg==:117 a=PalzARQSbocsUSjMRkwAPg==:17
 a=kj9zAlcOel0A:10 a=6I5d2MoRAAAA:8 a=D3kSSieMYeL5IMV4CcIA:9
 a=CjuIK1q_8ugA:10 a=IjZwj45LgO3ly-622nXo:22
X-Rspamd-Queue-Id: D2F3F8EE73
X-Spamd-Bar: ------
Authentication-Results: mx1.freebsd.org;
 spf=pass (mx1.freebsd.org: domain of brde@optusnet.com.au designates
 211.29.132.246 as permitted sender) smtp.mailfrom=brde@optusnet.com.au
X-Spamd-Result: default: False [-6.35 / 15.00]; ARC_NA(0.00)[];
 NEURAL_HAM_MEDIUM(-1.00)[-1.000,0];
 RCVD_IN_DNSWL_LOW(-0.10)[246.132.29.211.list.dnswl.org : 127.0.5.1];
 FROM_HAS_DN(0.00)[]; TO_DN_SOME(0.00)[];
 R_SPF_ALLOW(-0.20)[+ip4:211.29.132.0/23];
 FREEMAIL_FROM(0.00)[optusnet.com.au];
 MIME_GOOD(-0.10)[text/plain]; DMARC_NA(0.00)[optusnet.com.au];
 NEURAL_HAM_LONG(-1.00)[-1.000,0];
 TO_MATCH_ENVRCPT_SOME(0.00)[];
 MX_GOOD(-0.01)[extmail.optusnet.com.au];
 RCPT_COUNT_TWO(0.00)[2]; NEURAL_HAM_SHORT(-0.98)[-0.978,0];
 IP_SCORE(-3.06)[ip: (-7.84), ipnet: 211.28.0.0/14(-4.13), asn: 4804(-3.29),
 country: AU(-0.04)]; RCVD_NO_TLS_LAST(0.10)[];
 FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[];
 FREEMAIL_ENVFROM(0.00)[optusnet.com.au];
 ASN(0.00)[asn:4804, ipnet:211.28.0.0/14, country:AU];
 MIME_TRACE(0.00)[0:+]; RCVD_COUNT_TWO(0.00)[2]
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
 <freebsd-numerics.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics/>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 27 Feb 2019 06:05:26 -0000

On Tue, 26 Feb 2019, Steve Kargl wrote:

> I have submitted
>
> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=236063
>
> Update the ENTERI() macro in math_private.h to take a parameter.
> This parameter is used by __typeof() to set the type for an
> intermediate variable that holds the return value of a function
> on i386 FreeBSD.  The patch updates the consumers of ENTERI().

I don't like this.  It churns and complicates all the simple cases
that only need ENTERI().  It bogotifies the existence of ENTERIT(),
whose reason for existence is to avoid this.  (Actually, it removes
ENTERIT().)  __typeof() is unnecessarily unportable.  The implementation
uses literal typesin thousands of places so gains little or less than
nothing by not spelling types literally for ENTERIT().

> ...
> --- /data/kargl/freebsd/src/lib/msun/src/math_private.h	2018-10-31 16:15:21.809877000 -0700
> +++ ./src/math_private.h	2019-02-25 16:54:26.897247000 -0800
> @@ -335,15 +348,14 @@
>
> /* Support switching the mode to FP_PE if necessary. */
> #if defined(__i386__) && !defined(NO_FPSETPREC)
> -#define	ENTERI() ENTERIT(long double)
> -#define	ENTERIT(returntype)			\
> -	returntype __retval;			\
> +#define	ENTERI(a)				\
> +	__typeof(a) __retval;			\
> 	fp_prec_t __oprec;			\
> 						\
> 	if ((__oprec = fpgetprec()) != FP_PE)	\
> 		fpsetprec(FP_PE)
> -#define	RETURNI(x) do {				\
> -	__retval = (x);				\
> +#define	RETURNI(a) do {				\
> +	__retval = (a);				\
> 	if (__oprec != FP_PE)			\
> 		fpsetprec(__oprec);		\
> 	RETURNF(__retval);			\
> @@ -359,9 +371,8 @@
> 	return;			\
> } while (0)
> #else
> -#define	ENTERI()
> -#define	ENTERIT(x)
> -#define	RETURNI(x)	RETURNF(x)
> +#define	ENTERI(a)
> +#define	RETURNI(a)	RETURNF(a)
> #define	ENTERV()
> #define	RETURNV()	return
> #endif

This also changes RETURNI(), by unimproving its parameter name.  'x' for
ENTERI() wasn't a very good name for a type, but is good for a variable.
'x' for RETURNI() is slightly worse than 'r', but better than 'a'

I had forgotten that ENTERIT() creates a local variable, and don't like this,
but this is its main reason for existence.  Now I see a better way,
especially if if we accept unportabilities like __typeof():

ENTERIT() doesn't need to declare local variable with nearly function scope,
since the variable is only needed for returning.  Only the precision variable
needs to have nearly function scope.  The return variable can be created in
block scope in RETURNI().  Something like:

#define	RETURNI(r) do {				\
 	__typeof(r) __retval;			\
 						\
 	__retval = (r);				\
 	if ((__oprec = fpgetprec()) != FP_PE)	\
 		fpsetprec(FP_PE)		\
 	RETURNF(__retval);			\
} while (0)

You already added ENTERV() and RETURNV().  That seemed reasonable, but now
I don't like it.  With the above change (and of course also remove the
variable from ENTERI()), ENTERV() becomes the same as ENTERI().  RETURNV()
still needs to be different.

Another bug in ENTERV() is its name.  It is missing the 'I' to indicate that
it is a complication for only i386.

RETURNV() has the same naming problem, and more.  It can complement
ENTERI() perfectly since it doesn't need to return.  It should be named
LEAVEI(), and the return should be done by the caller.

But I now see 3 more problems.  The return in RETURNI() is not direct,
but goes through the macro RETURNF(x).  In the committed version, this
is a default that just returns x, but in my version it returns
hackdouble_t(x) or hackfloat_t(x) in some cases (no cases are needed
for long doubles, so there is no interaction with ENTERI()/LEAVEI(),
and I only do this in a few simple cases not including any with
complex types).

In the hack*_t() cases, x has type double_t or float_t, and we want
to avoid C99's requirement to destroy efficiency and accuracy on return
by removing any extra precision in x.  (C99 only requires this
destruction for careful code that uses double_t or float_t in the
function.  Then since the return type is different from the internal
type, destruction is required.  C99 also requires removing extra
precision on assignment or casts, but most compilers with most CFLAGS
don't do this since it is pessimal.  Careful code uses double_t and
float_t internally so as to not depend on compiler bugs for efficiency
and to avoid unintended loss of accuracy on assignment ands casts for
compilers without bugs.  C11 adds further destruction for sloppy code
that uses the return type internally.)  hackdouble_t(x) and
hackfloat_t(x) change from type xxx_t to type xxx using a type pun so
that the destruction is not required in the C99 case.

The 3 problems are:
- the type pun interacts in a complicated way with changing the precision.
   This problem is null since type type pun is not needed for long doubles
   (unless there is a type with more precision than long double, but there
   is no such type for any supported arch, and C99 is missing support for
   it).  If this were a problem, then we would have to be carefull about
   the order of the type pun and the precision change.  Do the type pun
   first.

- also, be careful to apply any __typeof() to an arg with the correct
   type.  Normally do the type pun first, and use the return type for
   the local variable.  However, for printing, the arg should be the
   type that has extra precision, and __typeof() on that doesn't give
   the return type in general.

   In simple cases, The modified RETURNF() expands to something like
     'return (hackfloat_t(hi + lo));',
   where hi + lo is evalated in extra precision using float_t if possible,
   hackfloat_t puns the resulting float_t value to plain float without
   any destruction, and we finally return the value.

   In more complicated cases, the punned returned value is stored in a
   variable for printing using RETURNP().  I forget if the type of this
   variable is the return type or the arg type (printing in the return
   type shows what callers will normally see but printing in the arg
   type gives more details).  The caller uses RETURNP*(), and RETURNP*()
   uses RETURNF() and RETURNF() might do the type pun.  My code is not
   careful enough to handle all cases.  For the common hi + lo case,
   RETURN2P[I]() is used to print hi and lo separately, so the extra
   precision in hi + lo can be checked even on arches where hi + lo is
   not evaluated in extra precision.

- finally, it is necessary to go through a macro to reach the printing.
   Not many functions do this.  But most long double functions get this
   at no extra cost (except complexity in the macros) by having to go
   though ENTERI()/RETURNI().  Currently, the caller must decide, but
   it is a trivaial edit to change RETURNI() to RETURNP() and a simple
   edit to change RETURNI(hi + lo) to RETURN2P(hi, lo).  There is also
   SUM2P() to handle more complicated cases of hi + lo.

The return value is not passed to RETURNV(), so it is harder to get
automatic printing at no extra cost.  In s_sincosf.c and s_sincos.c,
the macros are just not used, the same as for s_sinf.c, etc.  The
float case double on all arches instead of double_t.  On i386, this
reduces to a slightly pessimized use of double_t, and on arches with
no extra precision it sacrifices a little efficiency to get more
accuracy.  In s_sincosl.c, the values are calculated and stored in
local variables, and then RETURNV() returns void.

This reminds me of a reason why I don't like sincos*().  Its API
requires destruction of efficiency and accuracy by returning the values
indirectly.  On i386 with not very old CPUs, this costs about 8 cycles per
long double value.  Float and double values cost about half as much.  On
amd64, the long double case is the same and the float and double cases
are faster.

sinf() and cosf() on small args take only 15-20 cycles (thoughput) on
amd64 with not very old CPUs, so 2-8 extra cycles for the 2 indirect
return values is a lot.  sincosf() still ends up being slightly faster
than separate sinf()/cosf().

Bruce

From owner-freebsd-numerics@freebsd.org  Wed Feb 27 07:48:21 2019
Return-Path: <owner-freebsd-numerics@freebsd.org>
Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id BB6381514694
 for <freebsd-numerics@mailman.ysv.freebsd.org>;
 Wed, 27 Feb 2019 07:48:20 +0000 (UTC)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu
 [128.95.76.21])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 server-signature RSA-PSS (4096 bits)
 client-signature RSA-PSS (2048 bits) client-digest SHA256)
 (Client CN "troutmask", Issuer "troutmask" (not verified))
 by mx1.freebsd.org (Postfix) with ESMTPS id 7AE3A6B873
 for <freebsd-numerics@freebsd.org>; Wed, 27 Feb 2019 07:48:19 +0000 (UTC)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: from troutmask.apl.washington.edu (localhost [127.0.0.1])
 by troutmask.apl.washington.edu (8.15.2/8.15.2) with ESMTPS id x1R7mBxP076055
 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NO);
 Tue, 26 Feb 2019 23:48:11 -0800 (PST)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: (from sgk@localhost)
 by troutmask.apl.washington.edu (8.15.2/8.15.2/Submit) id x1R7mBAr076054;
 Tue, 26 Feb 2019 23:48:11 -0800 (PST) (envelope-from sgk)
Date: Tue, 26 Feb 2019 23:48:11 -0800
From: Steve Kargl <sgk@troutmask.apl.washington.edu>
To: Bruce Evans <brde@optusnet.com.au>
Cc: freebsd-numerics@freebsd.org
Subject: Re: Update ENTERI() macro
Message-ID: <20190227074811.GA75972@troutmask.apl.washington.edu>
Reply-To: sgk@troutmask.apl.washington.edu
References: <20190226191825.GA68479@troutmask.apl.washington.edu>
 <20190227145002.P907@besplex.bde.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20190227145002.P907@besplex.bde.org>
User-Agent: Mutt/1.11.2 (2019-01-07)
X-Rspamd-Queue-Id: 7AE3A6B873
X-Spamd-Bar: +
Authentication-Results: mx1.freebsd.org
X-Spamd-Result: default: False [1.06 / 15.00]; ARC_NA(0.00)[];
 HAS_REPLYTO(0.00)[sgk@troutmask.apl.washington.edu];
 FROM_HAS_DN(0.00)[]; TO_DN_SOME(0.00)[];
 NEURAL_SPAM_SHORT(0.49)[0.491,0];
 NEURAL_HAM_LONG(-0.51)[-0.508,0]; MIME_GOOD(-0.10)[text/plain];
 RCVD_TLS_LAST(0.00)[]; DMARC_NA(0.00)[washington.edu];
 AUTH_NA(1.00)[]; REPLYTO_ADDR_EQ_FROM(0.00)[];
 RCVD_COUNT_THREE(0.00)[3]; TO_MATCH_ENVRCPT_SOME(0.00)[];
 RCVD_IN_DNSWL_MED(-0.20)[21.76.95.128.list.dnswl.org : 127.0.11.2];
 RCPT_COUNT_TWO(0.00)[2];
 MX_GOOD(-0.01)[troutmask.apl.washington.edu];
 NEURAL_SPAM_MEDIUM(0.33)[0.331,0]; R_SPF_NA(0.00)[];
 FREEMAIL_TO(0.00)[optusnet.com.au]; FROM_EQ_ENVFROM(0.00)[];
 R_DKIM_NA(0.00)[]; MIME_TRACE(0.00)[0:+];
 ASN(0.00)[asn:73, ipnet:128.95.0.0/16, country:US];
 MID_RHS_MATCH_FROM(0.00)[];
 IP_SCORE(0.05)[ip: (0.11), ipnet: 128.95.0.0/16(0.16), asn: 73(0.06),
 country: US(-0.07)]
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
 <freebsd-numerics.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics/>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 27 Feb 2019 07:48:21 -0000

On Wed, Feb 27, 2019 at 05:05:15PM +1100, Bruce Evans wrote:
> On Tue, 26 Feb 2019, Steve Kargl wrote:
> 
> > I have submitted
> >
> > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=236063
> >
> > Update the ENTERI() macro in math_private.h to take a parameter.
> > This parameter is used by __typeof() to set the type for an
> > intermediate variable that holds the return value of a function
> > on i386 FreeBSD.  The patch updates the consumers of ENTERI().
> 
> I don't like this.  It churns and complicates all the simple cases
> that only need ENTERI().  It bogotifies the existence of ENTERIT(),
> whose reason for existence is to avoid this.  (Actually, it removes
> ENTERIT().)  __typeof() is unnecessarily unportable.  The implementation
> uses literal typesin thousands of places so gains little or less than
> nothing by not spelling types literally for ENTERIT().

Okay.  The other option is an ENTERC() and RETURNC() as
we need to toggle FP_PE for long double complex functions.
I suppose I could follow the one example currently in the
tree that use 

	ENTERIT(long double complex)

I find it somewhat odd that we have

	ENTERI() /* Implicit declaration of __retval to long double. */

but must use directly ENTERIT(long double complex).

> > ...
> > --- /data/kargl/freebsd/src/lib/msun/src/math_private.h	2018-10-31 16:15:21.809877000 -0700
> > +++ ./src/math_private.h	2019-02-25 16:54:26.897247000 -0800
> > @@ -335,15 +348,14 @@
> >
> > /* Support switching the mode to FP_PE if necessary. */
> > #if defined(__i386__) && !defined(NO_FPSETPREC)
> > -#define	ENTERI() ENTERIT(long double)
> > -#define	ENTERIT(returntype)			\
> > -	returntype __retval;			\
> > +#define	ENTERI(a)				\
> > +	__typeof(a) __retval;			\
> > 	fp_prec_t __oprec;			\
> > 						\
> > 	if ((__oprec = fpgetprec()) != FP_PE)	\
> > 		fpsetprec(FP_PE)
> > -#define	RETURNI(x) do {				\
> > -	__retval = (x);				\
> > +#define	RETURNI(a) do {				\
> > +	__retval = (a);				\
> > 	if (__oprec != FP_PE)			\
> > 		fpsetprec(__oprec);		\
> > 	RETURNF(__retval);			\
> > @@ -359,9 +371,8 @@
> > 	return;			\
> > } while (0)
> > #else
> > -#define	ENTERI()
> > -#define	ENTERIT(x)
> > -#define	RETURNI(x)	RETURNF(x)
> > +#define	ENTERI(a)
> > +#define	RETURNI(a)	RETURNF(a)
> > #define	ENTERV()
> > #define	RETURNV()	return
> > #endif
> 
> This also changes RETURNI(), by unimproving its parameter name.  'x' for
> ENTERI() wasn't a very good name for a type, but is good for a variable.
> 'x' for RETURNI() is slightly worse than 'r', but better than 'a'

The renaming is for consistency.  I can use 'r'.

> I had forgotten that ENTERIT() creates a local variable, and don't like this,
> but this is its main reason for existence.  Now I see a better way,
> especially if if we accept unportabilities like __typeof():
> 
> ENTERIT() doesn't need to declare local variable with nearly function scope,
> since the variable is only needed for returning.  Only the precision variable
> needs to have nearly function scope.  The return variable can be created in
> block scope in RETURNI().  Something like:
> 
> #define	RETURNI(r) do {				\
>  	__typeof(r) __retval;			\
>  						\
>  	__retval = (r);				\
>  	if ((__oprec = fpgetprec()) != FP_PE)	\
>  		fpsetprec(FP_PE)		\
>  	RETURNF(__retval);			\
> } while (0)
> 
> You already added ENTERV() and RETURNV().  That seemed reasonable, but now
> I don't like it.  With the above change (and of course also remove the
> variable from ENTERI()), ENTERV() becomes the same as ENTERI().  RETURNV()
> still needs to be different.
> 
> Another bug in ENTERV() is its name.  It is missing the 'I' to indicate that
> it is a complication for only i386.

You probably told at some point in the distant past what the
'I' meant, but that is long forgotten.  I'm fine with you
changing these names.  ENTERIV() and RETURNIV() are needed
for sincosl(), which is a void function.

> RETURNV() has the same naming problem, and more.  It can complement
> ENTERI() perfectly since it doesn't need to return.  It should be named
> LEAVEI(), and the return should be done by the caller.
> 
> But I now see 3 more problems.  The return in RETURNI() is not direct,
> but goes through the macro RETURNF(x).  In the committed version, this
> is a default that just returns x, but in my version it returns
> hackdouble_t(x) or hackfloat_t(x) in some cases (no cases are needed
> for long doubles, so there is no interaction with ENTERI()/LEAVEI(),
> and I only do this in a few simple cases not including any with
> complex types).

I'm fine with making ENTERI() only toggle precision, and adding
a LEAVEI() to reset precision.  RETURNI(r) would then be

#define RETURNI(r)	\
do {			\
   LEAVEI();		\
   return (r);		\
} while (0) 

ENTERV and RETURNV can then go away as ENTERV is simply
ENTERI and RETURNV is now spelled LEAVEI.

> In the hack*_t() cases, x has type double_t or float_t, and we want
> to avoid C99's requirement to destroy efficiency and accuracy on return
> by removing any extra precision in x.  (C99 only requires this
> destruction for careful code that uses double_t or float_t in the
> function.  Then since the return type is different from the internal
> type, destruction is required.  C99 also requires removing extra
> precision on assignment or casts, but most compilers with most CFLAGS
> don't do this since it is pessimal.  Careful code uses double_t and
> float_t internally so as to not depend on compiler bugs for efficiency
> and to avoid unintended loss of accuracy on assignment ands casts for
> compilers without bugs.  C11 adds further destruction for sloppy code
> that uses the return type internally.)  hackdouble_t(x) and
> hackfloat_t(x) change from type xxx_t to type xxx using a type pun so
> that the destruction is not required in the C99 case.
> 
> The 3 problems are:
> - the type pun interacts in a complicated way with changing the precision.
>    This problem is null since type type pun is not needed for long doubles
>    (unless there is a type with more precision than long double, but there
>    is no such type for any supported arch, and C99 is missing support for
>    it).  If this were a problem, then we would have to be carefull about
>    the order of the type pun and the precision change.  Do the type pun
>    first.
> 
> - also, be careful to apply any __typeof() to an arg with the correct
>    type.  Normally do the type pun first, and use the return type for
>    the local variable.  However, for printing, the arg should be the
>    type that has extra precision, and __typeof() on that doesn't give
>    the return type in general.
> 
>    In simple cases, The modified RETURNF() expands to something like
>      'return (hackfloat_t(hi + lo));',
>    where hi + lo is evalated in extra precision using float_t if possible,
>    hackfloat_t puns the resulting float_t value to plain float without
>    any destruction, and we finally return the value.
> 
>    In more complicated cases, the punned returned value is stored in a
>    variable for printing using RETURNP().  I forget if the type of this
>    variable is the return type or the arg type (printing in the return
>    type shows what callers will normally see but printing in the arg
>    type gives more details).  The caller uses RETURNP*(), and RETURNP*()
>    uses RETURNF() and RETURNF() might do the type pun.  My code is not
>    careful enough to handle all cases.  For the common hi + lo case,
>    RETURN2P[I]() is used to print hi and lo separately, so the extra
>    precision in hi + lo can be checked even on arches where hi + lo is
>    not evaluated in extra precision.
> 
> - finally, it is necessary to go through a macro to reach the printing.
>    Not many functions do this.  But most long double functions get this
>    at no extra cost (except complexity in the macros) by having to go
>    though ENTERI()/RETURNI().  Currently, the caller must decide, but
>    it is a trivaial edit to change RETURNI() to RETURNP() and a simple
>    edit to change RETURNI(hi + lo) to RETURN2P(hi, lo).  There is also
>    SUM2P() to handle more complicated cases of hi + lo.
> 
> The return value is not passed to RETURNV(), so it is harder to get
> automatic printing at no extra cost.  In s_sincosf.c and s_sincos.c,
> the macros are just not used, the same as for s_sinf.c, etc.  The
> float case double on all arches instead of double_t.  On i386, this
> reduces to a slightly pessimized use of double_t, and on arches with
> no extra precision it sacrifices a little efficiency to get more
> accuracy.  In s_sincosl.c, the values are calculated and stored in
> local variables, and then RETURNV() returns void.
> 
> This reminds me of a reason why I don't like sincos*().  Its API
> requires destruction of efficiency and accuracy by returning the values
> indirectly.  On i386 with not very old CPUs, this costs about 8 cycles per
> long double value.  Float and double values cost about half as much.  On
> amd64, the long double case is the same and the float and double cases
> are faster.

Not sure your efficiency claim holds.  I've seen significant improves 
in cexp and cexpf where sin[f]() and cos[f]() are replaced by 
sincos[f].  On my core2 running i386 freebsd, I see 0.1779 usecs/call
for cexpf with sinf and cosf and 0.12522 usecs/call for sincosf.
Yes, that's a 29.6% improvement.  For cexp the numbers are 0.2697
usecs/call for sin and cos and 0.20586 for sincos (ie, 23.7% improvement).
This is for z = x + I y with x and y in the non-exceptable case.

> sinf() and cosf() on small args take only 15-20 cycles (thoughput) on
> amd64 with not very old CPUs, so 2-8 extra cycles for the 2 indirect
> return values is a lot.  sincosf() still ends up being slightly faster
> than separate sinf()/cosf().

Seems to be much faster when used in other functions.

-- 
Steve

From owner-freebsd-numerics@freebsd.org  Wed Feb 27 08:42:49 2019
Return-Path: <owner-freebsd-numerics@freebsd.org>
Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 422E71515B9A
 for <freebsd-numerics@mailman.ysv.freebsd.org>;
 Wed, 27 Feb 2019 08:42:49 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from mail104.syd.optusnet.com.au (mail104.syd.optusnet.com.au
 [211.29.132.246])
 by mx1.freebsd.org (Postfix) with ESMTP id 157F26D4DF
 for <freebsd-numerics@freebsd.org>; Wed, 27 Feb 2019 08:42:45 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from [192.168.0.102] (c110-21-101-228.carlnfd1.nsw.optusnet.com.au
 [110.21.101.228])
 by mail104.syd.optusnet.com.au (Postfix) with ESMTPS id 610EB4365DE;
 Wed, 27 Feb 2019 19:42:42 +1100 (AEDT)
Date: Wed, 27 Feb 2019 19:42:41 +1100 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Steve Kargl <sgk@troutmask.apl.washington.edu>
cc: freebsd-numerics@freebsd.org
Subject: Re: Implement cexpl
In-Reply-To: <20190227020142.GA74833@troutmask.apl.washington.edu>
Message-ID: <20190227170706.J907@besplex.bde.org>
References: <20190203173056.GA42499@troutmask.apl.washington.edu>
 <20190227020142.GA74833@troutmask.apl.washington.edu>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Optus-CM-Score: 0
X-Optus-CM-Analysis: v=2.2 cv=FNpr/6gs c=1 sm=1 tr=0
 a=PalzARQSbocsUSjMRkwAPg==:117 a=PalzARQSbocsUSjMRkwAPg==:17
 a=kj9zAlcOel0A:10 a=z-HYpUQ0eYvp4CnOBZ8A:9 a=CjuIK1q_8ugA:10
X-Rspamd-Queue-Id: 157F26D4DF
X-Spamd-Bar: ------
Authentication-Results: mx1.freebsd.org;
 spf=pass (mx1.freebsd.org: domain of brde@optusnet.com.au designates
 211.29.132.246 as permitted sender) smtp.mailfrom=brde@optusnet.com.au
X-Spamd-Result: default: False [-6.32 / 15.00]; ARC_NA(0.00)[];
 NEURAL_HAM_MEDIUM(-1.00)[-1.000,0];
 RCVD_IN_DNSWL_LOW(-0.10)[246.132.29.211.list.dnswl.org : 127.0.5.1];
 FROM_HAS_DN(0.00)[]; TO_DN_SOME(0.00)[];
 R_SPF_ALLOW(-0.20)[+ip4:211.29.132.0/23];
 FREEMAIL_FROM(0.00)[optusnet.com.au];
 MIME_GOOD(-0.10)[text/plain]; DMARC_NA(0.00)[optusnet.com.au];
 NEURAL_HAM_LONG(-1.00)[-1.000,0];
 TO_MATCH_ENVRCPT_SOME(0.00)[];
 MX_GOOD(-0.01)[cached: extmail.optusnet.com.au];
 RCPT_COUNT_TWO(0.00)[2]; NEURAL_HAM_SHORT(-0.93)[-0.931,0];
 IP_SCORE(-3.08)[ip: (-7.92), ipnet: 211.28.0.0/14(-4.14), asn: 4804(-3.30),
 country: AU(-0.04)]; RCVD_NO_TLS_LAST(0.10)[];
 FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[];
 FREEMAIL_ENVFROM(0.00)[optusnet.com.au];
 ASN(0.00)[asn:4804, ipnet:211.28.0.0/14, country:AU];
 MIME_TRACE(0.00)[0:+]; RCVD_COUNT_TWO(0.00)[2]
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
 <freebsd-numerics.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics/>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 27 Feb 2019 08:42:49 -0000

On Tue, 26 Feb 2019, Steve Kargl wrote:

> On Sun, Feb 03, 2019 at 09:30:56AM -0800, Steve Kargl wrote:
>> The following patch implements cexpl() for ld80 and ld128 architectures.
> ...
> Index: lib/msun/src/math_private.h
> ===================================================================
> --- lib/msun/src/math_private.h	(revision 344600)
> +++ lib/msun/src/math_private.h	(working copy)
> @@ -243,6 +243,20 @@
> } while (0)
>
> /*
> + * Get expsign and mantissa as 16 bit and 2 32-bit ints from an 80 bit long
> + * double.
> + */
> +
> +#define	EXTRACT_LDBL80_WORDS2(ix0,ix1,ix2,d)				\
> +do {								\
> +  union IEEEl2bits ew_u;					\
> +  ew_u.e = (d);							\
> +  (ix0) = ew_u.xbits.expsign;					\
> +  (ix1) = ew_u.bits.manh;					\
> +  (ix2) = ew_u.bits.manl;					\
> +} while (0)

Don't add more of these accessor functions (more were already added to
support powl, since powl doesn't use the FreeBSD spelling in APIs).

Use existing separate functions and let the compiler combine them.
The one corresponding to this is EXTRACT_LDBL80_WORDS().  That extracts
the mantissa into a single 64-bit word which is easier to use than the
2 32-bit words given by the above.  It also doesn't ask for pessimal
extraction from memory 32 bits at a time, but compilers usually combine
into a single 64-bit word for the memory access and then split into
2 32-bit words for 32-bit arches.

3 words instead of 2 also enlarges diffs with the double precision case.

> Index: lib/msun/src/s_cexp.c
> ===================================================================
> --- lib/msun/src/s_cexp.c	(revision 344600)
> +++ lib/msun/src/s_cexp.c	(working copy)
> @@ -30,6 +30,7 @@
> __FBSDID("$FreeBSD$");
>
> #include <complex.h>
> +#include <float.h>
> #include <math.h>
>
> #include "math_private.h"
> @@ -41,12 +42,19 @@
> double complex
> cexp(double complex z)
> {
> -	double x, y, exp_x;
> +	double c, exp_x, s, x, y;
> 	uint32_t hx, hy, lx, ly;
>
> 	x = creal(z);
> 	y = cimag(z);
>
> +	EXTRACT_WORDS(hx, lx, x);
> +	/* cexp(0 + I y) = cos(y) + I sin(y) */
> +	if (((hx & 0x7fffffff) | lx) == 0) {
> +		sincos(y, &s, &c);
> +		return (CMPLX(c, s));
> +	}
> +
> 	EXTRACT_WORDS(hy, ly, y);
> 	hy &= 0x7fffffff;
>

Another reason I don't like sincos*() is that if the compiler can very
easily do the above rewrite automatically.  The only large difficulty
is to know if the sin(), cos() and sincos() are any good, or at least
their relative badness.  x86 compilers already know that the i387 sin()
and cos() are no good, (or perhaps they just don't know that the FreeBSD
implementation doesn't make the i387 functions non-conforming to C99
and/or POSIX (since they don't set errno and might do other side effects
wrong)), so x86 compilers don't convert the above to i387 sin(), cos() or
sincos().  It is even harder for compilers to know the quality of the
library.

> @@ -53,10 +61,6 @@
> 	/* cexp(x + I 0) = exp(x) + I 0 */
> 	if ((hy | ly) == 0)
> 		return (CMPLX(exp(x), y));

The comments aren't as careful as for hyperbolic functions.  The 0 here
is actually -0 if y is -0.  The code handles both +-0 by clearing the
sign bit in hy for the check byt keeping it in y.

> Index: lib/msun/ld128/k_cexpl.c
> ...
> +/*
> + * __ldexp_exp(x, expt) and __ldexp_cexp(x, expt) compute exp(x) * 2**expt.
> + * They are intended for large arguments (real part >= ln(DBL_MAX))
> + * where care is needed to avoid overflow.
> + *
> + * The present implementation is narrowly tailored for our hyperbolic and
> + * exponential functions.  We assume expt is small (0 or -1), and the caller
> + * has filtered out very large x, for which overflow would be inevitable.
> + */
> +
> +long double
> +__ldexp_expl(long double x, int expt)
> +{
> +#if 0
> +	double exp_x, scale;
> +	int ex_expt;
> +
> +	exp_x = __frexp_exp(x, &ex_expt);
> +	expt += ex_expt;
> +	INSERT_WORDS(scale, (0x3ff + expt) << 20, 0);
> +	return (exp_x * scale);
> +#endif
> +	return (x - x) / (x - x);
> +}

This is less likely to work than the comment says :-).

> +
> +long double complex
> +__ldexp_cexpl(long double complex z, int expt)
> +{
> +#if 0
> +	double x, y, exp_x, scale1, scale2;
> +	int ex_expt, half_expt;

Same.  The code that has a chance of working won't even be tested at
compile time since it is ifdefed out.

> Index: lib/msun/ld128/s_cexpl.c
> ===================================================================
> --- lib/msun/ld128/s_cexpl.c	(nonexistent)
> +++ lib/msun/ld128/s_cexpl.c	(working copy)
> @@ -0,0 +1,76 @@
> ...
> +	exp_x = expl(x);
> +	sincosl(y, &s, &c);
> +	return (CMPLXL(exp_x * c, exp_x * s));

This doesn't use the kernel, although one is written (but always returns
NaN).  For testing, the low quality kernel should at least use the
above formula.

> Index: lib/msun/ld80/s_cexpl.c
> ===================================================================
> --- lib/msun/ld80/s_cexpl.c	(nonexistent)
> +++ lib/msun/ld80/s_cexpl.c	(working copy)
> @@ -0,0 +1,105 @@
> ...
> +long double complex
> +cexpl (long double complex z)
> +{
> +	long double c, exp_x, s, x, y;
> +	uint32_t hwx, hwy, lwx, lwy;
> +	uint16_t hx, hy;
> +
> +	ENTERI(z);
> +
> +	x = creall(z);
> +	y = cimagl(z);
> +
> +	EXTRACT_LDBL80_WORDS2(hx, hwx, lwx, x);
> +	EXTRACT_LDBL80_WORDS2(hy, hwy, lwy, y);

This asks for slowness relative to the double precision method, by doing
2 slow extraction steps up front.

This seems to be broken by not using the normal method of hx &= 0x7fff
up front.

> +
> +	/* cexp(0 + I y) = cos(y) + I sin(y) */
> +	if ((hwx | lwx | (hx & 0x7fff)) == 0) {

The 2 words for the mantissa are just harder to use.  Write hwx | lwx
normally as lx after extracting into 1 64-bit word.

This handles -0 correctly by masking hx with 0x7fff.

> +		sincosl(y, &s, &c);
> +		RETURNI(CMPLXL(c, s));
> +	}
> +
> +	/* cexp(x + I 0) = exp(x) + I 0 */
> +	if ((hwy | lwy | (hy & 0x7fff)) == 0)
> +		RETURNI(CMPLXL(expl(x), y));
> +
> +	if (hy >= 0x7fff) {

Bugs from not masking h[xy] with 0x7fff up front start here.  This
misclassifies all negative y's except -0 as being NaNs.

> +		if (hx < 0x7fff ||
> +		    (hx == 0x7fff && (hwx & 0x7fffffff) != 0)) {

Many bugs here:
- not masking hx with 0x7fff
- not checking the low bits of the mantissa.  It is foot shooting to
   have these bits in a separate variable
- not checking the normalization bit.

Fix this by copying mostly logl() (expl() is too subtle to copy):

 	lx = from correct extraction
 	ix = hx & 0x7fff;	/* up front, as in logl */
 	...
 		if (ix < 0x7fff ||
 		    (hx == 0x7fff && lx != 0x8000000000000000))

The first line classifies finite x, and the second line classifies
everyhing else except +-Inf.  The other cases are:
- normal NaN with top bit set
- unnormal NaN with top bit set
- unnormal +-Inf with top bit clear.

This is slightly wrong too.  x87 hardware treats unnormal finite values
as NaNs, so this should treat them as NaNs too.  Checking the normalization
bit first works especially easily here.  There are complications for
normalized zeros and normalized denormals (sic), but we have already
handled the zeros.  IIRC, the hardware doesn't treat normalized denomals
as NaNs, so we get the correct result by treating them as finite.  This
gives:

 	...
 		if ((lx & 0x8000000000000000) == 0 || ix < 0x7fff ||
 		    (ix == 0x7fff && (lx & 0x7fffffffffffffff) != 0))

> +			/* cexp(finite|NaN +- I Inf|NaN) = NaN + I NaN */
> +			RETURNI(CMPLXL(y - y, y - y));
> +		} else if (hx & 0x8000) {

cexp() doesn't mask the sign bit in hx or use ix, so as to test the sign
via hx here.  It clobbers the sign bit in hy.

> +			/* cexp(-Inf +- I Inf|NaN) = 0 + I 0 */
> +			RETURNI(CMPLXL(0.0L, 0.0L));

Style bug: explcit long double constants.  The float case risks using
double precision by spelling the constants 0.0, the same as the double
case.  Plain 0 is a better spelling.

There may be further bugs from misclassification of y using hy >= 0x7fff.
This is supposed to classify Infs and NaNs.  The masking bug makes it
misclassify all negative values and -0.  The further bugs are that it
doesn't classify unnormals (except unnormal denormals) as NaNs.  Only
ld80 has unnormals.

Bruce

From owner-freebsd-numerics@freebsd.org  Wed Feb 27 10:16:08 2019
Return-Path: <owner-freebsd-numerics@freebsd.org>
Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 29EE51518470
 for <freebsd-numerics@mailman.ysv.freebsd.org>;
 Wed, 27 Feb 2019 10:16:08 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from mail106.syd.optusnet.com.au (mail106.syd.optusnet.com.au
 [211.29.132.42])
 by mx1.freebsd.org (Postfix) with ESMTP id 6316E70622
 for <freebsd-numerics@freebsd.org>; Wed, 27 Feb 2019 10:16:04 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from [192.168.0.102] (c110-21-101-228.carlnfd1.nsw.optusnet.com.au
 [110.21.101.228])
 by mail106.syd.optusnet.com.au (Postfix) with ESMTPS id 752763D6714;
 Wed, 27 Feb 2019 21:15:54 +1100 (AEDT)
Date: Wed, 27 Feb 2019 21:15:52 +1100 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Steve Kargl <sgk@troutmask.apl.washington.edu>
cc: freebsd-numerics@freebsd.org
Subject: Re: Update ENTERI() macro
In-Reply-To: <20190227074811.GA75972@troutmask.apl.washington.edu>
Message-ID: <20190227201214.V1823@besplex.bde.org>
References: <20190226191825.GA68479@troutmask.apl.washington.edu>
 <20190227145002.P907@besplex.bde.org>
 <20190227074811.GA75972@troutmask.apl.washington.edu>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Optus-CM-Score: 0
X-Optus-CM-Analysis: v=2.2 cv=P6RKvmIu c=1 sm=1 tr=0
 a=PalzARQSbocsUSjMRkwAPg==:117 a=PalzARQSbocsUSjMRkwAPg==:17
 a=kj9zAlcOel0A:10 a=_NwNQ0rs8jac9D6pTMUA:9 a=CjuIK1q_8ugA:10
X-Rspamd-Queue-Id: 6316E70622
X-Spamd-Bar: ------
Authentication-Results: mx1.freebsd.org;
 spf=pass (mx1.freebsd.org: domain of brde@optusnet.com.au designates
 211.29.132.42 as permitted sender) smtp.mailfrom=brde@optusnet.com.au
X-Spamd-Result: default: False [-6.44 / 15.00]; ARC_NA(0.00)[];
 NEURAL_HAM_MEDIUM(-1.00)[-1.000,0];
 RCVD_IN_DNSWL_LOW(-0.10)[42.132.29.211.list.dnswl.org : 127.0.5.1];
 FROM_HAS_DN(0.00)[]; TO_DN_SOME(0.00)[];
 R_SPF_ALLOW(-0.20)[+ip4:211.29.132.0/23];
 FREEMAIL_FROM(0.00)[optusnet.com.au];
 MIME_GOOD(-0.10)[text/plain]; DMARC_NA(0.00)[optusnet.com.au];
 NEURAL_HAM_LONG(-1.00)[-1.000,0];
 TO_MATCH_ENVRCPT_SOME(0.00)[];
 MX_GOOD(-0.01)[cached: extmail.optusnet.com.au];
 RCPT_COUNT_TWO(0.00)[2]; NEURAL_HAM_SHORT(-0.88)[-0.884,0];
 IP_SCORE(-3.25)[ip: (-8.75), ipnet: 211.28.0.0/14(-4.15), asn: 4804(-3.30),
 country: AU(-0.04)]; RCVD_NO_TLS_LAST(0.10)[];
 FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[];
 FREEMAIL_ENVFROM(0.00)[optusnet.com.au];
 ASN(0.00)[asn:4804, ipnet:211.28.0.0/14, country:AU];
 MIME_TRACE(0.00)[0:+]; RCVD_COUNT_TWO(0.00)[2]
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
 <freebsd-numerics.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics/>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 27 Feb 2019 10:16:08 -0000

On Tue, 26 Feb 2019, Steve Kargl wrote:

> On Wed, Feb 27, 2019 at 05:05:15PM +1100, Bruce Evans wrote:
>> On Tue, 26 Feb 2019, Steve Kargl wrote:
>* ...
>>> Update the ENTERI() macro in math_private.h to take a parameter.
>> ...
>> I don't like this.  It churns and complicates all the simple cases
>> that only need ENTERI().  It bogotifies the existence of ENTERIT(),
> ...
> Okay.  The other option is an ENTERC() and RETURNC() as
> we need to toggle FP_PE for long double complex functions.
> I suppose I could follow the one example currently in the
> tree that use
>
> 	ENTERIT(long double complex)
>
> I find it somewhat odd that we have
>
> 	ENTERI() /* Implicit declaration of __retval to long double. */
>
> but must use directly ENTERIT(long double complex).

ENTERI() hard-codes the long double for simplicity.  Remember, it is only
needed for long double precision on i386.  But I forgot about long double
complex types, and didn't dream about indirect long double types in sincosl().

> ...
>>> -#define	RETURNI(x)	RETURNF(x)
>>> +#define	ENTERI(a)
>>> +#define	RETURNI(a)	RETURNF(a)
>>> #define	ENTERV()
>>> #define	RETURNV()	return
>>> #endif
>>
>> This also changes RETURNI(), by unimproving its parameter name.  'x' for
>> ENTERI() wasn't a very good name for a type, but is good for a variable.
>> 'x' for RETURNI() is slightly worse than 'r', but better than 'a'
>
> The renaming is for consistency.  I can use 'r'.

'r' is not quite right either, since the arg can be and is often an
expression.  'a' is good for 'arg'.

>> ...
>> But I now see 3 more problems.  The return in RETURNI() is not direct,
>> but goes through the macro RETURNF(x).  In the committed version, this
>> is a default that just returns x, but in my version it returns
>> hackdouble_t(x) or hackfloat_t(x) in some cases (no cases are needed
>> for long doubles, so there is no interaction with ENTERI()/LEAVEI(),
>> and I only do this in a few simple cases not including any with
>> complex types).
>
> I'm fine with making ENTERI() only toggle precision, and adding
> a LEAVEI() to reset precision.  RETURNI(r) would then be
>
> #define RETURNI(r)	\
> do {			\
>   LEAVEI();		\
>   return (r);		\
> } while (0)

No, may be an expression, so it must be evaluated before LEAVEI().  This
is the reason for existence of the variable to hold the result.

>> [... about complications for the general case]

>> This reminds me of a reason why I don't like sincos*().  Its API
>> requires destruction of efficiency and accuracy by returning the values
>> indirectly.  On i386 with not very old CPUs, this costs about 8 cycles per
>> long double value.  Float and double values cost about half as much.  On
>> amd64, the long double case is the same and the float and double cases
>> are faster.
>
> Not sure your efficiency claim holds.  I've seen significant improves
> in cexp and cexpf where sin[f]() and cos[f]() are replaced by
> sincos[f].  On my core2 running i386 freebsd, I see 0.1779 usecs/call
> for cexpf with sinf and cosf and 0.12522 usecs/call for sincosf.
> Yes, that's a 29.6% improvement.  For cexp the numbers are 0.2697
> usecs/call for sin and cos and 0.20586 for sincos (ie, 23.7% improvement).
> This is for z = x + I y with x and y in the non-exceptable case.

Combined sin and cos probably does work better outside of benchmarks for
sin and cos alone, since it does less work so leaves more resources for
the, more useful things.

>> sinf() and cosf() on small args take only 15-20 cycles (thoughput) on
>> amd64 with not very old CPUs, so 2-8 extra cycles for the 2 indirect
>> return values is a lot.  sincosf() still ends up being slightly faster
>> than separate sinf()/cosf().
>
> Seems to be much faster when used in other functions.

It's hard tp be much faster than 15-20 cycles.  The latency is more like
50 cycles, with 3 sinf()'s or cosf()'s running in parallel.

sincos() is far from the best possible optimization for repeated calls on
the same or nearby args.  If sin() and cos() cached the arg reduction, then
separate sin() and cos() on the same arg would run about as fast as sincos(),
and repeated sin()'s on the same arg would run much faster than now.
Caching the arg reduction may also be good when the arg changes slightly.
However, caching is slower if the args are not close.  Even a 1-entry cache
takes a long time to look up relative to the 15-20 cycles taken by sinf()
and cosf().  Caching is complicated by signal handlers and threads.  Perhaps
the right API one that has to ask for caching and provides the cache storage.
Then sincos() could be:

 	...
 	_dh_init(x, &dh);		/* prefill 1-entry cache dh */
 	s = _sin_cache(x, &dh, 1);	/* cache hit unless x is NaN
 					/* cache misses update dh */
 	c = _cos_cache(x, &dh, 1);	/* cache hit unless x is NaN
 	...

and with everything inlined this is little different from the current
sincos() except for NaNs.  NaNs can be cache hits too if you compare
them as bits, but the comparison should probably be x == dhp->dh_x
for a 1-entry cache, so as to not to have to extract the bits of x.

Bruce

From owner-freebsd-numerics@freebsd.org  Wed Feb 27 16:19:11 2019
Return-Path: <owner-freebsd-numerics@freebsd.org>
Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id BE29A1521000
 for <freebsd-numerics@mailman.ysv.freebsd.org>;
 Wed, 27 Feb 2019 16:19:11 +0000 (UTC)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu
 [128.95.76.21])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 server-signature RSA-PSS (4096 bits)
 client-signature RSA-PSS (2048 bits) client-digest SHA256)
 (Client CN "troutmask", Issuer "troutmask" (not verified))
 by mx1.freebsd.org (Postfix) with ESMTPS id B8D9B84AFF
 for <freebsd-numerics@freebsd.org>; Wed, 27 Feb 2019 16:19:09 +0000 (UTC)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: from troutmask.apl.washington.edu (localhost [127.0.0.1])
 by troutmask.apl.washington.edu (8.15.2/8.15.2) with ESMTPS id x1RGJ7J6077887
 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NO);
 Wed, 27 Feb 2019 08:19:07 -0800 (PST)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: (from sgk@localhost)
 by troutmask.apl.washington.edu (8.15.2/8.15.2/Submit) id x1RGJ6kv077886;
 Wed, 27 Feb 2019 08:19:06 -0800 (PST) (envelope-from sgk)
Date: Wed, 27 Feb 2019 08:19:06 -0800
From: Steve Kargl <sgk@troutmask.apl.washington.edu>
To: Bruce Evans <brde@optusnet.com.au>
Cc: freebsd-numerics@freebsd.org
Subject: Re: Update ENTERI() macro
Message-ID: <20190227161906.GA77785@troutmask.apl.washington.edu>
Reply-To: sgk@troutmask.apl.washington.edu
References: <20190226191825.GA68479@troutmask.apl.washington.edu>
 <20190227145002.P907@besplex.bde.org>
 <20190227074811.GA75972@troutmask.apl.washington.edu>
 <20190227201214.V1823@besplex.bde.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20190227201214.V1823@besplex.bde.org>
User-Agent: Mutt/1.11.2 (2019-01-07)
X-Rspamd-Queue-Id: B8D9B84AFF
X-Spamd-Bar: +
Authentication-Results: mx1.freebsd.org
X-Spamd-Result: default: False [1.32 / 15.00]; ARC_NA(0.00)[];
 HAS_REPLYTO(0.00)[sgk@troutmask.apl.washington.edu];
 NEURAL_HAM_MEDIUM(-0.47)[-0.472,0]; FROM_HAS_DN(0.00)[];
 TO_DN_SOME(0.00)[]; NEURAL_SPAM_SHORT(0.90)[0.899,0];
 MIME_GOOD(-0.10)[text/plain]; RCVD_TLS_LAST(0.00)[];
 DMARC_NA(0.00)[washington.edu]; AUTH_NA(1.00)[];
 REPLYTO_ADDR_EQ_FROM(0.00)[]; RCVD_COUNT_THREE(0.00)[3];
 TO_MATCH_ENVRCPT_SOME(0.00)[];
 RCVD_IN_DNSWL_MED(-0.20)[21.76.95.128.list.dnswl.org : 127.0.11.2];
 MX_GOOD(-0.01)[cached: troutmask.apl.washington.edu];
 RCPT_COUNT_TWO(0.00)[2]; NEURAL_SPAM_LONG(0.15)[0.149,0];
 R_SPF_NA(0.00)[]; FREEMAIL_TO(0.00)[optusnet.com.au];
 FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[];
 MIME_TRACE(0.00)[0:+];
 ASN(0.00)[asn:73, ipnet:128.95.0.0/16, country:US];
 MID_RHS_MATCH_FROM(0.00)[];
 IP_SCORE(0.05)[ip: (0.11), ipnet: 128.95.0.0/16(0.16), asn: 73(0.06),
 country: US(-0.07)]
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
 <freebsd-numerics.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics/>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 27 Feb 2019 16:19:12 -0000

On Wed, Feb 27, 2019 at 09:15:52PM +1100, Bruce Evans wrote:
> On Tue, 26 Feb 2019, Steve Kargl wrote:
> 
> > On Wed, Feb 27, 2019 at 05:05:15PM +1100, Bruce Evans wrote:
> >> On Tue, 26 Feb 2019, Steve Kargl wrote:
> >* ...
> >>> Update the ENTERI() macro in math_private.h to take a parameter.
> >> ...
> >> I don't like this.  It churns and complicates all the simple cases
> >> that only need ENTERI().  It bogotifies the existence of ENTERIT(),
> > ...
> > Okay.  The other option is an ENTERC() and RETURNC() as
> > we need to toggle FP_PE for long double complex functions.
> > I suppose I could follow the one example currently in the
> > tree that use
> >
> > 	ENTERIT(long double complex)
> >
> > I find it somewhat odd that we have
> >
> > 	ENTERI() /* Implicit declaration of __retval to long double. */
> >
> > but must use directly ENTERIT(long double complex).
> 
> ENTERI() hard-codes the long double for simplicity.  Remember, it is only
> needed for long double precision on i386.  But I forgot about long double
> complex types, and didn't dream about indirect long double types in sincosl().

That simplicity does not work for long double complex.  We will
need either ENTERIC as in

#define ENTERIC() ENTERIT(long double complex)

or a direct use of ENTERIT as you have done s_clogl.c

> 
> > ...
> >>> -#define	RETURNI(x)	RETURNF(x)
> >>> +#define	ENTERI(a)
> >>> +#define	RETURNI(a)	RETURNF(a)
> >>> #define	ENTERV()
> >>> #define	RETURNV()	return
> >>> #endif
> >>
> >> This also changes RETURNI(), by unimproving its parameter name.  'x' for
> >> ENTERI() wasn't a very good name for a type, but is good for a variable.
> >> 'x' for RETURNI() is slightly worse than 'r', but better than 'a'
> >
> > The renaming is for consistency.  I can use 'r'.
> 
> 'r' is not quite right either, since the arg can be and is often an
> expression.  'a' is good for 'arg'.
> 
> >> ...
> >> But I now see 3 more problems.  The return in RETURNI() is not direct,
> >> but goes through the macro RETURNF(x).  In the committed version, this
> >> is a default that just returns x, but in my version it returns
> >> hackdouble_t(x) or hackfloat_t(x) in some cases (no cases are needed
> >> for long doubles, so there is no interaction with ENTERI()/LEAVEI(),
> >> and I only do this in a few simple cases not including any with
> >> complex types).
> >
> > I'm fine with making ENTERI() only toggle precision, and adding
> > a LEAVEI() to reset precision.  RETURNI(r) would then be
> >
> > #define RETURNI(r)	\
> > do {		\
> >   LEAVEI();		\
> >   return (r);	\
> > } while (0)
> 
> No, may be an expression, so it must be evaluated before LEAVEI().  This
> is the reason for existence of the variable to hold the result.

So, we'll need RETURNI for long double and one for long double complex.
Or, we give RETURNI a second parameter, which is the input parameter of
the function

#define RETURNI(x, r)	\
do {			\
   x = (r)		\
   LEAVEI();		\
   return (r);		\
 } while (0)
 
This will cause a lot of churn.

So, it seems that ENTERIC is the way forward.

> >> [... about complications for the general case]
> 
> >> This reminds me of a reason why I don't like sincos*().  Its API
> >> requires destruction of efficiency and accuracy by returning the values
> >> indirectly.  On i386 with not very old CPUs, this costs about 8 cycles per
> >> long double value.  Float and double values cost about half as much.  On
> >> amd64, the long double case is the same and the float and double cases
> >> are faster.
> >
> > Not sure your efficiency claim holds.  I've seen significant improves
> > in cexp and cexpf where sin[f]() and cos[f]() are replaced by
> > sincos[f].  On my core2 running i386 freebsd, I see 0.1779 usecs/call
> > for cexpf with sinf and cosf and 0.12522 usecs/call for sincosf.
> > Yes, that's a 29.6% improvement.  For cexp the numbers are 0.2697
> > usecs/call for sin and cos and 0.20586 for sincos (ie, 23.7% improvement).
> > This is for z = x + I y with x and y in the non-exceptable case.
> 
> Combined sin and cos probably does work better outside of benchmarks for
> sin and cos alone, since it does less work so leaves more resources for
> the, more useful things.

Exactly!  I have a significant amount of Fortran code that does

   z = cmplx(cos(x), sin(x))

in modern C this is 'z = CMPLX(cos(x), sin(x))'.  GCC with optimization
enables will convert this to z = cexp(cmplx(0,x)) where it expects cexp
to optimize this to sincos().  GCC on FreeBSD will not do this optimization
because FreeBSD's libm is not C99 compliant.

> >> sinf() and cosf() on small args take only 15-20 cycles (thoughput) on
> >> amd64 with not very old CPUs, so 2-8 extra cycles for the 2 indirect
> >> return values is a lot.  sincosf() still ends up being slightly faster
> >> than separate sinf()/cosf().
> >
> > Seems to be much faster when used in other functions.
> 
> It's hard tp be much faster than 15-20 cycles.  The latency is more like
> 50 cycles, with 3 sinf()'s or cosf()'s running in parallel.
> 
> sincos() is far from the best possible optimization for repeated calls on
> the same or nearby args.  If sin() and cos() cached the arg reduction, then
> separate sin() and cos() on the same arg would run about as fast as sincos(),
> and repeated sin()'s on the same arg would run much faster than now.
> Caching the arg reduction may also be good when the arg changes slightly.
> However, caching is slower if the args are not close.  Even a 1-entry cache
> takes a long time to look up relative to the 15-20 cycles taken by sinf()
> and cosf().  Caching is complicated by signal handlers and threads.  Perhaps
> the right API one that has to ask for caching and provides the cache storage.
> Then sincos() could be:
> 
>  	...
>  	_dh_init(x, &dh);		/* prefill 1-entry cache dh */
>  	s = _sin_cache(x, &dh, 1);	/* cache hit unless x is NaN
>  					/* cache misses update dh */
>  	c = _cos_cache(x, &dh, 1);	/* cache hit unless x is NaN
>  	...
> 
> and with everything inlined this is little different from the current
> sincos() except for NaNs.  NaNs can be cache hits too if you compare
> them as bits, but the comparison should probably be x == dhp->dh_x
> for a 1-entry cache, so as to not to have to extract the bits of x.

When I worked on sincos() I tried a few variations.  This included
the simpliest implementation:

void
sincos(double x, double *s, double *c)
{
  *c = cos(x);
  *s=  sin(x);
}

I tried argument reduction with kernels.

void
sincos(double x, double *s, double *c)
{
  a = inline argument reduction done to set a.
  *c = k_cos(x);
  *s=  k_sin(x);
}

And finally the version that was committed where k_cos and k_sin
were manually inlined and re-arranged to reduce redundant computations.

Never thought about some caching mechanism.  It seems to be more
complicated than it may be worth.

-- 
Steve

From owner-freebsd-numerics@freebsd.org  Wed Feb 27 18:06:25 2019
Return-Path: <owner-freebsd-numerics@freebsd.org>
Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id B29111500CF1
 for <freebsd-numerics@mailman.ysv.freebsd.org>;
 Wed, 27 Feb 2019 18:06:25 +0000 (UTC)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu
 [128.95.76.21])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 server-signature RSA-PSS (4096 bits)
 client-signature RSA-PSS (2048 bits) client-digest SHA256)
 (Client CN "troutmask", Issuer "troutmask" (not verified))
 by mx1.freebsd.org (Postfix) with ESMTPS id 451558904D
 for <freebsd-numerics@freebsd.org>; Wed, 27 Feb 2019 18:06:24 +0000 (UTC)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: from troutmask.apl.washington.edu (localhost [127.0.0.1])
 by troutmask.apl.washington.edu (8.15.2/8.15.2) with ESMTPS id x1RI6L2t078491
 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NO);
 Wed, 27 Feb 2019 10:06:22 -0800 (PST)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: (from sgk@localhost)
 by troutmask.apl.washington.edu (8.15.2/8.15.2/Submit) id x1RI6LJh078490;
 Wed, 27 Feb 2019 10:06:21 -0800 (PST) (envelope-from sgk)
Date: Wed, 27 Feb 2019 10:06:21 -0800
From: Steve Kargl <sgk@troutmask.apl.washington.edu>
To: Bruce Evans <brde@optusnet.com.au>
Cc: freebsd-numerics@freebsd.org
Subject: Re: Implement cexpl
Message-ID: <20190227180621.GA78303@troutmask.apl.washington.edu>
Reply-To: sgk@troutmask.apl.washington.edu
References: <20190203173056.GA42499@troutmask.apl.washington.edu>
 <20190227020142.GA74833@troutmask.apl.washington.edu>
 <20190227170706.J907@besplex.bde.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20190227170706.J907@besplex.bde.org>
User-Agent: Mutt/1.11.2 (2019-01-07)
X-Rspamd-Queue-Id: 451558904D
X-Spamd-Bar: /
Authentication-Results: mx1.freebsd.org
X-Spamd-Result: default: False [0.91 / 15.00];
 HAS_REPLYTO(0.00)[sgk@troutmask.apl.washington.edu];
 TO_DN_SOME(0.00)[]; REPLYTO_ADDR_EQ_FROM(0.00)[];
 RCVD_COUNT_THREE(0.00)[3];
 RCVD_IN_DNSWL_MED(-0.20)[21.76.95.128.list.dnswl.org : 127.0.11.2];
 MX_GOOD(-0.01)[cached: troutmask.apl.washington.edu];
 RCPT_COUNT_TWO(0.00)[2]; FREEMAIL_TO(0.00)[optusnet.com.au];
 FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[];
 MIME_TRACE(0.00)[0:+];
 ASN(0.00)[asn:73, ipnet:128.95.0.0/16, country:US];
 MID_RHS_MATCH_FROM(0.00)[]; ARC_NA(0.00)[];
 NEURAL_HAM_MEDIUM(-0.51)[-0.512,0]; FROM_HAS_DN(0.00)[];
 NEURAL_SPAM_SHORT(0.66)[0.656,0]; MIME_GOOD(-0.10)[text/plain];
 RCVD_TLS_LAST(0.00)[]; DMARC_NA(0.00)[washington.edu];
 AUTH_NA(1.00)[]; TO_MATCH_ENVRCPT_SOME(0.00)[];
 NEURAL_SPAM_LONG(0.03)[0.030,0]; R_SPF_NA(0.00)[];
 IP_SCORE(0.05)[ip: (0.11), ipnet: 128.95.0.0/16(0.16), asn: 73(0.06),
 country: US(-0.07)]
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
 <freebsd-numerics.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics/>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 27 Feb 2019 18:06:25 -0000

On Wed, Feb 27, 2019 at 07:42:41PM +1100, Bruce Evans wrote:
> On Tue, 26 Feb 2019, Steve Kargl wrote:
> 
> > On Sun, Feb 03, 2019 at 09:30:56AM -0800, Steve Kargl wrote:
> >> The following patch implements cexpl() for ld80 and ld128 architectures.
> > ...
> > Index: lib/msun/src/math_private.h
> > ===================================================================
> > --- lib/msun/src/math_private.h	(revision 344600)
> > +++ lib/msun/src/math_private.h	(working copy)
> > @@ -243,6 +243,20 @@
> > } while (0)
> >
> > /*
> > + * Get expsign and mantissa as 16 bit and 2 32-bit ints from an 80 bit long
> > + * double.
> > + */
> > +
> > +#define	EXTRACT_LDBL80_WORDS2(ix0,ix1,ix2,d)				\
> > +do {								\
> > +  union IEEEl2bits ew_u;					\
> > +  ew_u.e = (d);							\
> > +  (ix0) = ew_u.xbits.expsign;					\
> > +  (ix1) = ew_u.bits.manh;					\
> > +  (ix2) = ew_u.bits.manl;					\
> > +} while (0)
> 
> Don't add more of these accessor functions (more were already added to
> support powl, since powl doesn't use the FreeBSD spelling in APIs).

This response was not totally unexpected. :-)

I haven't looked a bit twiddling details in awhile, so it can
be considered an exercise in "Let's get something working".

> Use existing separate functions and let the compiler combine them.
> The one corresponding to this is EXTRACT_LDBL80_WORDS().  That extracts
> the mantissa into a single 64-bit word which is easier to use than the
> 2 32-bit words given by the above.  It also doesn't ask for pessimal
> extraction from memory 32 bits at a time, but compilers usually combine
> into a single 64-bit word for the memory access and then split into
> 2 32-bit words for 32-bit arches.
> 
> 3 words instead of 2 also enlarges diffs with the double precision case.
> 
> > Index: lib/msun/src/s_cexp.c
> > ===================================================================
> > --- lib/msun/src/s_cexp.c	(revision 344600)
> > +++ lib/msun/src/s_cexp.c	(working copy)
> > @@ -30,6 +30,7 @@
> > __FBSDID("$FreeBSD$");
> >
> > #include <complex.h>
> > +#include <float.h>
> > #include <math.h>
> >
> > #include "math_private.h"
> > @@ -41,12 +42,19 @@
> > double complex
> > cexp(double complex z)
> > {
> > -	double x, y, exp_x;
> > +	double c, exp_x, s, x, y;
> > 	uint32_t hx, hy, lx, ly;
> >
> > 	x = creal(z);
> > 	y = cimag(z);
> >
> > +	EXTRACT_WORDS(hx, lx, x);
> > +	/* cexp(0 + I y) = cos(y) + I sin(y) */
> > +	if (((hx & 0x7fffffff) | lx) == 0) {
> > +		sincos(y, &s, &c);
> > +		return (CMPLX(c, s));
> > +	}
> > +
> > 	EXTRACT_WORDS(hy, ly, y);
> > 	hy &= 0x7fffffff;
> >
> 
> Another reason I don't like sincos*() is that if the compiler can very
> easily do the above rewrite automatically.  The only large difficulty
> is to know if the sin(), cos() and sincos() are any good, or at least
> their relative badness.  x86 compilers already know that the i387 sin()
> and cos() are no good, (or perhaps they just don't know that the FreeBSD
> implementation doesn't make the i387 functions non-conforming to C99
> and/or POSIX (since they don't set errno and might do other side effects
> wrong)), so x86 compilers don't convert the above to i387 sin(), cos() or
> sincos().  It is even harder for compilers to know the quality of the
> library.
 
Unfortunately, the compiler doesn't do the rewrite automatically.
AFAICT, for gcc  CMPLX(cos(x),sin(x)) is transformed into
cexp(CMPLX(0,x)).  It then assumes that cexp handles the special 
case.  gcc doesn't do this optimization, yet.

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89125

See comment #7.

> > @@ -53,10 +61,6 @@
> > 	/* cexp(x + I 0) = exp(x) + I 0 */
> > 	if ((hy | ly) == 0)
> > 		return (CMPLX(exp(x), y));
> 
> The comments aren't as careful as for hyperbolic functions.  The 0 here
> is actually -0 if y is -0.  The code handles both +-0 by clearing the
> sign bit in hy for the check byt keeping it in y.
> 
> > Index: lib/msun/ld128/k_cexpl.c
> > ...
> > +/*
> > + * __ldexp_exp(x, expt) and __ldexp_cexp(x, expt) compute exp(x) * 2**expt.
> > + * They are intended for large arguments (real part >= ln(DBL_MAX))
> > + * where care is needed to avoid overflow.
> > + *
> > + * The present implementation is narrowly tailored for our hyperbolic and
> > + * exponential functions.  We assume expt is small (0 or -1), and the caller
> > + * has filtered out very large x, for which overflow would be inevitable.
> > + */
> > +
> > +long double
> > +__ldexp_expl(long double x, int expt)
> > +{
> > +#if 0
> > +	double exp_x, scale;
> > +	int ex_expt;
> > +
> > +	exp_x = __frexp_exp(x, &ex_expt);
> > +	expt += ex_expt;
> > +	INSERT_WORDS(scale, (0x3ff + expt) << 20, 0);
> > +	return (exp_x * scale);
> > +#endif
> > +	return (x - x) / (x - x);
> > +}
> 
> This is less likely to work than the comment says :-).

I don't have ld128 hardware.  I can't write the functions.
When this is compile, one should see warnings 

#warning These functions are broken on ld128 architectures.
#warning Functions defined here are currently unused.
#warning Someone who cares needs to convert src/k_exp.c.

I suspect no one cares.

> > Index: lib/msun/ld128/s_cexpl.c
> > ===================================================================
> > --- lib/msun/ld128/s_cexpl.c	(nonexistent)
> > +++ lib/msun/ld128/s_cexpl.c	(working copy)
> > @@ -0,0 +1,76 @@
> > ...
> > +	exp_x = expl(x);
> > +	sincosl(y, &s, &c);
> > +	return (CMPLXL(exp_x * c, exp_x * s));
> 
> This doesn't use the kernel, although one is written (but always returns
> NaN).  For testing, the low quality kernel should at least use the
> above formula.

I don't have ld128 hardware.  I can't write the function in a manner
similar to src/s_cexp.c.  When this is compile, one should see warnings 

#warning cexpl is likely broken on ld128 architectures.
#warning Someone who cares needs to convert src/s_cexp.c.

I suspect no one cares.

Furthermore, when one links in cexpl(), she should see

__warn_references("Precision of cexpl may have lower than expected precision");

I suspect no one cares.

> > Index: lib/msun/ld80/s_cexpl.c
> > ===================================================================
> > --- lib/msun/ld80/s_cexpl.c	(nonexistent)
> > +++ lib/msun/ld80/s_cexpl.c	(working copy)
> > @@ -0,0 +1,105 @@
> > ...
> > +long double complex
> > +cexpl (long double complex z)
> > +{
> > +	long double c, exp_x, s, x, y;
> > +	uint32_t hwx, hwy, lwx, lwy;
> > +	uint16_t hx, hy;
> > +
> > +	ENTERI(z);
> > +
> > +	x = creall(z);
> > +	y = cimagl(z);
> > +
> > +	EXTRACT_LDBL80_WORDS2(hx, hwx, lwx, x);
> > +	EXTRACT_LDBL80_WORDS2(hy, hwy, lwy, y);
> 
> This asks for slowness relative to the double precision method, by doing
> 2 slow extraction steps up front.
> 
> This seems to be broken by not using the normal method of hx &= 0x7fff
> up front.
> 
> > +
> > +	/* cexp(0 + I y) = cos(y) + I sin(y) */
> > +	if ((hwx | lwx | (hx & 0x7fff)) == 0) {
> 
> The 2 words for the mantissa are just harder to use.  Write hwx | lwx
> normally as lx after extracting into 1 64-bit word.
> 
> This handles -0 correctly by masking hx with 0x7fff.
> 
> > +		sincosl(y, &s, &c);
> > +		RETURNI(CMPLXL(c, s));
> > +	}
> > +
> > +	/* cexp(x + I 0) = exp(x) + I 0 */
> > +	if ((hwy | lwy | (hy & 0x7fff)) == 0)
> > +		RETURNI(CMPLXL(expl(x), y));
> > +
> > +	if (hy >= 0x7fff) {
> 
> Bugs from not masking h[xy] with 0x7fff up front start here.  This
> misclassifies all negative y's except -0 as being NaNs.

Whoops.  I lost a month of testing due to the PAE vs non-PAE
fallout; yes, I should have done more testing.

-- 
Steve
20170425 https://www.youtube.com/watch?v=VWUpyCsUKR4
20161221 https://www.youtube.com/watch?v=IbCHE-hONow

From owner-freebsd-numerics@freebsd.org  Wed Feb 27 20:15:27 2019
Return-Path: <owner-freebsd-numerics@freebsd.org>
Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 45F3A1503C6C
 for <freebsd-numerics@mailman.ysv.freebsd.org>;
 Wed, 27 Feb 2019 20:15:27 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from mail105.syd.optusnet.com.au (mail105.syd.optusnet.com.au
 [211.29.132.249])
 by mx1.freebsd.org (Postfix) with ESMTP id D30AA8DAB2
 for <freebsd-numerics@freebsd.org>; Wed, 27 Feb 2019 20:15:24 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from [192.168.0.102] (c110-21-101-228.carlnfd1.nsw.optusnet.com.au
 [110.21.101.228])
 by mail105.syd.optusnet.com.au (Postfix) with ESMTPS id 7426B105FEF3;
 Thu, 28 Feb 2019 07:15:14 +1100 (AEDT)
Date: Thu, 28 Feb 2019 07:15:14 +1100 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Steve Kargl <sgk@troutmask.apl.washington.edu>
cc: freebsd-numerics@freebsd.org
Subject: Re: Update ENTERI() macro
In-Reply-To: <20190227161906.GA77785@troutmask.apl.washington.edu>
Message-ID: <20190228060920.R4413@besplex.bde.org>
References: <20190226191825.GA68479@troutmask.apl.washington.edu>
 <20190227145002.P907@besplex.bde.org>
 <20190227074811.GA75972@troutmask.apl.washington.edu>
 <20190227201214.V1823@besplex.bde.org>
 <20190227161906.GA77785@troutmask.apl.washington.edu>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Optus-CM-Score: 0
X-Optus-CM-Analysis: v=2.2 cv=P6RKvmIu c=1 sm=1 tr=0
 a=PalzARQSbocsUSjMRkwAPg==:117 a=PalzARQSbocsUSjMRkwAPg==:17
 a=kj9zAlcOel0A:10 a=k8ySeul569u4Qwmp2EoA:9 a=CjuIK1q_8ugA:10
X-Rspamd-Queue-Id: D30AA8DAB2
X-Spamd-Bar: ------
Authentication-Results: mx1.freebsd.org;
 spf=pass (mx1.freebsd.org: domain of brde@optusnet.com.au designates
 211.29.132.249 as permitted sender) smtp.mailfrom=brde@optusnet.com.au
X-Spamd-Result: default: False [-6.41 / 15.00]; ARC_NA(0.00)[];
 NEURAL_HAM_MEDIUM(-1.00)[-1.000,0];
 RCVD_IN_DNSWL_LOW(-0.10)[249.132.29.211.list.dnswl.org : 127.0.5.1];
 FROM_HAS_DN(0.00)[]; TO_DN_SOME(0.00)[];
 R_SPF_ALLOW(-0.20)[+ip4:211.29.132.0/23];
 FREEMAIL_FROM(0.00)[optusnet.com.au];
 MIME_GOOD(-0.10)[text/plain]; DMARC_NA(0.00)[optusnet.com.au];
 NEURAL_HAM_LONG(-1.00)[-1.000,0];
 TO_MATCH_ENVRCPT_SOME(0.00)[];
 MX_GOOD(-0.01)[cached: extmail.optusnet.com.au];
 RCPT_COUNT_TWO(0.00)[2]; NEURAL_HAM_SHORT(-0.95)[-0.947,0];
 IP_SCORE(-3.15)[ip: (-8.24), ipnet: 211.28.0.0/14(-4.16), asn: 4804(-3.31),
 country: AU(-0.04)]; RCVD_NO_TLS_LAST(0.10)[];
 FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[];
 FREEMAIL_ENVFROM(0.00)[optusnet.com.au];
 ASN(0.00)[asn:4804, ipnet:211.28.0.0/14, country:AU];
 MIME_TRACE(0.00)[0:+]; RCVD_COUNT_TWO(0.00)[2]
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
 <freebsd-numerics.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics/>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 27 Feb 2019 20:15:27 -0000

On Wed, 27 Feb 2019, Steve Kargl wrote:

> On Wed, Feb 27, 2019 at 09:15:52PM +1100, Bruce Evans wrote:
>>
>> ENTERI() hard-codes the long double for simplicity.  Remember, it is only
>> needed for long double precision on i386.  But I forgot about long double
>> complex types, and didn't dream about indirect long double types in sincosl().
>
> That simplicity does not work for long double complex.  We will
> 
> need either ENTERIC as in
>
> #define ENTERIC() ENTERIT(long double complex)
>
> or a direct use of ENTERIT as you have done s_clogl.c

I wrote ENTERIT() to work around this problem.

>>> I'm fine with making ENTERI() only toggle precision, and adding
>>> a LEAVEI() to reset precision.  RETURNI(r) would then be
>>>
>>> #define RETURNI(r)	\
>>> do {		\
>>>   LEAVEI();		\
>>>   return (r);	\
>>> } while (0)
>>
>> No, may be an expression, so it must be evaluated before LEAVEI().  This
>> is the reason for existence of the variable to hold the result.
>
> So, we'll need RETURNI for long double and one for long double complex.
> Or, we give RETURNI a second parameter, which is the input parameter of
> the function

I said to use your method of __typeof().  I tested this:

XX --- /tmp/math_private.h	Sun Nov 27 17:58:57 2005
XX +++ ./math_private.h	Thu Feb 28 06:17:26 2019
XX @@ -474,21 +474,22 @@
XX  /* Support switching the mode to FP_PE if necessary. */
XX  #if defined(__i386__) && !defined(NO_FPSETPREC)
XX -#define	ENTERI() ENTERIT(long double)
XX -#define	ENTERIT(returntype)			\
XX -	returntype __retval;			\
XX +#define	ENTERI()				\
XX  	fp_prec_t __oprec;			\
XX  						\
XX  	if ((__oprec = fpgetprec()) != FP_PE)	\
XX  		fpsetprec(FP_PE)
XX -#define	RETURNI(x) do {				\
XX -	__retval = (x);				\
XX -	if (__oprec != FP_PE)			\
XX -		fpsetprec(__oprec);		\
XX +#define	LEAVEI()				\
XX +	if ((__oprec = fpgetprec()) != FP_PE)	\
XX +		fpsetprec(FP_PE)
XX +#define	RETURNI(expr) do {			\
XX +	__typeof(expr) __retval = (expr);	\
XX +						\
XX +	LEAVEI();				\
XX  	RETURNF(__retval);			\
XX  } while (0)
XX  #else
XX  #define	ENTERI()
XX -#define	ENTERIT(x)
XX -#define	RETURNI(x)	RETURNF(x)
XX +#define	LEAVEI()
XX +#define	RETURNI(expr)	RETURNF(expr)
XX  #endif
XX

This compiles, but has minor problems.  Note that the apparent style
bug of initializing __retval in its declaration is needed in cases
where __typeof() gives a const type.  This happens in my code that
uses RETURNI(1 + tiny) to set inexact.  I think it would also happen
for RETURNI(1).  The type is then int instead of floating point, and
I need to check that this is harmless.

clogl() is the only user of ENTERIT().  Its size expands from 2302
bytes text to 2399 when compiled by gcc-3.3.3.  I hope that this is
just gcc not doing a very good job optimizing the returns (there are
many RETURNI()s fpr clogl()).  Repeating the return code instead of
jumping to it might even be optimal.

> #define RETURNI(x, r)	\
> do {			\
>   x = (r)		\
>   LEAVEI();		\
>   return (r);		\
> } while (0)
>
> This will cause a lot of churn.

Indeed.

My version causes 1 line of churn:

XX --- /tmp/s_clogl.c	Fri Jul 20 16:00:11 2018
XX +++ ./s_clogl.c	Thu Feb 28 05:58:05 2019
XX @@ -66,5 +66,5 @@
XX  	int kx, ky;
XX 
XX -	ENTERIT(long double complex);
XX +	ENTERI();
XX 
XX  	x = creall(z);

>> Combined sin and cos probably does work better outside of benchmarks for
>> sin and cos alone, since it does less work so leaves more resources for
>> the, more useful things.
>
> Exactly!  I have a significant amount of Fortran code that does
>
>   z = cmplx(cos(x), sin(x))
>
> in modern C this is 'z = CMPLX(cos(x), sin(x))'.  GCC with optimization
> enables will convert this to z = cexp(cmplx(0,x)) where it expects cexp
> to optimize this to sincos().

This is an pessimization unless everything is inlined.  An optimization
would convert cexp(cmplx(0,x)) to sin(x) and cos(x) or sincos(x).

> GCC on FreeBSD will not do this optimization
> because FreeBSD's libm is not C99 compliant.

It is more conformant than most for cexp().  I think old gcc just doesn't
attempt such optimizations.

> When I worked on sincos() I tried a few variations.  This included
> the simpliest implementation:
>
> void
> sincos(double x, double *s, double *c)
> {
>  *c = cos(x);
>  *s=  sin(x);
> }
>
> I tried argument reduction with kernels.
>
> void
> sincos(double x, double *s, double *c)
> {
>  a = inline argument reduction done to set a.
>  *c = k_cos(x);
>  *s=  k_sin(x);
> }

You mean *c = s_cos(x), etc.  That was good enough.

> And finally the version that was committed where k_cos and k_sin
> were manually inlined and re-arranged to reduce redundant computations.

That has excessive manual inlining.  It should have only inlined s_cos()
and s_sin(), and changed k_cos() and k_sin() from extern to static inline.
Someday the data for these inline functions should be deduplicated, but
the data is small compared with that for the expl kernel.

Bruce

From owner-freebsd-numerics@freebsd.org  Wed Feb 27 21:31:39 2019
Return-Path: <owner-freebsd-numerics@freebsd.org>
Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 859051505BCE
 for <freebsd-numerics@mailman.ysv.freebsd.org>;
 Wed, 27 Feb 2019 21:31:39 +0000 (UTC)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu
 [128.95.76.21])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 server-signature RSA-PSS (4096 bits)
 client-signature RSA-PSS (2048 bits) client-digest SHA256)
 (Client CN "troutmask", Issuer "troutmask" (not verified))
 by mx1.freebsd.org (Postfix) with ESMTPS id 5C9B190DB1
 for <freebsd-numerics@freebsd.org>; Wed, 27 Feb 2019 21:31:38 +0000 (UTC)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: from troutmask.apl.washington.edu (localhost [127.0.0.1])
 by troutmask.apl.washington.edu (8.15.2/8.15.2) with ESMTPS id x1RLVYjN010710
 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NO);
 Wed, 27 Feb 2019 13:31:34 -0800 (PST)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: (from sgk@localhost)
 by troutmask.apl.washington.edu (8.15.2/8.15.2/Submit) id x1RLVY9L010709;
 Wed, 27 Feb 2019 13:31:34 -0800 (PST) (envelope-from sgk)
Date: Wed, 27 Feb 2019 13:31:33 -0800
From: Steve Kargl <sgk@troutmask.apl.washington.edu>
To: Bruce Evans <brde@optusnet.com.au>
Cc: freebsd-numerics@freebsd.org
Subject: Re: Update ENTERI() macro
Message-ID: <20190227213133.GA10233@troutmask.apl.washington.edu>
Reply-To: sgk@troutmask.apl.washington.edu
References: <20190226191825.GA68479@troutmask.apl.washington.edu>
 <20190227145002.P907@besplex.bde.org>
 <20190227074811.GA75972@troutmask.apl.washington.edu>
 <20190227201214.V1823@besplex.bde.org>
 <20190227161906.GA77785@troutmask.apl.washington.edu>
 <20190228060920.R4413@besplex.bde.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20190228060920.R4413@besplex.bde.org>
User-Agent: Mutt/1.11.2 (2019-01-07)
X-Rspamd-Queue-Id: 5C9B190DB1
X-Spamd-Bar: /
Authentication-Results: mx1.freebsd.org
X-Spamd-Result: default: False [0.56 / 15.00]; ARC_NA(0.00)[];
 HAS_REPLYTO(0.00)[sgk@troutmask.apl.washington.edu];
 NEURAL_HAM_MEDIUM(-0.11)[-0.106,0]; FROM_HAS_DN(0.00)[];
 TO_DN_SOME(0.00)[]; NEURAL_SPAM_SHORT(0.17)[0.169,0];
 NEURAL_HAM_LONG(-0.24)[-0.243,0]; MIME_GOOD(-0.10)[text/plain];
 RCVD_TLS_LAST(0.00)[]; DMARC_NA(0.00)[washington.edu];
 AUTH_NA(1.00)[]; REPLYTO_ADDR_EQ_FROM(0.00)[];
 RCVD_COUNT_THREE(0.00)[3]; TO_MATCH_ENVRCPT_SOME(0.00)[];
 RCVD_IN_DNSWL_MED(-0.20)[21.76.95.128.list.dnswl.org : 127.0.11.2];
 MX_GOOD(-0.01)[cached: troutmask.apl.washington.edu];
 RCPT_COUNT_TWO(0.00)[2]; R_SPF_NA(0.00)[];
 FREEMAIL_TO(0.00)[optusnet.com.au]; FROM_EQ_ENVFROM(0.00)[];
 R_DKIM_NA(0.00)[]; MIME_TRACE(0.00)[0:+];
 ASN(0.00)[asn:73, ipnet:128.95.0.0/16, country:US];
 MID_RHS_MATCH_FROM(0.00)[];
 IP_SCORE(0.05)[ip: (0.10), ipnet: 128.95.0.0/16(0.16), asn: 73(0.06),
 country: US(-0.07)]
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
 <freebsd-numerics.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics/>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 27 Feb 2019 21:31:39 -0000

On Thu, Feb 28, 2019 at 07:15:14AM +1100, Bruce Evans wrote:
> On Wed, 27 Feb 2019, Steve Kargl wrote:
> 
> > On Wed, Feb 27, 2019 at 09:15:52PM +1100, Bruce Evans wrote:
> >>
> >> ENTERI() hard-codes the long double for simplicity.  Remember, it is only
> >> needed for long double precision on i386.  But I forgot about long double
> >> complex types, and didn't dream about indirect long double types in sincosl().
> >
> > That simplicity does not work for long double complex.  We will
> > 
> > need either ENTERIC as in
> >
> > #define ENTERIC() ENTERIT(long double complex)
> >
> > or a direct use of ENTERIT as you have done s_clogl.c
> 
> I wrote ENTERIT() to work around this problem.
> 
> >>> I'm fine with making ENTERI() only toggle precision, and adding
> >>> a LEAVEI() to reset precision.  RETURNI(r) would then be
> >>>
> >>> #define RETURNI(r)	\
> >>> do {		\
> >>>   LEAVEI();		\
> >>>   return (r);	\
> >>> } while (0)
> >>
> >> No, may be an expression, so it must be evaluated before LEAVEI().  This
> >> is the reason for existence of the variable to hold the result.
> >
> > So, we'll need RETURNI for long double and one for long double complex.
> > Or, we give RETURNI a second parameter, which is the input parameter of
> > the function
> 
> I said to use your method of __typeof().  I tested this:

Oh, I misunderstood.  I thought you meant that __typeof()
may have portability issues, so you want to avoid it.

> XX --- /tmp/math_private.h	Sun Nov 27 17:58:57 2005
> XX +++ ./math_private.h	Thu Feb 28 06:17:26 2019

(patched deleted to keep this short)

> This compiles, but has minor problems.  Note that the apparent style
> bug of initializing __retval in its declaration is needed in cases
> where __typeof() gives a const type.  This happens in my code that
> uses RETURNI(1 + tiny) to set inexact.  I think it would also happen
> for RETURNI(1).  The type is then int instead of floating point, and
> I need to check that this is harmless.

I've not used __typeof() until now.  Coming with my modern Fortran
background, doing __typeof(expr) __retval = (expr) feel weirdly
circular.  In Fortran, the RHS of an expression is evaluate independently
of the type of the LHS.  It is only at the actual assignment that
type conversion (if needed) occurs.

> clogl() is the only user of ENTERIT().  Its size expands from 2302
> bytes text to 2399 when compiled by gcc-3.3.3.  I hope that this is
> just gcc not doing a very good job optimizing the returns (there are
> many RETURNI()s fpr clogl()).  Repeating the return code instead of
> jumping to it might even be optimal.
> 
> > #define RETURNI(x, r)	\
> > do {			\
> >   x = (r)		\
> >   LEAVEI();		\
> >   return (r);		\
> > } while (0)
> >
> > This will cause a lot of churn.
> 
> Indeed.
> 
> My version causes 1 line of churn:
> 
> XX --- /tmp/s_clogl.c	Fri Jul 20 16:00:11 2018
> XX +++ ./s_clogl.c	Thu Feb 28 05:58:05 2019
> XX @@ -66,5 +66,5 @@
> XX  	int kx, ky;
> XX 
> XX -	ENTERIT(long double complex);
> XX +	ENTERI();
> XX 
> XX  	x = creall(z);
> 
> >> Combined sin and cos probably does work better outside of benchmarks for
> >> sin and cos alone, since it does less work so leaves more resources for
> >> the, more useful things.
> >
> > Exactly!  I have a significant amount of Fortran code that does
> >
> >   z = cmplx(cos(x), sin(x))
> >
> > in modern C this is 'z = CMPLX(cos(x), sin(x))'.  GCC with optimization
> > enables will convert this to z = cexp(cmplx(0,x)) where it expects cexp
> > to optimize this to sincos().
> 
> This is an pessimization unless everything is inlined.  An optimization
> would convert cexp(cmplx(0,x)) to sin(x) and cos(x) or sincos(x).
> 
> > GCC on FreeBSD will not do this optimization
> > because FreeBSD's libm is not C99 compliant.
> 
> It is more conformant than most for cexp().  I think old gcc just doesn't
> attempt such optimizations.
> 
> > When I worked on sincos() I tried a few variations.  This included
> > the simpliest implementation:
> >
> > void
> > sincos(double x, double *s, double *c)
> > {
> >  *c = cos(x);
> >  *s=  sin(x);
> > }
> >
> > I tried argument reduction with kernels.
> >
> > void
> > sincos(double x, double *s, double *c)
> > {
> >  a = inline argument reduction done to set a.
> >  *c = k_cos(x);
> >  *s=  k_sin(x);
> > }
> 
> You mean *c = s_cos(x), etc.  That was good enough.
> 
> > And finally the version that was committed where k_cos and k_sin
> > were manually inlined and re-arranged to reduce redundant computations.
> 
> That has excessive manual inlining.  It should have only inlined s_cos()
> and s_sin(), and changed k_cos() and k_sin() from extern to static inline.
> Someday the data for these inline functions should be deduplicated, but
> the data is small compared with that for the expl kernel.
> 

I recall you preferred the "good enough" implementation,
because it would reduce maintenance if either sin or cos
kernel needed to change.  Now, one needs to remember that
there is a sincos kernel.  I also remember that you felt
that a good compiler could optimize the inlined sin and
cos kernels.  I suppose FreeBSD didn't have good compilers
when I manually inlined the sincos kernel, because it was
a significant measurable improvement.

-- 
Steve