From owner-freebsd-numerics@freebsd.org  Wed Feb 27 08:42:49 2019
Return-Path: <owner-freebsd-numerics@freebsd.org>
Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 422E71515B9A
 for <freebsd-numerics@mailman.ysv.freebsd.org>;
 Wed, 27 Feb 2019 08:42:49 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from mail104.syd.optusnet.com.au (mail104.syd.optusnet.com.au
 [211.29.132.246])
 by mx1.freebsd.org (Postfix) with ESMTP id 157F26D4DF
 for <freebsd-numerics@freebsd.org>; Wed, 27 Feb 2019 08:42:45 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from [192.168.0.102] (c110-21-101-228.carlnfd1.nsw.optusnet.com.au
 [110.21.101.228])
 by mail104.syd.optusnet.com.au (Postfix) with ESMTPS id 610EB4365DE;
 Wed, 27 Feb 2019 19:42:42 +1100 (AEDT)
Date: Wed, 27 Feb 2019 19:42:41 +1100 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Steve Kargl <sgk@troutmask.apl.washington.edu>
cc: freebsd-numerics@freebsd.org
Subject: Re: Implement cexpl
In-Reply-To: <20190227020142.GA74833@troutmask.apl.washington.edu>
Message-ID: <20190227170706.J907@besplex.bde.org>
References: <20190203173056.GA42499@troutmask.apl.washington.edu>
 <20190227020142.GA74833@troutmask.apl.washington.edu>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Optus-CM-Score: 0
X-Optus-CM-Analysis: v=2.2 cv=FNpr/6gs c=1 sm=1 tr=0
 a=PalzARQSbocsUSjMRkwAPg==:117 a=PalzARQSbocsUSjMRkwAPg==:17
 a=kj9zAlcOel0A:10 a=z-HYpUQ0eYvp4CnOBZ8A:9 a=CjuIK1q_8ugA:10
X-Rspamd-Queue-Id: 157F26D4DF
X-Spamd-Bar: ------
Authentication-Results: mx1.freebsd.org;
 spf=pass (mx1.freebsd.org: domain of brde@optusnet.com.au designates
 211.29.132.246 as permitted sender) smtp.mailfrom=brde@optusnet.com.au
X-Spamd-Result: default: False [-6.32 / 15.00]; ARC_NA(0.00)[];
 NEURAL_HAM_MEDIUM(-1.00)[-1.000,0];
 RCVD_IN_DNSWL_LOW(-0.10)[246.132.29.211.list.dnswl.org : 127.0.5.1];
 FROM_HAS_DN(0.00)[]; TO_DN_SOME(0.00)[];
 R_SPF_ALLOW(-0.20)[+ip4:211.29.132.0/23];
 FREEMAIL_FROM(0.00)[optusnet.com.au];
 MIME_GOOD(-0.10)[text/plain]; DMARC_NA(0.00)[optusnet.com.au];
 NEURAL_HAM_LONG(-1.00)[-1.000,0];
 TO_MATCH_ENVRCPT_SOME(0.00)[];
 MX_GOOD(-0.01)[cached: extmail.optusnet.com.au];
 RCPT_COUNT_TWO(0.00)[2]; NEURAL_HAM_SHORT(-0.93)[-0.931,0];
 IP_SCORE(-3.08)[ip: (-7.92), ipnet: 211.28.0.0/14(-4.14), asn: 4804(-3.30),
 country: AU(-0.04)]; RCVD_NO_TLS_LAST(0.10)[];
 FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[];
 FREEMAIL_ENVFROM(0.00)[optusnet.com.au];
 ASN(0.00)[asn:4804, ipnet:211.28.0.0/14, country:AU];
 MIME_TRACE(0.00)[0:+]; RCVD_COUNT_TWO(0.00)[2]
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
 <freebsd-numerics.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics/>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 27 Feb 2019 08:42:49 -0000

On Tue, 26 Feb 2019, Steve Kargl wrote:

> On Sun, Feb 03, 2019 at 09:30:56AM -0800, Steve Kargl wrote:
>> The following patch implements cexpl() for ld80 and ld128 architectures.
> ...
> Index: lib/msun/src/math_private.h
> ===================================================================
> --- lib/msun/src/math_private.h	(revision 344600)
> +++ lib/msun/src/math_private.h	(working copy)
> @@ -243,6 +243,20 @@
> } while (0)
>
> /*
> + * Get expsign and mantissa as 16 bit and 2 32-bit ints from an 80 bit long
> + * double.
> + */
> +
> +#define	EXTRACT_LDBL80_WORDS2(ix0,ix1,ix2,d)				\
> +do {								\
> +  union IEEEl2bits ew_u;					\
> +  ew_u.e = (d);							\
> +  (ix0) = ew_u.xbits.expsign;					\
> +  (ix1) = ew_u.bits.manh;					\
> +  (ix2) = ew_u.bits.manl;					\
> +} while (0)

Don't add more of these accessor functions (more were already added to
support powl, since powl doesn't use the FreeBSD spelling in APIs).

Use existing separate functions and let the compiler combine them.
The one corresponding to this is EXTRACT_LDBL80_WORDS().  That extracts
the mantissa into a single 64-bit word which is easier to use than the
2 32-bit words given by the above.  It also doesn't ask for pessimal
extraction from memory 32 bits at a time, but compilers usually combine
into a single 64-bit word for the memory access and then split into
2 32-bit words for 32-bit arches.

3 words instead of 2 also enlarges diffs with the double precision case.

> Index: lib/msun/src/s_cexp.c
> ===================================================================
> --- lib/msun/src/s_cexp.c	(revision 344600)
> +++ lib/msun/src/s_cexp.c	(working copy)
> @@ -30,6 +30,7 @@
> __FBSDID("$FreeBSD$");
>
> #include <complex.h>
> +#include <float.h>
> #include <math.h>
>
> #include "math_private.h"
> @@ -41,12 +42,19 @@
> double complex
> cexp(double complex z)
> {
> -	double x, y, exp_x;
> +	double c, exp_x, s, x, y;
> 	uint32_t hx, hy, lx, ly;
>
> 	x = creal(z);
> 	y = cimag(z);
>
> +	EXTRACT_WORDS(hx, lx, x);
> +	/* cexp(0 + I y) = cos(y) + I sin(y) */
> +	if (((hx & 0x7fffffff) | lx) == 0) {
> +		sincos(y, &s, &c);
> +		return (CMPLX(c, s));
> +	}
> +
> 	EXTRACT_WORDS(hy, ly, y);
> 	hy &= 0x7fffffff;
>

Another reason I don't like sincos*() is that if the compiler can very
easily do the above rewrite automatically.  The only large difficulty
is to know if the sin(), cos() and sincos() are any good, or at least
their relative badness.  x86 compilers already know that the i387 sin()
and cos() are no good, (or perhaps they just don't know that the FreeBSD
implementation doesn't make the i387 functions non-conforming to C99
and/or POSIX (since they don't set errno and might do other side effects
wrong)), so x86 compilers don't convert the above to i387 sin(), cos() or
sincos().  It is even harder for compilers to know the quality of the
library.

> @@ -53,10 +61,6 @@
> 	/* cexp(x + I 0) = exp(x) + I 0 */
> 	if ((hy | ly) == 0)
> 		return (CMPLX(exp(x), y));

The comments aren't as careful as for hyperbolic functions.  The 0 here
is actually -0 if y is -0.  The code handles both +-0 by clearing the
sign bit in hy for the check byt keeping it in y.

> Index: lib/msun/ld128/k_cexpl.c
> ...
> +/*
> + * __ldexp_exp(x, expt) and __ldexp_cexp(x, expt) compute exp(x) * 2**expt.
> + * They are intended for large arguments (real part >= ln(DBL_MAX))
> + * where care is needed to avoid overflow.
> + *
> + * The present implementation is narrowly tailored for our hyperbolic and
> + * exponential functions.  We assume expt is small (0 or -1), and the caller
> + * has filtered out very large x, for which overflow would be inevitable.
> + */
> +
> +long double
> +__ldexp_expl(long double x, int expt)
> +{
> +#if 0
> +	double exp_x, scale;
> +	int ex_expt;
> +
> +	exp_x = __frexp_exp(x, &ex_expt);
> +	expt += ex_expt;
> +	INSERT_WORDS(scale, (0x3ff + expt) << 20, 0);
> +	return (exp_x * scale);
> +#endif
> +	return (x - x) / (x - x);
> +}

This is less likely to work than the comment says :-).

> +
> +long double complex
> +__ldexp_cexpl(long double complex z, int expt)
> +{
> +#if 0
> +	double x, y, exp_x, scale1, scale2;
> +	int ex_expt, half_expt;

Same.  The code that has a chance of working won't even be tested at
compile time since it is ifdefed out.

> Index: lib/msun/ld128/s_cexpl.c
> ===================================================================
> --- lib/msun/ld128/s_cexpl.c	(nonexistent)
> +++ lib/msun/ld128/s_cexpl.c	(working copy)
> @@ -0,0 +1,76 @@
> ...
> +	exp_x = expl(x);
> +	sincosl(y, &s, &c);
> +	return (CMPLXL(exp_x * c, exp_x * s));

This doesn't use the kernel, although one is written (but always returns
NaN).  For testing, the low quality kernel should at least use the
above formula.

> Index: lib/msun/ld80/s_cexpl.c
> ===================================================================
> --- lib/msun/ld80/s_cexpl.c	(nonexistent)
> +++ lib/msun/ld80/s_cexpl.c	(working copy)
> @@ -0,0 +1,105 @@
> ...
> +long double complex
> +cexpl (long double complex z)
> +{
> +	long double c, exp_x, s, x, y;
> +	uint32_t hwx, hwy, lwx, lwy;
> +	uint16_t hx, hy;
> +
> +	ENTERI(z);
> +
> +	x = creall(z);
> +	y = cimagl(z);
> +
> +	EXTRACT_LDBL80_WORDS2(hx, hwx, lwx, x);
> +	EXTRACT_LDBL80_WORDS2(hy, hwy, lwy, y);

This asks for slowness relative to the double precision method, by doing
2 slow extraction steps up front.

This seems to be broken by not using the normal method of hx &= 0x7fff
up front.

> +
> +	/* cexp(0 + I y) = cos(y) + I sin(y) */
> +	if ((hwx | lwx | (hx & 0x7fff)) == 0) {

The 2 words for the mantissa are just harder to use.  Write hwx | lwx
normally as lx after extracting into 1 64-bit word.

This handles -0 correctly by masking hx with 0x7fff.

> +		sincosl(y, &s, &c);
> +		RETURNI(CMPLXL(c, s));
> +	}
> +
> +	/* cexp(x + I 0) = exp(x) + I 0 */
> +	if ((hwy | lwy | (hy & 0x7fff)) == 0)
> +		RETURNI(CMPLXL(expl(x), y));
> +
> +	if (hy >= 0x7fff) {

Bugs from not masking h[xy] with 0x7fff up front start here.  This
misclassifies all negative y's except -0 as being NaNs.

> +		if (hx < 0x7fff ||
> +		    (hx == 0x7fff && (hwx & 0x7fffffff) != 0)) {

Many bugs here:
- not masking hx with 0x7fff
- not checking the low bits of the mantissa.  It is foot shooting to
   have these bits in a separate variable
- not checking the normalization bit.

Fix this by copying mostly logl() (expl() is too subtle to copy):

 	lx = from correct extraction
 	ix = hx & 0x7fff;	/* up front, as in logl */
 	...
 		if (ix < 0x7fff ||
 		    (hx == 0x7fff && lx != 0x8000000000000000))

The first line classifies finite x, and the second line classifies
everyhing else except +-Inf.  The other cases are:
- normal NaN with top bit set
- unnormal NaN with top bit set
- unnormal +-Inf with top bit clear.

This is slightly wrong too.  x87 hardware treats unnormal finite values
as NaNs, so this should treat them as NaNs too.  Checking the normalization
bit first works especially easily here.  There are complications for
normalized zeros and normalized denormals (sic), but we have already
handled the zeros.  IIRC, the hardware doesn't treat normalized denomals
as NaNs, so we get the correct result by treating them as finite.  This
gives:

 	...
 		if ((lx & 0x8000000000000000) == 0 || ix < 0x7fff ||
 		    (ix == 0x7fff && (lx & 0x7fffffffffffffff) != 0))

> +			/* cexp(finite|NaN +- I Inf|NaN) = NaN + I NaN */
> +			RETURNI(CMPLXL(y - y, y - y));
> +		} else if (hx & 0x8000) {

cexp() doesn't mask the sign bit in hx or use ix, so as to test the sign
via hx here.  It clobbers the sign bit in hy.

> +			/* cexp(-Inf +- I Inf|NaN) = 0 + I 0 */
> +			RETURNI(CMPLXL(0.0L, 0.0L));

Style bug: explcit long double constants.  The float case risks using
double precision by spelling the constants 0.0, the same as the double
case.  Plain 0 is a better spelling.

There may be further bugs from misclassification of y using hy >= 0x7fff.
This is supposed to classify Infs and NaNs.  The masking bug makes it
misclassify all negative values and -0.  The further bugs are that it
doesn't classify unnormals (except unnormal denormals) as NaNs.  Only
ld80 has unnormals.

Bruce