From owner-freebsd-numerics@freebsd.org  Thu Mar  7 19:54:46 2019
Return-Path: <owner-freebsd-numerics@freebsd.org>
Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 6B45615239FD
 for <freebsd-numerics@mailman.ysv.freebsd.org>;
 Thu,  7 Mar 2019 19:54:46 +0000 (UTC)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu
 [128.95.76.21])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 server-signature RSA-PSS (4096 bits)
 client-signature RSA-PSS (2048 bits) client-digest SHA256)
 (Client CN "troutmask", Issuer "troutmask" (not verified))
 by mx1.freebsd.org (Postfix) with ESMTPS id 9331F70523
 for <freebsd-numerics@freebsd.org>; Thu,  7 Mar 2019 19:54:44 +0000 (UTC)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: from troutmask.apl.washington.edu (localhost [127.0.0.1])
 by troutmask.apl.washington.edu (8.15.2/8.15.2) with ESMTPS id x27JsbOC021893
 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NO);
 Thu, 7 Mar 2019 11:54:37 -0800 (PST)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: (from sgk@localhost)
 by troutmask.apl.washington.edu (8.15.2/8.15.2/Submit) id x27Jsa5Z021892;
 Thu, 7 Mar 2019 11:54:36 -0800 (PST) (envelope-from sgk)
Date: Thu, 7 Mar 2019 11:54:36 -0800
From: Steve Kargl <sgk@troutmask.apl.washington.edu>
To: Bruce Evans <brde@optusnet.com.au>
Cc: freebsd-numerics@freebsd.org
Subject: Re: Update ENTERI() macro
Message-ID: <20190307195436.GA20856@troutmask.apl.washington.edu>
Reply-To: sgk@troutmask.apl.washington.edu
References: <20190304212159.GA12587@troutmask.apl.washington.edu>
 <20190305153243.Y1349@besplex.bde.org>
 <20190306055201.GA40298@troutmask.apl.washington.edu>
 <20190306225811.P2731@besplex.bde.org>
 <20190306184829.GA44023@troutmask.apl.washington.edu>
 <20190307061214.R4911@besplex.bde.org>
 <20190306214233.GA23159@troutmask.apl.washington.edu>
 <20190307144315.N932@besplex.bde.org>
 <20190307044447.GA16298@troutmask.apl.washington.edu>
 <20190307163958.V1293@besplex.bde.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20190307163958.V1293@besplex.bde.org>
User-Agent: Mutt/1.11.2 (2019-01-07)
X-Rspamd-Queue-Id: 9331F70523
X-Spamd-Bar: /
Authentication-Results: mx1.freebsd.org
X-Spamd-Result: default: False [0.92 / 15.00]; ARC_NA(0.00)[];
 HAS_REPLYTO(0.00)[sgk@troutmask.apl.washington.edu];
 FROM_HAS_DN(0.00)[]; TO_DN_SOME(0.00)[];
 NEURAL_SPAM_SHORT(0.59)[0.591,0];
 NEURAL_HAM_LONG(-0.58)[-0.582,0]; MIME_GOOD(-0.10)[text/plain];
 RCVD_TLS_LAST(0.00)[]; DMARC_NA(0.00)[washington.edu];
 AUTH_NA(1.00)[]; REPLYTO_ADDR_EQ_FROM(0.00)[];
 RCVD_COUNT_THREE(0.00)[3]; TO_MATCH_ENVRCPT_SOME(0.00)[];
 RCVD_IN_DNSWL_MED(-0.20)[21.76.95.128.list.dnswl.org : 127.0.11.2];
 RCPT_COUNT_TWO(0.00)[2];
 MX_GOOD(-0.01)[troutmask.apl.washington.edu];
 NEURAL_SPAM_MEDIUM(0.17)[0.169,0]; R_SPF_NA(0.00)[];
 FREEMAIL_TO(0.00)[optusnet.com.au]; FROM_EQ_ENVFROM(0.00)[];
 R_DKIM_NA(0.00)[]; MIME_TRACE(0.00)[0:+];
 ASN(0.00)[asn:73, ipnet:128.95.0.0/16, country:US];
 MID_RHS_MATCH_FROM(0.00)[];
 IP_SCORE(0.05)[ip: (0.10), ipnet: 128.95.0.0/16(0.15), asn: 73(0.05),
 country: US(-0.07)]
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
 <freebsd-numerics.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics/>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 07 Mar 2019 19:54:46 -0000

On Thu, Mar 07, 2019 at 05:36:22PM +1100, Bruce Evans wrote:
> On Wed, 6 Mar 2019, Steve Kargl wrote:
> 
> > On Thu, Mar 07, 2019 at 03:09:06PM +1100, Bruce Evans wrote:
> >> On Wed, 6 Mar 2019, Steve Kargl wrote:
> >> ...
> >> But does it preserve NaN bits as much as possible, and produce the same
> >> NaN bits as double precision after conversion to double precision?  The
> >> report spells NaN and Inf using C99's fuzzy bad names.  Even the C99
> >> specification spells NaN and Inf correctly in Appendix F.
> >>
> >> The complex hyperbolic functions have better comments, so are almost
> >> an easier place to start.
> >
> > I suppose it preserves the NaN bits as well as cexpf and cexp as
> > it uses the same algorthim.
> 
> > ...
> > long double complex
> > cexpl (long double complex z)
> > {
> > 	long double c, exp_x, s, x, y;
> > 	uint64_t lx, ly;
> > 	uint16_t hx, hy, ix, iy;
> >
> > 	ENTERI();
> >
> > 	x = creall(z);
> > 	y = cimagl(z);
> >
> > 	EXTRACT_LDBL80_WORDS(hx, lx, x);
> > 	EXTRACT_LDBL80_WORDS(hy, ly, y);
> 
> I think you recently changed to this from your new macro.

Yes, the new macro is gone.

> >
> > 	ix = hx&0x7fff;
> > 	iy = hy&0x7fff;
> 
> Some style bugs.  s_cexp.c doesn't use the fdlibm'ish style of sometimes
> no spaces around '&'.  It puts the ix and iy calculations immediately
> after the extractions.  They shouldn't be separated since they are
> also extractions.  c_cexp.c extracts in a different order, and doesn't
> use iy.  You reordered it a bit to use sincos().  I don't like the
> reordering much.  But old s_cexp.c has its own style bug of a separating
> the extraction for y but not even following its usual style of a blank
> line before comments near the extraction of x.

I may unorder the re-ordering to the old order.  If I understand
one of the comments in the GCC bugzilla report, the optimization of 
converting

  a = cos(x);
  b = sin(x):

to 

  sincos(x, &b, &a).

Actually, does "a = cos(x); b = sin(x)" goes to cexp(cmplx(0,x)) in
the middle-end of the compiler.  The backend then can recognize
cexp(cmplx(0,x)) and converts this to sincos(x, &b, &a).  I can't 
verify this for FreeBSD, because libm isn't C99 compliant.  GCC
requires a C99 compliant to activate the optimization. 

I fixed the extraction for hx, and hy.

> > 	/* cexp(0 + I y) = cos(y) + I sin(y) */
> > 	if ((ix | lx) == 0) {
> > 		sincosl(y, &s, &c);
> > 		RETURNI(CMPLXL(c, s));
> > 	}
> >
> > 	/* cexp(x + I 0) = exp(x) + I 0 */
> > 	if ((iy | ly) == 0)
> > 		RETURNI(CMPLXL(expl(x), y));
> >
> > 	if (iy == 0x7fff) {
> 
> s_cexp.c actually uses a clobbered hy here.  Perhaps fix this too (compilers
> mostly ignore program order for initializing variables like iy and optimize
> to an expression involving the unclobbered hy here if that is optimal.  The
> extra variables are just for readability or unreadability).
> 
> > 		if (ix < 0x7fff ||
> > 		    (ix == 0x7fff && (lx & 0x7fffffffffffffffULL) != 0)) {
> > 			/* cexp(finite|NaN +- I Inf|NaN) = NaN + I NaN */
> > 			RETURNI(CMPLXL(y - y, y - y));
> 
> Looks OK.
> 
> I think ix and iy are only used here, so it is clearer to not have
> variables for them -- just use (hx & 0x7fff), etc.
> 
> y - y and similar expressions usually give the right NaN (y = Inf -> dNaN
> and y = NaN -> d(y)).
> 
> IIRC, the only differences for ld128 is that then normalization bit is
> hidden so masking out the top bit in the mantissa is not needed, but the
> mantissa is in 2 64-bit words.
> 
> > 		} else if (hx & 0x8000) {
> 
> This is clearest as it is with an explicit mask.
> 
> > 	/*
> > 	 *  exp_ovfl = 11356.5234062941439497, 0x400c, 0xb17217f7d1cf79ac
> > 	 * cexp_ovfl = 22755.3287906024445633, 0x400d, 0xb1c6a8573de9768c
> > 	 */
> > 	if ((hx == 0x400c && lx > 0xb17217f7d1cf79acULL) ||
> > 	    (hx == 0x400d && lx < 0xb1c6a8573de9768cULL)) {
> 
> Don't duplicate the hex constants in the comment.  The limits can be fuzzy
> so don't need so many decimal digits or nonzero nonzero bits or LD80C()
> to round them (s_cexp.c uses only 1 decimal digit and doesn't test the
> lower 32 bits in the mantissa; here we have more bits than we need in lx).

Being fuzzy was the movitation for the new macro, I was only using
the upper 32 bits of lx in the original cexpl() I posted.  

> Maybe not use bit-fiddling at all for this (in other precisions too).
> The more important exp() functions check thresholds in FP.  Here x can
> be NaN so FP comparisons with it don't work right.  That seems to be
> the only reason to check in bits here.
> 
> > 		/*
> > 		 * x is between exp_ovfl and cexp_ovfl, so we must scale to
> > 		 * avoid overflow in exp(x).
> > 		 */
> > 		RETURNI(__ldexp_cexpl(z, 1));
> 
> Did your tests cover this case, and does it use the fixed version of the
> old __ldexp_cexpl() in k_expl.h?

No.  The test I reported was restricted to -11350 < x < 11350.
Spot checked x outside that range.  I had the conditional wrong
for x < -exp_ovfl.  Also, the comment is actaully a little optimistic.
One cannot avoid overflow.  For example, x = 11357 is just above
exp_ovfl, and exp(x) = inf.  __ldexp_cexpl() does scaling and exploits
|cos(y)| <= 1 and |sin(y)| <= 1 to try to extend the range of cexpl().

 ./testl -a 11357 0.1
libm: Re(cexpf) =  inf
mpfr: Re(cexpf) =  1.90658369228334884925e+4932
libm: Im(cexpf) =  1.91296449568717353572e+4931
mpfr: Im(cexpf) =  1.91296449568717353558e+4931

troutmask:kargl[430] ./testl -a 11357 1.57
libm: Re(cexpf) =  1.52588659766018377153e+4929
mpfr: Re(cexpf) =  1.52588659766018377147e+4929
libm: Im(cexpf) =  inf
mpfr: Im(cexpf) =  1.91615588587371861485e+4932


> 
> > 	} else {
> > 		/*
> > 		 * Cases covered here:
> > 		 *  -  x < exp_ovfl and exp(x) won't overflow (common case)
> > 		 *  -  x > cexp_ovfl, so exp(x) * s overflows for all s > 0
> > 		 *  -  x = +-Inf (generated by exp())
> > 		 *  -  x = NaN (spurious inexact exception from y)
> > 		 */
> > 		exp_x = expl(x);
> > 		sincosl(y, &s, &c);
> > 		RETURNI(CMPLXL(exp_x * c, exp_x * s));
> > 	}
> > }
> 
> I think the last clause can be simplfied and even optimized by using the
> kernel in all cases:
> - the kernel produces exp_x in the form kexp_x * 2**k where kexp_x is near
>    1.  Just multiply kexp_x by c and s, and later multiply by 2**k.  This
>    cost an extra multiplication by 2**k (for the real and imaginary parts
>    separately), but avoids other work so may be faster.
> - hopefully kexp_x is actually >= 1 (and < 2).  Otherwise we have to
>    worry about spurious underflow
> - the above method seems to avoid spurious underflow automatically, and
>    doesn't even mention this (multiplying by exp_x may underflow, but not
>    spuriously since c and s are <= 1).

You might be right about using the kernel.  I'll leave that
to others on freebsd-numerics@; although you and I seem to be
the only subscribers.

-- 
Steve