From owner-freebsd-numerics@freebsd.org  Sat Apr 29 18:10:24 2017
Return-Path: <owner-freebsd-numerics@freebsd.org>
Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 8E065D55BE9;
 Sat, 29 Apr 2017 18:10:24 +0000 (UTC)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu
 [128.95.76.21])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client CN "troutmask", Issuer "troutmask" (not verified))
 by mx1.freebsd.org (Postfix) with ESMTPS id 76320D7A;
 Sat, 29 Apr 2017 18:10:24 +0000 (UTC)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: from troutmask.apl.washington.edu (localhost [127.0.0.1])
 by troutmask.apl.washington.edu (8.15.2/8.15.2) with ESMTPS id v3TIAN63041484
 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO);
 Sat, 29 Apr 2017 11:10:23 -0700 (PDT)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: (from sgk@localhost)
 by troutmask.apl.washington.edu (8.15.2/8.15.2/Submit) id v3TIAMGF041483;
 Sat, 29 Apr 2017 11:10:22 -0700 (PDT) (envelope-from sgk)
Date: Sat, 29 Apr 2017 11:10:22 -0700
From: Steve Kargl <sgk@troutmask.apl.washington.edu>
To: Bruce Evans <brde@optusnet.com.au>
Cc: freebsd-hackers@freebsd.org, freebsd-numerics@freebsd.org
Subject: Re: Implementation of half-cycle trignometric functions
Message-ID: <20170429181022.GA41420@troutmask.apl.washington.edu>
Reply-To: sgk@troutmask.apl.washington.edu
References: <20170427231411.GA11346@troutmask.apl.washington.edu>
 <20170428010122.GA12814@troutmask.apl.washington.edu>
 <20170428183733.V1497@besplex.bde.org>
 <20170428165658.GA17560@troutmask.apl.washington.edu>
 <20170429035131.E3406@besplex.bde.org>
 <20170428201522.GA32785@troutmask.apl.washington.edu>
 <20170429070036.A4005@besplex.bde.org>
 <20170428233552.GA34580@troutmask.apl.washington.edu>
 <20170429005924.GA37947@troutmask.apl.washington.edu>
 <20170429151457.F809@besplex.bde.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20170429151457.F809@besplex.bde.org>
User-Agent: Mutt/1.7.2 (2016-11-26)
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
 <freebsd-numerics.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics/>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 29 Apr 2017 18:10:24 -0000

On Sat, Apr 29, 2017 at 05:54:21PM +1000, Bruce Evans wrote:
> On Fri, 28 Apr 2017, Steve Kargl wrote:
> 
> > On Fri, Apr 28, 2017 at 04:35:52PM -0700, Steve Kargl wrote:
> >>
> >> I was just backtracking with __kernel_sinpi.  This gets a max ULP < 0.61.
> 
> Comments on this below.

Way too much information to digest in one reading.  I'll
answer the things that are easy ...

> >> 	x2 = x * x;
> >> 	p = x2 * (x2 * (x2 * s4 + s3) + s2) + s1;
> >> 	s = x * (x2 * (double)p + s0hi + s0lo);
> >> 	return ((float)s);
> >> }
> 
> I don't like the style being completely different.  Why lower-case constants?
> ...

Having teethed on musty FORTRAN code, I have an adversion to uppercase.
I also do not like camel case, which seems to plague modern languages.

> > Well, starting with above and playing the splitting game
> > with x, x2, and p.  I've manage to reduce the max ULP in
> > the region testd.  Testing against MPFR with sin(pi*x)
> > computed in 5*24-bit precision gives
> 
> It already cheats by using (double)p, but is limited by using float to
> caclulate p.

Yes, the above is cheating to the extent that the above code
started out as completely written in double.  This gives 
max ULP < 0.51.  I then started reducing the double to float
with the intent of finding when max ULP exceeds some chosen
threshold.  The above gave something like max ULP < 0.55.
I then moved the (double) cast to different terms/factors,
observed how the max ULP changed.  This then gives information
about what requires extra precision.

> > static inline float
> > __kernel_sinpif(float x)
> > {
> > 	float p, phi, x2, x2hi, x2lo, xhi, xlo;
> > 	uint32_t ix;
> >
> > 	x2 = x * x;
> > 	p = x2 * (x2 * (x2 * s4 + s3) + s2) + s1;
> >
> > 	GET_FLOAT_WORD(ix, p);
> > 	SET_FLOAT_WORD(phi, (ix >> 14) << 14);
> >
> > 	GET_FLOAT_WORD(ix, x2);
> > 	SET_FLOAT_WORD(x2hi, (ix >> 14) << 14);
> 
> I expect that these GET/SET's are the slowest part.  They are quite fast
> in float prec, but in double prec on old i386 CPUs compilers generate bad
> code which can have penalties of 20 cycles per GET/SET.
> 
> Why the strange reduction?  The double shift is just a manual optimization
> or pssimization (usually the latter) for clearing low bits.  Here it is
> used to clear 14 low bits instead of the usual 12.  This is normally
> written using just a mask of 0xffff0000, unless you want a different
> number of bits in the hi terms for technical reasons.  Double precision
> can benefit more from asymmetric splitting of terms since 53 is not
> divisible by 2; 1 hi term must have less than 26.5 bits and the other term
> can hold an extra bit.

Because I didn't think about using a mask.  :-)

It's easy to change 14 to 13 or 11 or ..., while I would 
need to write out zeros and one to come up with 0xffff8000,
etc.

-- 
Steve
20170425 https://www.youtube.com/watch?v=VWUpyCsUKR4
20161221 https://www.youtube.com/watch?v=IbCHE-hONow