From owner-freebsd-numerics@freebsd.org  Thu May 18 17:54:37 2017
Return-Path: <owner-freebsd-numerics@freebsd.org>
Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 9070DD70642
 for <freebsd-numerics@mailman.ysv.freebsd.org>;
 Thu, 18 May 2017 17:54:37 +0000 (UTC)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu
 [128.95.76.21])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client CN "troutmask", Issuer "troutmask" (not verified))
 by mx1.freebsd.org (Postfix) with ESMTPS id 6455FCF7
 for <freebsd-numerics@freebsd.org>; Thu, 18 May 2017 17:54:37 +0000 (UTC)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: from troutmask.apl.washington.edu (localhost [127.0.0.1])
 by troutmask.apl.washington.edu (8.15.2/8.15.2) with ESMTPS id v4IHsZVu074686
 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO);
 Thu, 18 May 2017 10:54:35 -0700 (PDT)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: (from sgk@localhost)
 by troutmask.apl.washington.edu (8.15.2/8.15.2/Submit) id v4IHsZFV074685;
 Thu, 18 May 2017 10:54:35 -0700 (PDT) (envelope-from sgk)
Date: Thu, 18 May 2017 10:54:34 -0700
From: Steve Kargl <sgk@troutmask.apl.washington.edu>
To: Bruce Evans <brde@optusnet.com.au>
Cc: freebsd-numerics@freebsd.org
Subject: Re: Implementation of half-cycle trignometric functions
Message-ID: <20170518175434.GA74453@troutmask.apl.washington.edu>
Reply-To: sgk@troutmask.apl.washington.edu
References: <20170409220809.GA25076@troutmask.apl.washington.edu>
 <20170427231411.GA11346@troutmask.apl.washington.edu>
 <20170428175851.A1386@besplex.bde.org>
 <20170516224618.GA40855@troutmask.apl.washington.edu>
 <20170517094848.A52247@besplex.bde.org>
 <20170517180927.GA54431@troutmask.apl.washington.edu>
 <20170518072636.E951@besplex.bde.org>
 <20170518014226.GA63819@troutmask.apl.washington.edu>
 <20170518154820.A8280@besplex.bde.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20170518154820.A8280@besplex.bde.org>
User-Agent: Mutt/1.7.2 (2016-11-26)
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
 <freebsd-numerics.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics/>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 18 May 2017 17:54:37 -0000

On Thu, May 18, 2017 at 04:34:57PM +1000, Bruce Evans wrote:
> On Wed, 17 May 2017, Steve Kargl wrote:
> 
> > On Thu, May 18, 2017 at 09:25:58AM +1000, Bruce Evans wrote:
> >> On Wed, 17 May 2017, Steve Kargl wrote:
> > ...
> >>> As such, I've added a comment
> >>>
> >>> /*
> >>> * pi_hi contains the leading 56 bits of a 169 bit approximation for pi.
> >>> */
> >>
> >> Why 56?  53 + 56 = 109.
> >
> > This is ld128.
> 
> ld128 has precision 113.

Yes. I know.

> > static const long double
> > pi_hi = 3.14159265358979322702026593105983920e+00L,
> > pi_lo = 1.14423774522196636802434264184180742e-17L;
> >
> > pi_hi has 113/2 = 56 bits.
> > pi_lo has 113 bit.
> >
> > 56 + 113 = 169
> 
> But hi is set intentionally sloppily by converting to double, so it has
> 53 bits instead of the 57 that 56 would go with (57 = 113/2 rounded up).

A 53-bit entity times a 56-bit entity needs 109-bits for the result.
A 113-bit entity can hold a 109-bit entity.  Or am I missing something?

> >> My improvements to lgamma()'s sin_pi*() are buggy.  I forgot the original
> >> target of avoiding inexact better:
> >
> > I'll need to wait until the weekend to study your improved sin_pi.
> 
> I now have closer to production version.  sinpi() runs more than 2
> times faster that your version, without doing much different except
> not having so much EXTRACT/INSERT or special cases.  (66 cycles instead
> of 150 on Haswell-i386, and the 66 can be reduced to 58 by removing
> the STRICT_ASSIGN() which breaks only i386 with non-default extra
> precision.  The difference would be smaller on amd64 or better compilers
> , but Haswell-i386 has much the same slownesses as old Athlon from the
> EXTRACTs and INSERTs.
> 
> However, both versions fail tests.  Min has a max error of 2048 ulps
> and an everage error of 928 ulps and yours has a maximum error of many
> giga-ups for the sign bit wrong and an average error of 561 ulps.  The
> enormous errors are probably for denormals or NaNs.

You're correct that I did not give much consideration to subnormal
number.  For float this, can be fixed by

	 		if (ix < 0x38800000) {	/* |x| < 0x1p-14 */
				if (x == 0)
					return (x);
				SET_FLOAT_WORD(hi, hx & 0xffff0000);
				if (ix <= 0x0b7f0001) { /* Avoid spurios underflow. */
					hi *= 0x1p23F;
					lo = x * 0x1p23F - hi;
					s = (pi_lo + pi_hi) * lo + pi_lo * hi +
					    pi_hi * hi;
					return (s * 0x1p-23F);
				}
				lo = x - hi;
				s = (pi_lo + pi_hi) * lo + pi_lo * hi +
				    pi_hi * hi;
				return (s);
			}
or if you want to avoid the extra if-statement.  Simply scale

	 		if (ix < 0x38800000) {	/* |x| < 0x1p-14 */
				if (x == 0)
					return (x);
				SET_FLOAT_WORD(hi, hx & 0xffff0000);
				hi *= 0x1p23F;
				lo = x * 0x1p23F - hi;
				s = (pi_lo + pi_hi) * lo + pi_lo * hi +
				    pi_hi * hi;
				return (s * 0x1p-23F);
			}

Starting with x = 0x1p-14  and using nextafterf(x,-1.f), the 
first value where underflow is signalled is

% ./testf -S -D
sinpif(3.74171353e-39) = 1.17549407e-38
x is subnormal
sinpif(x) is subnormal

Exhaustive testing of numbers in the domain [FLT_MIN, 0x1p-14] yields

% ./testf -S -m 1.17549435E-38F -M 0x1p-14F -ed
         MAX ULP: 0.58366567
    Total tested: 939524096

-- 
Steve
20170425 https://www.youtube.com/watch?v=VWUpyCsUKR4
20161221 https://www.youtube.com/watch?v=IbCHE-hONow