From owner-freebsd-numerics@freebsd.org  Wed Mar  8 20:24:24 2017
Return-Path: <owner-freebsd-numerics@freebsd.org>
Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 9C3D4D030DE
 for <freebsd-numerics@mailman.ysv.freebsd.org>;
 Wed,  8 Mar 2017 20:24:24 +0000 (UTC)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu
 [128.95.76.21])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client CN "troutmask", Issuer "troutmask" (not verified))
 by mx1.freebsd.org (Postfix) with ESMTPS id 82D4419E7
 for <freebsd-numerics@freebsd.org>; Wed,  8 Mar 2017 20:24:24 +0000 (UTC)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: from troutmask.apl.washington.edu (localhost [127.0.0.1])
 by troutmask.apl.washington.edu (8.15.2/8.15.2) with ESMTPS id v28KOHwU023196
 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO)
 for <freebsd-numerics@freebsd.org>; Wed, 8 Mar 2017 12:24:17 -0800 (PST)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: (from sgk@localhost)
 by troutmask.apl.washington.edu (8.15.2/8.15.2/Submit) id v28KOHBY023195
 for freebsd-numerics@freebsd.org; Wed, 8 Mar 2017 12:24:17 -0800 (PST)
 (envelope-from sgk)
Date: Wed, 8 Mar 2017 12:24:17 -0800
From: Steve Kargl <sgk@troutmask.apl.washington.edu>
To: freebsd-numerics@freebsd.org
Subject: Bit twiddling question
Message-ID: <20170308202417.GA23103@troutmask.apl.washington.edu>
Reply-To: sgk@troutmask.apl.washington.edu
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.7.2 (2016-11-26)
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
 <freebsd-numerics.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics/>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 08 Mar 2017 20:24:24 -0000

Suppose I have a float 'x' that I know is in the
range 1 <= x <= 0x1p23 and I know that 'x' is 
integral, e.g., x = 12.000.  If I use GET_FLOAT_WORD
from math_private.h, then x=12.000 maps to ix=0x41400000.
Is there a bit twiddling method that I can apply to ix to
unambiguously determine if x is even of odd?

Yes, I know I can do

float x;
int32_t ix;
ix = (int32_t)x;

and then test (ix & 1).  But, this does not generalize to 
the case of long double on a ld128 architecture.  That is,
if I have 1 <= x < 1xp112, then I would need to have

long double x;
int128_t ix;
ix = (int128_t)x;

and AFAICT sparc64 doesn't have an int128_t.

-- 
Steve
20161221 https://www.youtube.com/watch?v=IbCHE-hONow

From owner-freebsd-numerics@freebsd.org  Thu Mar  9 06:59:02 2017
Return-Path: <owner-freebsd-numerics@freebsd.org>
Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id F1D47D04409
 for <freebsd-numerics@mailman.ysv.freebsd.org>;
 Thu,  9 Mar 2017 06:59:02 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from mail105.syd.optusnet.com.au (mail105.syd.optusnet.com.au
 [211.29.132.249])
 by mx1.freebsd.org (Postfix) with ESMTP id BDA561A94
 for <freebsd-numerics@freebsd.org>; Thu,  9 Mar 2017 06:59:02 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from besplex.bde.org (c122-106-153-191.carlnfd1.nsw.optusnet.com.au
 [122.106.153.191])
 by mail105.syd.optusnet.com.au (Postfix) with ESMTPS id 9578F1044931;
 Thu,  9 Mar 2017 17:58:53 +1100 (AEDT)
Date: Thu, 9 Mar 2017 17:58:52 +1100 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Steve Kargl <sgk@troutmask.apl.washington.edu>
cc: freebsd-numerics@freebsd.org
Subject: Re: Bit twiddling question
In-Reply-To: <20170308202417.GA23103@troutmask.apl.washington.edu>
Message-ID: <20170309173152.F1269@besplex.bde.org>
References: <20170308202417.GA23103@troutmask.apl.washington.edu>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Optus-CM-Score: 0
X-Optus-CM-Analysis: v=2.2 cv=KeqiiUQD c=1 sm=1 tr=0
 a=Tj3pCpwHnMupdyZSltBt7Q==:117 a=Tj3pCpwHnMupdyZSltBt7Q==:17
 a=kj9zAlcOel0A:10 a=WF-Rw4ERh08rOXangSMA:9 a=CjuIK1q_8ugA:10
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
 <freebsd-numerics.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics/>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 09 Mar 2017 06:59:03 -0000

On Wed, 8 Mar 2017, Steve Kargl wrote:

> Suppose I have a float 'x' that I know is in the
> range 1 <= x <= 0x1p23 and I know that 'x' is
> integral, e.g., x = 12.000.  If I use GET_FLOAT_WORD
> from math_private.h, then x=12.000 maps to ix=0x41400000.
> Is there a bit twiddling method that I can apply to ix to
> unambiguously determine if x is even of odd?

I don't know of any good method.

> Yes, I know I can do
>
> float x;
> int32_t ix;
> ix = (int32_t)x;
>
> and then test (ix & 1).  But, this does not generalize to

This isn't bit twiddling, and is also slow.

> the case of long double on a ld128 architecture.  That is,
> if I have 1 <= x < 1xp112, then I would need to have
>
> long double x;
> int128_t ix;
> ix = (int128_t)x;
>
> and AFAICT sparc64 doesn't have an int128_t.

If sparc64 has this, it would be even slower.  Sparc64 emulates
all 128-bit FP.  This makes 128-bit sparc64 ~10-100 times slower
than 64-bit sparc64 FP, and 100-1000 times slower than modern
x86 64 and 8-bit FP.  FP to integer conversions tend to be slower
than pure FP, and are especially tricky for integers with more
bits than FP mantissas.

Consider bit twiddling to classifiy oddness of 2.0F (0x40000000 in
bits) and 3.0F (0x40400000 in bits).  The following method seems
to be not so bad.  Calculates that the unbiased exponent for 3.0F
is 1.  This means that the 1's bit is (1 << (23 - 1)) = 0x00400000
where we see it for 3.0F but not for 2.0F.  All bits to the right
of this must be 0 for the value to be an integer.  I don't know
how you classified integers efficiently without already determining
the position of the "point" and looking at these bits.  There are
complications for powers of 2 and the normalization bit being implicit.

Bruce

From owner-freebsd-numerics@freebsd.org  Thu Mar  9 07:52:37 2017
Return-Path: <owner-freebsd-numerics@freebsd.org>
Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id DA5C6D041B4
 for <freebsd-numerics@mailman.ysv.freebsd.org>;
 Thu,  9 Mar 2017 07:52:37 +0000 (UTC)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu
 [128.95.76.21])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client CN "troutmask", Issuer "troutmask" (not verified))
 by mx1.freebsd.org (Postfix) with ESMTPS id B54321594
 for <freebsd-numerics@freebsd.org>; Thu,  9 Mar 2017 07:52:37 +0000 (UTC)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: from troutmask.apl.washington.edu (localhost [127.0.0.1])
 by troutmask.apl.washington.edu (8.15.2/8.15.2) with ESMTPS id v297qawX030230
 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO);
 Wed, 8 Mar 2017 23:52:36 -0800 (PST)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: (from sgk@localhost)
 by troutmask.apl.washington.edu (8.15.2/8.15.2/Submit) id v297qaxt030229;
 Wed, 8 Mar 2017 23:52:36 -0800 (PST) (envelope-from sgk)
Date: Wed, 8 Mar 2017 23:52:36 -0800
From: Steve Kargl <sgk@troutmask.apl.washington.edu>
To: Bruce Evans <brde@optusnet.com.au>
Cc: freebsd-numerics@freebsd.org
Subject: Re: Bit twiddling question
Message-ID: <20170309075236.GA30199@troutmask.apl.washington.edu>
Reply-To: sgk@troutmask.apl.washington.edu
References: <20170308202417.GA23103@troutmask.apl.washington.edu>
 <20170309173152.F1269@besplex.bde.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20170309173152.F1269@besplex.bde.org>
User-Agent: Mutt/1.7.2 (2016-11-26)
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
 <freebsd-numerics.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics/>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 09 Mar 2017 07:52:37 -0000

On Thu, Mar 09, 2017 at 05:58:52PM +1100, Bruce Evans wrote:
> On Wed, 8 Mar 2017, Steve Kargl wrote:
> 
> > Suppose I have a float 'x' that I know is in the
> > range 1 <= x <= 0x1p23 and I know that 'x' is
> > integral, e.g., x = 12.000.  If I use GET_FLOAT_WORD
> > from math_private.h, then x=12.000 maps to ix=0x41400000.
> > Is there a bit twiddling method that I can apply to ix to
> > unambiguously determine if x is even of odd?
> 
> I don't know of any good method.

Spent a day or so searching; hence, my question.

> > Yes, I know I can do
> >
> > float x;
> > int32_t ix;
> > ix = (int32_t)x;
> >
> > and then test (ix & 1).  But, this does not generalize to
> 
> This isn't bit twiddling, and is also slow.

Yes, I know it isn't bit twiddling, but it achieves want I 
need.  I was hoping that if a bit twiddling algorithm was 
available (and was faster), then I could change my algorithm.

> > the case of long double on a ld128 architecture.  That is,
> > if I have 1 <= x < 1xp112, then I would need to have
> >
> > long double x;
> > int128_t ix;
> > ix = (int128_t)x;
> >
> > and AFAICT sparc64 doesn't have an int128_t.
> 
> If sparc64 has this, it would be even slower.  Sparc64 emulates
> all 128-bit FP.  This makes 128-bit sparc64 ~10-100 times slower
> than 64-bit sparc64 FP, and 100-1000 times slower than modern
> x86 64 and 8-bit FP.  FP to integer conversions tend to be slower
> than pure FP, and are especially tricky for integers with more
> bits than FP mantissas.
> 
> Consider bit twiddling to classifiy oddness of 2.0F (0x40000000 in
> bits) and 3.0F (0x40400000 in bits).  The following method seems
> to be not so bad.  Calculates that the unbiased exponent for 3.0F
> is 1.  This means that the 1's bit is (1 << (23 - 1)) = 0x00400000
> where we see it for 3.0F but not for 2.0F.  All bits to the right
> of this must be 0 for the value to be an integer.  I don't know
> how you classified integers efficiently without already determining
> the position of the "point" and looking at these bits.  There are
> complications for powers of 2 and the normalization bit being implicit.

Thanks.  I'll look into the above to see if I can up with
something that does not require the cast.  

To give a hint at what I have been working on, I have implementations
for sinpif(x), sinpi(x), cospif(x), and cospi(x).  For sinpif(x) and
cospif(x) I have kernels that give correctly rounded results for 
FLT_MIN <= x < 0x1p23 (at least on x86_64).   I also have slightly
faster kernels that give max ULPs of about 0.5008.  For sinpi(x) and
cospi(x), my kernels currently give about 1.25 ULP for the worse case
error if I accumulate the polynomials in double precision.  If I 
accumulate the polynomials in long double precision, then I get 
correctly rounded results.  To complete the set, I was hoping to 
work out ld80 and ld128 versions.  `ld128 is going to be problematic
due to the absense of int128_t.

I'll send you what I have in a few days.

-- 
Steve
20161221 https://www.youtube.com/watch?v=IbCHE-hONow

From owner-freebsd-numerics@freebsd.org  Thu Mar  9 15:23:09 2017
Return-Path: <owner-freebsd-numerics@freebsd.org>
Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 89BD1D048BB
 for <freebsd-numerics@mailman.ysv.freebsd.org>;
 Thu,  9 Mar 2017 15:23:09 +0000 (UTC)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu
 [128.95.76.21])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client CN "troutmask", Issuer "troutmask" (not verified))
 by mx1.freebsd.org (Postfix) with ESMTPS id 653421532
 for <freebsd-numerics@freebsd.org>; Thu,  9 Mar 2017 15:23:09 +0000 (UTC)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: from troutmask.apl.washington.edu (localhost [127.0.0.1])
 by troutmask.apl.washington.edu (8.15.2/8.15.2) with ESMTPS id v29FN8AJ033321
 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO);
 Thu, 9 Mar 2017 07:23:08 -0800 (PST)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: (from sgk@localhost)
 by troutmask.apl.washington.edu (8.15.2/8.15.2/Submit) id v29FN7MU033320;
 Thu, 9 Mar 2017 07:23:07 -0800 (PST) (envelope-from sgk)
Date: Thu, 9 Mar 2017 07:23:07 -0800
From: Steve Kargl <sgk@troutmask.apl.washington.edu>
To: Bruce Evans <brde@optusnet.com.au>
Cc: freebsd-numerics@freebsd.org
Subject: Re: Bit twiddling question
Message-ID: <20170309152307.GA32781@troutmask.apl.washington.edu>
Reply-To: sgk@troutmask.apl.washington.edu
References: <20170308202417.GA23103@troutmask.apl.washington.edu>
 <20170309173152.F1269@besplex.bde.org>
 <20170309075236.GA30199@troutmask.apl.washington.edu>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20170309075236.GA30199@troutmask.apl.washington.edu>
User-Agent: Mutt/1.7.2 (2016-11-26)
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
 <freebsd-numerics.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics/>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 09 Mar 2017 15:23:09 -0000

On Wed, Mar 08, 2017 at 11:52:36PM -0800, Steve Kargl wrote:
> To give a hint at what I have been working on, I have implementations
> for sinpif(x), sinpi(x), cospif(x), and cospi(x).  For sinpif(x) and
> cospif(x) I have kernels that give correctly rounded results for 
> FLT_MIN <= x < 0x1p23 (at least on x86_64).   I also have slightly
> faster kernels that give max ULPs of about 0.5008.  For sinpi(x) and
> cospi(x), my kernels currently give about 1.25 ULP for the worse case
> error if I accumulate the polynomials in double precision.  If I 
> accumulate the polynomials in long double precision, then I get 
> correctly rounded results.  To complete the set, I was hoping to 
> work out ld80 and ld128 versions.  `ld128 is going to be problematic
> due to the absense of int128_t.
> 
> I'll send you what I have in a few days.

It is sometime amazing what happens when I sleep.  For sinpi[fl](x)
and the method I came up with one needs to do argument reduction
where x = n + r.  Here n is an integer and 0 < r < 1.  If n is even
then one is in the positive half-cyle of sin(x) and if n is odd then
one has the negative half-cycle.  For sinpif(x) this looks like

        if (ix < 0x4b000000) {          /* 1 <= |x| < 0x1p23 */
/* 1 */         ix = (uint32_t)ax;
/* 2 */         ax = ax == ix ? zero : __compute_sinpif(ax - ix);
/* 3 */         if (ix & 1) ax = -ax;
/* 4 */         return ((hx & 0x80000000) ? -ax : ax);
        }

Line 1 determines n.  Line 2 computes either sinpi(n) exactly or
sinpi(r).  Line 3 uses n odd to set the sign for the half-cycle.
Line 4 is used to set the sign from sin(-x) = -sin(x).

For double and ld80 one can use int64_t instead of uint32_t.  Last
night I realized that I don't need to use int64_t, and more importantly
I don't need int128_t.  The argument reduction can use uint32_t
everywhere.  For double, I currently have

        if (ix < 0x43300000) {          /* 1 <= |x| < 0x1p52 */
                int64_t n;
                n = (int64_t)ax;
                ax = ax == n ? zero : __compute_sinpi(ax - n);
                if (n & 1) ax = -ax;
                return ((hx & 0x80000000) ? -ax : ax);
        }

which can be written (off the top-of-head and not tested)

        if (ix < 0x43300000) {          /* 1 <= |x| < 0x1p52 */
                double volatile vx;
                uint32_t n;
                vx = ax + 0x1p52;
                vx = vx - 0x1p52;
                if (vx == ax) return(0);
                if (vx > UINT32_MAX) {   /* ax = m + n + r with m + n */
                    vx -= UINT32_MAX;    /* m = UINT32_MAX.  n is in range */
                    ax -= UINT32_MAX;    
				}
                n = (uint32_t)vx;
                ax = __compute_sinpi(ax - n);
                if (n & 1) ax = -ax;
                return ((hx & 0x80000000) ? -ax : ax);
        }

Something similar can be applied to ld128, but reduction may take
two rounds (ie., a comparison with UINT64_MAX and then UINT32_MAX).

-- 
Steve
20161221 https://www.youtube.com/watch?v=IbCHE-hONow

From owner-freebsd-numerics@freebsd.org  Thu Mar  9 15:35:25 2017
Return-Path: <owner-freebsd-numerics@freebsd.org>
Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 20036D05175
 for <freebsd-numerics@mailman.ysv.freebsd.org>;
 Thu,  9 Mar 2017 15:35:25 +0000 (UTC)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu
 [128.95.76.21])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client CN "troutmask", Issuer "troutmask" (not verified))
 by mx1.freebsd.org (Postfix) with ESMTPS id 0673636E
 for <freebsd-numerics@freebsd.org>; Thu,  9 Mar 2017 15:35:25 +0000 (UTC)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: from troutmask.apl.washington.edu (localhost [127.0.0.1])
 by troutmask.apl.washington.edu (8.15.2/8.15.2) with ESMTPS id v29FZO8g033424
 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO);
 Thu, 9 Mar 2017 07:35:24 -0800 (PST)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: (from sgk@localhost)
 by troutmask.apl.washington.edu (8.15.2/8.15.2/Submit) id v29FZNpC033420;
 Thu, 9 Mar 2017 07:35:23 -0800 (PST) (envelope-from sgk)
Date: Thu, 9 Mar 2017 07:35:23 -0800
From: Steve Kargl <sgk@troutmask.apl.washington.edu>
To: Bruce Evans <brde@optusnet.com.au>
Cc: freebsd-numerics@freebsd.org
Subject: Re: Bit twiddling question
Message-ID: <20170309153523.GA33379@troutmask.apl.washington.edu>
Reply-To: sgk@troutmask.apl.washington.edu
References: <20170308202417.GA23103@troutmask.apl.washington.edu>
 <20170309173152.F1269@besplex.bde.org>
 <20170309075236.GA30199@troutmask.apl.washington.edu>
 <20170309152307.GA32781@troutmask.apl.washington.edu>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20170309152307.GA32781@troutmask.apl.washington.edu>
User-Agent: Mutt/1.7.2 (2016-11-26)
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
 <freebsd-numerics.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics/>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 09 Mar 2017 15:35:25 -0000

On Thu, Mar 09, 2017 at 07:23:07AM -0800, Steve Kargl wrote:
> 
>         if (ix < 0x43300000) {          /* 1 <= |x| < 0x1p52 */
>                 double volatile vx;
>                 uint32_t n;
>                 vx = ax + 0x1p52;
>                 vx = vx - 0x1p52;
>                 if (vx == ax) return(0);
>                 if (vx > UINT32_MAX) {   /* ax = m + n + r with m + n */
>                     vx -= UINT32_MAX;    /* m = UINT32_MAX.  n is in range */
>                     ax -= UINT32_MAX;    
> 				}
>                 n = (uint32_t)vx;
>                 ax = __compute_sinpi(ax - n);
>                 if (n & 1) ax = -ax;
>                 return ((hx & 0x80000000) ? -ax : ax);
>         }
> 

We don't even need to use UINT32_MAX.  We can do something
like

if (vx > 0x1p30) {
   vx -= 0x1p30;
   ax -= 0x1p30;
}

so there is no conversion from uint32_t to double.  We also
probably want to minimize access to a volatile, so we would
have y = vx - 0x1p52 and use y afterwards.

-- 
Steve
20161221 https://www.youtube.com/watch?v=IbCHE-hONow

From owner-freebsd-numerics@freebsd.org  Thu Mar  9 16:07:52 2017
Return-Path: <owner-freebsd-numerics@freebsd.org>
Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 00EC1D05FE6
 for <freebsd-numerics@mailman.ysv.freebsd.org>;
 Thu,  9 Mar 2017 16:07:51 +0000 (UTC)
 (envelope-from dennis.hamilton@acm.org)
Received: from nov-007-i623.relay.mailchannels.net
 (nov-007-i623.relay.mailchannels.net [46.232.183.177])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 6C4851D4C
 for <freebsd-numerics@freebsd.org>; Thu,  9 Mar 2017 16:07:49 +0000 (UTC)
 (envelope-from dennis.hamilton@acm.org)
X-Sender-Id: a2hosting|x-authuser|himself@orcmid.com
Received: from relay.mailchannels.net (localhost [127.0.0.1])
 by relay.mailchannels.net (Postfix) with ESMTP id CA4B9120959;
 Thu,  9 Mar 2017 16:07:39 +0000 (UTC)
Received: from a2s42.a2hosting.com (unknown [100.96.135.39])
 by relay.mailchannels.net (Postfix) with ESMTPA id 2C4CB12458D;
 Thu,  9 Mar 2017 16:07:39 +0000 (UTC)
X-Sender-Id: a2hosting|x-authuser|himself@orcmid.com
Received: from a2s42.a2hosting.com (a2s42.a2hosting.com [172.20.61.168])
 (using TLSv1 with cipher DHE-RSA-AES256-SHA)
 by 0.0.0.0:2500 (trex/5.7.32); Thu, 09 Mar 2017 16:07:39 +0000
X-MC-Relay: Neutral
X-MailChannels-SenderId: a2hosting|x-authuser|himself@orcmid.com
X-MailChannels-Auth-Id: a2hosting
X-Madly-Fearful: 530ad3b634e42515_1489075659565_2445266865
X-MC-Loop-Signature: 1489075659564:3093905383
X-MC-Ingress-Time: 1489075659564
Received: from 75-172-39-61.tukw.qwest.net ([75.172.39.61]:56458
 helo=Astraendo2)
 by a2s42.a2hosting.com with esmtpsa (TLSv1:DHE-RSA-AES256-SHA:256)
 (Exim 4.87) (envelope-from <dennis.hamilton@acm.org>)
 id 1cm0bJ-003Rx9-HF; Thu, 09 Mar 2017 11:07:38 -0500
Reply-To: <dennis.hamilton@acm.org>
From: "Dennis E. Hamilton" <dennis.hamilton@acm.org>
To: <freebsd-numerics@freebsd.org>
Cc: "'Bruce Evans'" <brde@optusnet.com.au>, <sgk@troutmask.apl.washington.edu>
References: <20170308202417.GA23103@troutmask.apl.washington.edu>
 <20170309173152.F1269@besplex.bde.org>
 <20170309075236.GA30199@troutmask.apl.washington.edu>
In-Reply-To: <20170309075236.GA30199@troutmask.apl.washington.edu>
Subject: RE: Bit twiddling question
Date: Thu, 9 Mar 2017 08:07:37 -0800
Organization: NuovoDoc
Message-ID: <000001d298ef$460c42a0$d224c7e0$@acm.org>
MIME-Version: 1.0
Content-Type: text/plain;
	charset="utf-8"
Content-Transfer-Encoding: quoted-printable
X-Mailer: Microsoft Outlook 16.0
Thread-Index: AQI9VYS4o4iHSjhMB3OB5vywod5c4QGeZGK/Ad8U0Yqgmz9bkA==
Content-Language: en-us
X-AuthUser: himself@orcmid.com
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
 <freebsd-numerics.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics/>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 09 Mar 2017 16:07:52 -0000


> -----Original Message-----
> From: owner-freebsd-numerics@freebsd.org [mailto:owner-freebsd-
> numerics@freebsd.org] On Behalf Of Steve Kargl
> Sent: Wednesday, March 8, 2017 23:53
> To: Bruce Evans <brde@optusnet.com.au>
> Cc: freebsd-numerics@freebsd.org
> Subject: Re: Bit twiddling question
>=20
> On Thu, Mar 09, 2017 at 05:58:52PM +1100, Bruce Evans wrote:
> > On Wed, 8 Mar 2017, Steve Kargl wrote:
> >
> > > Suppose I have a float 'x' that I know is in the
> > > range 1 <=3D x <=3D 0x1p23 and I know that 'x' is
> > > integral, e.g., x =3D 12.000.  If I use GET_FLOAT_WORD
> > > from math_private.h, then x=3D12.000 maps to ix=3D0x41400000.
> > > Is there a bit twiddling method that I can apply to ix to
> > > unambiguously determine if x is even of odd?
> >
> > I don't know of any good method.
[orcmid]=20

If you are certain about the pre-conditions holding absolutely, figure =
out the power of 2 *float* to add to your value, such that the binary =
point is forced to be at the low-order edge of the float word.  This =
might be a bit faster than casting your float to int directly.   =20

 - Dennis

[ ... ]


From owner-freebsd-numerics@freebsd.org  Thu Mar  9 16:48:07 2017
Return-Path: <owner-freebsd-numerics@freebsd.org>
Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id EA4CBD05E57
 for <freebsd-numerics@mailman.ysv.freebsd.org>;
 Thu,  9 Mar 2017 16:48:07 +0000 (UTC)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu
 [128.95.76.21])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client CN "troutmask", Issuer "troutmask" (not verified))
 by mx1.freebsd.org (Postfix) with ESMTPS id CE8BBE48
 for <freebsd-numerics@freebsd.org>; Thu,  9 Mar 2017 16:48:07 +0000 (UTC)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: from troutmask.apl.washington.edu (localhost [127.0.0.1])
 by troutmask.apl.washington.edu (8.15.2/8.15.2) with ESMTPS id v29GltjP034945
 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO);
 Thu, 9 Mar 2017 08:47:55 -0800 (PST)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: (from sgk@localhost)
 by troutmask.apl.washington.edu (8.15.2/8.15.2/Submit) id v29GltIP034944;
 Thu, 9 Mar 2017 08:47:55 -0800 (PST) (envelope-from sgk)
Date: Thu, 9 Mar 2017 08:47:55 -0800
From: Steve Kargl <sgk@troutmask.apl.washington.edu>
To: "Dennis E. Hamilton" <dennis.hamilton@acm.org>
Cc: freebsd-numerics@freebsd.org, "'Bruce Evans'" <brde@optusnet.com.au>
Subject: Re: Bit twiddling question
Message-ID: <20170309164755.GA34821@troutmask.apl.washington.edu>
Reply-To: sgk@troutmask.apl.washington.edu
References: <20170308202417.GA23103@troutmask.apl.washington.edu>
 <20170309173152.F1269@besplex.bde.org>
 <20170309075236.GA30199@troutmask.apl.washington.edu>
 <000001d298ef$460c42a0$d224c7e0$@acm.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <000001d298ef$460c42a0$d224c7e0$@acm.org>
User-Agent: Mutt/1.7.2 (2016-11-26)
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
 <freebsd-numerics.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics/>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 09 Mar 2017 16:48:08 -0000

On Thu, Mar 09, 2017 at 08:07:37AM -0800, Dennis E. Hamilton wrote:
> 
> > -----Original Message-----
> > From: owner-freebsd-numerics@freebsd.org [mailto:owner-freebsd-
> > numerics@freebsd.org] On Behalf Of Steve Kargl
> > Sent: Wednesday, March 8, 2017 23:53
> > To: Bruce Evans <brde@optusnet.com.au>
> > Cc: freebsd-numerics@freebsd.org
> > Subject: Re: Bit twiddling question
> > 
> > On Thu, Mar 09, 2017 at 05:58:52PM +1100, Bruce Evans wrote:
> > > On Wed, 8 Mar 2017, Steve Kargl wrote:
> > >
> > > > Suppose I have a float 'x' that I know is in the
> > > > range 1 <= x <= 0x1p23 and I know that 'x' is
> > > > integral, e.g., x = 12.000.  If I use GET_FLOAT_WORD
> > > > from math_private.h, then x=12.000 maps to ix=0x41400000.
> > > > Is there a bit twiddling method that I can apply to ix to
> > > > unambiguously determine if x is even of odd?
> > >
> > > I don't know of any good method.
> 
> If you are certain about the pre-conditions holding absolutely,
> figure out the power of 2 *float* to add to your value, such
> that the binary point is forced to be at the low-order edge of
> the float word.  This might be a bit faster than casting your
> float to int directly.    

Thanks, I'll look into doing the above.  Yes, I'm certain
of the conditions.  See my other posts from earlier today.
I have a way forward.  Optimizations can come later. 

-- 
Steve
20161221 https://www.youtube.com/watch?v=IbCHE-hONow

From owner-freebsd-numerics@freebsd.org  Thu Mar  9 18:16:03 2017
Return-Path: <owner-freebsd-numerics@freebsd.org>
Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id AC82CD05002
 for <freebsd-numerics@mailman.ysv.freebsd.org>;
 Thu,  9 Mar 2017 18:16:03 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from mail107.syd.optusnet.com.au (mail107.syd.optusnet.com.au
 [211.29.132.53]) by mx1.freebsd.org (Postfix) with ESMTP id 59CC612EB
 for <freebsd-numerics@freebsd.org>; Thu,  9 Mar 2017 18:16:02 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from besplex.bde.org (c122-106-153-191.carlnfd1.nsw.optusnet.com.au
 [122.106.153.191])
 by mail107.syd.optusnet.com.au (Postfix) with ESMTPS id 984BDD55AE7;
 Fri, 10 Mar 2017 05:02:14 +1100 (AEDT)
Date: Fri, 10 Mar 2017 05:02:13 +1100 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Steve Kargl <sgk@troutmask.apl.washington.edu>
cc: Bruce Evans <brde@optusnet.com.au>, freebsd-numerics@freebsd.org
Subject: Re: Bit twiddling question
In-Reply-To: <20170309152307.GA32781@troutmask.apl.washington.edu>
Message-ID: <20170310025417.U3723@besplex.bde.org>
References: <20170308202417.GA23103@troutmask.apl.washington.edu>
 <20170309173152.F1269@besplex.bde.org>
 <20170309075236.GA30199@troutmask.apl.washington.edu>
 <20170309152307.GA32781@troutmask.apl.washington.edu>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Optus-CM-Score: 0
X-Optus-CM-Analysis: v=2.2 cv=AYLBJzfG c=1 sm=1 tr=0
 a=Tj3pCpwHnMupdyZSltBt7Q==:117 a=Tj3pCpwHnMupdyZSltBt7Q==:17
 a=kj9zAlcOel0A:10 a=MW23G5SEgc8WnkJ-59oA:9 a=CjuIK1q_8ugA:10
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
 <freebsd-numerics.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics/>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 09 Mar 2017 18:16:03 -0000

On Thu, 9 Mar 2017, Steve Kargl wrote:

> On Wed, Mar 08, 2017 at 11:52:36PM -0800, Steve Kargl wrote:
>> To give a hint at what I have been working on, I have implementations
>> for sinpif(x), sinpi(x), cospif(x), and cospi(x).  For sinpif(x) and
>> cospif(x) I have kernels that give correctly rounded results for
>> FLT_MIN <= x < 0x1p23 (at least on x86_64).   I also have slightly

I didn't see the original mail.

I'd prefer you to finish the long double versions of cosl(), sinl() and
tanl() (maybe das is responsible for the latter).  These are still too
slow and minor bugs for at least sinl() on small x, partly by not copying
the lower-precision versions enough.

>> faster kernels that give max ULPs of about 0.5008.  For sinpi(x) and

Similar to for sinf() and cosf().  The maximum for those is 0.5009, using
double precision calculations which I now know to be a mistake, and
a polynomial of degree higher than necessary for good enough rounding in
float precision and lower than necessary for perfect rounding in float
precision.

>> cospi(x), my kernels currently give about 1.25 ULP for the worse case
>> error if I accumulate the polynomials in double precision.  If I

Not too good.  I think the problem is that the coeffs for the lowest
terms are no longer exactly representable.  The linear function f(x) = a*x
is harder to evaluate accurately than exp(x) = 1 + x + ..., unless a
is exactly representable and also  has few digits (like 1 or -0.5 for
early coeffs of sin() and cos()).

>> accumulate the polynomials in long double precision, then I get
>> correctly rounded results.  To complete the set, I was hoping to

Don't do that.  It is much slower, and not available starting with
long double precision, or even starting with double precision on arches
without real long doubles.  It is also a mistake to use it for float
precision like I did for trig functions and an uncommitted version of
logf().  I managed to make the specialized logf() run 50% faster than
the less specialized log() (also uncommitted), by using double precision
and manually schedling for Athlon-XP, but modern CPUs and compilers handle
the less sprecialized log() so well that the specialized logf() is only
a couple of cycles faster.  Similarly but less pronounced for trig functions.

>> work out ld80 and ld128 versions.  `ld128 is going to be problematic
>> due to the absense of int128_t.
>>
>> I'll send you what I have in a few days.
>
> It is sometime amazing what happens when I sleep.  For sinpi[fl](x)
> and the method I came up with one needs to do argument reduction
> where x = n + r.  Here n is an integer and 0 < r < 1.  If n is even
> then one is in the positive half-cyle of sin(x) and if n is odd then
> one has the negative half-cycle.  For sinpif(x) this looks like
>        if (ix < 0x4b000000) {          /* 1 <= |x| < 0x1p23 */
> /* 1 */         ix = (uint32_t)ax;
> /* 2 */         ax = ax == ix ? zero : __compute_sinpif(ax - ix);
> /* 3 */         if (ix & 1) ax = -ax;
> /* 4 */         return ((hx & 0x80000000) ? -ax : ax);
>        }
>
> Line 1 determines n.  Line 2 computes either sinpi(n) exactly or
> sinpi(r).  Line 3 uses n odd to set the sign for the half-cycle.
> Line 4 is used to set the sign from sin(-x) = -sin(x).
>
> For double and ld80 one can use int64_t instead of uint32_t.  Last
> night I realized that I don't need to use int64_t, and more importantly
> I don't need int128_t.  The argument reduction can use uint32_t
> everywhere.  For double, I currently have
>
>        if (ix < 0x43300000) {          /* 1 <= |x| < 0x1p52 */
>                int64_t n;
>                n = (int64_t)ax;
>                ax = ax == n ? zero : __compute_sinpi(ax - n);
>                if (n & 1) ax = -ax;
>                return ((hx & 0x80000000) ? -ax : ax);
>        }
>
> which can be written (off the top-of-head and not tested)
>        if (ix < 0x43300000) {          /* 1 <= |x| < 0x1p52 */
>                double volatile vx;
>                uint32_t n;
>                vx = ax + 0x1p52;
>                vx = vx - 0x1p52;
>                if (vx == ax) return(0);
>                if (vx > UINT32_MAX) {   /* ax = m + n + r with m + n */
>                    vx -= UINT32_MAX;    /* m = UINT32_MAX.  n is in range */
>                    ax -= UINT32_MAX;
> 				}
>                n = (uint32_t)vx;
>                ax = __compute_sinpi(ax - n);
>                if (n & 1) ax = -ax;
>                return ((hx & 0x80000000) ? -ax : ax);
>        }
>
> Something similar can be applied to ld128, but reduction may take
> two rounds (ie., a comparison with UINT64_MAX and then UINT32_MAX).

I don't like that much.  The 0x1p52 magic is tricky and probably buggy.
s_rint.c uses the same magic but needs many complications to avoid
double rounding.  Similar-looking magic for trig reduction in radians
uses floating the magic value of 0x1.8p52 to bias rounding errors in
a safe way, and arranges so that rounding errors don't matter then.
Perhaps you can do the same here -- do a fixup of +-UINT32_MAX if
necessary.  This seem to be no harder than using rint(), round() or
floor().  Except for floor(), we it is hard to control whether the
rounded result is too high or too low.  Using floor:

 	prec_t n = floor(x);
 	r = x - n;

but we want to classify quadrants, and still don't have an n that will
fit in an integer, so first scale by 2 (4 quadrants of size 1/2):

 	y = x * 0.5 - floor(x * 0.5);	/* 4 quads in [2n, 2n+2) -> [0, 1) */
 	y = 4 * y;			/* -> [0, 4) */
 	int n = y;			/* quadrant */
 	r = (y - n) / 4;		/* offset in quadrant */

This is a bit slow, and too much like lgamma's method.  frexp() can be used
instead of floor() to slowly extract the fraction.  I think it can be done
using the 0x1p52 magic a bit like you say.  s_rint.c already uses this
magic, but has to work hard to handle errors in double rounding.  s_floor.c
uses it via bit fiddling.

If we replace the slow floor(z) by the 0x1p52 calculation on z, then we
get an approximation to floor(z) that is hopefully an integer (possibly
an integer + 0.5) within 1 or 2 of the correct result.  Adjust it
until it is correct.

Bruce

From owner-freebsd-numerics@freebsd.org  Thu Mar  9 19:51:35 2017
Return-Path: <owner-freebsd-numerics@freebsd.org>
Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id D77DBD0470F
 for <freebsd-numerics@mailman.ysv.freebsd.org>;
 Thu,  9 Mar 2017 19:51:35 +0000 (UTC)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu
 [128.95.76.21])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client CN "troutmask", Issuer "troutmask" (not verified))
 by mx1.freebsd.org (Postfix) with ESMTPS id C16C9989
 for <freebsd-numerics@freebsd.org>; Thu,  9 Mar 2017 19:51:35 +0000 (UTC)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: from troutmask.apl.washington.edu (localhost [127.0.0.1])
 by troutmask.apl.washington.edu (8.15.2/8.15.2) with ESMTPS id v29JpYWr036829
 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO);
 Thu, 9 Mar 2017 11:51:34 -0800 (PST)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: (from sgk@localhost)
 by troutmask.apl.washington.edu (8.15.2/8.15.2/Submit) id v29JpYdT036828;
 Thu, 9 Mar 2017 11:51:34 -0800 (PST) (envelope-from sgk)
Date: Thu, 9 Mar 2017 11:51:34 -0800
From: Steve Kargl <sgk@troutmask.apl.washington.edu>
To: Bruce Evans <brde@optusnet.com.au>
Cc: freebsd-numerics@freebsd.org
Subject: Re: Bit twiddling question
Message-ID: <20170309195134.GA36213@troutmask.apl.washington.edu>
Reply-To: sgk@troutmask.apl.washington.edu
References: <20170308202417.GA23103@troutmask.apl.washington.edu>
 <20170309173152.F1269@besplex.bde.org>
 <20170309075236.GA30199@troutmask.apl.washington.edu>
 <20170309152307.GA32781@troutmask.apl.washington.edu>
 <20170310025417.U3723@besplex.bde.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20170310025417.U3723@besplex.bde.org>
User-Agent: Mutt/1.7.2 (2016-11-26)
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
 <freebsd-numerics.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics/>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 09 Mar 2017 19:51:35 -0000

On Fri, Mar 10, 2017 at 05:02:13AM +1100, Bruce Evans wrote:
> On Thu, 9 Mar 2017, Steve Kargl wrote:
> 
> > On Wed, Mar 08, 2017 at 11:52:36PM -0800, Steve Kargl wrote:
> >> To give a hint at what I have been working on, I have implementations
> >> for sinpif(x), sinpi(x), cospif(x), and cospi(x).  For sinpif(x) and
> >> cospif(x) I have kernels that give correctly rounded results for
> >> FLT_MIN <= x < 0x1p23 (at least on x86_64).   I also have slightly
> 
> I didn't see the original mail.

Odd.  Maybe munched by a spam filter.

> I'd prefer you to finish the long double versions of cosl(), sinl() and
> tanl() (maybe das is responsible for the latter).  These are still too
> slow and minor bugs for at least sinl() on small x, partly by not copying
> the lower-precision versions enough.

sinl, cosl, and tanl are good enough for my purposes at the moment.
tgammal and powl are higher on the list of things to fix.  But, 
sinpi[fl] and company are high on the list because: (1) I can use
those in my actual job; and (2),  the desirable property that
sinpi(x) = 0 whereas sin(pi*x) = 0.mumbleE-17 (or even E-16) for
integer x.

> >> faster kernels that give max ULPs of about 0.5008.  For sinpi(x) and
> 
> Similar to for sinf() and cosf().  The maximum for those is 0.5009, using
> double precision calculations which I now know to be a mistake, and
> a polynomial of degree higher than necessary for good enough rounding in
> float precision and lower than necessary for perfect rounding in float
> precision.

I have three polynomials for __kernel_sinpif() over x in [0,1/4].
All are accumulated in double precision.  With a 6th order polynomial
(12 double FOP), I get a max ULP of 0.5.  With a 5th order polynomial
(10 double FOP), I get a max ULP of 0.50000006.  With the 4th order
polynomial (8 double FOP), the max ULP is 0.50007051.  Note, in my
Remes algorithm I used f(x) = sin(pi*x) / x.  I still haven't found a
way to round 20*53-bit coefficients to a desired precision, which here
is 53 bits.  (Well, I have thought about a recursive Remes algorithm,
but it requires much more work/time that I can commit.)

I also have three polynomials for __kernel_cospif() over the
same interval.  6th order gives Max ULP of 0.5, the 5th order
gives 0.50000089, and the 4th order gives 0.5008257.

> >> cospi(x), my kernels currently give about 1.25 ULP for the worse case
> >> error if I accumulate the polynomials in double precision.  If I
> 
> Not too good.  I think the problem is that the coeffs for the lowest
> terms are no longer exactly representable.  The linear function f(x) = a*x
> is harder to evaluate accurately than exp(x) = 1 + x + ..., unless a
> is exactly representable and also  has few digits (like 1 or -0.5 for
> early coeffs of sin() and cos()).

I agree.  I'm still working on the double routines and starting to
contemplate the ld80 and ld128 ones.

> >> accumulate the polynomials in long double precision, then I get
> >> correctly rounded results.  To complete the set, I was hoping to
> 
> Don't do that.  It is much slower, and not available starting with
> long double precision, or even starting with double precision on arches
> without real long doubles.  It is also a mistake to use it for float
> precision like I did for trig functions and an uncommitted version of
> logf().  I managed to make the specialized logf() run 50% faster than
> the less specialized log() (also uncommitted), by using double precision
> and manually schedling for Athlon-XP, but modern CPUs and compilers handle
> the less sprecialized log() so well that the specialized logf() is only
> a couple of cycles faster.  Similarly but less pronounced for trig functions.

Don't worry.  I have no intention to use long double to accumulate the
polynomials.  I was just curious to see if the extra 11-bits would
reduce the ulp from 1.25 to something below 1.  I was surprised at
how well those 11 bits worked.  I also tried fma() in my Horner's
rule representation of the polynomial, but that did not help.

> >> work out ld80 and ld128 versions.  `ld128 is going to be problematic
> >> due to the absense of int128_t.
> >>
> >> I'll send you what I have in a few days.
> >
> > It is sometime amazing what happens when I sleep.  For sinpi[fl](x)
> > and the method I came up with one needs to do argument reduction
> > where x = n + r.  Here n is an integer and 0 < r < 1.  If n is even
> > then one is in the positive half-cyle of sin(x) and if n is odd then
> > one has the negative half-cycle.  For sinpif(x) this looks like
> >        if (ix < 0x4b000000) {          /* 1 <= |x| < 0x1p23 */
> > /* 1 */         ix = (uint32_t)ax;
> > /* 2 */         ax = ax == ix ? zero : __compute_sinpif(ax - ix);
> > /* 3 */         if (ix & 1) ax = -ax;
> > /* 4 */         return ((hx & 0x80000000) ? -ax : ax);
> >        }
> >
> > Line 1 determines n.  Line 2 computes either sinpi(n) exactly or
> > sinpi(r).  Line 3 uses n odd to set the sign for the half-cycle.
> > Line 4 is used to set the sign from sin(-x) = -sin(x).
> >
> > For double and ld80 one can use int64_t instead of uint32_t.  Last
> > night I realized that I don't need to use int64_t, and more importantly
> > I don't need int128_t.  The argument reduction can use uint32_t
> > everywhere.  For double, I currently have
> >
> >        if (ix < 0x43300000) {          /* 1 <= |x| < 0x1p52 */
> >                int64_t n;
> >                n = (int64_t)ax;
> >                ax = ax == n ? zero : __compute_sinpi(ax - n);
> >                if (n & 1) ax = -ax;
> >                return ((hx & 0x80000000) ? -ax : ax);
> >        }
> >
> > which can be written (off the top-of-head and not tested)
> >        if (ix < 0x43300000) {          /* 1 <= |x| < 0x1p52 */
> >                double volatile vx;
> >                uint32_t n;
> >                vx = ax + 0x1p52;
> >                vx = vx - 0x1p52;
> >                if (vx == ax) return(0);
> >                if (vx > UINT32_MAX) {   /* ax = m + n + r with m + n */
> >                    vx -= UINT32_MAX;    /* m = UINT32_MAX.  n is in range */
> >                    ax -= UINT32_MAX;
> > 				}
> >                n = (uint32_t)vx;
> >                ax = __compute_sinpi(ax - n);
> >                if (n & 1) ax = -ax;
> >                return ((hx & 0x80000000) ? -ax : ax);
> >        }
> >
> > Something similar can be applied to ld128, but reduction may take
> > two rounds (ie., a comparison with UINT64_MAX and then UINT32_MAX).
> 
> I don't like that much.  The 0x1p52 magic is tricky and probably buggy.
> s_rint.c uses the same magic but needs many complications to avoid
> double rounding.

Yep. Seems to have some issue.  From my working copy of s_sinpif(x),
I tried

#if 0
		ix = (uint32_t)ax;
		a = ax == ix ? zero : __compute_sinpif(ax - ix);
		if (ix & 1) ax = -ax;
		return ((hx & 0x80000000) ? -ax : ax);
#else
		volatile float vx;
		float y;
		vx = ax + 0x1p23F;
		y = vx - 0x1p23F;
		ix = (uint32_t)y;
		ax = ax == y ? zero : __compute_sinpif(ax - ix);
		if (ix & 1) ax = -ax;
		return ((hx & 0x80000000) ? -ax : ax);
#endif	

My max ULP went from 0.5 to 0.50398505 for exhaustive testing
in the interval [1,100].  If I grab the worse case x and use the
'#if 1' code, I see

./testf -S -a 1.50121641e+00f
a =  1.50121641e+00f, /* 0x3fc027dc */
sinpif(a) = -9.99992669e-01f, /* 0xbf7fff85 */
sin(pi*a) = -9.99992669e-01f, /* 0xbf7fff85 */  MPFR 96-bits rounded 24-bits.
ULP: 0.49601495

The '#if 0' code gives

 ./testf -S -a 1.50121641e+00f
a =  1.50121641e+00f, /* 0x3fc027dc */
sinpif(a) = -9.99992728e-01f, /* 0xbf7fff86 */
sin(pi*a) = -9.99992669e-01f, /* 0xbf7fff85 */  MPFR 96-bits rounded 24-bits.
ULP: 0.50398505

> Similar-looking magic for trig reduction in radians
> uses floating the magic value of 0x1.8p52 to bias rounding errors in
> a safe way, and arranges so that rounding errors don't matter then.
> Perhaps you can do the same here -- do a fixup of +-UINT32_MAX if
> necessary.  This seem to be no harder than using rint(), round() or
> floor().  Except for floor(), we it is hard to control whether the
> rounded result is too high or too low.  Using floor:
> 
>  	prec_t n = floor(x);
>  	r = x - n;
> 
> but we want to classify quadrants, and still don't have an n that will
> fit in an integer, so first scale by 2 (4 quadrants of size 1/2):
> 
>  	y = x * 0.5 - floor(x * 0.5);	/* 4 quads in [2n, 2n+2) -> [0, 1) */
>  	y = 4 * y;			/* -> [0, 4) */
>  	int n = y;			/* quadrant */
>  	r = (y - n) / 4;		/* offset in quadrant */
> 
> This is a bit slow, and too much like lgamma's method.

I did look over what you did in lgamma[f], but forgot the logic
behind the reduction into octants.  I suppose that I'm more
concern with 'make it work', 'make it correct', and then I
will/might worry about 'make it fast'.  But, I don't want it to be
too slow, so I'm trying to avoid function calls and reclassification
of x as +-inf or nan.  Heck, r = remquof(ax, vx, &n) should work,
but there's that speed issue.

-- 
Steve
20161221 https://www.youtube.com/watch?v=IbCHE-hONow

From owner-freebsd-numerics@freebsd.org  Thu Mar  9 20:14:10 2017
Return-Path: <owner-freebsd-numerics@freebsd.org>
Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id B3BB1D04CEE
 for <freebsd-numerics@mailman.ysv.freebsd.org>;
 Thu,  9 Mar 2017 20:14:10 +0000 (UTC)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu
 [128.95.76.21])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client CN "troutmask", Issuer "troutmask" (not verified))
 by mx1.freebsd.org (Postfix) with ESMTPS id 7A53D1505
 for <freebsd-numerics@freebsd.org>; Thu,  9 Mar 2017 20:14:10 +0000 (UTC)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: from troutmask.apl.washington.edu (localhost [127.0.0.1])
 by troutmask.apl.washington.edu (8.15.2/8.15.2) with ESMTPS id v29KE9sg037304
 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO);
 Thu, 9 Mar 2017 12:14:09 -0800 (PST)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: (from sgk@localhost)
 by troutmask.apl.washington.edu (8.15.2/8.15.2/Submit) id v29KE9dt037303;
 Thu, 9 Mar 2017 12:14:09 -0800 (PST) (envelope-from sgk)
Date: Thu, 9 Mar 2017 12:14:09 -0800
From: Steve Kargl <sgk@troutmask.apl.washington.edu>
To: Bruce Evans <brde@optusnet.com.au>
Cc: freebsd-numerics@freebsd.org
Subject: Re: Bit twiddling question
Message-ID: <20170309201409.GA37219@troutmask.apl.washington.edu>
Reply-To: sgk@troutmask.apl.washington.edu
References: <20170308202417.GA23103@troutmask.apl.washington.edu>
 <20170309173152.F1269@besplex.bde.org>
 <20170309075236.GA30199@troutmask.apl.washington.edu>
 <20170309152307.GA32781@troutmask.apl.washington.edu>
 <20170310025417.U3723@besplex.bde.org>
 <20170309195134.GA36213@troutmask.apl.washington.edu>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20170309195134.GA36213@troutmask.apl.washington.edu>
User-Agent: Mutt/1.7.2 (2016-11-26)
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
 <freebsd-numerics.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics/>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 09 Mar 2017 20:14:10 -0000

On Thu, Mar 09, 2017 at 11:51:34AM -0800, Steve Kargl wrote:
> On Fri, Mar 10, 2017 at 05:02:13AM +1100, Bruce Evans wrote:
> > >
> > > Something similar can be applied to ld128, but reduction may take
> > > two rounds (ie., a comparison with UINT64_MAX and then UINT32_MAX).
> > 
> > I don't like that much.  The 0x1p52 magic is tricky and probably buggy.
> > s_rint.c uses the same magic but needs many complications to avoid
> > double rounding.
> 
> Yep. Seems to have some issue.  From my working copy of s_sinpif(x),
> I tried
> 
> #if 0
> 		ix = (uint32_t)ax;
> 		a = ax == ix ? zero : __compute_sinpif(ax - ix);
> 		if (ix & 1) ax = -ax;
> 		return ((hx & 0x80000000) ? -ax : ax);
> #else
> 		volatile float vx;
> 		float y;
> 		vx = ax + 0x1p23F;
> 		y = vx - 0x1p23F;
> 		ix = (uint32_t)y;
> 		ax = ax == y ? zero : __compute_sinpif(ax - ix);
> 		if (ix & 1) ax = -ax;
> 		return ((hx & 0x80000000) ? -ax : ax);
> #endif	
> 
> My max ULP went from 0.5 to 0.50398505 for exhaustive testing
> in the interval [1,100].  If I grab the worse case x and use the
> '#if 1' code, I see
> 
> ./testf -S -a 1.50121641e+00f
> a =  1.50121641e+00f, /* 0x3fc027dc */
> sinpif(a) = -9.99992669e-01f, /* 0xbf7fff85 */
> sin(pi*a) = -9.99992669e-01f, /* 0xbf7fff85 */  MPFR 96-bits rounded 24-bits.
> ULP: 0.49601495
> 
> The '#if 0' code gives
> 
>  ./testf -S -a 1.50121641e+00f
> a =  1.50121641e+00f, /* 0x3fc027dc */
> sinpif(a) = -9.99992728e-01f, /* 0xbf7fff86 */
> sin(pi*a) = -9.99992669e-01f, /* 0xbf7fff85 */  MPFR 96-bits rounded 24-bits.
> ULP: 0.50398505
> 

Certainly looks like a double rounding issue.  An instrumented copy
of the code shows that y = 2 instead of 1 in the above.

a =  1.50121641e+00f, /* 0x3fc027dc */
1.50121641e+00
2.00000000e+00
sinpif(a) = -9.99992728e-01f, /* 0xbf7fff86 */
sin(pi*a) = -9.99992669e-01f, /* 0xbf7fff85 */
ULP: 0.50398505

-- 
Steve
20161221 https://www.youtube.com/watch?v=IbCHE-hONow

From owner-freebsd-numerics@freebsd.org  Fri Mar 10 06:46:43 2017
Return-Path: <owner-freebsd-numerics@freebsd.org>
Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 8ACEFD05924
 for <freebsd-numerics@mailman.ysv.freebsd.org>;
 Fri, 10 Mar 2017 06:46:43 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from mail106.syd.optusnet.com.au (mail106.syd.optusnet.com.au
 [211.29.132.42]) by mx1.freebsd.org (Postfix) with ESMTP id 380861C3A
 for <freebsd-numerics@freebsd.org>; Fri, 10 Mar 2017 06:46:42 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from besplex.bde.org (c122-106-153-191.carlnfd1.nsw.optusnet.com.au
 [122.106.153.191])
 by mail106.syd.optusnet.com.au (Postfix) with ESMTPS id B191E3C1536;
 Fri, 10 Mar 2017 17:46:35 +1100 (AEDT)
Date: Fri, 10 Mar 2017 17:46:34 +1100 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Steve Kargl <sgk@troutmask.apl.washington.edu>
cc: freebsd-numerics@freebsd.org
Subject: Re: Bit twiddling question
In-Reply-To: <20170309195134.GA36213@troutmask.apl.washington.edu>
Message-ID: <20170310165638.D1143@besplex.bde.org>
References: <20170308202417.GA23103@troutmask.apl.washington.edu>
 <20170309173152.F1269@besplex.bde.org>
 <20170309075236.GA30199@troutmask.apl.washington.edu>
 <20170309152307.GA32781@troutmask.apl.washington.edu>
 <20170310025417.U3723@besplex.bde.org>
 <20170309195134.GA36213@troutmask.apl.washington.edu>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Optus-CM-Score: 0
X-Optus-CM-Analysis: v=2.2 cv=VbSHBBh9 c=1 sm=1 tr=0
 a=Tj3pCpwHnMupdyZSltBt7Q==:117 a=Tj3pCpwHnMupdyZSltBt7Q==:17
 a=kj9zAlcOel0A:10 a=V3hBaD5_8KBfC0JK-0QA:9 a=CjuIK1q_8ugA:10
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
 <freebsd-numerics.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics/>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 10 Mar 2017 06:46:43 -0000

On Thu, 9 Mar 2017, Steve Kargl wrote:

> On Fri, Mar 10, 2017 at 05:02:13AM +1100, Bruce Evans wrote:
>> On Thu, 9 Mar 2017, Steve Kargl wrote:
>>
>>> On Wed, Mar 08, 2017 at 11:52:36PM -0800, Steve Kargl wrote:
>>>> To give a hint at what I have been working on, I have implementations
>>>> for sinpif(x), sinpi(x), cospif(x), and cospi(x).  For sinpif(x) and
>>>> cospif(x) I have kernels that give correctly rounded results for
>>>> FLT_MIN <= x < 0x1p23 (at least on x86_64).   I also have slightly
>>
>> I didn't see the original mail.
>
> Odd.  Maybe munched by a spam filter.

It was delivered out of order.

>> I'd prefer you to finish the long double versions of cosl(), sinl() and
>> tanl() (maybe das is responsible for the latter).  These are still too
>> slow and minor bugs for at least sinl() on small x, partly by not copying
>> the lower-precision versions enough.
>
> sinl, cosl, and tanl are good enough for my purposes at the moment.
> tgammal and powl are higher on the list of things to fix.  But,

Only powl is very important, but everyone avoids it because it is harder.

>*  ...
>>> which can be written (off the top-of-head and not tested)
>>>        if (ix < 0x43300000) {          /* 1 <= |x| < 0x1p52 */
>>>                double volatile vx;
>>>                uint32_t n;
>>>                vx = ax + 0x1p52;
>>>                vx = vx - 0x1p52;
>>>                if (vx == ax) return(0);
>>>                if (vx > UINT32_MAX) {   /* ax = m + n + r with m + n */
>>>                    vx -= UINT32_MAX;    /* m = UINT32_MAX.  n is in range */
>>>                    ax -= UINT32_MAX;
>>> 				}
>>>                n = (uint32_t)vx;
>>>                ax = __compute_sinpi(ax - n);
>>>                if (n & 1) ax = -ax;
>>>                return ((hx & 0x80000000) ? -ax : ax);
>>>        }
>>>
>>> Something similar can be applied to ld128, but reduction may take
>>> two rounds (ie., a comparison with UINT64_MAX and then UINT32_MAX).
>>
>> I don't like that much.  The 0x1p52 magic is tricky and probably buggy.
>> s_rint.c uses the same magic but needs many complications to avoid
>> double rounding.
>
> Yep. Seems to have some issue.  From my working copy of s_sinpif(x),
> I tried
> ...
> My max ULP went from 0.5 to 0.50398505 for exhaustive testing
> in the interval [1,100].  If I grab the worse case x and use the
> '#if 1' code, I see
>
>> Similar-looking magic for trig reduction in radians
>> uses floating the magic value of 0x1.8p52 to bias rounding errors in
>> a safe way, and arranges so that rounding errors don't matter then.
>> ....
>>  	y = x * 0.5 - floor(x * 0.5);	/* 4 quads in [2n, 2n+2) -> [0, 1) */
>>  	y = 4 * y;			/* -> [0, 4) */
>>  	int n = y;			/* quadrant */
>>  	r = (y - n) / 4;		/* offset in quadrant */
>>
>> This is a bit slow, and too much like lgamma's method.
>
> I did look over what you did in lgamma[f], but forgot the logic
> behind the reduction into octants.  I suppose that I'm more
> concern with 'make it work', 'make it correct', and then I
> will/might worry about 'make it fast'.  But, I don't want it to be
> too slow, so I'm trying to avoid function calls and reclassification
> of x as +-inf or nan.  Heck, r = remquof(ax, vx, &n) should work,
> but there's that speed issue.

In fact, we already did the work to optimize this using the 0x1p52
magic for lgamma, and even committed it.

fdlibm lgamma used a messy combination of multiplications by 0.5 and
2.0, 2 floor()s, 1 half of the 0x1p52 magic and bit fiddling to do
the other half of the 0x1p52 magic.  You simplified this, and we
eventually optimized it to use no floors (but the equivalent of
2 rint()s on |x| and 4*|x| implemented using full 0x1p52 magic) and
different bit fiddling.  Now I fear that the 0x1p52 doesn't work
in double rounding cases.  Hopefully it works like it does for trig
functions -- at worst there is an off-by 1 error in the quadrant
but here is a compensating tiny error in the offset.  E.g., the
closest value to pi/4 might be represented as octant 0 with offset
pi/4 + epsilon or octant 1 with offset epsilon (the lowest level
must reduce to octants, not quadrants).  Then the poly much just
approximate a little above pi/4 in case epsilon > 0.  Higher levels
also see the off-by-1 error but everything is consistent.

This explains your larger error -- the accuracy of the poly breaks
down not far above pi/4, and even epsilon above pi/4 there may be
a near-halfway case that is relatively inaccurate.

I think the reduction for lgamma can be further improved by reducing
directly to octants.

sin_pi() in lgamma differs from a perfect general sin_pi() only in
the following respects:
- the general function has to handle large args, most are even integers
   so sin_pi() on them is 0.  Some are odd integers.  The 0x1p52 magic
   breaks down at about the same point as where all args are integers.
- the perfect general function has to use special polys and special
   poly evaluation methods since pi*x is too inaccurate without extra
   precision.

Reduction for sind() is interesting.  sind(x) = sin_pi(x/180) in
infinite precision, but the division is hard to do accurately and
efficiently.  So sind(x) is only easy for small x.

C11 doesn't have sincos*() or sind*() to make us more incomplete.

Bruce

From owner-freebsd-numerics@freebsd.org  Fri Mar 10 07:21:43 2017
Return-Path: <owner-freebsd-numerics@freebsd.org>
Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id A0B2CD021AC
 for <freebsd-numerics@mailman.ysv.freebsd.org>;
 Fri, 10 Mar 2017 07:21:43 +0000 (UTC)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu
 [128.95.76.21])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client CN "troutmask", Issuer "troutmask" (not verified))
 by mx1.freebsd.org (Postfix) with ESMTPS id 896F61A36
 for <freebsd-numerics@freebsd.org>; Fri, 10 Mar 2017 07:21:43 +0000 (UTC)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: from troutmask.apl.washington.edu (localhost [127.0.0.1])
 by troutmask.apl.washington.edu (8.15.2/8.15.2) with ESMTPS id v2A7LbaU043765
 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO);
 Thu, 9 Mar 2017 23:21:37 -0800 (PST)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: (from sgk@localhost)
 by troutmask.apl.washington.edu (8.15.2/8.15.2/Submit) id v2A7LaDR043764;
 Thu, 9 Mar 2017 23:21:36 -0800 (PST) (envelope-from sgk)
Date: Thu, 9 Mar 2017 23:21:36 -0800
From: Steve Kargl <sgk@troutmask.apl.washington.edu>
To: Bruce Evans <brde@optusnet.com.au>
Cc: freebsd-numerics@freebsd.org
Subject: Re: Bit twiddling question
Message-ID: <20170310072136.GA43725@troutmask.apl.washington.edu>
Reply-To: sgk@troutmask.apl.washington.edu
References: <20170308202417.GA23103@troutmask.apl.washington.edu>
 <20170309173152.F1269@besplex.bde.org>
 <20170309075236.GA30199@troutmask.apl.washington.edu>
 <20170309152307.GA32781@troutmask.apl.washington.edu>
 <20170310025417.U3723@besplex.bde.org>
 <20170309195134.GA36213@troutmask.apl.washington.edu>
 <20170310165638.D1143@besplex.bde.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20170310165638.D1143@besplex.bde.org>
User-Agent: Mutt/1.7.2 (2016-11-26)
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
 <freebsd-numerics.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics/>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 10 Mar 2017 07:21:43 -0000

On Fri, Mar 10, 2017 at 05:46:34PM +1100, Bruce Evans wrote:
> On Thu, 9 Mar 2017, Steve Kargl wrote:
> 
> > On Fri, Mar 10, 2017 at 05:02:13AM +1100, Bruce Evans wrote:
> >> On Thu, 9 Mar 2017, Steve Kargl wrote:
> >>
> >>> On Wed, Mar 08, 2017 at 11:52:36PM -0800, Steve Kargl wrote:
> >>>> To give a hint at what I have been working on, I have implementations
> >>>> for sinpif(x), sinpi(x), cospif(x), and cospi(x).  For sinpif(x) and
> >>>> cospif(x) I have kernels that give correctly rounded results for
> >>>> FLT_MIN <= x < 0x1p23 (at least on x86_64).   I also have slightly
> >>
> >> I didn't see the original mail.
> >
> > Odd.  Maybe munched by a spam filter.
> 
> It was delivered out of order.
> 
> >> I'd prefer you to finish the long double versions of cosl(), sinl() and
> >> tanl() (maybe das is responsible for the latter).  These are still too
> >> slow and minor bugs for at least sinl() on small x, partly by not copying
> >> the lower-precision versions enough.
> >
> > sinl, cosl, and tanl are good enough for my purposes at the moment.
> > tgammal and powl are higher on the list of things to fix.  But,
> 
> Only powl is very important, but everyone avoids it because it is harder.

I have a partially written powl in my tree.  I'll probably finish
it someday.

> 
> >*  ...
> >>> which can be written (off the top-of-head and not tested)
> >>>        if (ix < 0x43300000) {          /* 1 <= |x| < 0x1p52 */
> >>>                double volatile vx;
> >>>                uint32_t n;
> >>>                vx = ax + 0x1p52;
> >>>                vx = vx - 0x1p52;
> >>>                if (vx == ax) return(0);
> >>>                if (vx > UINT32_MAX) {   /* ax = m + n + r with m + n */
> >>>                    vx -= UINT32_MAX;    /* m = UINT32_MAX.  n is in range */
> >>>                    ax -= UINT32_MAX;
> >>> 				}
> >>>                n = (uint32_t)vx;
> >>>                ax = __compute_sinpi(ax - n);
> >>>                if (n & 1) ax = -ax;
> >>>                return ((hx & 0x80000000) ? -ax : ax);
> >>>        }
> >>>
> >>> Something similar can be applied to ld128, but reduction may take
> >>> two rounds (ie., a comparison with UINT64_MAX and then UINT32_MAX).
> >>
> >> I don't like that much.  The 0x1p52 magic is tricky and probably buggy.
> >> s_rint.c uses the same magic but needs many complications to avoid
> >> double rounding.
> >
> > Yep. Seems to have some issue.  From my working copy of s_sinpif(x),
> > I tried
> > ...
> > My max ULP went from 0.5 to 0.50398505 for exhaustive testing
> > in the interval [1,100].  If I grab the worse case x and use the
> > '#if 1' code, I see
> >
> >> Similar-looking magic for trig reduction in radians
> >> uses floating the magic value of 0x1.8p52 to bias rounding errors in
> >> a safe way, and arranges so that rounding errors don't matter then.
> >> ....
> >>  	y = x * 0.5 - floor(x * 0.5);	/* 4 quads in [2n, 2n+2) -> [0, 1) */
> >>  	y = 4 * y;			/* -> [0, 4) */
> >>  	int n = y;			/* quadrant */
> >>  	r = (y - n) / 4;		/* offset in quadrant */
> >>
> >> This is a bit slow, and too much like lgamma's method.
> >
> > I did look over what you did in lgamma[f], but forgot the logic
> > behind the reduction into octants.  I suppose that I'm more
> > concern with 'make it work', 'make it correct', and then I
> > will/might worry about 'make it fast'.  But, I don't want it to be
> > too slow, so I'm trying to avoid function calls and reclassification
> > of x as +-inf or nan.  Heck, r = remquof(ax, vx, &n) should work,
> > but there's that speed issue.
> 
> In fact, we already did the work to optimize this using the 0x1p52
> magic for lgamma, and even committed it.
> 
> fdlibm lgamma used a messy combination of multiplications by 0.5 and
> 2.0, 2 floor()s, 1 half of the 0x1p52 magic and bit fiddling to do
> the other half of the 0x1p52 magic.  You simplified this, and we
> eventually optimized it to use no floors (but the equivalent of
> 2 rint()s on |x| and 4*|x| implemented using full 0x1p52 magic) and
> different bit fiddling.  Now I fear that the 0x1p52 doesn't work
> in double rounding cases.

At least, in my sinpif() the double rounding appears near x = n.5 
where I get n+1 instead of the intended n.  I'll need to recheck
lgammaf, but I recall that we are working with |x|.  This means
we can test for n < |x|.  If we get n+1 then double rounding
occurred. 

> Hopefully it works like it does for trig
> functions -- at worst there is an off-by 1 error in the quadrant
> but here is a compensating tiny error in the offset.  E.g., the
> closest value to pi/4 might be represented as octant 0 with offset
> pi/4 + epsilon or octant 1 with offset epsilon (the lowest level
> must reduce to octants, not quadrants).  Then the poly much just
> approximate a little above pi/4 in case epsilon > 0.  Higher levels
> also see the off-by-1 error but everything is consistent.
> 
> This explains your larger error -- the accuracy of the poly breaks
> down not far above pi/4, and even epsilon above pi/4 there may be
> a near-halfway case that is relatively inaccurate.
> 

Well, I stared at s_floorf.c and s_floor.c today.  For sinpi,
I'm interested in 1 <= x < 0x1p52 (or 23 or 63 or 112 for other
precisions), which significantly reduces the needed bit twiddling.
For double, I need to do

        if (ix < 0x43300000) {          /* 1 <= |x| < 0x1p52 */
/* Reduced from s_floor.c from here ... */
                j0 = ((ix >> 20) & 0x7ff) - 0x3ff;
                if (j0 < 20) {
                        ix &= ~(0x000fffff >> j0);
                        lx = 0;
                } else {
                        i = ((uint32_t)(0xffffffff)) >> (j0 - 20);
                        lx &= ~i;
                }
                INSERT_WORDS(x,ix,lx);
/* ... to here. */
                ax -= x;        /* r = |x| - floor(|n|) */
                if (ax == 0) {
                        ax = zero;  /* Make sure sign of zero is +. */
                } else {
                        ax = __compute_sinpi(ax);
/* Here I determine if integer part of ax is even or odd. */
                        if (x > 0x1p30) x -= 0x1p30;
                        lx = (uint32_t)x;
                        if (lx & 1) ax = -ax;
                }
                return ((hx & 0x80000000) ? -ax : ax);
        }
 
-- 
Steve
20161221 https://www.youtube.com/watch?v=IbCHE-hONow