From owner-freebsd-numerics@freebsd.org Wed Mar 8 20:24:24 2017 Return-Path: Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 9C3D4D030DE for ; Wed, 8 Mar 2017 20:24:24 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu [128.95.76.21]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "troutmask", Issuer "troutmask" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 82D4419E7 for ; Wed, 8 Mar 2017 20:24:24 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (localhost [127.0.0.1]) by troutmask.apl.washington.edu (8.15.2/8.15.2) with ESMTPS id v28KOHwU023196 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO) for ; Wed, 8 Mar 2017 12:24:17 -0800 (PST) (envelope-from sgk@troutmask.apl.washington.edu) Received: (from sgk@localhost) by troutmask.apl.washington.edu (8.15.2/8.15.2/Submit) id v28KOHBY023195 for freebsd-numerics@freebsd.org; Wed, 8 Mar 2017 12:24:17 -0800 (PST) (envelope-from sgk) Date: Wed, 8 Mar 2017 12:24:17 -0800 From: Steve Kargl To: freebsd-numerics@freebsd.org Subject: Bit twiddling question Message-ID: <20170308202417.GA23103@troutmask.apl.washington.edu> Reply-To: sgk@troutmask.apl.washington.edu MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.7.2 (2016-11-26) X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 08 Mar 2017 20:24:24 -0000 Suppose I have a float 'x' that I know is in the range 1 <= x <= 0x1p23 and I know that 'x' is integral, e.g., x = 12.000. If I use GET_FLOAT_WORD from math_private.h, then x=12.000 maps to ix=0x41400000. Is there a bit twiddling method that I can apply to ix to unambiguously determine if x is even of odd? Yes, I know I can do float x; int32_t ix; ix = (int32_t)x; and then test (ix & 1). But, this does not generalize to the case of long double on a ld128 architecture. That is, if I have 1 <= x < 1xp112, then I would need to have long double x; int128_t ix; ix = (int128_t)x; and AFAICT sparc64 doesn't have an int128_t. -- Steve 20161221 https://www.youtube.com/watch?v=IbCHE-hONow From owner-freebsd-numerics@freebsd.org Thu Mar 9 06:59:02 2017 Return-Path: Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id F1D47D04409 for ; Thu, 9 Mar 2017 06:59:02 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail105.syd.optusnet.com.au (mail105.syd.optusnet.com.au [211.29.132.249]) by mx1.freebsd.org (Postfix) with ESMTP id BDA561A94 for ; Thu, 9 Mar 2017 06:59:02 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from besplex.bde.org (c122-106-153-191.carlnfd1.nsw.optusnet.com.au [122.106.153.191]) by mail105.syd.optusnet.com.au (Postfix) with ESMTPS id 9578F1044931; Thu, 9 Mar 2017 17:58:53 +1100 (AEDT) Date: Thu, 9 Mar 2017 17:58:52 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Steve Kargl cc: freebsd-numerics@freebsd.org Subject: Re: Bit twiddling question In-Reply-To: <20170308202417.GA23103@troutmask.apl.washington.edu> Message-ID: <20170309173152.F1269@besplex.bde.org> References: <20170308202417.GA23103@troutmask.apl.washington.edu> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.2 cv=KeqiiUQD c=1 sm=1 tr=0 a=Tj3pCpwHnMupdyZSltBt7Q==:117 a=Tj3pCpwHnMupdyZSltBt7Q==:17 a=kj9zAlcOel0A:10 a=WF-Rw4ERh08rOXangSMA:9 a=CjuIK1q_8ugA:10 X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 09 Mar 2017 06:59:03 -0000 On Wed, 8 Mar 2017, Steve Kargl wrote: > Suppose I have a float 'x' that I know is in the > range 1 <= x <= 0x1p23 and I know that 'x' is > integral, e.g., x = 12.000. If I use GET_FLOAT_WORD > from math_private.h, then x=12.000 maps to ix=0x41400000. > Is there a bit twiddling method that I can apply to ix to > unambiguously determine if x is even of odd? I don't know of any good method. > Yes, I know I can do > > float x; > int32_t ix; > ix = (int32_t)x; > > and then test (ix & 1). But, this does not generalize to This isn't bit twiddling, and is also slow. > the case of long double on a ld128 architecture. That is, > if I have 1 <= x < 1xp112, then I would need to have > > long double x; > int128_t ix; > ix = (int128_t)x; > > and AFAICT sparc64 doesn't have an int128_t. If sparc64 has this, it would be even slower. Sparc64 emulates all 128-bit FP. This makes 128-bit sparc64 ~10-100 times slower than 64-bit sparc64 FP, and 100-1000 times slower than modern x86 64 and 8-bit FP. FP to integer conversions tend to be slower than pure FP, and are especially tricky for integers with more bits than FP mantissas. Consider bit twiddling to classifiy oddness of 2.0F (0x40000000 in bits) and 3.0F (0x40400000 in bits). The following method seems to be not so bad. Calculates that the unbiased exponent for 3.0F is 1. This means that the 1's bit is (1 << (23 - 1)) = 0x00400000 where we see it for 3.0F but not for 2.0F. All bits to the right of this must be 0 for the value to be an integer. I don't know how you classified integers efficiently without already determining the position of the "point" and looking at these bits. There are complications for powers of 2 and the normalization bit being implicit. Bruce From owner-freebsd-numerics@freebsd.org Thu Mar 9 07:52:37 2017 Return-Path: Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id DA5C6D041B4 for ; Thu, 9 Mar 2017 07:52:37 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu [128.95.76.21]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "troutmask", Issuer "troutmask" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id B54321594 for ; Thu, 9 Mar 2017 07:52:37 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (localhost [127.0.0.1]) by troutmask.apl.washington.edu (8.15.2/8.15.2) with ESMTPS id v297qawX030230 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Wed, 8 Mar 2017 23:52:36 -0800 (PST) (envelope-from sgk@troutmask.apl.washington.edu) Received: (from sgk@localhost) by troutmask.apl.washington.edu (8.15.2/8.15.2/Submit) id v297qaxt030229; Wed, 8 Mar 2017 23:52:36 -0800 (PST) (envelope-from sgk) Date: Wed, 8 Mar 2017 23:52:36 -0800 From: Steve Kargl To: Bruce Evans Cc: freebsd-numerics@freebsd.org Subject: Re: Bit twiddling question Message-ID: <20170309075236.GA30199@troutmask.apl.washington.edu> Reply-To: sgk@troutmask.apl.washington.edu References: <20170308202417.GA23103@troutmask.apl.washington.edu> <20170309173152.F1269@besplex.bde.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170309173152.F1269@besplex.bde.org> User-Agent: Mutt/1.7.2 (2016-11-26) X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 09 Mar 2017 07:52:37 -0000 On Thu, Mar 09, 2017 at 05:58:52PM +1100, Bruce Evans wrote: > On Wed, 8 Mar 2017, Steve Kargl wrote: > > > Suppose I have a float 'x' that I know is in the > > range 1 <= x <= 0x1p23 and I know that 'x' is > > integral, e.g., x = 12.000. If I use GET_FLOAT_WORD > > from math_private.h, then x=12.000 maps to ix=0x41400000. > > Is there a bit twiddling method that I can apply to ix to > > unambiguously determine if x is even of odd? > > I don't know of any good method. Spent a day or so searching; hence, my question. > > Yes, I know I can do > > > > float x; > > int32_t ix; > > ix = (int32_t)x; > > > > and then test (ix & 1). But, this does not generalize to > > This isn't bit twiddling, and is also slow. Yes, I know it isn't bit twiddling, but it achieves want I need. I was hoping that if a bit twiddling algorithm was available (and was faster), then I could change my algorithm. > > the case of long double on a ld128 architecture. That is, > > if I have 1 <= x < 1xp112, then I would need to have > > > > long double x; > > int128_t ix; > > ix = (int128_t)x; > > > > and AFAICT sparc64 doesn't have an int128_t. > > If sparc64 has this, it would be even slower. Sparc64 emulates > all 128-bit FP. This makes 128-bit sparc64 ~10-100 times slower > than 64-bit sparc64 FP, and 100-1000 times slower than modern > x86 64 and 8-bit FP. FP to integer conversions tend to be slower > than pure FP, and are especially tricky for integers with more > bits than FP mantissas. > > Consider bit twiddling to classifiy oddness of 2.0F (0x40000000 in > bits) and 3.0F (0x40400000 in bits). The following method seems > to be not so bad. Calculates that the unbiased exponent for 3.0F > is 1. This means that the 1's bit is (1 << (23 - 1)) = 0x00400000 > where we see it for 3.0F but not for 2.0F. All bits to the right > of this must be 0 for the value to be an integer. I don't know > how you classified integers efficiently without already determining > the position of the "point" and looking at these bits. There are > complications for powers of 2 and the normalization bit being implicit. Thanks. I'll look into the above to see if I can up with something that does not require the cast. To give a hint at what I have been working on, I have implementations for sinpif(x), sinpi(x), cospif(x), and cospi(x). For sinpif(x) and cospif(x) I have kernels that give correctly rounded results for FLT_MIN <= x < 0x1p23 (at least on x86_64). I also have slightly faster kernels that give max ULPs of about 0.5008. For sinpi(x) and cospi(x), my kernels currently give about 1.25 ULP for the worse case error if I accumulate the polynomials in double precision. If I accumulate the polynomials in long double precision, then I get correctly rounded results. To complete the set, I was hoping to work out ld80 and ld128 versions. `ld128 is going to be problematic due to the absense of int128_t. I'll send you what I have in a few days. -- Steve 20161221 https://www.youtube.com/watch?v=IbCHE-hONow From owner-freebsd-numerics@freebsd.org Thu Mar 9 15:23:09 2017 Return-Path: Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 89BD1D048BB for ; Thu, 9 Mar 2017 15:23:09 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu [128.95.76.21]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "troutmask", Issuer "troutmask" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 653421532 for ; Thu, 9 Mar 2017 15:23:09 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (localhost [127.0.0.1]) by troutmask.apl.washington.edu (8.15.2/8.15.2) with ESMTPS id v29FN8AJ033321 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Thu, 9 Mar 2017 07:23:08 -0800 (PST) (envelope-from sgk@troutmask.apl.washington.edu) Received: (from sgk@localhost) by troutmask.apl.washington.edu (8.15.2/8.15.2/Submit) id v29FN7MU033320; Thu, 9 Mar 2017 07:23:07 -0800 (PST) (envelope-from sgk) Date: Thu, 9 Mar 2017 07:23:07 -0800 From: Steve Kargl To: Bruce Evans Cc: freebsd-numerics@freebsd.org Subject: Re: Bit twiddling question Message-ID: <20170309152307.GA32781@troutmask.apl.washington.edu> Reply-To: sgk@troutmask.apl.washington.edu References: <20170308202417.GA23103@troutmask.apl.washington.edu> <20170309173152.F1269@besplex.bde.org> <20170309075236.GA30199@troutmask.apl.washington.edu> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170309075236.GA30199@troutmask.apl.washington.edu> User-Agent: Mutt/1.7.2 (2016-11-26) X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 09 Mar 2017 15:23:09 -0000 On Wed, Mar 08, 2017 at 11:52:36PM -0800, Steve Kargl wrote: > To give a hint at what I have been working on, I have implementations > for sinpif(x), sinpi(x), cospif(x), and cospi(x). For sinpif(x) and > cospif(x) I have kernels that give correctly rounded results for > FLT_MIN <= x < 0x1p23 (at least on x86_64). I also have slightly > faster kernels that give max ULPs of about 0.5008. For sinpi(x) and > cospi(x), my kernels currently give about 1.25 ULP for the worse case > error if I accumulate the polynomials in double precision. If I > accumulate the polynomials in long double precision, then I get > correctly rounded results. To complete the set, I was hoping to > work out ld80 and ld128 versions. `ld128 is going to be problematic > due to the absense of int128_t. > > I'll send you what I have in a few days. It is sometime amazing what happens when I sleep. For sinpi[fl](x) and the method I came up with one needs to do argument reduction where x = n + r. Here n is an integer and 0 < r < 1. If n is even then one is in the positive half-cyle of sin(x) and if n is odd then one has the negative half-cycle. For sinpif(x) this looks like if (ix < 0x4b000000) { /* 1 <= |x| < 0x1p23 */ /* 1 */ ix = (uint32_t)ax; /* 2 */ ax = ax == ix ? zero : __compute_sinpif(ax - ix); /* 3 */ if (ix & 1) ax = -ax; /* 4 */ return ((hx & 0x80000000) ? -ax : ax); } Line 1 determines n. Line 2 computes either sinpi(n) exactly or sinpi(r). Line 3 uses n odd to set the sign for the half-cycle. Line 4 is used to set the sign from sin(-x) = -sin(x). For double and ld80 one can use int64_t instead of uint32_t. Last night I realized that I don't need to use int64_t, and more importantly I don't need int128_t. The argument reduction can use uint32_t everywhere. For double, I currently have if (ix < 0x43300000) { /* 1 <= |x| < 0x1p52 */ int64_t n; n = (int64_t)ax; ax = ax == n ? zero : __compute_sinpi(ax - n); if (n & 1) ax = -ax; return ((hx & 0x80000000) ? -ax : ax); } which can be written (off the top-of-head and not tested) if (ix < 0x43300000) { /* 1 <= |x| < 0x1p52 */ double volatile vx; uint32_t n; vx = ax + 0x1p52; vx = vx - 0x1p52; if (vx == ax) return(0); if (vx > UINT32_MAX) { /* ax = m + n + r with m + n */ vx -= UINT32_MAX; /* m = UINT32_MAX. n is in range */ ax -= UINT32_MAX; } n = (uint32_t)vx; ax = __compute_sinpi(ax - n); if (n & 1) ax = -ax; return ((hx & 0x80000000) ? -ax : ax); } Something similar can be applied to ld128, but reduction may take two rounds (ie., a comparison with UINT64_MAX and then UINT32_MAX). -- Steve 20161221 https://www.youtube.com/watch?v=IbCHE-hONow From owner-freebsd-numerics@freebsd.org Thu Mar 9 15:35:25 2017 Return-Path: Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 20036D05175 for ; Thu, 9 Mar 2017 15:35:25 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu [128.95.76.21]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "troutmask", Issuer "troutmask" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 0673636E for ; Thu, 9 Mar 2017 15:35:25 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (localhost [127.0.0.1]) by troutmask.apl.washington.edu (8.15.2/8.15.2) with ESMTPS id v29FZO8g033424 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Thu, 9 Mar 2017 07:35:24 -0800 (PST) (envelope-from sgk@troutmask.apl.washington.edu) Received: (from sgk@localhost) by troutmask.apl.washington.edu (8.15.2/8.15.2/Submit) id v29FZNpC033420; Thu, 9 Mar 2017 07:35:23 -0800 (PST) (envelope-from sgk) Date: Thu, 9 Mar 2017 07:35:23 -0800 From: Steve Kargl To: Bruce Evans Cc: freebsd-numerics@freebsd.org Subject: Re: Bit twiddling question Message-ID: <20170309153523.GA33379@troutmask.apl.washington.edu> Reply-To: sgk@troutmask.apl.washington.edu References: <20170308202417.GA23103@troutmask.apl.washington.edu> <20170309173152.F1269@besplex.bde.org> <20170309075236.GA30199@troutmask.apl.washington.edu> <20170309152307.GA32781@troutmask.apl.washington.edu> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170309152307.GA32781@troutmask.apl.washington.edu> User-Agent: Mutt/1.7.2 (2016-11-26) X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 09 Mar 2017 15:35:25 -0000 On Thu, Mar 09, 2017 at 07:23:07AM -0800, Steve Kargl wrote: > > if (ix < 0x43300000) { /* 1 <= |x| < 0x1p52 */ > double volatile vx; > uint32_t n; > vx = ax + 0x1p52; > vx = vx - 0x1p52; > if (vx == ax) return(0); > if (vx > UINT32_MAX) { /* ax = m + n + r with m + n */ > vx -= UINT32_MAX; /* m = UINT32_MAX. n is in range */ > ax -= UINT32_MAX; > } > n = (uint32_t)vx; > ax = __compute_sinpi(ax - n); > if (n & 1) ax = -ax; > return ((hx & 0x80000000) ? -ax : ax); > } > We don't even need to use UINT32_MAX. We can do something like if (vx > 0x1p30) { vx -= 0x1p30; ax -= 0x1p30; } so there is no conversion from uint32_t to double. We also probably want to minimize access to a volatile, so we would have y = vx - 0x1p52 and use y afterwards. -- Steve 20161221 https://www.youtube.com/watch?v=IbCHE-hONow From owner-freebsd-numerics@freebsd.org Thu Mar 9 16:07:52 2017 Return-Path: Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 00EC1D05FE6 for ; Thu, 9 Mar 2017 16:07:51 +0000 (UTC) (envelope-from dennis.hamilton@acm.org) Received: from nov-007-i623.relay.mailchannels.net (nov-007-i623.relay.mailchannels.net [46.232.183.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 6C4851D4C for ; Thu, 9 Mar 2017 16:07:49 +0000 (UTC) (envelope-from dennis.hamilton@acm.org) X-Sender-Id: a2hosting|x-authuser|himself@orcmid.com Received: from relay.mailchannels.net (localhost [127.0.0.1]) by relay.mailchannels.net (Postfix) with ESMTP id CA4B9120959; Thu, 9 Mar 2017 16:07:39 +0000 (UTC) Received: from a2s42.a2hosting.com (unknown [100.96.135.39]) by relay.mailchannels.net (Postfix) with ESMTPA id 2C4CB12458D; Thu, 9 Mar 2017 16:07:39 +0000 (UTC) X-Sender-Id: a2hosting|x-authuser|himself@orcmid.com Received: from a2s42.a2hosting.com (a2s42.a2hosting.com [172.20.61.168]) (using TLSv1 with cipher DHE-RSA-AES256-SHA) by 0.0.0.0:2500 (trex/5.7.32); Thu, 09 Mar 2017 16:07:39 +0000 X-MC-Relay: Neutral X-MailChannels-SenderId: a2hosting|x-authuser|himself@orcmid.com X-MailChannels-Auth-Id: a2hosting X-Madly-Fearful: 530ad3b634e42515_1489075659565_2445266865 X-MC-Loop-Signature: 1489075659564:3093905383 X-MC-Ingress-Time: 1489075659564 Received: from 75-172-39-61.tukw.qwest.net ([75.172.39.61]:56458 helo=Astraendo2) by a2s42.a2hosting.com with esmtpsa (TLSv1:DHE-RSA-AES256-SHA:256) (Exim 4.87) (envelope-from ) id 1cm0bJ-003Rx9-HF; Thu, 09 Mar 2017 11:07:38 -0500 Reply-To: From: "Dennis E. Hamilton" To: Cc: "'Bruce Evans'" , References: <20170308202417.GA23103@troutmask.apl.washington.edu> <20170309173152.F1269@besplex.bde.org> <20170309075236.GA30199@troutmask.apl.washington.edu> In-Reply-To: <20170309075236.GA30199@troutmask.apl.washington.edu> Subject: RE: Bit twiddling question Date: Thu, 9 Mar 2017 08:07:37 -0800 Organization: NuovoDoc Message-ID: <000001d298ef$460c42a0$d224c7e0$@acm.org> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-Mailer: Microsoft Outlook 16.0 Thread-Index: AQI9VYS4o4iHSjhMB3OB5vywod5c4QGeZGK/Ad8U0Yqgmz9bkA== Content-Language: en-us X-AuthUser: himself@orcmid.com X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 09 Mar 2017 16:07:52 -0000 > -----Original Message----- > From: owner-freebsd-numerics@freebsd.org [mailto:owner-freebsd- > numerics@freebsd.org] On Behalf Of Steve Kargl > Sent: Wednesday, March 8, 2017 23:53 > To: Bruce Evans > Cc: freebsd-numerics@freebsd.org > Subject: Re: Bit twiddling question >=20 > On Thu, Mar 09, 2017 at 05:58:52PM +1100, Bruce Evans wrote: > > On Wed, 8 Mar 2017, Steve Kargl wrote: > > > > > Suppose I have a float 'x' that I know is in the > > > range 1 <=3D x <=3D 0x1p23 and I know that 'x' is > > > integral, e.g., x =3D 12.000. If I use GET_FLOAT_WORD > > > from math_private.h, then x=3D12.000 maps to ix=3D0x41400000. > > > Is there a bit twiddling method that I can apply to ix to > > > unambiguously determine if x is even of odd? > > > > I don't know of any good method. [orcmid]=20 If you are certain about the pre-conditions holding absolutely, figure = out the power of 2 *float* to add to your value, such that the binary = point is forced to be at the low-order edge of the float word. This = might be a bit faster than casting your float to int directly. =20 - Dennis [ ... ] From owner-freebsd-numerics@freebsd.org Thu Mar 9 16:48:07 2017 Return-Path: Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id EA4CBD05E57 for ; Thu, 9 Mar 2017 16:48:07 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu [128.95.76.21]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "troutmask", Issuer "troutmask" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id CE8BBE48 for ; Thu, 9 Mar 2017 16:48:07 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (localhost [127.0.0.1]) by troutmask.apl.washington.edu (8.15.2/8.15.2) with ESMTPS id v29GltjP034945 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Thu, 9 Mar 2017 08:47:55 -0800 (PST) (envelope-from sgk@troutmask.apl.washington.edu) Received: (from sgk@localhost) by troutmask.apl.washington.edu (8.15.2/8.15.2/Submit) id v29GltIP034944; Thu, 9 Mar 2017 08:47:55 -0800 (PST) (envelope-from sgk) Date: Thu, 9 Mar 2017 08:47:55 -0800 From: Steve Kargl To: "Dennis E. Hamilton" Cc: freebsd-numerics@freebsd.org, "'Bruce Evans'" Subject: Re: Bit twiddling question Message-ID: <20170309164755.GA34821@troutmask.apl.washington.edu> Reply-To: sgk@troutmask.apl.washington.edu References: <20170308202417.GA23103@troutmask.apl.washington.edu> <20170309173152.F1269@besplex.bde.org> <20170309075236.GA30199@troutmask.apl.washington.edu> <000001d298ef$460c42a0$d224c7e0$@acm.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <000001d298ef$460c42a0$d224c7e0$@acm.org> User-Agent: Mutt/1.7.2 (2016-11-26) X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 09 Mar 2017 16:48:08 -0000 On Thu, Mar 09, 2017 at 08:07:37AM -0800, Dennis E. Hamilton wrote: > > > -----Original Message----- > > From: owner-freebsd-numerics@freebsd.org [mailto:owner-freebsd- > > numerics@freebsd.org] On Behalf Of Steve Kargl > > Sent: Wednesday, March 8, 2017 23:53 > > To: Bruce Evans > > Cc: freebsd-numerics@freebsd.org > > Subject: Re: Bit twiddling question > > > > On Thu, Mar 09, 2017 at 05:58:52PM +1100, Bruce Evans wrote: > > > On Wed, 8 Mar 2017, Steve Kargl wrote: > > > > > > > Suppose I have a float 'x' that I know is in the > > > > range 1 <= x <= 0x1p23 and I know that 'x' is > > > > integral, e.g., x = 12.000. If I use GET_FLOAT_WORD > > > > from math_private.h, then x=12.000 maps to ix=0x41400000. > > > > Is there a bit twiddling method that I can apply to ix to > > > > unambiguously determine if x is even of odd? > > > > > > I don't know of any good method. > > If you are certain about the pre-conditions holding absolutely, > figure out the power of 2 *float* to add to your value, such > that the binary point is forced to be at the low-order edge of > the float word. This might be a bit faster than casting your > float to int directly. Thanks, I'll look into doing the above. Yes, I'm certain of the conditions. See my other posts from earlier today. I have a way forward. Optimizations can come later. -- Steve 20161221 https://www.youtube.com/watch?v=IbCHE-hONow From owner-freebsd-numerics@freebsd.org Thu Mar 9 18:16:03 2017 Return-Path: Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id AC82CD05002 for ; Thu, 9 Mar 2017 18:16:03 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail107.syd.optusnet.com.au (mail107.syd.optusnet.com.au [211.29.132.53]) by mx1.freebsd.org (Postfix) with ESMTP id 59CC612EB for ; Thu, 9 Mar 2017 18:16:02 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from besplex.bde.org (c122-106-153-191.carlnfd1.nsw.optusnet.com.au [122.106.153.191]) by mail107.syd.optusnet.com.au (Postfix) with ESMTPS id 984BDD55AE7; Fri, 10 Mar 2017 05:02:14 +1100 (AEDT) Date: Fri, 10 Mar 2017 05:02:13 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Steve Kargl cc: Bruce Evans , freebsd-numerics@freebsd.org Subject: Re: Bit twiddling question In-Reply-To: <20170309152307.GA32781@troutmask.apl.washington.edu> Message-ID: <20170310025417.U3723@besplex.bde.org> References: <20170308202417.GA23103@troutmask.apl.washington.edu> <20170309173152.F1269@besplex.bde.org> <20170309075236.GA30199@troutmask.apl.washington.edu> <20170309152307.GA32781@troutmask.apl.washington.edu> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.2 cv=AYLBJzfG c=1 sm=1 tr=0 a=Tj3pCpwHnMupdyZSltBt7Q==:117 a=Tj3pCpwHnMupdyZSltBt7Q==:17 a=kj9zAlcOel0A:10 a=MW23G5SEgc8WnkJ-59oA:9 a=CjuIK1q_8ugA:10 X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 09 Mar 2017 18:16:03 -0000 On Thu, 9 Mar 2017, Steve Kargl wrote: > On Wed, Mar 08, 2017 at 11:52:36PM -0800, Steve Kargl wrote: >> To give a hint at what I have been working on, I have implementations >> for sinpif(x), sinpi(x), cospif(x), and cospi(x). For sinpif(x) and >> cospif(x) I have kernels that give correctly rounded results for >> FLT_MIN <= x < 0x1p23 (at least on x86_64). I also have slightly I didn't see the original mail. I'd prefer you to finish the long double versions of cosl(), sinl() and tanl() (maybe das is responsible for the latter). These are still too slow and minor bugs for at least sinl() on small x, partly by not copying the lower-precision versions enough. >> faster kernels that give max ULPs of about 0.5008. For sinpi(x) and Similar to for sinf() and cosf(). The maximum for those is 0.5009, using double precision calculations which I now know to be a mistake, and a polynomial of degree higher than necessary for good enough rounding in float precision and lower than necessary for perfect rounding in float precision. >> cospi(x), my kernels currently give about 1.25 ULP for the worse case >> error if I accumulate the polynomials in double precision. If I Not too good. I think the problem is that the coeffs for the lowest terms are no longer exactly representable. The linear function f(x) = a*x is harder to evaluate accurately than exp(x) = 1 + x + ..., unless a is exactly representable and also has few digits (like 1 or -0.5 for early coeffs of sin() and cos()). >> accumulate the polynomials in long double precision, then I get >> correctly rounded results. To complete the set, I was hoping to Don't do that. It is much slower, and not available starting with long double precision, or even starting with double precision on arches without real long doubles. It is also a mistake to use it for float precision like I did for trig functions and an uncommitted version of logf(). I managed to make the specialized logf() run 50% faster than the less specialized log() (also uncommitted), by using double precision and manually schedling for Athlon-XP, but modern CPUs and compilers handle the less sprecialized log() so well that the specialized logf() is only a couple of cycles faster. Similarly but less pronounced for trig functions. >> work out ld80 and ld128 versions. `ld128 is going to be problematic >> due to the absense of int128_t. >> >> I'll send you what I have in a few days. > > It is sometime amazing what happens when I sleep. For sinpi[fl](x) > and the method I came up with one needs to do argument reduction > where x = n + r. Here n is an integer and 0 < r < 1. If n is even > then one is in the positive half-cyle of sin(x) and if n is odd then > one has the negative half-cycle. For sinpif(x) this looks like > if (ix < 0x4b000000) { /* 1 <= |x| < 0x1p23 */ > /* 1 */ ix = (uint32_t)ax; > /* 2 */ ax = ax == ix ? zero : __compute_sinpif(ax - ix); > /* 3 */ if (ix & 1) ax = -ax; > /* 4 */ return ((hx & 0x80000000) ? -ax : ax); > } > > Line 1 determines n. Line 2 computes either sinpi(n) exactly or > sinpi(r). Line 3 uses n odd to set the sign for the half-cycle. > Line 4 is used to set the sign from sin(-x) = -sin(x). > > For double and ld80 one can use int64_t instead of uint32_t. Last > night I realized that I don't need to use int64_t, and more importantly > I don't need int128_t. The argument reduction can use uint32_t > everywhere. For double, I currently have > > if (ix < 0x43300000) { /* 1 <= |x| < 0x1p52 */ > int64_t n; > n = (int64_t)ax; > ax = ax == n ? zero : __compute_sinpi(ax - n); > if (n & 1) ax = -ax; > return ((hx & 0x80000000) ? -ax : ax); > } > > which can be written (off the top-of-head and not tested) > if (ix < 0x43300000) { /* 1 <= |x| < 0x1p52 */ > double volatile vx; > uint32_t n; > vx = ax + 0x1p52; > vx = vx - 0x1p52; > if (vx == ax) return(0); > if (vx > UINT32_MAX) { /* ax = m + n + r with m + n */ > vx -= UINT32_MAX; /* m = UINT32_MAX. n is in range */ > ax -= UINT32_MAX; > } > n = (uint32_t)vx; > ax = __compute_sinpi(ax - n); > if (n & 1) ax = -ax; > return ((hx & 0x80000000) ? -ax : ax); > } > > Something similar can be applied to ld128, but reduction may take > two rounds (ie., a comparison with UINT64_MAX and then UINT32_MAX). I don't like that much. The 0x1p52 magic is tricky and probably buggy. s_rint.c uses the same magic but needs many complications to avoid double rounding. Similar-looking magic for trig reduction in radians uses floating the magic value of 0x1.8p52 to bias rounding errors in a safe way, and arranges so that rounding errors don't matter then. Perhaps you can do the same here -- do a fixup of +-UINT32_MAX if necessary. This seem to be no harder than using rint(), round() or floor(). Except for floor(), we it is hard to control whether the rounded result is too high or too low. Using floor: prec_t n = floor(x); r = x - n; but we want to classify quadrants, and still don't have an n that will fit in an integer, so first scale by 2 (4 quadrants of size 1/2): y = x * 0.5 - floor(x * 0.5); /* 4 quads in [2n, 2n+2) -> [0, 1) */ y = 4 * y; /* -> [0, 4) */ int n = y; /* quadrant */ r = (y - n) / 4; /* offset in quadrant */ This is a bit slow, and too much like lgamma's method. frexp() can be used instead of floor() to slowly extract the fraction. I think it can be done using the 0x1p52 magic a bit like you say. s_rint.c already uses this magic, but has to work hard to handle errors in double rounding. s_floor.c uses it via bit fiddling. If we replace the slow floor(z) by the 0x1p52 calculation on z, then we get an approximation to floor(z) that is hopefully an integer (possibly an integer + 0.5) within 1 or 2 of the correct result. Adjust it until it is correct. Bruce From owner-freebsd-numerics@freebsd.org Thu Mar 9 19:51:35 2017 Return-Path: Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id D77DBD0470F for ; Thu, 9 Mar 2017 19:51:35 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu [128.95.76.21]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "troutmask", Issuer "troutmask" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id C16C9989 for ; Thu, 9 Mar 2017 19:51:35 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (localhost [127.0.0.1]) by troutmask.apl.washington.edu (8.15.2/8.15.2) with ESMTPS id v29JpYWr036829 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Thu, 9 Mar 2017 11:51:34 -0800 (PST) (envelope-from sgk@troutmask.apl.washington.edu) Received: (from sgk@localhost) by troutmask.apl.washington.edu (8.15.2/8.15.2/Submit) id v29JpYdT036828; Thu, 9 Mar 2017 11:51:34 -0800 (PST) (envelope-from sgk) Date: Thu, 9 Mar 2017 11:51:34 -0800 From: Steve Kargl To: Bruce Evans Cc: freebsd-numerics@freebsd.org Subject: Re: Bit twiddling question Message-ID: <20170309195134.GA36213@troutmask.apl.washington.edu> Reply-To: sgk@troutmask.apl.washington.edu References: <20170308202417.GA23103@troutmask.apl.washington.edu> <20170309173152.F1269@besplex.bde.org> <20170309075236.GA30199@troutmask.apl.washington.edu> <20170309152307.GA32781@troutmask.apl.washington.edu> <20170310025417.U3723@besplex.bde.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170310025417.U3723@besplex.bde.org> User-Agent: Mutt/1.7.2 (2016-11-26) X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 09 Mar 2017 19:51:35 -0000 On Fri, Mar 10, 2017 at 05:02:13AM +1100, Bruce Evans wrote: > On Thu, 9 Mar 2017, Steve Kargl wrote: > > > On Wed, Mar 08, 2017 at 11:52:36PM -0800, Steve Kargl wrote: > >> To give a hint at what I have been working on, I have implementations > >> for sinpif(x), sinpi(x), cospif(x), and cospi(x). For sinpif(x) and > >> cospif(x) I have kernels that give correctly rounded results for > >> FLT_MIN <= x < 0x1p23 (at least on x86_64). I also have slightly > > I didn't see the original mail. Odd. Maybe munched by a spam filter. > I'd prefer you to finish the long double versions of cosl(), sinl() and > tanl() (maybe das is responsible for the latter). These are still too > slow and minor bugs for at least sinl() on small x, partly by not copying > the lower-precision versions enough. sinl, cosl, and tanl are good enough for my purposes at the moment. tgammal and powl are higher on the list of things to fix. But, sinpi[fl] and company are high on the list because: (1) I can use those in my actual job; and (2), the desirable property that sinpi(x) = 0 whereas sin(pi*x) = 0.mumbleE-17 (or even E-16) for integer x. > >> faster kernels that give max ULPs of about 0.5008. For sinpi(x) and > > Similar to for sinf() and cosf(). The maximum for those is 0.5009, using > double precision calculations which I now know to be a mistake, and > a polynomial of degree higher than necessary for good enough rounding in > float precision and lower than necessary for perfect rounding in float > precision. I have three polynomials for __kernel_sinpif() over x in [0,1/4]. All are accumulated in double precision. With a 6th order polynomial (12 double FOP), I get a max ULP of 0.5. With a 5th order polynomial (10 double FOP), I get a max ULP of 0.50000006. With the 4th order polynomial (8 double FOP), the max ULP is 0.50007051. Note, in my Remes algorithm I used f(x) = sin(pi*x) / x. I still haven't found a way to round 20*53-bit coefficients to a desired precision, which here is 53 bits. (Well, I have thought about a recursive Remes algorithm, but it requires much more work/time that I can commit.) I also have three polynomials for __kernel_cospif() over the same interval. 6th order gives Max ULP of 0.5, the 5th order gives 0.50000089, and the 4th order gives 0.5008257. > >> cospi(x), my kernels currently give about 1.25 ULP for the worse case > >> error if I accumulate the polynomials in double precision. If I > > Not too good. I think the problem is that the coeffs for the lowest > terms are no longer exactly representable. The linear function f(x) = a*x > is harder to evaluate accurately than exp(x) = 1 + x + ..., unless a > is exactly representable and also has few digits (like 1 or -0.5 for > early coeffs of sin() and cos()). I agree. I'm still working on the double routines and starting to contemplate the ld80 and ld128 ones. > >> accumulate the polynomials in long double precision, then I get > >> correctly rounded results. To complete the set, I was hoping to > > Don't do that. It is much slower, and not available starting with > long double precision, or even starting with double precision on arches > without real long doubles. It is also a mistake to use it for float > precision like I did for trig functions and an uncommitted version of > logf(). I managed to make the specialized logf() run 50% faster than > the less specialized log() (also uncommitted), by using double precision > and manually schedling for Athlon-XP, but modern CPUs and compilers handle > the less sprecialized log() so well that the specialized logf() is only > a couple of cycles faster. Similarly but less pronounced for trig functions. Don't worry. I have no intention to use long double to accumulate the polynomials. I was just curious to see if the extra 11-bits would reduce the ulp from 1.25 to something below 1. I was surprised at how well those 11 bits worked. I also tried fma() in my Horner's rule representation of the polynomial, but that did not help. > >> work out ld80 and ld128 versions. `ld128 is going to be problematic > >> due to the absense of int128_t. > >> > >> I'll send you what I have in a few days. > > > > It is sometime amazing what happens when I sleep. For sinpi[fl](x) > > and the method I came up with one needs to do argument reduction > > where x = n + r. Here n is an integer and 0 < r < 1. If n is even > > then one is in the positive half-cyle of sin(x) and if n is odd then > > one has the negative half-cycle. For sinpif(x) this looks like > > if (ix < 0x4b000000) { /* 1 <= |x| < 0x1p23 */ > > /* 1 */ ix = (uint32_t)ax; > > /* 2 */ ax = ax == ix ? zero : __compute_sinpif(ax - ix); > > /* 3 */ if (ix & 1) ax = -ax; > > /* 4 */ return ((hx & 0x80000000) ? -ax : ax); > > } > > > > Line 1 determines n. Line 2 computes either sinpi(n) exactly or > > sinpi(r). Line 3 uses n odd to set the sign for the half-cycle. > > Line 4 is used to set the sign from sin(-x) = -sin(x). > > > > For double and ld80 one can use int64_t instead of uint32_t. Last > > night I realized that I don't need to use int64_t, and more importantly > > I don't need int128_t. The argument reduction can use uint32_t > > everywhere. For double, I currently have > > > > if (ix < 0x43300000) { /* 1 <= |x| < 0x1p52 */ > > int64_t n; > > n = (int64_t)ax; > > ax = ax == n ? zero : __compute_sinpi(ax - n); > > if (n & 1) ax = -ax; > > return ((hx & 0x80000000) ? -ax : ax); > > } > > > > which can be written (off the top-of-head and not tested) > > if (ix < 0x43300000) { /* 1 <= |x| < 0x1p52 */ > > double volatile vx; > > uint32_t n; > > vx = ax + 0x1p52; > > vx = vx - 0x1p52; > > if (vx == ax) return(0); > > if (vx > UINT32_MAX) { /* ax = m + n + r with m + n */ > > vx -= UINT32_MAX; /* m = UINT32_MAX. n is in range */ > > ax -= UINT32_MAX; > > } > > n = (uint32_t)vx; > > ax = __compute_sinpi(ax - n); > > if (n & 1) ax = -ax; > > return ((hx & 0x80000000) ? -ax : ax); > > } > > > > Something similar can be applied to ld128, but reduction may take > > two rounds (ie., a comparison with UINT64_MAX and then UINT32_MAX). > > I don't like that much. The 0x1p52 magic is tricky and probably buggy. > s_rint.c uses the same magic but needs many complications to avoid > double rounding. Yep. Seems to have some issue. From my working copy of s_sinpif(x), I tried #if 0 ix = (uint32_t)ax; a = ax == ix ? zero : __compute_sinpif(ax - ix); if (ix & 1) ax = -ax; return ((hx & 0x80000000) ? -ax : ax); #else volatile float vx; float y; vx = ax + 0x1p23F; y = vx - 0x1p23F; ix = (uint32_t)y; ax = ax == y ? zero : __compute_sinpif(ax - ix); if (ix & 1) ax = -ax; return ((hx & 0x80000000) ? -ax : ax); #endif My max ULP went from 0.5 to 0.50398505 for exhaustive testing in the interval [1,100]. If I grab the worse case x and use the '#if 1' code, I see ./testf -S -a 1.50121641e+00f a = 1.50121641e+00f, /* 0x3fc027dc */ sinpif(a) = -9.99992669e-01f, /* 0xbf7fff85 */ sin(pi*a) = -9.99992669e-01f, /* 0xbf7fff85 */ MPFR 96-bits rounded 24-bits. ULP: 0.49601495 The '#if 0' code gives ./testf -S -a 1.50121641e+00f a = 1.50121641e+00f, /* 0x3fc027dc */ sinpif(a) = -9.99992728e-01f, /* 0xbf7fff86 */ sin(pi*a) = -9.99992669e-01f, /* 0xbf7fff85 */ MPFR 96-bits rounded 24-bits. ULP: 0.50398505 > Similar-looking magic for trig reduction in radians > uses floating the magic value of 0x1.8p52 to bias rounding errors in > a safe way, and arranges so that rounding errors don't matter then. > Perhaps you can do the same here -- do a fixup of +-UINT32_MAX if > necessary. This seem to be no harder than using rint(), round() or > floor(). Except for floor(), we it is hard to control whether the > rounded result is too high or too low. Using floor: > > prec_t n = floor(x); > r = x - n; > > but we want to classify quadrants, and still don't have an n that will > fit in an integer, so first scale by 2 (4 quadrants of size 1/2): > > y = x * 0.5 - floor(x * 0.5); /* 4 quads in [2n, 2n+2) -> [0, 1) */ > y = 4 * y; /* -> [0, 4) */ > int n = y; /* quadrant */ > r = (y - n) / 4; /* offset in quadrant */ > > This is a bit slow, and too much like lgamma's method. I did look over what you did in lgamma[f], but forgot the logic behind the reduction into octants. I suppose that I'm more concern with 'make it work', 'make it correct', and then I will/might worry about 'make it fast'. But, I don't want it to be too slow, so I'm trying to avoid function calls and reclassification of x as +-inf or nan. Heck, r = remquof(ax, vx, &n) should work, but there's that speed issue. -- Steve 20161221 https://www.youtube.com/watch?v=IbCHE-hONow From owner-freebsd-numerics@freebsd.org Thu Mar 9 20:14:10 2017 Return-Path: Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id B3BB1D04CEE for ; Thu, 9 Mar 2017 20:14:10 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu [128.95.76.21]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "troutmask", Issuer "troutmask" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 7A53D1505 for ; Thu, 9 Mar 2017 20:14:10 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (localhost [127.0.0.1]) by troutmask.apl.washington.edu (8.15.2/8.15.2) with ESMTPS id v29KE9sg037304 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Thu, 9 Mar 2017 12:14:09 -0800 (PST) (envelope-from sgk@troutmask.apl.washington.edu) Received: (from sgk@localhost) by troutmask.apl.washington.edu (8.15.2/8.15.2/Submit) id v29KE9dt037303; Thu, 9 Mar 2017 12:14:09 -0800 (PST) (envelope-from sgk) Date: Thu, 9 Mar 2017 12:14:09 -0800 From: Steve Kargl To: Bruce Evans Cc: freebsd-numerics@freebsd.org Subject: Re: Bit twiddling question Message-ID: <20170309201409.GA37219@troutmask.apl.washington.edu> Reply-To: sgk@troutmask.apl.washington.edu References: <20170308202417.GA23103@troutmask.apl.washington.edu> <20170309173152.F1269@besplex.bde.org> <20170309075236.GA30199@troutmask.apl.washington.edu> <20170309152307.GA32781@troutmask.apl.washington.edu> <20170310025417.U3723@besplex.bde.org> <20170309195134.GA36213@troutmask.apl.washington.edu> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170309195134.GA36213@troutmask.apl.washington.edu> User-Agent: Mutt/1.7.2 (2016-11-26) X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 09 Mar 2017 20:14:10 -0000 On Thu, Mar 09, 2017 at 11:51:34AM -0800, Steve Kargl wrote: > On Fri, Mar 10, 2017 at 05:02:13AM +1100, Bruce Evans wrote: > > > > > > Something similar can be applied to ld128, but reduction may take > > > two rounds (ie., a comparison with UINT64_MAX and then UINT32_MAX). > > > > I don't like that much. The 0x1p52 magic is tricky and probably buggy. > > s_rint.c uses the same magic but needs many complications to avoid > > double rounding. > > Yep. Seems to have some issue. From my working copy of s_sinpif(x), > I tried > > #if 0 > ix = (uint32_t)ax; > a = ax == ix ? zero : __compute_sinpif(ax - ix); > if (ix & 1) ax = -ax; > return ((hx & 0x80000000) ? -ax : ax); > #else > volatile float vx; > float y; > vx = ax + 0x1p23F; > y = vx - 0x1p23F; > ix = (uint32_t)y; > ax = ax == y ? zero : __compute_sinpif(ax - ix); > if (ix & 1) ax = -ax; > return ((hx & 0x80000000) ? -ax : ax); > #endif > > My max ULP went from 0.5 to 0.50398505 for exhaustive testing > in the interval [1,100]. If I grab the worse case x and use the > '#if 1' code, I see > > ./testf -S -a 1.50121641e+00f > a = 1.50121641e+00f, /* 0x3fc027dc */ > sinpif(a) = -9.99992669e-01f, /* 0xbf7fff85 */ > sin(pi*a) = -9.99992669e-01f, /* 0xbf7fff85 */ MPFR 96-bits rounded 24-bits. > ULP: 0.49601495 > > The '#if 0' code gives > > ./testf -S -a 1.50121641e+00f > a = 1.50121641e+00f, /* 0x3fc027dc */ > sinpif(a) = -9.99992728e-01f, /* 0xbf7fff86 */ > sin(pi*a) = -9.99992669e-01f, /* 0xbf7fff85 */ MPFR 96-bits rounded 24-bits. > ULP: 0.50398505 > Certainly looks like a double rounding issue. An instrumented copy of the code shows that y = 2 instead of 1 in the above. a = 1.50121641e+00f, /* 0x3fc027dc */ 1.50121641e+00 2.00000000e+00 sinpif(a) = -9.99992728e-01f, /* 0xbf7fff86 */ sin(pi*a) = -9.99992669e-01f, /* 0xbf7fff85 */ ULP: 0.50398505 -- Steve 20161221 https://www.youtube.com/watch?v=IbCHE-hONow From owner-freebsd-numerics@freebsd.org Fri Mar 10 06:46:43 2017 Return-Path: Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 8ACEFD05924 for ; Fri, 10 Mar 2017 06:46:43 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail106.syd.optusnet.com.au (mail106.syd.optusnet.com.au [211.29.132.42]) by mx1.freebsd.org (Postfix) with ESMTP id 380861C3A for ; Fri, 10 Mar 2017 06:46:42 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from besplex.bde.org (c122-106-153-191.carlnfd1.nsw.optusnet.com.au [122.106.153.191]) by mail106.syd.optusnet.com.au (Postfix) with ESMTPS id B191E3C1536; Fri, 10 Mar 2017 17:46:35 +1100 (AEDT) Date: Fri, 10 Mar 2017 17:46:34 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Steve Kargl cc: freebsd-numerics@freebsd.org Subject: Re: Bit twiddling question In-Reply-To: <20170309195134.GA36213@troutmask.apl.washington.edu> Message-ID: <20170310165638.D1143@besplex.bde.org> References: <20170308202417.GA23103@troutmask.apl.washington.edu> <20170309173152.F1269@besplex.bde.org> <20170309075236.GA30199@troutmask.apl.washington.edu> <20170309152307.GA32781@troutmask.apl.washington.edu> <20170310025417.U3723@besplex.bde.org> <20170309195134.GA36213@troutmask.apl.washington.edu> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.2 cv=VbSHBBh9 c=1 sm=1 tr=0 a=Tj3pCpwHnMupdyZSltBt7Q==:117 a=Tj3pCpwHnMupdyZSltBt7Q==:17 a=kj9zAlcOel0A:10 a=V3hBaD5_8KBfC0JK-0QA:9 a=CjuIK1q_8ugA:10 X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 10 Mar 2017 06:46:43 -0000 On Thu, 9 Mar 2017, Steve Kargl wrote: > On Fri, Mar 10, 2017 at 05:02:13AM +1100, Bruce Evans wrote: >> On Thu, 9 Mar 2017, Steve Kargl wrote: >> >>> On Wed, Mar 08, 2017 at 11:52:36PM -0800, Steve Kargl wrote: >>>> To give a hint at what I have been working on, I have implementations >>>> for sinpif(x), sinpi(x), cospif(x), and cospi(x). For sinpif(x) and >>>> cospif(x) I have kernels that give correctly rounded results for >>>> FLT_MIN <= x < 0x1p23 (at least on x86_64). I also have slightly >> >> I didn't see the original mail. > > Odd. Maybe munched by a spam filter. It was delivered out of order. >> I'd prefer you to finish the long double versions of cosl(), sinl() and >> tanl() (maybe das is responsible for the latter). These are still too >> slow and minor bugs for at least sinl() on small x, partly by not copying >> the lower-precision versions enough. > > sinl, cosl, and tanl are good enough for my purposes at the moment. > tgammal and powl are higher on the list of things to fix. But, Only powl is very important, but everyone avoids it because it is harder. >* ... >>> which can be written (off the top-of-head and not tested) >>> if (ix < 0x43300000) { /* 1 <= |x| < 0x1p52 */ >>> double volatile vx; >>> uint32_t n; >>> vx = ax + 0x1p52; >>> vx = vx - 0x1p52; >>> if (vx == ax) return(0); >>> if (vx > UINT32_MAX) { /* ax = m + n + r with m + n */ >>> vx -= UINT32_MAX; /* m = UINT32_MAX. n is in range */ >>> ax -= UINT32_MAX; >>> } >>> n = (uint32_t)vx; >>> ax = __compute_sinpi(ax - n); >>> if (n & 1) ax = -ax; >>> return ((hx & 0x80000000) ? -ax : ax); >>> } >>> >>> Something similar can be applied to ld128, but reduction may take >>> two rounds (ie., a comparison with UINT64_MAX and then UINT32_MAX). >> >> I don't like that much. The 0x1p52 magic is tricky and probably buggy. >> s_rint.c uses the same magic but needs many complications to avoid >> double rounding. > > Yep. Seems to have some issue. From my working copy of s_sinpif(x), > I tried > ... > My max ULP went from 0.5 to 0.50398505 for exhaustive testing > in the interval [1,100]. If I grab the worse case x and use the > '#if 1' code, I see > >> Similar-looking magic for trig reduction in radians >> uses floating the magic value of 0x1.8p52 to bias rounding errors in >> a safe way, and arranges so that rounding errors don't matter then. >> .... >> y = x * 0.5 - floor(x * 0.5); /* 4 quads in [2n, 2n+2) -> [0, 1) */ >> y = 4 * y; /* -> [0, 4) */ >> int n = y; /* quadrant */ >> r = (y - n) / 4; /* offset in quadrant */ >> >> This is a bit slow, and too much like lgamma's method. > > I did look over what you did in lgamma[f], but forgot the logic > behind the reduction into octants. I suppose that I'm more > concern with 'make it work', 'make it correct', and then I > will/might worry about 'make it fast'. But, I don't want it to be > too slow, so I'm trying to avoid function calls and reclassification > of x as +-inf or nan. Heck, r = remquof(ax, vx, &n) should work, > but there's that speed issue. In fact, we already did the work to optimize this using the 0x1p52 magic for lgamma, and even committed it. fdlibm lgamma used a messy combination of multiplications by 0.5 and 2.0, 2 floor()s, 1 half of the 0x1p52 magic and bit fiddling to do the other half of the 0x1p52 magic. You simplified this, and we eventually optimized it to use no floors (but the equivalent of 2 rint()s on |x| and 4*|x| implemented using full 0x1p52 magic) and different bit fiddling. Now I fear that the 0x1p52 doesn't work in double rounding cases. Hopefully it works like it does for trig functions -- at worst there is an off-by 1 error in the quadrant but here is a compensating tiny error in the offset. E.g., the closest value to pi/4 might be represented as octant 0 with offset pi/4 + epsilon or octant 1 with offset epsilon (the lowest level must reduce to octants, not quadrants). Then the poly much just approximate a little above pi/4 in case epsilon > 0. Higher levels also see the off-by-1 error but everything is consistent. This explains your larger error -- the accuracy of the poly breaks down not far above pi/4, and even epsilon above pi/4 there may be a near-halfway case that is relatively inaccurate. I think the reduction for lgamma can be further improved by reducing directly to octants. sin_pi() in lgamma differs from a perfect general sin_pi() only in the following respects: - the general function has to handle large args, most are even integers so sin_pi() on them is 0. Some are odd integers. The 0x1p52 magic breaks down at about the same point as where all args are integers. - the perfect general function has to use special polys and special poly evaluation methods since pi*x is too inaccurate without extra precision. Reduction for sind() is interesting. sind(x) = sin_pi(x/180) in infinite precision, but the division is hard to do accurately and efficiently. So sind(x) is only easy for small x. C11 doesn't have sincos*() or sind*() to make us more incomplete. Bruce From owner-freebsd-numerics@freebsd.org Fri Mar 10 07:21:43 2017 Return-Path: Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id A0B2CD021AC for ; Fri, 10 Mar 2017 07:21:43 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu [128.95.76.21]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "troutmask", Issuer "troutmask" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 896F61A36 for ; Fri, 10 Mar 2017 07:21:43 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (localhost [127.0.0.1]) by troutmask.apl.washington.edu (8.15.2/8.15.2) with ESMTPS id v2A7LbaU043765 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Thu, 9 Mar 2017 23:21:37 -0800 (PST) (envelope-from sgk@troutmask.apl.washington.edu) Received: (from sgk@localhost) by troutmask.apl.washington.edu (8.15.2/8.15.2/Submit) id v2A7LaDR043764; Thu, 9 Mar 2017 23:21:36 -0800 (PST) (envelope-from sgk) Date: Thu, 9 Mar 2017 23:21:36 -0800 From: Steve Kargl To: Bruce Evans Cc: freebsd-numerics@freebsd.org Subject: Re: Bit twiddling question Message-ID: <20170310072136.GA43725@troutmask.apl.washington.edu> Reply-To: sgk@troutmask.apl.washington.edu References: <20170308202417.GA23103@troutmask.apl.washington.edu> <20170309173152.F1269@besplex.bde.org> <20170309075236.GA30199@troutmask.apl.washington.edu> <20170309152307.GA32781@troutmask.apl.washington.edu> <20170310025417.U3723@besplex.bde.org> <20170309195134.GA36213@troutmask.apl.washington.edu> <20170310165638.D1143@besplex.bde.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170310165638.D1143@besplex.bde.org> User-Agent: Mutt/1.7.2 (2016-11-26) X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 10 Mar 2017 07:21:43 -0000 On Fri, Mar 10, 2017 at 05:46:34PM +1100, Bruce Evans wrote: > On Thu, 9 Mar 2017, Steve Kargl wrote: > > > On Fri, Mar 10, 2017 at 05:02:13AM +1100, Bruce Evans wrote: > >> On Thu, 9 Mar 2017, Steve Kargl wrote: > >> > >>> On Wed, Mar 08, 2017 at 11:52:36PM -0800, Steve Kargl wrote: > >>>> To give a hint at what I have been working on, I have implementations > >>>> for sinpif(x), sinpi(x), cospif(x), and cospi(x). For sinpif(x) and > >>>> cospif(x) I have kernels that give correctly rounded results for > >>>> FLT_MIN <= x < 0x1p23 (at least on x86_64). I also have slightly > >> > >> I didn't see the original mail. > > > > Odd. Maybe munched by a spam filter. > > It was delivered out of order. > > >> I'd prefer you to finish the long double versions of cosl(), sinl() and > >> tanl() (maybe das is responsible for the latter). These are still too > >> slow and minor bugs for at least sinl() on small x, partly by not copying > >> the lower-precision versions enough. > > > > sinl, cosl, and tanl are good enough for my purposes at the moment. > > tgammal and powl are higher on the list of things to fix. But, > > Only powl is very important, but everyone avoids it because it is harder. I have a partially written powl in my tree. I'll probably finish it someday. > > >* ... > >>> which can be written (off the top-of-head and not tested) > >>> if (ix < 0x43300000) { /* 1 <= |x| < 0x1p52 */ > >>> double volatile vx; > >>> uint32_t n; > >>> vx = ax + 0x1p52; > >>> vx = vx - 0x1p52; > >>> if (vx == ax) return(0); > >>> if (vx > UINT32_MAX) { /* ax = m + n + r with m + n */ > >>> vx -= UINT32_MAX; /* m = UINT32_MAX. n is in range */ > >>> ax -= UINT32_MAX; > >>> } > >>> n = (uint32_t)vx; > >>> ax = __compute_sinpi(ax - n); > >>> if (n & 1) ax = -ax; > >>> return ((hx & 0x80000000) ? -ax : ax); > >>> } > >>> > >>> Something similar can be applied to ld128, but reduction may take > >>> two rounds (ie., a comparison with UINT64_MAX and then UINT32_MAX). > >> > >> I don't like that much. The 0x1p52 magic is tricky and probably buggy. > >> s_rint.c uses the same magic but needs many complications to avoid > >> double rounding. > > > > Yep. Seems to have some issue. From my working copy of s_sinpif(x), > > I tried > > ... > > My max ULP went from 0.5 to 0.50398505 for exhaustive testing > > in the interval [1,100]. If I grab the worse case x and use the > > '#if 1' code, I see > > > >> Similar-looking magic for trig reduction in radians > >> uses floating the magic value of 0x1.8p52 to bias rounding errors in > >> a safe way, and arranges so that rounding errors don't matter then. > >> .... > >> y = x * 0.5 - floor(x * 0.5); /* 4 quads in [2n, 2n+2) -> [0, 1) */ > >> y = 4 * y; /* -> [0, 4) */ > >> int n = y; /* quadrant */ > >> r = (y - n) / 4; /* offset in quadrant */ > >> > >> This is a bit slow, and too much like lgamma's method. > > > > I did look over what you did in lgamma[f], but forgot the logic > > behind the reduction into octants. I suppose that I'm more > > concern with 'make it work', 'make it correct', and then I > > will/might worry about 'make it fast'. But, I don't want it to be > > too slow, so I'm trying to avoid function calls and reclassification > > of x as +-inf or nan. Heck, r = remquof(ax, vx, &n) should work, > > but there's that speed issue. > > In fact, we already did the work to optimize this using the 0x1p52 > magic for lgamma, and even committed it. > > fdlibm lgamma used a messy combination of multiplications by 0.5 and > 2.0, 2 floor()s, 1 half of the 0x1p52 magic and bit fiddling to do > the other half of the 0x1p52 magic. You simplified this, and we > eventually optimized it to use no floors (but the equivalent of > 2 rint()s on |x| and 4*|x| implemented using full 0x1p52 magic) and > different bit fiddling. Now I fear that the 0x1p52 doesn't work > in double rounding cases. At least, in my sinpif() the double rounding appears near x = n.5 where I get n+1 instead of the intended n. I'll need to recheck lgammaf, but I recall that we are working with |x|. This means we can test for n < |x|. If we get n+1 then double rounding occurred. > Hopefully it works like it does for trig > functions -- at worst there is an off-by 1 error in the quadrant > but here is a compensating tiny error in the offset. E.g., the > closest value to pi/4 might be represented as octant 0 with offset > pi/4 + epsilon or octant 1 with offset epsilon (the lowest level > must reduce to octants, not quadrants). Then the poly much just > approximate a little above pi/4 in case epsilon > 0. Higher levels > also see the off-by-1 error but everything is consistent. > > This explains your larger error -- the accuracy of the poly breaks > down not far above pi/4, and even epsilon above pi/4 there may be > a near-halfway case that is relatively inaccurate. > Well, I stared at s_floorf.c and s_floor.c today. For sinpi, I'm interested in 1 <= x < 0x1p52 (or 23 or 63 or 112 for other precisions), which significantly reduces the needed bit twiddling. For double, I need to do if (ix < 0x43300000) { /* 1 <= |x| < 0x1p52 */ /* Reduced from s_floor.c from here ... */ j0 = ((ix >> 20) & 0x7ff) - 0x3ff; if (j0 < 20) { ix &= ~(0x000fffff >> j0); lx = 0; } else { i = ((uint32_t)(0xffffffff)) >> (j0 - 20); lx &= ~i; } INSERT_WORDS(x,ix,lx); /* ... to here. */ ax -= x; /* r = |x| - floor(|n|) */ if (ax == 0) { ax = zero; /* Make sure sign of zero is +. */ } else { ax = __compute_sinpi(ax); /* Here I determine if integer part of ax is even or odd. */ if (x > 0x1p30) x -= 0x1p30; lx = (uint32_t)x; if (lx & 1) ax = -ax; } return ((hx & 0x80000000) ? -ax : ax); } -- Steve 20161221 https://www.youtube.com/watch?v=IbCHE-hONow