From owner-freebsd-numerics@freebsd.org Thu Mar 9 15:23:09 2017 Return-Path: Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 89BD1D048BB for ; Thu, 9 Mar 2017 15:23:09 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu [128.95.76.21]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "troutmask", Issuer "troutmask" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 653421532 for ; Thu, 9 Mar 2017 15:23:09 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (localhost [127.0.0.1]) by troutmask.apl.washington.edu (8.15.2/8.15.2) with ESMTPS id v29FN8AJ033321 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Thu, 9 Mar 2017 07:23:08 -0800 (PST) (envelope-from sgk@troutmask.apl.washington.edu) Received: (from sgk@localhost) by troutmask.apl.washington.edu (8.15.2/8.15.2/Submit) id v29FN7MU033320; Thu, 9 Mar 2017 07:23:07 -0800 (PST) (envelope-from sgk) Date: Thu, 9 Mar 2017 07:23:07 -0800 From: Steve Kargl To: Bruce Evans Cc: freebsd-numerics@freebsd.org Subject: Re: Bit twiddling question Message-ID: <20170309152307.GA32781@troutmask.apl.washington.edu> Reply-To: sgk@troutmask.apl.washington.edu References: <20170308202417.GA23103@troutmask.apl.washington.edu> <20170309173152.F1269@besplex.bde.org> <20170309075236.GA30199@troutmask.apl.washington.edu> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170309075236.GA30199@troutmask.apl.washington.edu> User-Agent: Mutt/1.7.2 (2016-11-26) X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 09 Mar 2017 15:23:09 -0000 On Wed, Mar 08, 2017 at 11:52:36PM -0800, Steve Kargl wrote: > To give a hint at what I have been working on, I have implementations > for sinpif(x), sinpi(x), cospif(x), and cospi(x). For sinpif(x) and > cospif(x) I have kernels that give correctly rounded results for > FLT_MIN <= x < 0x1p23 (at least on x86_64). I also have slightly > faster kernels that give max ULPs of about 0.5008. For sinpi(x) and > cospi(x), my kernels currently give about 1.25 ULP for the worse case > error if I accumulate the polynomials in double precision. If I > accumulate the polynomials in long double precision, then I get > correctly rounded results. To complete the set, I was hoping to > work out ld80 and ld128 versions. `ld128 is going to be problematic > due to the absense of int128_t. > > I'll send you what I have in a few days. It is sometime amazing what happens when I sleep. For sinpi[fl](x) and the method I came up with one needs to do argument reduction where x = n + r. Here n is an integer and 0 < r < 1. If n is even then one is in the positive half-cyle of sin(x) and if n is odd then one has the negative half-cycle. For sinpif(x) this looks like if (ix < 0x4b000000) { /* 1 <= |x| < 0x1p23 */ /* 1 */ ix = (uint32_t)ax; /* 2 */ ax = ax == ix ? zero : __compute_sinpif(ax - ix); /* 3 */ if (ix & 1) ax = -ax; /* 4 */ return ((hx & 0x80000000) ? -ax : ax); } Line 1 determines n. Line 2 computes either sinpi(n) exactly or sinpi(r). Line 3 uses n odd to set the sign for the half-cycle. Line 4 is used to set the sign from sin(-x) = -sin(x). For double and ld80 one can use int64_t instead of uint32_t. Last night I realized that I don't need to use int64_t, and more importantly I don't need int128_t. The argument reduction can use uint32_t everywhere. For double, I currently have if (ix < 0x43300000) { /* 1 <= |x| < 0x1p52 */ int64_t n; n = (int64_t)ax; ax = ax == n ? zero : __compute_sinpi(ax - n); if (n & 1) ax = -ax; return ((hx & 0x80000000) ? -ax : ax); } which can be written (off the top-of-head and not tested) if (ix < 0x43300000) { /* 1 <= |x| < 0x1p52 */ double volatile vx; uint32_t n; vx = ax + 0x1p52; vx = vx - 0x1p52; if (vx == ax) return(0); if (vx > UINT32_MAX) { /* ax = m + n + r with m + n */ vx -= UINT32_MAX; /* m = UINT32_MAX. n is in range */ ax -= UINT32_MAX; } n = (uint32_t)vx; ax = __compute_sinpi(ax - n); if (n & 1) ax = -ax; return ((hx & 0x80000000) ? -ax : ax); } Something similar can be applied to ld128, but reduction may take two rounds (ie., a comparison with UINT64_MAX and then UINT32_MAX). -- Steve 20161221 https://www.youtube.com/watch?v=IbCHE-hONow