From owner-freebsd-numerics@freebsd.org Fri Mar 10 07:21:43 2017 Return-Path: Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id A0B2CD021AC for ; Fri, 10 Mar 2017 07:21:43 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu [128.95.76.21]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "troutmask", Issuer "troutmask" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 896F61A36 for ; Fri, 10 Mar 2017 07:21:43 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (localhost [127.0.0.1]) by troutmask.apl.washington.edu (8.15.2/8.15.2) with ESMTPS id v2A7LbaU043765 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Thu, 9 Mar 2017 23:21:37 -0800 (PST) (envelope-from sgk@troutmask.apl.washington.edu) Received: (from sgk@localhost) by troutmask.apl.washington.edu (8.15.2/8.15.2/Submit) id v2A7LaDR043764; Thu, 9 Mar 2017 23:21:36 -0800 (PST) (envelope-from sgk) Date: Thu, 9 Mar 2017 23:21:36 -0800 From: Steve Kargl To: Bruce Evans Cc: freebsd-numerics@freebsd.org Subject: Re: Bit twiddling question Message-ID: <20170310072136.GA43725@troutmask.apl.washington.edu> Reply-To: sgk@troutmask.apl.washington.edu References: <20170308202417.GA23103@troutmask.apl.washington.edu> <20170309173152.F1269@besplex.bde.org> <20170309075236.GA30199@troutmask.apl.washington.edu> <20170309152307.GA32781@troutmask.apl.washington.edu> <20170310025417.U3723@besplex.bde.org> <20170309195134.GA36213@troutmask.apl.washington.edu> <20170310165638.D1143@besplex.bde.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170310165638.D1143@besplex.bde.org> User-Agent: Mutt/1.7.2 (2016-11-26) X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 10 Mar 2017 07:21:43 -0000 On Fri, Mar 10, 2017 at 05:46:34PM +1100, Bruce Evans wrote: > On Thu, 9 Mar 2017, Steve Kargl wrote: > > > On Fri, Mar 10, 2017 at 05:02:13AM +1100, Bruce Evans wrote: > >> On Thu, 9 Mar 2017, Steve Kargl wrote: > >> > >>> On Wed, Mar 08, 2017 at 11:52:36PM -0800, Steve Kargl wrote: > >>>> To give a hint at what I have been working on, I have implementations > >>>> for sinpif(x), sinpi(x), cospif(x), and cospi(x). For sinpif(x) and > >>>> cospif(x) I have kernels that give correctly rounded results for > >>>> FLT_MIN <= x < 0x1p23 (at least on x86_64). I also have slightly > >> > >> I didn't see the original mail. > > > > Odd. Maybe munched by a spam filter. > > It was delivered out of order. > > >> I'd prefer you to finish the long double versions of cosl(), sinl() and > >> tanl() (maybe das is responsible for the latter). These are still too > >> slow and minor bugs for at least sinl() on small x, partly by not copying > >> the lower-precision versions enough. > > > > sinl, cosl, and tanl are good enough for my purposes at the moment. > > tgammal and powl are higher on the list of things to fix. But, > > Only powl is very important, but everyone avoids it because it is harder. I have a partially written powl in my tree. I'll probably finish it someday. > > >* ... > >>> which can be written (off the top-of-head and not tested) > >>> if (ix < 0x43300000) { /* 1 <= |x| < 0x1p52 */ > >>> double volatile vx; > >>> uint32_t n; > >>> vx = ax + 0x1p52; > >>> vx = vx - 0x1p52; > >>> if (vx == ax) return(0); > >>> if (vx > UINT32_MAX) { /* ax = m + n + r with m + n */ > >>> vx -= UINT32_MAX; /* m = UINT32_MAX. n is in range */ > >>> ax -= UINT32_MAX; > >>> } > >>> n = (uint32_t)vx; > >>> ax = __compute_sinpi(ax - n); > >>> if (n & 1) ax = -ax; > >>> return ((hx & 0x80000000) ? -ax : ax); > >>> } > >>> > >>> Something similar can be applied to ld128, but reduction may take > >>> two rounds (ie., a comparison with UINT64_MAX and then UINT32_MAX). > >> > >> I don't like that much. The 0x1p52 magic is tricky and probably buggy. > >> s_rint.c uses the same magic but needs many complications to avoid > >> double rounding. > > > > Yep. Seems to have some issue. From my working copy of s_sinpif(x), > > I tried > > ... > > My max ULP went from 0.5 to 0.50398505 for exhaustive testing > > in the interval [1,100]. If I grab the worse case x and use the > > '#if 1' code, I see > > > >> Similar-looking magic for trig reduction in radians > >> uses floating the magic value of 0x1.8p52 to bias rounding errors in > >> a safe way, and arranges so that rounding errors don't matter then. > >> .... > >> y = x * 0.5 - floor(x * 0.5); /* 4 quads in [2n, 2n+2) -> [0, 1) */ > >> y = 4 * y; /* -> [0, 4) */ > >> int n = y; /* quadrant */ > >> r = (y - n) / 4; /* offset in quadrant */ > >> > >> This is a bit slow, and too much like lgamma's method. > > > > I did look over what you did in lgamma[f], but forgot the logic > > behind the reduction into octants. I suppose that I'm more > > concern with 'make it work', 'make it correct', and then I > > will/might worry about 'make it fast'. But, I don't want it to be > > too slow, so I'm trying to avoid function calls and reclassification > > of x as +-inf or nan. Heck, r = remquof(ax, vx, &n) should work, > > but there's that speed issue. > > In fact, we already did the work to optimize this using the 0x1p52 > magic for lgamma, and even committed it. > > fdlibm lgamma used a messy combination of multiplications by 0.5 and > 2.0, 2 floor()s, 1 half of the 0x1p52 magic and bit fiddling to do > the other half of the 0x1p52 magic. You simplified this, and we > eventually optimized it to use no floors (but the equivalent of > 2 rint()s on |x| and 4*|x| implemented using full 0x1p52 magic) and > different bit fiddling. Now I fear that the 0x1p52 doesn't work > in double rounding cases. At least, in my sinpif() the double rounding appears near x = n.5 where I get n+1 instead of the intended n. I'll need to recheck lgammaf, but I recall that we are working with |x|. This means we can test for n < |x|. If we get n+1 then double rounding occurred. > Hopefully it works like it does for trig > functions -- at worst there is an off-by 1 error in the quadrant > but here is a compensating tiny error in the offset. E.g., the > closest value to pi/4 might be represented as octant 0 with offset > pi/4 + epsilon or octant 1 with offset epsilon (the lowest level > must reduce to octants, not quadrants). Then the poly much just > approximate a little above pi/4 in case epsilon > 0. Higher levels > also see the off-by-1 error but everything is consistent. > > This explains your larger error -- the accuracy of the poly breaks > down not far above pi/4, and even epsilon above pi/4 there may be > a near-halfway case that is relatively inaccurate. > Well, I stared at s_floorf.c and s_floor.c today. For sinpi, I'm interested in 1 <= x < 0x1p52 (or 23 or 63 or 112 for other precisions), which significantly reduces the needed bit twiddling. For double, I need to do if (ix < 0x43300000) { /* 1 <= |x| < 0x1p52 */ /* Reduced from s_floor.c from here ... */ j0 = ((ix >> 20) & 0x7ff) - 0x3ff; if (j0 < 20) { ix &= ~(0x000fffff >> j0); lx = 0; } else { i = ((uint32_t)(0xffffffff)) >> (j0 - 20); lx &= ~i; } INSERT_WORDS(x,ix,lx); /* ... to here. */ ax -= x; /* r = |x| - floor(|n|) */ if (ax == 0) { ax = zero; /* Make sure sign of zero is +. */ } else { ax = __compute_sinpi(ax); /* Here I determine if integer part of ax is even or odd. */ if (x > 0x1p30) x -= 0x1p30; lx = (uint32_t)x; if (lx & 1) ax = -ax; } return ((hx & 0x80000000) ? -ax : ax); } -- Steve 20161221 https://www.youtube.com/watch?v=IbCHE-hONow