Date: Thu, 9 Mar 2017 23:21:36 -0800 From: Steve Kargl <sgk@troutmask.apl.washington.edu> To: Bruce Evans <brde@optusnet.com.au> Cc: freebsd-numerics@freebsd.org Subject: Re: Bit twiddling question Message-ID: <20170310072136.GA43725@troutmask.apl.washington.edu> In-Reply-To: <20170310165638.D1143@besplex.bde.org> References: <20170308202417.GA23103@troutmask.apl.washington.edu> <20170309173152.F1269@besplex.bde.org> <20170309075236.GA30199@troutmask.apl.washington.edu> <20170309152307.GA32781@troutmask.apl.washington.edu> <20170310025417.U3723@besplex.bde.org> <20170309195134.GA36213@troutmask.apl.washington.edu> <20170310165638.D1143@besplex.bde.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, Mar 10, 2017 at 05:46:34PM +1100, Bruce Evans wrote: > On Thu, 9 Mar 2017, Steve Kargl wrote: > > > On Fri, Mar 10, 2017 at 05:02:13AM +1100, Bruce Evans wrote: > >> On Thu, 9 Mar 2017, Steve Kargl wrote: > >> > >>> On Wed, Mar 08, 2017 at 11:52:36PM -0800, Steve Kargl wrote: > >>>> To give a hint at what I have been working on, I have implementations > >>>> for sinpif(x), sinpi(x), cospif(x), and cospi(x). For sinpif(x) and > >>>> cospif(x) I have kernels that give correctly rounded results for > >>>> FLT_MIN <= x < 0x1p23 (at least on x86_64). I also have slightly > >> > >> I didn't see the original mail. > > > > Odd. Maybe munched by a spam filter. > > It was delivered out of order. > > >> I'd prefer you to finish the long double versions of cosl(), sinl() and > >> tanl() (maybe das is responsible for the latter). These are still too > >> slow and minor bugs for at least sinl() on small x, partly by not copying > >> the lower-precision versions enough. > > > > sinl, cosl, and tanl are good enough for my purposes at the moment. > > tgammal and powl are higher on the list of things to fix. But, > > Only powl is very important, but everyone avoids it because it is harder. I have a partially written powl in my tree. I'll probably finish it someday. > > >* ... > >>> which can be written (off the top-of-head and not tested) > >>> if (ix < 0x43300000) { /* 1 <= |x| < 0x1p52 */ > >>> double volatile vx; > >>> uint32_t n; > >>> vx = ax + 0x1p52; > >>> vx = vx - 0x1p52; > >>> if (vx == ax) return(0); > >>> if (vx > UINT32_MAX) { /* ax = m + n + r with m + n */ > >>> vx -= UINT32_MAX; /* m = UINT32_MAX. n is in range */ > >>> ax -= UINT32_MAX; > >>> } > >>> n = (uint32_t)vx; > >>> ax = __compute_sinpi(ax - n); > >>> if (n & 1) ax = -ax; > >>> return ((hx & 0x80000000) ? -ax : ax); > >>> } > >>> > >>> Something similar can be applied to ld128, but reduction may take > >>> two rounds (ie., a comparison with UINT64_MAX and then UINT32_MAX). > >> > >> I don't like that much. The 0x1p52 magic is tricky and probably buggy. > >> s_rint.c uses the same magic but needs many complications to avoid > >> double rounding. > > > > Yep. Seems to have some issue. From my working copy of s_sinpif(x), > > I tried > > ... > > My max ULP went from 0.5 to 0.50398505 for exhaustive testing > > in the interval [1,100]. If I grab the worse case x and use the > > '#if 1' code, I see > > > >> Similar-looking magic for trig reduction in radians > >> uses floating the magic value of 0x1.8p52 to bias rounding errors in > >> a safe way, and arranges so that rounding errors don't matter then. > >> .... > >> y = x * 0.5 - floor(x * 0.5); /* 4 quads in [2n, 2n+2) -> [0, 1) */ > >> y = 4 * y; /* -> [0, 4) */ > >> int n = y; /* quadrant */ > >> r = (y - n) / 4; /* offset in quadrant */ > >> > >> This is a bit slow, and too much like lgamma's method. > > > > I did look over what you did in lgamma[f], but forgot the logic > > behind the reduction into octants. I suppose that I'm more > > concern with 'make it work', 'make it correct', and then I > > will/might worry about 'make it fast'. But, I don't want it to be > > too slow, so I'm trying to avoid function calls and reclassification > > of x as +-inf or nan. Heck, r = remquof(ax, vx, &n) should work, > > but there's that speed issue. > > In fact, we already did the work to optimize this using the 0x1p52 > magic for lgamma, and even committed it. > > fdlibm lgamma used a messy combination of multiplications by 0.5 and > 2.0, 2 floor()s, 1 half of the 0x1p52 magic and bit fiddling to do > the other half of the 0x1p52 magic. You simplified this, and we > eventually optimized it to use no floors (but the equivalent of > 2 rint()s on |x| and 4*|x| implemented using full 0x1p52 magic) and > different bit fiddling. Now I fear that the 0x1p52 doesn't work > in double rounding cases. At least, in my sinpif() the double rounding appears near x = n.5 where I get n+1 instead of the intended n. I'll need to recheck lgammaf, but I recall that we are working with |x|. This means we can test for n < |x|. If we get n+1 then double rounding occurred. > Hopefully it works like it does for trig > functions -- at worst there is an off-by 1 error in the quadrant > but here is a compensating tiny error in the offset. E.g., the > closest value to pi/4 might be represented as octant 0 with offset > pi/4 + epsilon or octant 1 with offset epsilon (the lowest level > must reduce to octants, not quadrants). Then the poly much just > approximate a little above pi/4 in case epsilon > 0. Higher levels > also see the off-by-1 error but everything is consistent. > > This explains your larger error -- the accuracy of the poly breaks > down not far above pi/4, and even epsilon above pi/4 there may be > a near-halfway case that is relatively inaccurate. > Well, I stared at s_floorf.c and s_floor.c today. For sinpi, I'm interested in 1 <= x < 0x1p52 (or 23 or 63 or 112 for other precisions), which significantly reduces the needed bit twiddling. For double, I need to do if (ix < 0x43300000) { /* 1 <= |x| < 0x1p52 */ /* Reduced from s_floor.c from here ... */ j0 = ((ix >> 20) & 0x7ff) - 0x3ff; if (j0 < 20) { ix &= ~(0x000fffff >> j0); lx = 0; } else { i = ((uint32_t)(0xffffffff)) >> (j0 - 20); lx &= ~i; } INSERT_WORDS(x,ix,lx); /* ... to here. */ ax -= x; /* r = |x| - floor(|n|) */ if (ax == 0) { ax = zero; /* Make sure sign of zero is +. */ } else { ax = __compute_sinpi(ax); /* Here I determine if integer part of ax is even or odd. */ if (x > 0x1p30) x -= 0x1p30; lx = (uint32_t)x; if (lx & 1) ax = -ax; } return ((hx & 0x80000000) ? -ax : ax); } -- Steve 20161221 https://www.youtube.com/watch?v=IbCHE-hONow
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20170310072136.GA43725>