Date: Sun, 6 Feb 2005 18:32:38 +1100 (EST) From: Bruce Evans <bde@zeta.org.au> To: David Schultz <das@freebsd.org> Cc: freebsd-i386@freebsd.org Subject: Re: i386/67469: src/lib/msun/i387/s_tan.S gives incorrect results for large inputs Message-ID: <20050206162951.J14584@delplex.bde.org> In-Reply-To: <20050206000912.GA59372@VARK.MIT.EDU> References: <200406012251.i51MpkkU024224@VARK.homeunix.com> <20040602172105.T23521@gamplex.bde.org> <20050204215913.GA44598@VARK.MIT.EDU> <20050205181808.J10966@delplex.bde.org> <20050206000912.GA59372@VARK.MIT.EDU>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, 5 Feb 2005, David Schultz wrote: > On Sat, Feb 05, 2005, Bruce Evans wrote: > > Better call the MI tan() to do all this. It won't take significantly > > longer, and shouldn't be reached in most cases anyway. > > Yeah, but s_tan.S overrides s_tan.c, so that would require extra > machinery. Also, if fptan had worked as advertised and correctly > identified the cases where range reduction was necessary, calling > rem_pio2() directly would have avoided superfluous range checking. I hacked the file (but not the machinery) easily enough for testing. fptan advertises to work? :-) At least the amd64 manual (instruction reference) only claims that it sets C2 if the arg is outside the range [-2^63,2^63]. > > I think it is because fptan is inherently inaccurate. [...] > > As you mention, it gets the wrong answer for M_PI. In general, it > seems to do pretty badly on numbers close to multiples of pi. > Granted, one could argue that the answer returned is going to be > even *further* from the answer the programmer expected (namely, 0), > so it won't make much difference. ;-) Too bad the correct answer is a little further from 0. > ... > - The Intel software developer's guide says that fptan has an error > of at most 1 ulp, but as we've seen, they're lying. The amd64 manual (application refrence) says "x87 computations are carried out in double extended precision format, so that the transcendental functions [are accurate to 1 ulp]". The "so that" part doesn't follow. We are seeing something like naive extended precision computations and how inaccurate they can be even when only a double precision result is wanted. > [... too much detail to reply to :-)] > Your suggestion of effectively special-casing large inputs and > inputs close to multiples of pi is probably the right way to fix > the inaccuracy. However, I'm worried that it would wipe away > any performance benefit of using fptan, if there is a benefit > at all. How about punting and removing s_tan.S instead? The problem affects at least sin() and cos() too, so I think throwing away the optimized versions is too extreme. Perhaps a single range check to let the MD version handle only values near 0 ([-pi/2+eps, pi/2-eps]? would be efficient enough. > Other unanswered questions (ENOTIME right now): > - What about sin() and cos()? > - What about the float versions of these? I tested sin(). It misbehaves similarly (with identical results for args (2 * n * M_PI). It's interesting that there is fptan and not ftan, but fsin and not fpsin. Both are partial. I don't run -current and hadn't seen your code for the MD float versions... They are buggier: (1) the exponent can be up to 127, so more than half of the representable values exceed 2^63 in magnitude and thus need range reduction. Results for tan(0x1p64) using various methods: buggy MD tanf: 0x1p+64 1.84467e+19 buggy MD tan: -0x1.8467b926c834bp-2 -0.379302 fdlibm MI tanf: -0x1.82bee6p-6 -0.0236051 bc s(x)/c(x) (scale=40): -.02360508353334969937000... bc s(x)/c(x) (default scale=20): -.02358521765210826916 It looks like fdlibm is perfect and scale=20 in bc is not quite enough. sinf(0x1p64) would be 0x1p64 too. This is preposterous for sin(). (2) they return extra bits of precision. The MD double precision versions have the same bug; the bug is just clearer for the float case, and cannot be fixed by FreeBSD's rounding precision hack. BTW, I don't trust the range reduction for floating point pi/2 but never finished debugging or fixing it. According to my old comment, it is off by pi/2 and very slow for many values between 32768 and 65535. However, I couldn't duplicate this misbehaviour last time I looked at it. I used to fix it using the double version. Bruce
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20050206162951.J14584>