Date: Sun, 20 Feb 2005 23:00:36 GMT From: David Schultz <das@FreeBSD.ORG> To: freebsd-i386@FreeBSD.org Subject: Re: i386/67469: src/lib/msun/i387/s_tan.S gives incorrect results for large inputs Message-ID: <200502202300.j1KN0a5q005345@freefall.freebsd.org>
next in thread | raw e-mail | index | archive | help
The following reply was made to PR i386/67469; it has been noted by GNATS. From: David Schultz <das@FreeBSD.ORG> To: Bruce Evans <bde@zeta.org.au> Cc: FreeBSD-gnats-submit@FreeBSD.ORG, freebsd-i386@FreeBSD.ORG Subject: Re: i386/67469: src/lib/msun/i387/s_tan.S gives incorrect results for large inputs Date: Sun, 20 Feb 2005 17:52:01 -0500 DONE: - Remove i387 float trig functions. - Remove i387 acos() and asin(). TODO: - Figure out what to do about logf(), logbf(), and log10f(). - Figure out what to do about atan(), atan2(), and atan2f(). - Figure out what to do with sin(), cos(), and tan(). (I suggest removing the i387 double trig functions now and worrying about the speed of fdlibm routines later.) On Sun, Feb 20, 2005, Bruce Evans wrote: > I would adjust the following due to these results: > Think about deleting the exp and > log i387 float functions. I didn't add NetBSD's e_expf.S in the first place because my tests showed that it was slower. :-P As for log{,b,10}f, your tests show that the asm versions are faster on my Pentium 4: asmlogf: nsec per call: 40 41 40 40 40 40 40 40 40 40 40 40 40 40 40 40 fdllogf: nsec per call: 76 77 77 78 76 78 78 78 77 75 78 78 78 78 78 78 asmlogbf: nsec per call: 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 fdllogbf: nsec per call: 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 asmlog10f: nsec per call: 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 fdllog10f: nsec per call: 80 80 71 88 71 71 95 84 71 71 71 71 72 112 96 71 > - delete all inverse trig i387 functions. This is a clear win for asin() and acos(). It's not so clear for atan() or atan2(): asmatan: nsec per call: 68 68 68 68 68 68 68 69 69 68 68 68 68 68 68 68 fdlatan: nsec per call: 92 92 92 92 92 94 97 70 70 97 95 92 92 92 92 92 fdlatanf: nsec per call: 70 70 70 70 70 71 72 58 58 72 70 69 70 75 71 69 This is for the same Pentium 4 as above. Do you get different results for a saner processor, like an Athlon? IIRC, atan2f() was faster in assembly according to my naive tests, or I wouldn't have imported it. I don't remember what inputs I tried or why I left out atanf(). > Think about optimizing the fdlibm versions. > They could use double or extended precision, and then they might not > need range reduction for a much larger range (hopefully [-2pi, 2pi]) > and/or might not need a correction term for a much larger range. [...] > - think about optimizing the trig fdlibm double functions until they are > faster than the trig i387 double functions on a larger range than > [-pi/4, pi/4]. They could use extended precision, but only only on > some arches so there would be a negative reduction in complications > for replacing the MD functions by "MI" ones optimized in this way. > Does the polynomial approximation for sin start to fail between pi/4 > and pi/2, or are there other technical rasons to reduce to the current > ranges? Yeah, the float versions of just about everything would benefit from storing the extra bits in a double. Replacing two split doubles with a long double is harder, as you mention; it works (a) never, for 64-bit long doubles, (b) sometimes, for 80-bit long doubles, and (c) always, for 128-bit long doubles. Some of the lookup tables for the float routines are a bit braindead, too. It's impossible to use a polynomial approximation on [0,pi/2] or a larger range for tan(), since tan() grows faster than any polynomial as it approaches pi/2. There may be a rational approximation that works well, but I doubt it. It is possible to find polynomial approximations for sin() and cos() on [0,pi/2], but assuming that Dr. Ng knows what he's doing (a reasonable assumption), the degree of any such polynomial would likely be very large. By the way, Dr. Ng's paper on argument reduction mentions that the not-so-freely-distributable libm uses table lookup for medium size arguments that would otherwise require reduction: http://www.ucbtest.org/arg.ps So in summary, there's a lot of low-hanging fruit with the float routines, but I think that doing a better job for double requires multiple versions that use the appropriate kind of extended precision, table lookup, or consulting a numerical analyst. By the way, the CEPHES library (netlib/cephes or http://www.moshier.net/) has different versions of many of these routines. The trig functions are also approximated on [0,pi/4], but accurate argument reduction is not used. I have the licensing issues worked out with the author and core@ if we want to use any of these. However, my experience with exp2() and expl() from CEPHES showed that there are some significant inaccuracies, places where the approximating polynomial can overflow, etc.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200502202300.j1KN0a5q005345>