Date: Sat, 26 Feb 2011 10:30:44 -0800 From: Steve Kargl <sgk@troutmask.apl.washington.edu> To: freebsd-hackers@freebsd.org, freebsd-standards@freebsd.org Subject: [PATCH] improve accuracy and speed of tanh[f] for x in [0:1) Message-ID: <20110226183044.GA92706@troutmask.apl.washington.edu>
next in thread | raw e-mail | index | archive | help
The patch that follows my signature improves both the accuracy and speed of tanhf and tanh in the interval [0,1). The testing was done on an Intel Core2 Duo CPU T7250 and everything is compiled with gcc. The improvements are accomplished by using a trancated continued fraction instead of relying on expm1[f] and a rewritten definition of tanh(x). The results can be summarized by | fdlibm | Con. frac. ------------------------------- | ulp time | ulp time tanhf | 1.66 1.36 | 0.56 1.31 tanh | 2.16 5.65 | 1.53 5.33 tanhl | 2.36 4.72 | 1.53 5.12 ulp in the above table is the maximum ulp observed for 2 million evenly distributed values with the interval. The timing for tanhf is for 5 million calls and the timing for tanh is for 20 million calls, where again the values are evenly distributed across the interval. The timings are the total time in seconds for a tight loop. For those paying attention, yes, the last entry is for tanhl, which libm does not currently implement. I have an implementation that uses expm1l and fdlibm's simple rewritten tanh(x) definition. I, of course, have a truncated continued fraction implementation as well. Now for those that are really paying attention, libm does not contain an expm1l. :) -- Steve Index: src/s_tanh.c =================================================================== --- src/s_tanh.c (revision 219046) +++ src/s_tanh.c (working copy) @@ -66,8 +66,33 @@ t = expm1(two*fabs(x)); z = one - two/(t+two); } else { +#if 0 + /* + * In the interval, one has a max ulp of 2.156 for 2M + * evenly distributed values. For timing, 20M call + * gives 5.65 s. + */ t = expm1(-two*fabs(x)); z= -t/(t+two); +#else + /* + * This is a continued fraction representation that + * was truncated at the 10th level and then refractored + * to get rid of the divisions. In the interval, one + * has a max ulp of 1.532 for 2M evenly distributed + * values. For timing, 20M calls give 5.33 s. + */ + double x2, t1, t2, t3, t4, t5; + x2 = x * x; + t1 = 399 + 20 * x2; + t3 = 17 * t1 + (21 + x2) * x2; + t4 = 15 * t3 + t1 * x2; + t5 = 13 * t4 + t3 * x2; + t2 = 99 * t5 + (9 * t4 + t5) * x2; + t3 = 7 * t2 + (11 * t5 + t4 * x2) * x2; + z = x / (1 + x2 / (3 + t3 * x2 / (5 * t3 + t2 * x2))); + return (z); +#endif } /* |x| >= 22, return +-1 */ } else { Index: src/s_tanhf.c =================================================================== --- src/s_tanhf.c (revision 219046) +++ src/s_tanhf.c (working copy) @@ -44,8 +44,29 @@ t = expm1f(two*fabsf(x)); z = one - two/(t+two); } else { +#if 0 + /* + * In the interval, one has a max ulp of 1.66 for 2M + * evenly distributed values. For timing, 5M calls + * give 1.36 s. + */ t = expm1f(-two*fabsf(x)); z= -t/(t+two); +#else + /* + * This is a continued fraction representation that + * was truncated at the 6th level and then refractored + * to get rid of the divisions. In the interval, one + * has a max ulp of 0.559 for 2M evenly distributed + * values. For timing, 5M calls give 1.306 s. + */ + float x2, t1, t2, t3; + x2 = x * x; + t1 = 11 + x2; + t2 = 9 * t1 + x2; + t3 = 7 * t2 + t1 * x2; + z = x / (1 + x2 / (3 + t3 * x2 / (5 * t3 + t2 * x2 ))); +#endif } /* |x| >= 9, return +-1 */ } else {
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20110226183044.GA92706>