From owner-freebsd-hackers@FreeBSD.ORG Sat Feb 26 18:30:44 2011 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id CD108106566B; Sat, 26 Feb 2011 18:30:44 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu [128.95.76.21]) by mx1.freebsd.org (Postfix) with ESMTP id B08778FC08; Sat, 26 Feb 2011 18:30:44 +0000 (UTC) Received: from troutmask.apl.washington.edu (localhost.apl.washington.edu [127.0.0.1]) by troutmask.apl.washington.edu (8.14.4/8.14.4) with ESMTP id p1QIUige092782; Sat, 26 Feb 2011 10:30:44 -0800 (PST) (envelope-from sgk@troutmask.apl.washington.edu) Received: (from sgk@localhost) by troutmask.apl.washington.edu (8.14.4/8.14.4/Submit) id p1QIUila092781; Sat, 26 Feb 2011 10:30:44 -0800 (PST) (envelope-from sgk) Date: Sat, 26 Feb 2011 10:30:44 -0800 From: Steve Kargl To: freebsd-hackers@freebsd.org, freebsd-standards@freebsd.org Message-ID: <20110226183044.GA92706@troutmask.apl.washington.edu> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.4.2.3i X-Mailman-Approved-At: Sat, 26 Feb 2011 20:56:20 +0000 Cc: Subject: [PATCH] improve accuracy and speed of tanh[f] for x in [0:1) X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 26 Feb 2011 18:30:45 -0000 The patch that follows my signature improves both the accuracy and speed of tanhf and tanh in the interval [0,1). The testing was done on an Intel Core2 Duo CPU T7250 and everything is compiled with gcc. The improvements are accomplished by using a trancated continued fraction instead of relying on expm1[f] and a rewritten definition of tanh(x). The results can be summarized by | fdlibm | Con. frac. ------------------------------- | ulp time | ulp time tanhf | 1.66 1.36 | 0.56 1.31 tanh | 2.16 5.65 | 1.53 5.33 tanhl | 2.36 4.72 | 1.53 5.12 ulp in the above table is the maximum ulp observed for 2 million evenly distributed values with the interval. The timing for tanhf is for 5 million calls and the timing for tanh is for 20 million calls, where again the values are evenly distributed across the interval. The timings are the total time in seconds for a tight loop. For those paying attention, yes, the last entry is for tanhl, which libm does not currently implement. I have an implementation that uses expm1l and fdlibm's simple rewritten tanh(x) definition. I, of course, have a truncated continued fraction implementation as well. Now for those that are really paying attention, libm does not contain an expm1l. :) -- Steve Index: src/s_tanh.c =================================================================== --- src/s_tanh.c (revision 219046) +++ src/s_tanh.c (working copy) @@ -66,8 +66,33 @@ t = expm1(two*fabs(x)); z = one - two/(t+two); } else { +#if 0 + /* + * In the interval, one has a max ulp of 2.156 for 2M + * evenly distributed values. For timing, 20M call + * gives 5.65 s. + */ t = expm1(-two*fabs(x)); z= -t/(t+two); +#else + /* + * This is a continued fraction representation that + * was truncated at the 10th level and then refractored + * to get rid of the divisions. In the interval, one + * has a max ulp of 1.532 for 2M evenly distributed + * values. For timing, 20M calls give 5.33 s. + */ + double x2, t1, t2, t3, t4, t5; + x2 = x * x; + t1 = 399 + 20 * x2; + t3 = 17 * t1 + (21 + x2) * x2; + t4 = 15 * t3 + t1 * x2; + t5 = 13 * t4 + t3 * x2; + t2 = 99 * t5 + (9 * t4 + t5) * x2; + t3 = 7 * t2 + (11 * t5 + t4 * x2) * x2; + z = x / (1 + x2 / (3 + t3 * x2 / (5 * t3 + t2 * x2))); + return (z); +#endif } /* |x| >= 22, return +-1 */ } else { Index: src/s_tanhf.c =================================================================== --- src/s_tanhf.c (revision 219046) +++ src/s_tanhf.c (working copy) @@ -44,8 +44,29 @@ t = expm1f(two*fabsf(x)); z = one - two/(t+two); } else { +#if 0 + /* + * In the interval, one has a max ulp of 1.66 for 2M + * evenly distributed values. For timing, 5M calls + * give 1.36 s. + */ t = expm1f(-two*fabsf(x)); z= -t/(t+two); +#else + /* + * This is a continued fraction representation that + * was truncated at the 6th level and then refractored + * to get rid of the divisions. In the interval, one + * has a max ulp of 0.559 for 2M evenly distributed + * values. For timing, 5M calls give 1.306 s. + */ + float x2, t1, t2, t3; + x2 = x * x; + t1 = 11 + x2; + t2 = 9 * t1 + x2; + t3 = 7 * t2 + t1 * x2; + z = x / (1 + x2 / (3 + t3 * x2 / (5 * t3 + t2 * x2 ))); +#endif } /* |x| >= 9, return +-1 */ } else {