From owner-svn-src-all@FreeBSD.ORG Sat Oct 20 07:18:34 2012 Return-Path: Delivered-To: svn-src-all@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id C17926D4; Sat, 20 Oct 2012 07:18:34 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from fallbackmx07.syd.optusnet.com.au (fallbackmx07.syd.optusnet.com.au [211.29.132.9]) by mx1.freebsd.org (Postfix) with ESMTP id DEE238FC12; Sat, 20 Oct 2012 07:18:33 +0000 (UTC) Received: from mail03.syd.optusnet.com.au (mail03.syd.optusnet.com.au [211.29.132.184]) by fallbackmx07.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id q9K7BjlW015608; Sat, 20 Oct 2012 18:11:45 +1100 Received: from c122-106-175-26.carlnfd1.nsw.optusnet.com.au (c122-106-175-26.carlnfd1.nsw.optusnet.com.au [122.106.175.26]) by mail03.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id q9K7BZSx028185 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sat, 20 Oct 2012 18:11:36 +1100 Date: Sat, 20 Oct 2012 18:11:35 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Warner Losh Subject: Re: svn commit: r241756 - head/lib/msun/src In-Reply-To: <201210192247.q9JMljL4093098@svn.freebsd.org> Message-ID: <20121020164315.Q1095@besplex.bde.org> References: <201210192247.q9JMljL4093098@svn.freebsd.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-Cloudmark-Score: 0 X-Optus-Cloudmark-Analysis: v=2.0 cv=fLlhRume c=1 sm=1 a=hNMMERRCccEA:10 a=kj9zAlcOel0A:10 a=PO7r1zJSAAAA:8 a=JzwRw_2MAAAA:8 a=Aet6fyW9sl8A:10 a=cYv7yatJ9UkkMbrVvNAA:9 a=CjuIK1q_8ugA:10 a=bxQHXO5Py4tHmhUgaywp5w==:117 Cc: svn-src-head@freebsd.org, svn-src-all@freebsd.org, src-committers@freebsd.org X-BeenThere: svn-src-all@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "SVN commit messages for the entire src tree \(except for " user" and " projects" \)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 20 Oct 2012 07:18:34 -0000 On Fri, 19 Oct 2012, Warner Losh wrote: > Log: > Document the method used to compute expf. Taken from exp, with > changes to reflect differences in computation between the two. Please back this out, as for logf. We are a bit further from replacing fdlibm exp* by better versions than for log*(). We have a better expl() that is faster and much more accuract than exp() and almost faster than expf(), but lower-precision of our expl() implementation are incomplete. > Modified: head/lib/msun/src/e_expf.c > ============================================================================== > --- head/lib/msun/src/e_expf.c Fri Oct 19 22:46:48 2012 (r241755) > +++ head/lib/msun/src/e_expf.c Fri Oct 19 22:47:44 2012 (r241756) > @@ -21,6 +21,68 @@ __FBSDID("$FreeBSD$"); > #include "math.h" > #include "math_private.h" > > +/* __ieee754_expf Here the comment matches the code, unlike for logf. > + * Returns the exponential of x. Banal comment which does less than echo the code. The function name tells us that it returns the exponential of x in float precision. > ... > + * 2. Approximation of exp(r) by a special rational function on > + * the interval [0,0.34658]: > + * Write > + * R(r**2) = r*(exp(r)+1)/(exp(r)-1) = 2 + r*r/6 - r**4/360 + ... > + * We use a special Remes algorithm on [0,0.34658] to generate > + * a polynomial of degree 2 to approximate R. The maximum error > + * of this polynomial approximation is bounded by 2**-27. In > + * other words, The last 3 sentences are missing the spelling and punctuation errors in log*. The magic 2**-27 is better decomented as 2**-27.74 in my changes, except possibly for the rounding. > + * R(z) ~ 2.0 + P1*z + P2*z*z > + * (where z=r*r, and the values of P1 and P2 are listed below) > + * and > + * | 2 | -27 > + * | 2.0+P1*z+P2*z - R(z) | <= 2 > + * | | -27 duplicated again. This part of the approximation is routine and shouldn't be documented using verbose pretty-printing. > + * The computation of expf(r) thus becomes > + * 2*r > + * expf(r) = 1 + ------- > + * R - r > + * r*R1(r) > + * = 1 + r + ----------- (for better accuracy) > + * 2 - R1(r) > + * where > + * 2 4 > + * R1(r) = r - (P1*r + P2*r) This part is not routine, so it is good to verbosely pretty-print it. But only once. > + * Accuracy: > + * according to an error analysis, the error is always less than > + * 0.5013 ulp (unit in the last place). Just wrong. An error of 2**-27 can't possibly get close to 0.5013. It gives a theoretical error of at least 0.5 + 2**(-27 - -24) = 0.625. We intentional don't try form more since the algorithm has even larger fundamental errors elsewhere. 2**-27 wouldn't be good enough if we couldn't do exhaustive testing to check that the other errors really do dominate. Exhaustive testing shows the following errors: - i386 (old version with gcc -O): 0.5092 - i386 (cur version with clang): <0.5092 - i386 (cur version with gcc -O): 0.5807 - i386 (cur version with clang): <0.5807 - i386 (with gcc -O0 and/or -ffloat-store): larger, much like amd64 - amd64 (al versions): 0.9101 With amd64 or -O0, we just get the larger errors elsewhere. With the old version, the errors were smaller due to a bug that normally gave extra precision accidentally, but which could just give wrong results in theory, so I fixed it. The fix unfortunately lost some of the accidental extra precision. The bug showed up as clang giving different accidental extra precision. It still gives some, but not as much as before. The sources could be modified to the more than the accidental extra precision non-accidentally by using float_t in float precision and double_t plus FP_PE rounding precision in double precision. This requires care with types mainly at the place where non-accidental extra precision gives bugs unless handled carefully. Otherwise, the changes are simply s/float/float_t/ for local variables, etc. plus fixing float_t for clang... > + * > + * Misc. info. > + * For IEEE float > + * if x > 8.8721679688e+01 then exp(x) overflow > + * if x < -1.0397208405e+02 then exp(x) underflow These constants are better documented in the code that declares them as o_threshold and u_threshold. This gives the values, so the values shouldn't be repeated here. There are perhaps not quite enough comments attached to the code. In s_expl.c, there are larger comments giving the expressions for calculating these constants. But the trickiest details for using these constants are not commented on at all. For efficiency, classification involves a mixture of fuzzy classification using bits and precise classification using these constants > + * > + * Constants: > + * The hexadecimal values are the intended ones for the following > + * constants. The decimal values may be used, provided that the > + * compiler will convert from decimal to binary accurately enough > + * to produce the hexadecimal values shown. > + */ Boilerplate FUD for pre-C90 duplicated ad nauseum more than before. Bruce