From owner-svn-src-all@FreeBSD.ORG Mon Oct 22 16:09:39 2012 Return-Path: Delivered-To: svn-src-all@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 78F98F40; Mon, 22 Oct 2012 16:09:39 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail36.syd.optusnet.com.au (mail36.syd.optusnet.com.au [211.29.133.76]) by mx1.freebsd.org (Postfix) with ESMTP id E98008FC08; Mon, 22 Oct 2012 16:09:38 +0000 (UTC) Received: from c122-106-175-26.carlnfd1.nsw.optusnet.com.au (c122-106-175-26.carlnfd1.nsw.optusnet.com.au [122.106.175.26]) by mail36.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id q9MG9Pv5023992 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Tue, 23 Oct 2012 03:09:26 +1100 Date: Tue, 23 Oct 2012 03:09:25 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Warner Losh Subject: Re: svn commit: r241755 - head/lib/msun/src In-Reply-To: <80A9038D-C84B-47D3-A137-F281449F4D88@bsdimp.com> Message-ID: <20121023011459.G2088@besplex.bde.org> References: <201210192246.q9JMkm4R092929@svn.freebsd.org> <20121020150917.I1095@besplex.bde.org> <18177777-6EE0-4103-98B0-272EFF98FE96@bsdimp.com> <20121022213348.T1373@besplex.bde.org> <80A9038D-C84B-47D3-A137-F281449F4D88@bsdimp.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-Cloudmark-Score: 0 X-Optus-Cloudmark-Analysis: v=2.0 cv=Up07rJMB c=1 sm=1 a=xI64QrV9ptIA:10 a=kj9zAlcOel0A:10 a=PO7r1zJSAAAA:8 a=JzwRw_2MAAAA:8 a=Aet6fyW9sl8A:10 a=IjXKO9rnGdwczpQha3EA:9 a=CjuIK1q_8ugA:10 a=d3D4cFlDitsIMNaX:21 a=4zsjEq_Luspm44fH:21 a=bxQHXO5Py4tHmhUgaywp5w==:117 Cc: svn-src-head@FreeBSD.org, svn-src-all@FreeBSD.org, src-committers@FreeBSD.org, Warner Losh , Bruce Evans X-BeenThere: svn-src-all@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "SVN commit messages for the entire src tree \(except for " user" and " projects" \)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 22 Oct 2012 16:09:39 -0000 On Mon, 22 Oct 2012, Warner Losh wrote: > On Oct 22, 2012, at 5:14 AM, Bruce Evans wrote: > >> On Sun, 21 Oct 2012, Warner Losh wrote: >> >>> Feel free to fix them however. I added the comments because the algorithms weren't quite the same... If you have a better way, feel free to back my stuff out on the way to it. >> >> But the algorithms are identical to a fault. Inside the functions, all >> lines except 1 in each correspond exactly, and the exception is a style >> bug. Only about 30 lines in each are not lexically identical. The >> non-lexical differences are for things like different magic numbers. > > Except that's not true. For expf, only the first two terms of the 5 are computed in Remes expansion. That's why I bothered to document the difference. For logf, there are at least two additional terms in the intermediate form. No, they both use a Remez-type polynomial approximation. The only differences are that the approximation for logf needs _fewer_ terms, and in fact has 3 fewer (except in the Cygnus version it had the same number of terms with the lower ones useless at best), and that the grouping of the terms is different for efficiency. I say "Remez-type" because it is unclear if anyone actually uses pure Remez. I use my own idea of what it is, without having looked at documentation of the original algorithm. I probably have different tweaks than normal. fdlibm's approximations often seem to be too inaccurate to have been generated by full Remez. The ones here in the Cygnus version were generated by starting with the double precision coeffs and blindly rounding them to float precision. This is a bad non-Remez method (but pure Remez doesn't understand limited precision AFAIK). In practice it gave 6 extra bits using Lg1-Lg7, where my coeffs generated for float precision give 10 extra bits using only Lg1-Lg4. The extra coeffs tend to be actively harmful (and IIRC they were here), because when Remez generates the lower coeffs it tunes them for the lower coeffs actually working, all in infinite precision, and after rounding to finite precision the delicate tuning for this just doesn't work. The comments actually say "special Remes". Who knows what that is? :-) Even running same algorithm, there is a lot of noise in the lower bits for all except the first few coeffs, so it is hard to duplicate coeffs produced by other special versions. > The differences here are: > /* logf */ > t1= w*(Lg2+w*Lg4); > t2= z*(Lg1+w*Lg3); > /* log */ > t1= w*(Lg2+w*(Lg4+w*Lg6)); > t2= z*(Lg1+w*(Lg3+w*(Lg5+w*Lg7))); > > which to even the most casual observer are clearly different. Too casual :-). Both use a moderately efficient grouping of terms for efficiency. Both are more optimal than the simple Horner's method grouping which would look similar even to a too-casual observer: R = s*s*(Lg1+s*(Lg2+s*(...+Lg[4or7])...)) Neither is highly optimized, but the logf one is is better, mainly because a power of 2 number of terms is easiest to get right and a small number of terms is especially easy to get right. In fact, the logf one is just the log one with higher terms omitted. Simply omitting higher terms is usually too simple, but here it happens to improve the grouping. The grouping is a general polynomial evaluation technique and isn't part of Remez. The details of it aren't documented at all. In fact, the pseudo-code doesn't even mention z completely: it says R(z) ~= Lg1*s**2 + ... +Lg7*s**14 (using superscripts for the powers) where the code says t2 = z*(Lg1+w*(Lg3+...)... R = t2+t1. Many other details involve unmentioned parts of general polynymonal evaluation technique (mainly, add terms from high to low for accuracy, but violate this rule as much as possible for efficiency, and avoid underflow by filtering out small args early and by not calculating high powers directly). >> The old fdlibm comments aren't too careful about keeping magic numbers >> out of the algorithm description so that the algorithm description is >> as general as possible. The precise magic numbers are often critical >> to the details of the implementation of the algorithm but not really >> to the algorithm itself. > > The current code isn't careful at all about magic numbers. Which ones are magic hex constants for IEEE stuff, and which ones are hex numbers for the exact representation of important constants? None for the hex FP constants. Hex FP constants are for the programmer's convenience, but I are just style bugs where I use put them in e_expf.c. Decimal FP constants work with any C compiler. They are only pseudocode in comments in fdlibm, and no one every used their values from there AFAIK, since C compilers were common when fdlibm was released. The style bug is to mix C99 hex FP constants in old fdlibm (Cygnus) code. Though only in comments, it looks strange since the active code still laboriously spells out (float) casts to get float constants. Even C90 has float constants via the F suffix. But I recently decided that many of the FP constants are style bugs. When you just want to add 2 as in e_log*.c you should spell 2 as 2 and not as 2.0 for the double case, 2.0F or (float)2.0 for the float case, and maybe 2.0L or (long double)2.0 for the long double case. C's promotion rules work correctly, so the 2 in (2+f) is promoted to the same type as f in all cases. By depending in this, the code becomes independent of the precision. A more interesting case is multiplying by 0.5 or (float)0.5 or 0.5F or ... This can be written as division byi integer 2, since the compiler is permitted to turn this into multiplication by 0.5 with the correct type. gcc has done this for many uears, so there is no need to manually optimize this. > Anyway, it is clear you guys don't want them in there so I've reverted them. I'm totally baffled by this request because these algorithms are similar not identical, but since you guys are the active maintainers, I'll accede to your wishes after expressing my utter bafflement at the justifications. Well, we (I) wrote the changes in the polynomial evaluation for both e_expf.c and e_logf.c, and was careful to a fault not to change the algorithm, so I know that it is more identical in these files than in others. Bruce