From owner-svn-src-all@FreeBSD.ORG  Mon Oct 22 16:09:39 2012
Return-Path: <owner-svn-src-all@FreeBSD.ORG>
Delivered-To: svn-src-all@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
 by hub.freebsd.org (Postfix) with ESMTP id 78F98F40;
 Mon, 22 Oct 2012 16:09:39 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from mail36.syd.optusnet.com.au (mail36.syd.optusnet.com.au
 [211.29.133.76])
 by mx1.freebsd.org (Postfix) with ESMTP id E98008FC08;
 Mon, 22 Oct 2012 16:09:38 +0000 (UTC)
Received: from c122-106-175-26.carlnfd1.nsw.optusnet.com.au
 (c122-106-175-26.carlnfd1.nsw.optusnet.com.au [122.106.175.26])
 by mail36.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id q9MG9Pv5023992
 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
 Tue, 23 Oct 2012 03:09:26 +1100
Date: Tue, 23 Oct 2012 03:09:25 +1100 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Warner Losh <imp@bsdimp.com>
Subject: Re: svn commit: r241755 - head/lib/msun/src
In-Reply-To: <80A9038D-C84B-47D3-A137-F281449F4D88@bsdimp.com>
Message-ID: <20121023011459.G2088@besplex.bde.org>
References: <201210192246.q9JMkm4R092929@svn.freebsd.org>
 <20121020150917.I1095@besplex.bde.org>
 <18177777-6EE0-4103-98B0-272EFF98FE96@bsdimp.com>
 <20121022213348.T1373@besplex.bde.org>
 <80A9038D-C84B-47D3-A137-F281449F4D88@bsdimp.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Optus-Cloudmark-Score: 0
X-Optus-Cloudmark-Analysis: v=2.0 cv=Up07rJMB c=1 sm=1 a=xI64QrV9ptIA:10
 a=kj9zAlcOel0A:10 a=PO7r1zJSAAAA:8 a=JzwRw_2MAAAA:8 a=Aet6fyW9sl8A:10
 a=IjXKO9rnGdwczpQha3EA:9 a=CjuIK1q_8ugA:10 a=d3D4cFlDitsIMNaX:21
 a=4zsjEq_Luspm44fH:21 a=bxQHXO5Py4tHmhUgaywp5w==:117
Cc: svn-src-head@FreeBSD.org, svn-src-all@FreeBSD.org,
 src-committers@FreeBSD.org, Warner Losh <imp@FreeBSD.org>,
 Bruce Evans <brde@optusnet.com.au>
X-BeenThere: svn-src-all@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: "SVN commit messages for the entire src tree \(except for &quot;
 user&quot; and &quot; projects&quot; \)" <svn-src-all.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/svn-src-all>,
 <mailto:svn-src-all-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/svn-src-all>
List-Post: <mailto:svn-src-all@freebsd.org>
List-Help: <mailto:svn-src-all-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/svn-src-all>,
 <mailto:svn-src-all-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 22 Oct 2012 16:09:39 -0000

On Mon, 22 Oct 2012, Warner Losh wrote:

> On Oct 22, 2012, at 5:14 AM, Bruce Evans wrote:
>
>> On Sun, 21 Oct 2012, Warner Losh wrote:
>>
>>> Feel free to fix them however.  I added the comments because the algorithms weren't quite the same...  If you have a better way, feel free to back my stuff out on the way to it.
>>
>> But the algorithms are identical to a fault.  Inside the functions, all
>> lines except 1 in each correspond exactly, and the exception is a style
>> bug.  Only about 30 lines in each are not lexically identical.  The
>> non-lexical differences are for things like different magic numbers.
>
> Except that's not true.  For expf, only the first two terms of the 5 are computed in Remes expansion.  That's why I bothered to document the difference.  For logf, there are at least two additional terms in the intermediate form.

No, they both use a Remez-type polynomial approximation.  The only
differences are that the approximation for logf needs _fewer_ terms,
and in fact has 3 fewer (except in the Cygnus version it had the
same number of terms with the lower ones useless at best), and that
the grouping of the terms is different for efficiency.

I say "Remez-type" because it is unclear if anyone actually uses pure
Remez.  I use my own idea of what it is, without having looked at
documentation of the original algorithm.  I probably have different
tweaks than normal.  fdlibm's approximations often seem to be too
inaccurate to have been generated by full Remez.  The ones here in
the Cygnus version were generated by starting with the double precision
coeffs and blindly rounding them to float precision.   This is a bad
non-Remez method (but pure Remez doesn't understand limited precision
AFAIK).  In practice it gave 6 extra bits using Lg1-Lg7, where my
coeffs generated for float precision give 10 extra bits using only
Lg1-Lg4.  The extra coeffs tend to be actively harmful (and IIRC they
were here), because when Remez generates the lower coeffs it tunes
them for the lower coeffs actually working, all in infinite precision,
and after rounding to finite precision the delicate tuning for this
just doesn't work.

The comments actually say "special Remes".  Who knows what that is? :-)
Even running same algorithm, there is a lot of noise in the lower bits
for all except the first few coeffs, so it is hard to duplicate coeffs
produced by other special versions.

> The differences here are:
> /* logf */
>        t1= w*(Lg2+w*Lg4);
>        t2= z*(Lg1+w*Lg3);
> /* log */
>        t1= w*(Lg2+w*(Lg4+w*Lg6));
>        t2= z*(Lg1+w*(Lg3+w*(Lg5+w*Lg7)));
>
> which to even the most casual observer are clearly different.

Too casual :-).  Both use a moderately efficient grouping of terms for
efficiency.  Both are more optimal than the simple Horner's method
grouping which would look similar even to a too-casual observer:

 	R = s*s*(Lg1+s*(Lg2+s*(...+Lg[4or7])...))

Neither is highly optimized, but the logf one is is better, mainly because
a power of 2 number of terms is easiest to get right and a small number of
terms is especially easy to get right.  In fact, the logf one is just the
log one with higher terms omitted.  Simply omitting higher terms is usually
too simple, but here it happens to improve the grouping.

The grouping is a general polynomial evaluation technique and isn't
part of Remez.  The details of it aren't documented at all.  In fact,
the pseudo-code doesn't even mention z completely: it says R(z) ~=
Lg1*s**2 + ... +Lg7*s**14 (using superscripts for the powers) where
the code says t2 = z*(Lg1+w*(Lg3+...)... R = t2+t1.

Many other details involve unmentioned parts of general polynymonal
evaluation technique (mainly, add terms from high to low for accuracy,
but violate this rule as much as possible for efficiency, and avoid
underflow by filtering out small args early and by not calculating
high powers directly).

>> The old fdlibm comments aren't too careful about keeping magic numbers
>> out of the algorithm description so that the algorithm description is
>> as general as possible.  The precise magic numbers are often critical
>> to the details of the implementation of the algorithm but not really
>> to the algorithm itself.
>
> The current code isn't careful at all about magic numbers.  Which ones are magic hex constants for IEEE stuff, and which ones are hex numbers for the exact representation of important constants?

None for the hex FP constants.  Hex FP constants are for the programmer's
convenience, but I are just style bugs where I use put them in e_expf.c.
Decimal FP constants work with any C compiler.  They are only pseudocode
in comments in fdlibm, and no one every used their values from there AFAIK,
since C compilers were common when fdlibm was released.  The style bug is
to mix C99 hex FP constants in old fdlibm (Cygnus) code.  Though only in
comments, it looks strange since the active code still laboriously spells
out (float) casts to get float constants.  Even C90 has float constants
via the F suffix.  But I recently decided that many of the FP constants
are style bugs.  When you just want to add 2 as in e_log*.c you should
spell 2 as 2 and not as 2.0 for the double case, 2.0F or (float)2.0 for
the float case, and maybe 2.0L or (long double)2.0 for the long double
case.  C's promotion rules work correctly, so the 2 in (2+f) is promoted
to the same type as f in all cases.  By depending in this, the code becomes
independent of the precision.  A more interesting case is multiplying by
0.5 or (float)0.5 or 0.5F or ...  This can be written as division byi
integer 2, since the compiler is permitted to turn this into multiplication
by 0.5 with the correct type.  gcc has done this for many uears, so there
is no need to manually optimize this.

> Anyway, it is clear you guys don't want them in there so I've reverted them.  I'm totally baffled by this request because these algorithms are similar not identical, but since you guys are the active maintainers, I'll accede to your wishes after expressing my utter bafflement at the justifications.

Well, we (I) wrote the changes in the polynomial evaluation for both
e_expf.c and e_logf.c, and was careful to a fault not to change the
algorithm, so I know that it is more identical in these files than in
others.

Bruce