From owner-svn-src-head@FreeBSD.ORG  Sat Oct 20 05:43:20 2012
Return-Path: <owner-svn-src-head@FreeBSD.ORG>
Delivered-To: svn-src-head@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
 by hub.freebsd.org (Postfix) with ESMTP id 7C14E58B;
 Sat, 20 Oct 2012 05:43:20 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from mail13.syd.optusnet.com.au (mail13.syd.optusnet.com.au
 [211.29.132.194])
 by mx1.freebsd.org (Postfix) with ESMTP id 24F498FC08;
 Sat, 20 Oct 2012 05:43:18 +0000 (UTC)
Received: from c122-106-175-26.carlnfd1.nsw.optusnet.com.au
 (c122-106-175-26.carlnfd1.nsw.optusnet.com.au [122.106.175.26])
 by mail13.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id q9K5h6C9027918
 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
 Sat, 20 Oct 2012 16:43:09 +1100
Date: Sat, 20 Oct 2012 16:43:06 +1100 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Warner Losh <imp@freebsd.org>
Subject: Re: svn commit: r241755 - head/lib/msun/src
In-Reply-To: <201210192246.q9JMkm4R092929@svn.freebsd.org>
Message-ID: <20121020150917.I1095@besplex.bde.org>
References: <201210192246.q9JMkm4R092929@svn.freebsd.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Optus-Cloudmark-Score: 0
X-Optus-Cloudmark-Analysis: v=2.0 cv=eLtLFwV1 c=1 sm=1 a=xI64QrV9ptIA:10
 a=kj9zAlcOel0A:10 a=PO7r1zJSAAAA:8 a=JzwRw_2MAAAA:8 a=Aet6fyW9sl8A:10
 a=cvw0RnjcknDQsnrTvV4A:9 a=CjuIK1q_8ugA:10 a=V4IsmBEFW7KwAuKY:21
 a=eyMMwmjJ-5K38dzR:21 a=bxQHXO5Py4tHmhUgaywp5w==:117
Cc: svn-src-head@freebsd.org, svn-src-all@freebsd.org,
 src-committers@freebsd.org
X-BeenThere: svn-src-head@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: SVN commit messages for the src tree for head/-current
 <svn-src-head.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/svn-src-head>,
 <mailto:svn-src-head-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/svn-src-head>
List-Post: <mailto:svn-src-head@freebsd.org>
List-Help: <mailto:svn-src-head-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/svn-src-head>,
 <mailto:svn-src-head-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 20 Oct 2012 05:43:20 -0000

On Fri, 19 Oct 2012, Warner Losh wrote:

> Log:
>  Document the methods used to compute logf. Taken and edited from the
>  double version, with adaptations for the differences between it and
>  the float version.

Please back this out.

This was intentionally left out.  It is painful to maintain large
comments that should be identical for all precisions.  Double precision
is primary, and only comments that depend on the precision should be
given in other precisions, except for short comments to to right of
coulde whose non-duplication would just give larger diffs.  If you
want to duplicate them, then they would often have to be quadruplicated
in 4 files for the 4 supported precisions:
- double precision
- float precision
- long double with MANT_DIG = 64 and 16 other bits
- long double with MANT_DIG = 113 and 15 other bits
Only 4 now, but there could easily be variations for long doubles.

log* will be replaced by my version soon.  One of the organizational
things that I'm currently struggling with is how not to duplicate the
comments in them.  Currently I duplicate then in 4 files, and don't
quite duplicate then in a 5th file for optimized float precision.
Maintaining them has been very painful.  They are about 3 times as large
as the comments here, and have more and more-delicate precision-dependent
details, so trimming them is harder than here.  They do cover all of
log*(), log1p*(), log2*() and log10*() in the same file, so there is
not as much duplication as for the different e_log*.c files.

Another organizational problem is how to organize the functions.  I
currently have 4 primary interfaces per file (should have a couple
more for kernels), and 1 file for each precision, and currently use
inlines and ifdefs to avoid these inlines, but the inlines are
not properly optimized for all cases of interest, and break debugging
in most cases, so I plan to switch to more macro hackery (like multiple
#include __FILE__'s with ifdefs to select what is compiled for each
#include) .  Even more macro hackery would let me put support for all
precisions in the same files, so duplicated comments would be even less
useful, but the precision-dependent ones would be even harder to write
and read.

We are developing inverse trig functions, and handle the comments and
other things by generating code for all precisions from the double
precision case.  Large comments are stripped as part of the generation.
This is easier to write but not so good to read.  This is only feasible
because the code is not very precision-dependent.

> Modified: head/lib/msun/src/e_logf.c
> ==============================================================================
> --- head/lib/msun/src/e_logf.c	Fri Oct 19 22:21:01 2012	(r241754)
> +++ head/lib/msun/src/e_logf.c	Fri Oct 19 22:46:48 2012	(r241755)
> @@ -19,6 +19,57 @@ __FBSDID("$FreeBSD$");
> #include "math.h"
> #include "math_private.h"
>
> +/* __ieee754_log(x)

Here the comment doesn't even match the code.  This is __ieee754_logf(x),
not __ieee754_log().  Technically, this is a comment that depends on the
precision, so it should be given separately, but the differences for type
suffixes are especially painful to maintain in comments and especially
useless to document in comments.

> + * Return the logrithm of x

This duplicates e_log.c to a fault.  Fixing this spelling error would
require touching multiple files.

> + *
> + * Method :
> + *   1. Argument Reduction: find k and f such that
> + *                      x = 2^k * (1+f),
> + *         where  sqrt(2)/2 < 1+f < sqrt(2) .
> + *
> + *   2. Approximation of log(1+f).
> + *      Let s = f/(2+f) ; based on log(1+f) = log(1+s) - log(1-s)
> + *               = 2s + 2/3 s**3 + 2/5 s**5 + .....,
> + *               = 2s + s*R

Nothing new here.  I don't want to see it again.

> + *      We use a special Reme algorithm on [0,0.1716] to generate
> + *      a polynomial of degree 8 to approximate R The maximum error
> + *      of this polynomial approximation is bounded by 2**-34.24. In
> + *      other words,

Lots more spelling and grammar errors.  I'm not sure if the correct spelling
is Remez or Remes, but I'm sure it isn't Reme.  One sentence break is
missing a "." and the others are consistently too short.

The 2**-34.24 number is precision-dependent and is documented in more
completely below using my automatically generated comment that gives a
consistent format.  (I haven't merged this improvement back into e_log.c.)
Now there is duplication even within the same file :-(.

> + *                      2      4      6      8
> + *          R(z) ~ Lg1*s +Lg2*s +Lg3*s +Lg4*s

The number of coeffs is indeed precision-dependent, but uninteresting.
My constent format for poly coeffs doesn't mention them in comments.
It is enough to spell the constants for them in a consistent way
(consistent across all sources, not just log*).

> + *      (the values of Lg1 to Lg7 are listed in the program)

Here the comment doesn't even match the code.  There is no Lg5 to
Lg7 for float precision.

> + *      and
> + *          |      2          8           |     -34.24
> + *          | Lg1*s +...+Lg4*s    -  R(z) | <= 2
> + *          |                             |

Another excessively verbose comment duplicated.  It has the magic number
-34.24 again, so this number is now triplicated.

> + *      Note that 2s = f - s*f = f - hfsq + s*hfsq, where hfsq = f*f/2.
> + *      In order to guarantee error in log below 1ulp, we compute log
> + *      by
> + *              log(1+f) = f - s*(f - R)        (if f is not too large)
> + *              log(1+f) = f - (hfsq - s*(hfsq+R)).     (better accuracy)
> + *
> + *      3. Finally,  log(x) = k*ln2 + log(1+f).
> + *                          = k*ln2_hi+(f-(hfsq-(s*(hfsq+R)+k*ln2_lo)))
> + *         Here ln2 is split into two floating point number:
> + *                      ln2_hi + ln2_lo,

Nothing new here.  I don't want to see it again.

> + *         where n*ln2_hi is always exact for |n| < 2000.

Now just wrong.  2000 is for double precision.  2048 would be better there.
It is what we arrange for, to go slightly above 1024.)  Here we need to
go slightly above 128, and arrange for < 256.  This is one of the points
documented better in my log*.  Its documentation belongs closer to the
declaration of ln2_hi where you can see the bits being left out of it.
But my comments for this grew too large.  They describe complications like
using the same splitting for double precision as for long double precision
(go slightly above 16384 instead of slightly abive 1024, at an insignificant
cost of precision), and sharing ln2_hi/lo with a table value where even
more bits sometimes have to be left out of ln2_hi, depending on complicated
options and the precision.  They described the details of why the number
of bits chosen is needed, and gave the acceptable ranges but have been
reduced a bit closer to saying something like "ln2_hi is rounded to a
number of bits that works [to satisfy constraints explained elsewhere".

> + *
> + * Special cases:
> + *      log(x) is NaN with signal if x < 0 (including -INF) ;
> + *      log(+INF) is +INF; log(0) is -INF with signal;
> + *      log(NaN) is that NaN with no signal.

Nothing new here.  I don't want to see it again.

> + *
> + * Accuracy:
> + *      according to an error analysis, the error is always less than
> + *      1 ulp (unit in the last place).

Actually, expf() has been exhaustively tested, so its error is known
much more precisely than the error analysis can give.  The details are
too MD and unimportant to give here.

e_log.c is not the place to describe what an ulp is.  Garbage should
be removed before duplicating things.

> + *
> + * Constants:
> + * The hexadecimal values are the intended ones for the following
> + * constants. The decimal values may be used, provided that the
> + * compiler will convert from decimal to binary accurately enough
> + * to produce the hexadecimal values shown.
> + */

More non-useful boilerplate.  This comment is FUD about pre-C90 compilers,
since compilers couldn't be trusted to round correctly long ago.  After
C99, we could use hex constants to reduce rounding problems, but I don't
want to churn the sources for this, and prefer decimal constants in most
cases, with hex values in comments (mainly for debugging -- hex constants
tend to be more usefull iff you are looking at hex values in a register).

> +
> static const float
> ln2_hi =   6.9313812256e-01,	/* 0x3f317180 */
> ln2_lo =   9.0580006145e-06,	/* 0x3717f7d1 */
>

This still uses fdlibm's old format in which the hex values are given as
pseudo-literals in hex in comments.  The couldn't use C99 hex literals
because C99 didn't exist.  They are more readable in this form, since
C99 hex literals handle exponents poorly, so it is hard to see what they
are in bits and you just can't express the special cases where hex is
most needed (Infs and NaNs) as C99 hex literals.  But when I optimized
and otherwise improved the LgN* contants, I had not settled on a format
and didn't even translate to the above format like I did in some other
places, so my LgN* declarations are of the form '<C99 literal in hex>;
/* <decimal constant> */'.

I was also sloppy with the -34.24 constant when I did that.  It seems
to be rounded in the wrong direction.  I now get -34.22 (rounded to
nearest) and round it in a safe direction to -34.2.  Note that this
constant doesn't quite allow the accuracy in the poly to be read off
as 34.2 bits.  It is scaled relative to s, but the accuracy in bits
should be scaled relative to |log(1+s)-log(1-s)| rounded down to a
power of 2.  This is ~2*s.  Normally for such estimates, scaling by
s is off by a factor of at most 2 so we can read off an accuracy of
33.2 bits, but here the factor of 2 in 2*s means that we can read off
an accuracy of 33.2 +- 1 bits.  Anyway, the -34.24 has lots of spare
bits, so its minor inaccuracies don't matter and can be be found by
exhaustive checking.  But it is strictly wrong when its value is given
in constants.  e_log.c gives the related bound of 2**-58.45.  This is
scaled by s, so it takes minor interpretation too.  It seems to be
correctly rounded.

Bruce