From owner-freebsd-numerics@FreeBSD.ORG  Mon Sep 24 19:05:48 2012
Return-Path: <owner-freebsd-numerics@FreeBSD.ORG>
Delivered-To: numerics@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 6EAA4106566B
	for <numerics@freebsd.org>; Mon, 24 Sep 2012 19:05:48 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from mail06.syd.optusnet.com.au (mail06.syd.optusnet.com.au
	[211.29.132.187])
	by mx1.freebsd.org (Postfix) with ESMTP id 0E8138FC0C
	for <numerics@freebsd.org>; Mon, 24 Sep 2012 19:05:44 +0000 (UTC)
Received: from c122-106-157-84.carlnfd1.nsw.optusnet.com.au
	(c122-106-157-84.carlnfd1.nsw.optusnet.com.au [122.106.157.84])
	by mail06.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
	q8OJ5gc3024699
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO)
	for <numerics@freebsd.org>; Tue, 25 Sep 2012 05:05:43 +1000
Date: Tue, 25 Sep 2012 05:05:42 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: numerics@freebsd.org
Message-ID: <20120925035402.C1433@besplex.bde.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: 
Subject: LDBL_MAX broken on i386
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
	<freebsd-numerics.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
	<mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
	<mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 24 Sep 2012 19:05:48 -0000

I just noticed that LDBL_MAX is completely broken on i386.  gcc
supports i386's default of not-really-long-double precision to a
fault, by rounding all constant expressions including literal
constants in the same way that they would be rounded at runtime in the
FreeBSD default rounding precision of 53 bits.  The definition of
LDBL_MAX knows nothing of this.  It asks for a value that is
significantly larger than possible, so the value is rounded to
infinity:

#define LDBL_MAX	1.1897314953572317650E+4932L

It is hard to see what this is, but in hex it asks for the value that
would work with 64-bit precision: 0x0.ffffffffffffffffp16384L (it is
just this value rounded to 20 decimal digits.

   (This 20 or perhaps DECIMAL_DIG = 21 is another bug.  All the long
   double constants in x86 float.h are rounded to 20 decimal digits,
   but DECIMAL_DIG says that 21 should be used.  Of course, 21 is vary
   rarely needed, and the small set of constants in float.h is very
   unlikely to need it or even 20, but they should set a better example.
   DECIMAL_DIG is new in C99, and the number of digits in the constants
   wasn't increased from 20 to 21 when it was added.

   OTOH, sparc64 has off-by-1-or-2 errors in the opposite direction:
   - its DECIMAL_DIG is 36
   - it uses 37 decimal digits
   - its DECIMAL_DIG used to be off-by-1 in the opposite direction (was
     35)
   Using 37 digits might be the result of trying to use the correct number
   (36) all along, but printing the values using "%.36Le" format.  This
   gives 1 more digit than the number in the format.)

Rounding this to nearest fairly obviously rounds up, so it overflows.
The half-way case is 0x0.fffffffffffffc00p16384L.  This rounds up too,
to round to even.  The largest value that rounds down is thus
0x0.fffffffffffffbffp16384L, and the largest representable 53-bit
value is 0x0.fffffffffffff800p16384L.

clang is not compatible to a fault here.  This gives bugfeatures like
the following:
- the buggy definition of LDBL_MAX actually works.  It gives the 64-bit
   value 0x0.ffffffffffffffffp16384L.
- other constant literals work similarly
- compile-time evaluation is broken.  For example, let L be half of the
   64-bit LDBL_MAX (0x0.ffffffffffffffffp16383L).  Then:
   - L+L = Inf at runtime (with not-really-long-double precision -- just
     extra exponent range)
   - L+L = 64-bit LDBL_MAX at compile time
   But with gcc:
   - L is rounded to nearest, so it becomes 0x1p383L
   - L+L = Inf at runtime (with either non-really-long-double precision or
     full long double precision)
   - L+L = Inf at compile time

clang warns about overflow in too-large constant literals for LDBL_MAX.
This was useful for debugging.

I think the technically handling for gcc is to round constant literals
to full binary precision (64 bits for i386 long doubles), and only
evaluate FP constant expressions at compile time if the result doesn't
depend on the rounding mode and doesn't lose precision in the default
rounding precision (whatever that may be -- even in full precision,
the calcuation should be deferred to runtime if it would set inexact),
or at least have an option for this and/or warn if the calculation
loses precision.  Decimal constants cause a problem here -- 0.3 cannot
be represented without losing precision, but warning for that would
be unreasonable.  An option to calculate this at runtime as 3.0/10.0
might be useful, but wouldn't work for longer decimal constants like
1.1897314953572317650E+4932L (this must have more digits than can be
represented so that it gives the desired value after rounding).

Bruce