From owner-freebsd-standards@FreeBSD.ORG Mon Oct 1 09:56:38 2007 Return-Path: Delivered-To: freebsd-standards@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9020D16A469 for ; Mon, 1 Oct 2007 09:56:38 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail15.syd.optusnet.com.au (mail15.syd.optusnet.com.au [211.29.132.196]) by mx1.freebsd.org (Postfix) with ESMTP id 0AD2B13C45A for ; Mon, 1 Oct 2007 09:56:37 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from besplex.bde.org (c220-239-235-248.carlnfd3.nsw.optusnet.com.au [220.239.235.248]) by mail15.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id l919uS9W023309 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Mon, 1 Oct 2007 19:56:34 +1000 Date: Mon, 1 Oct 2007 19:56:28 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Steve Kargl In-Reply-To: <20070928152227.GA39233@troutmask.apl.washington.edu> Message-ID: <20071001173736.U1985@besplex.bde.org> References: <20070928152227.GA39233@troutmask.apl.washington.edu> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-standards@freebsd.org Subject: Re: long double broken on i386? X-BeenThere: freebsd-standards@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Standards compliance List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 01 Oct 2007 09:56:38 -0000 On Fri, 28 Sep 2007, Steve Kargl wrote: > So, in my never ending crusade to implement missing > C99 math functions, I decided to tackle sinl, cosl, > and tanl. I have implementations that use argument > reduction from k_rem_pio2.c and Chebyshev interpolation > for the trig functions over the range [0,pi/4]. > > On amd64, these routines give <= 1/2 ULP for the various > values I've tested in the reduced range. Now, if I test I doubt this :-). Getting <= 1 ULP is hard; getting <= 1/2 + 0.001 ULPs is harder, and getting <= 1/2 ULPs without using an enormous amount of extra precision is a research problem. > these on i386, I see between 400 and 1200 ULP. I spent > a week or so trying to understand the descrepancy and "fix" > the problem using a double-double precision arithemetic. > > I finally found the problem! /usr/include/float.h on i386 > is lying about its numerical representation of long double. This problem is well known. I must have referred to it many times in old mail to you. FreeBSD uses my old workaround for compiler bugs in double precision, of using a rounding precision of double for everything. Unfortunately, this is still necessary because the compiler bugs are still there. This mostly breaks long double precision, and it doesn't fix float precision, but double precision is more important. libm, in particular k_rem_pio2f.c (which is no longer used) needs tricky fixes to work around the bugs for float precision. The fixes have mostly not been merged to the double precision functions -- we just depend on double precision mostly working. > In particular, at least these 3 values appear to be wrong > > #define LDBL_MANT_DIG 64 > #define LDBL_EPSILON 1.0842021724855044340E-19L > #define LDBL_DIG 18 These were broken in rev.1.9 in 2002. The "slight rounding issues with long doubles" referred to in the log message are that long doubles are rounded to double precision when they are operated on, unless you change the rounding precision to long double using fpsetprec() or fesetprec(). Even if you change the rounding precision, long doubles can only appear to work, but not actually work, unless you fix the compiler bugs or work around them in libraries and in your application. It isn't clear if the "appear to work" in the log messages is with or without the change to the rounding precision. As mentioned in the log message, printf() in 2002 didn't support printing long doubles at all, so it was hard to see loss of precision in printf output. Now printf supports long doubles and %a, so it is easy to see lost precision using it. E.g.: %%% #include #include int main(void) { printf(" sin(1) = %La\n", (long double)(double)sin(1)); printf("2 * sin(1) = %La\n", (long double)(double)(2 * sin(1))); return (0); } %%% Output on i386: %%% sin(1) = 0xd.76aa47848677021p-4 2 * sin(1) = 0xd.76aa47848677p-3 %%% This exhibits a library bug, several compiler bugs, and the lost precision from my rounding precision hack: o library bug: sin() returns extra precision, despite its prototype saying that it returns double. Well, I used to think that this is a bug. I only leared a few months ago that it is a feature, since it matches the corresponding feature for a function implemented in C. On i386, sin() is implemented in asm. It just uses the hardware to evaluate sin() in long double precision, and returns the long double result of that. The corresponding hardware feature is that in a C function that does similar calculations is permitted to return extra precision: double x, y; long double foo() { return x+y; } This is permitted to do the following: - evaluate x+y in extra precision - return the extra precision, since unlike casts and assignments, the return operation is _not_ required to discard any extra precision. For gcc on i386, this is what it would do without my rounding precision hack, except possibly when compiled with -O0 and/or if the result of x+y is spilled to memory (-O0 makes spilling quite likely). The return statement must be written as "return (double)(x+y);" to ensure removing any extra precision _before_ returning. Now I wonder if the implicit casts for function args with prototyped functions are permitted to be as magic as the non-casts for return statments. Keeping extra precision in intermediate calculations is good, and you don't really want to lose it unconditionally at function call boundaries, and it's confusing to lose it on calls but not on returns. I'm using sin(1) mainly to generate a long double with nonzero in its 11 least significant mantissa bits. Note that initializing a long double to apparently-the-same value at compile time would not work: long double x = 0xd.76aa47848677021p-4L; /* loses extra precision */ This is because gcc on FreeBSD supports my precision hack to a fault, and has bugs in the fault. gcc now tries to do compile-time arithmetic in the same precision as the runtime arithmetic, i.e., in only double precision on FreeBSD. This is wrong if there is no arithmetic involved, as in the above initializer. The runtime rounding precision only applies to operations -- it doesn't prevent loading, storing or non-arithmetic (function) operations on long doubles with full precision. The above initializer corresponds to a store of a value that has somehow been calculated in full precision. Note that this bug makes it more difficult to implement functions in long double precision -- you cannot use tables of long double constants. OTOH, loads of long double constants are so slow on i386 that such constants should almost never be used. Instead represent long doubles as the sum of 2 doubles if possible. Loading 2 doubles and adding them is faster than loading 1 long double, at least if the 2 extra operations can be pipelined effectively. o compiler bug (?). Casts are required to remove any extra precision. I cast sin(1) to double to do this. gcc elides this cast since it thinks that sin(1) has no extra precision since it has type double, and it knows that sin(1) is in a long double register, so it thinks that nothing needs to be done for the combined cast (long double)(double). However, having type double doesn't imply no extra precision in general, and after a return statement is one class of cases where it never does. A compiler can only tell if there is extra precision by looking inside the function near its return statement, or perhaps using an attribute. o compiler bug. The casts in "(long double)(double)(2 * sin(1)))" are elided similarly. This is not visible in the printf output, since my rounding precision hack actually makes a difference here: ... o ... lost precision. The hack results in the multiplication not producing any extra precision, so the printf output is correct despite the compiler bug. Variations on the above: - with 1*sin(1) instead of 2*sin(1), optimizing away the multiplication is valid and done with -O, so the extra precision is retained. - with sin(1) == sin(1) instead of 2*sin(1), and %d for the format specifier, the precision bugs combined with spilling bugs result in the value 0. Similarly, sin(x) != sin(x) for almost all x. This behaviour is probably permitted, but it is surprising, and it isn't intentional. C99 apparently permits sin(x) to be evaluated _and_ returned in extra precision, and I don't know of any requirement that the return value always has the same amount of extra precision, and there are no casts in sight to clip the precision. Different amounts always result in practice, accidentally as follows: gcc doesn't really understand exztra precision, especially for spilling intermediate values. It just spills one of the sin(1)'s and thus loses the extra precision for only one of them. Loss of precision from spilling is normal near function calls because of the ABI, but uncommon (but much harder to predict) otherwise. > #include > > int > main (void) > { > int i; > long double a; > a = 1.L; > for (i = 1; i < 64; i++) > { > a /= 2; > if (1.L - a == 1.L) break; > } > printf("%d %Le\n", i, a); > return 0; > } > > amd64:sgk[206] ./z > 64 1.084202e-19 > > i386:kargl[205] ./z > 54 5.551115e-17 > > The above results for i386 are consistent with DBL_MANT_DIG > and DBL_EPSILON. Is this intentional, and should float.h be > fixed? LBDL_MANT_DIG, LDBL_EPSILON and LDBL_EPSILON were broken in 2002. Note that the compiler bugs make use of the epsilons fragile. The delicate casts or assignments needed to give the lower-precision epsilons a chance of working don't actually work unless you use hacks like -ffloat store and volatile variables. LDBL_MIN_* and LDBL_MAX_* were fixed in 2002. My precision hack only affects precision (only of +-*/ operations). Long doubles still have extra exponent range, so LDBL_MAX != DBL_MAX, etc. This was wrong before 2002. Bruce