From owner-freebsd-stable@FreeBSD.ORG Sat Nov 17 10:01:41 2007 Return-Path: Delivered-To: stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E1A3B16A417 for ; Sat, 17 Nov 2007 10:01:41 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from fallbackmx01.syd.optusnet.com.au (fallbackmx01.syd.optusnet.com.au [211.29.132.93]) by mx1.freebsd.org (Postfix) with ESMTP id 1FF8713C474 for ; Sat, 17 Nov 2007 10:01:33 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail34.syd.optusnet.com.au (mail34.syd.optusnet.com.au [211.29.133.218]) by fallbackmx01.syd.optusnet.com.au (8.12.11.20060308/8.12.11) with ESMTP id lAGHrT3H027319 for ; Sat, 17 Nov 2007 04:53:36 +1100 Received: from c211-30-219-213.carlnfd3.nsw.optusnet.com.au (c211-30-219-213.carlnfd3.nsw.optusnet.com.au [211.30.219.213]) by mail34.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id lAGHrMii020019 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sat, 17 Nov 2007 04:53:24 +1100 Date: Sat, 17 Nov 2007 04:53:22 +1100 (EST) From: Bruce Evans X-X-Sender: bde@delplex.bde.org To: Peter Jeremy In-Reply-To: <20071115184615.GL20992@server.vk2pj.dyndns.org> Message-ID: <20071117025943.P64582@delplex.bde.org> References: <20071115062002.GK89746@server.vk2pj.dyndns.org> <20071115184615.GL20992@server.vk2pj.dyndns.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: stable@freebsd.org, bde@freebsd.org, Pete French Subject: Re: Float problen running i386 inary on amd64 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 17 Nov 2007 10:01:42 -0000 On Fri, 16 Nov 2007, Peter Jeremy wrote: > I've Cc'd bde@ because this relates to the FPU initialisation - which > he is the expert on. > > On Thu, Nov 15, 2007 at 12:54:29PM +0000, Pete French wrote: >>> On Fri, Nov 02, 2007 at 10:04:48PM +0000, Pete French wrote: >>>> int >>>> main(int argc, char *argv[]) >>>> { >>>> if(atof("3.2") == atof("3.200")) >>>> puts("They are equal"); >>>> else >>>> puts("They are NOT equal!"); >>>> return 0; >>>> } >>> >>> Since the program as defined above does not include any prototype for >>> atof(), its return value is assumed to be int. The i386 code for the >>> comparison is therefore: >> >> Sorry, I didn't bother sticking the include lines in when I sent it >> to the mailing list as I assumed it would be ovious that you need >> to include the prototypes! > > OK, sorry for the confusion. > >> Interestingly, if you recode like this: >> >> double x = atof("3.2"); >> double y = atof("3.200"); >> if(x == y) >> puts("They are equal"); >> else >> puts("They are NOT equal!"); >> >> Then the problem goes away! Glancing at the assembly code they both appear to >> be doing the same thing as regards the comparison. Glance more closely. Behaviour like this should be expected on i386 but not on amd64. It gives the well-known property of the sin() function, that sin(x) != sin(x) for almost all x (!). It happens because expressions _may_ be evaluated in extra precision (this is perfectly standard), so identical expressions may sometimes be evaluated in different precisions even, or especially, if they are on the same line. atof(s) and sin(x) are expressions, so they may or may not be evaluated in extra precision. Certainly they may be evaluated in extra precision internally. Then when they return a result, C99 doesn't require discarding any extra precision. (It only requires a conversion if the type of the expression being returned is different from the return type. Then it requires a conversion as if by assignment, and such conversions _are_ required to discard any extra precision. This gives the bizarre behaviour that, if a functon returning double uses long double internally until the return statement so as to get extra precision, then it can only return double precision, since the return statement discards the extra precision, while if it uses double precision internally then it may return extra precision and the extra bits may even be correct.) The actual behaviour depends on implementation details and bugs. Programmers are supposed to be get almost deterministic behaviour (with no _may_'s) by using casts or assignments to discard any extra precision. E.g., in functions that are declared as double, to actually return only double precision, use "return ((double)(x + y))" instead of "return (x + y)", or assign the result to a double (maybe "x += y; return (x);"). However, this is completely broken for gcc on i386's. For gcc on i386's, casts and assignments _may_ actually work as required by C99. The -ffloat-store hack is often recommended for fixing problems in this area, but it only works for assignments; casts remain broken, and the results of expressions remain unpredictable and dependent on the optimization level because intermediate values _may_ retain extra precision depending on whether they are spilled to memory and perhaps on other things (spilling certainly removes extra precision). This has been intentionally broken for about 20 years now. It is hard to fix without pessimizing almost everything in much the same way as -ffloat-store. The pessimization is larger than it was 20 years ago since memory is relatively slower (though the stores now normally go to L1 caches which are very fast, they add a relatvely large amount to pipeline latency) and register allocation is better. It is hard to write code that avoids the pessimization, since only code that uses very long expressions with no assignments to even register variables can avoid the stores. (Store+load to discard the extra precision is another implementation detail. It is the fastest way, even if a value with extra precision is in a register.) To work around the gcc bugs, something like *(volatile double *)&x must be used to reduce "double x;" to actually be a double. The actual behaviour is fairly easy to describe for (f(x) == f(x)): amd64: if f() returns float, then the value is returned in the low quarter of an XMM register, so extra precision is automatically discarded and the results are equal except in exceptional cases (if f(x) is a NaN or varies due to internals in the function). Assignment of the result(s) to variables of any type work correctly and don't change the values since float is the lowest precision. if f() returns double, similarly except the value is returned in the low half of an XMM register, and assignment of the result(s) to variable(s) of type float would work correctly and reduce to float precision. if f() returns long double, then the value is returned on the top of the FPU stack. Since the result has maximal precision, neither return statment may discard precision, and since the declared type is the actual type, the compiler cannot discard any extra precision accidentally. Assignment of the result(s) to variable(s) of type float or double would work correctly and reduce to the precision of the variable(s). i386: The value is always returned on the top of the FPU stack. It _may_ have extra precision if the return type is float or double. If f() returns float, then the value is very likely to have extra precision from something like "return (x + y)" at the end of f(). Then since the ABI requires spilling the result of the first f(x) before calling f() again, and since gcc doesn't know that f(x) has extra precision, the first f(x) is spilled to a termporary variable of type float and thus loses all of its extra precision. Then the second f(x) is very likely to return extra precision. With optimization, the second f(x) is never spilled -- it is just compared with the spilled first f(x) so it is very likely to compare unequal. Without optimization, or maybe with -ffloat-store, the second f(x) _may_ be stored to memory too, so the results may compare equal. Explicit assignments to variables of type float have no effect on any of this due to the compiler bugs, unless the variables are volatile. If f() returns double, then under FreeBSD extra precision is normally discarded before returning, since FreeBSD still uses double precision for evaluating expressions so as to avoid bugs in this area. ("return (x + y)" returns double precision, possibly with extra exponent range, since "x + y" is rounded to double precision.) However, functions like sqrt() and sin() return long double precision due to the implementation detail that they are evaluated in hardware in extra precision and the bugfeature that this extra precision is not discarded.) Then the behaviour is similar to the float case -- due to the compiler bugs, the extra precision cannot be discarded by assigning the results to variables of type double unless the variables are volatile, etc. I don't know how atof() could return extra precision. If f() returns long double, then the behaviour is the same as on amd64, except long double is almost completely useless, since most expressions are evaluated in double precision. > The underlying problem is that the amd64 FPU is initialised to 64-bit > precision mode, whilst the i386 FPU is initialised to 53-bit precision > mode (__INITIAL_FPUCW__ in amd64/include/fpu.h vs __INITIAL_NPXCW__ in > i386/include/npx.h). This is more of a feature than a problem, especially on amd64 where 64-bit precision works and doesn't interfere with 53-bit precision. (However, 64-bit precision is very slow, especially for vectors since it cannot use SSE, and the amd64 ABI defeats the main point of having extra precision, which is to automatically evaluate most expressions in extra precision so as to protect naive programmers from most precision bugs and make the precision bugs very hard to understand and/or work around when they occur. This ABI is actually more of a problem for floats than for doubles -- for doubles, the behaviour is almost the same as with FreeBSD-i386's reduced-precision hack, and problems are rare because double precision is enough for most things, but float precision isn't enough for most things.) > It looks like the FPU is initialised during the > machine-dependent CPU initialisation and then inherited by subsequent > processes as they are fork()d. The fix is probably to explicitly > initialise the FPU for legacy mode processes on the amd64. > > A work-around would be to call fpsetprec(FP_PD) (see ) > at the start of main(). This would avoid most non-bugs in this area in the same way as FreeBSD-i386's reduced precision hack does accidentally (the hack is only intended to avoid the compiler bugs, at least now). I still don't know how atof() could return extra precision on amd64. Long-term changes to the FP environment are dangerous for various reasons: - setjmp()/longjmp() still don't support SSE (mxcsr is missing in jmp_buf; IIRC, this affects fpsetround() but not fpsetprec()). - library functions have not been tested much in non-default environments. Reducing the precision on amd64 should work unsurprisingly. Increasing the precision on i386 probably breaks at least trig arg reduction (for double and maybe even for float precision) due to the compiler bugs. Breaking atof("3.2") might be expected too, since 0.2 is not exactly representable and, although atof (gdtoa) is careful about precision, it is careful code that is most broken by the compiler bugs. However, I would expect gdtoa to just work with the default precision on amd64. Bruce