Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 22 Nov 2015 13:01:57 +1100 (EST)
From:      Bruce Evans <brde@optusnet.com.au>
Cc:        freebsd-bugs@freebsd.org
Subject:   Re: [Bug 204671] clang floating point wrong around Inf (i386)
Message-ID:  <20151122112921.P1083@besplex.bde.org>
In-Reply-To: <bug-204671-8-kMVwFWIOER@https.bugs.freebsd.org/bugzilla/>
References:  <bug-204671-8@https.bugs.freebsd.org/bugzilla/> <bug-204671-8-kMVwFWIOER@https.bugs.freebsd.org/bugzilla/>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, 21 Nov 2015 a bug that supreesses replies in mail wrote:

> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=204671
>
> Jilles Tjoelker <jilles@FreeBSD.org> changed:
>
>           What    |Removed                     |Added
> ----------------------------------------------------------------------------
>                 CC|                            |jilles@FreeBSD.org
>
> --- Comment #2 from Jilles Tjoelker <jilles@FreeBSD.org> ---
> This is related to the strangeness that is the x87 FPU. Internally, the x87
> performs calculations in extended precision. Even if the precision control is
> set to double precision, like FreeBSD and Windows do by default but Linux and
> Solaris do not, the x87 registers still have greater range than double
> precision.

Which versions of Windows do it?  I only have Windows/DOS compilers
from 1995 or earlier, and they do it.  I think Visual Studio (?) does
it for compatibility.  Does Windows actually require this as an ABI?
then it should also disallow clang's bug of using SSE on 32-bit systems.

> As a result, the addition 1e308 + 1e308 does not overflow, but produces a
> result of approximately 2e308 in an x87 register. When this result is stored to
> memory in double precision format, overflow or rounding will occur.

For C (C90 and later) compilers, also when this result is assigned or cast
to variable of type double.  This sometimes loses precision and is always
slow (typically 2-4 times slower) and is rarely needed, so it is broken
by default in gcc and clang on i386 with x87.  Recent versions of gcc can
be turned into C compilers in this respect using -fexcess-precision=standard.
Standards directives like -std=c99 but not -std=gnu99 also give this
perfectly correct slowness for unsuspecting users that don't want the
slowness but want a C compiler in other respects.  clang now knows that
-fexcess-precision exists, but doesn't support it.  It also doesn't support
this implicitly for -std=c99.

For C11 compilers, also when this result is returned.  This gives further
destruction of precision and slownes and is broken by default.  IIRC,
-std=c99 gives this bug even for C99 mode in gcc.  clang doesn't support
this even with-std=c11.


> What happens in t1.c is that the conversion from extended to double precision
> happens two times. The conversion for printing the bytes happens directly after
> the calculation and therefore uses the modified rounding mode. The conversion
> for printf happens during the inlined fesetround() call, after setting the x87
> rounding mode and before calling a function __test_sse to check whether SSE is
> available. (After that, the value is stored and loaded again a few times.)
> Therefore, the conversion for printf uses an incorrect rounding mode.

Both conversions are done after the fesetround() call in program order.
This is asking for trouble.  But since there is an assignment before the
call, there is no problem if the compiler is a C compiler.  clang is far
from being a C compiler and does unnatural ordering that gives trouble:

program order:              runtime order:
add                         add
assign                      assign (to memory var) for printing in hex
restore rounding mode       restore rounding mode
print as double             assign (to memory var) for printing as double
print as hex                print as double
                             print as hex

> Global variables force the compiler to store values to memory more often and
> may therefore reduce x87 weirdnesses.

-ffloat-store is often recommended for causing the slow store.  Before
-fexcess-precision, there was no similar hack for for fixing casts.

But it is an easier and more controllable hack to use a volatile variable.
See STRICT_ASSIGN() in FreeBSD libm.  Even minimised use of this gives
slowness and loses precision.  So in some functions I have started using
double_t to avoid the slowness (especially if the compiler is a C compiler)
and keep the extra precision intentionally.  Some hacks are needed to
avoid destroying the extra precision on return.  (Since the extra precision
is intentionaly, it doesn't take the C11 bug to require destroying it on
return.)

The expression huge*huge is used often in FreeBSD libm to raise the overflow
flag and return +Inf.  It doesn't actually work for that.  Some broken
compilers invalididly optimize it and similar expressions for raising
underflow to just returning a value; the value is then correct but the
flags are not.  But the code is buggy.  With extra precision, it asks
for and should get a value larger than DBL_MAX and no exception.  The
C11 bug breaks this.  This gives a wrong value and for use in
expressions, but the use is often to store to a value of type double;
then if the compiler is a C compiler or due to some accident like
storing to memory, the value is sometimes converted to double.

A special case test program for comparing functions does rounding
mode flipping almost exactly the same as t1.c and differs only in
care taken with assignments:

X 		fpsetprec(RPREF);
X 		STRICT_ASSIGN(flref_t, vref, FUNCREF(x));
X 		fpsetprec(RPTEST);
X 		STRICT_ASSIGN(fl_t, v, FUNCTEST(x));
X 		fpsetprec(RPDEF);

Here flref_t might be long double and fl_t double.  FUNCREF might be
expl and FUNCTEST exp.  Oops, this actually modifies the rounding
precision.  The rounding mode is the same for the reference function
and the test function.  It is still important to get the order right.

Old versions of this use explicit volatile variables.  This version
uses STRICT_ASSIGN which uses volatile for double but not for long
double.  The volatile variables accidentally ensure the ordering of
the fpset* calls.  I'm not sure of fp* and fenv* calls have sufficient
ordering.  Function calls are supposed to give sequence points, but
compilers can see too far into inline ones.

> Following the C standard, you would have to use  #pragma STDC FENV_ACCESS on
> to make this work reliably.

This shouldn't be needed in practice.  Anyway, it is not required to affect
the compiler bug of not reducing to double precision in assignments and
casts.

> However, neither gcc nor clang support this pragma.
> They follow an ad hoc approach to floating point exceptions and modes. In gcc
> you can use -frounding-math to prevent some problematic optimizations but clang
> doesn't even support that. Clang has a bug about the pragma,
> https://llvm.org/bugs/show_bug.cgi?id=8100, which has been open for five years
> with various duplicates but no other significant action.

gcc does support this, even in 10+ year old versions (4.2.1), to the extent
of having documentation about it: from gcc.info:

X    * `The default state for the `FENV_ACCESS' pragma (C99 7.6.1).'
X 
X      This pragma is not implemented, but the default is to "off" unless
X      `-frounding-math' is used in which case it is "on".

This gives enough control for a simple test program, and I think the option
that keeps the flag always on gives strict standards conformace for the
pragma.

> You will generally have fewer problems with weirdly changing floating point
> results if you use SSE instead of the x87 FPU, assuming your CPUs are new
> enough. SSE performs calculations in the precision specified by the program
> (single or double), so it does not matter when or if a value is spilled to
> memory. As noted above, GCC and clang are still ignorant about the side effects
> with the floating point exceptions and modes, though.

Spilling of intermediate x87 values is one thing that works right in all or
most versions clang but not in old versions of gcc.

The test program seems to be looking for bugs, not workarounds.  Its
description says to not use high -march since this makes the bug go away.
High -march exposes the bug that clang starts using SSE on i386.  FreeBSD
doesn't support this.  The non-support includes:
- setjmp()/longjmp() don't support SSE
- double_t is still long double.  This seems to give only pessimizations.

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20151122112921.P1083>