Date: Sun, 13 Mar 2016 21:03:57 +0100 From: Dimitry Andric <dim@FreeBSD.org> To: Steve Kargl <sgk@troutmask.apl.washington.edu> Cc: freebsd-toolchain@freebsd.org Subject: Re: clang gets numerical underflow wrong, please fix. Message-ID: <74970883-FE44-47C0-BDA0-92DB0723398A@FreeBSD.org> In-Reply-To: <20160313182521.GA25361@troutmask.apl.washington.edu> References: <20160313182521.GA25361@troutmask.apl.washington.edu>
next in thread | previous in thread | raw e-mail | index | archive | help
[-- Attachment #1 --]
On 13 Mar 2016, at 19:25, Steve Kargl <sgk@troutmask.apl.washington.edu> wrote:
>
> Consider this small piece of code:
>
> #include <fenv.h>
> #include <stdio.h>
>
> float
> foo()
> {
> static const volatile float tiny = 1.e-30f;
> return (tiny * tiny);
> }
>
> int
> main(void)
> {
> float x;
> feclearexcept(FE_ALL_EXCEPT);
> x = foo();
> if (fetestexcept(FE_UNDERFLOW)) printf("FE_UNDERFLOW: ");
> printf("x = %e\n", x);
> return 0;
> }
>
> clang seems to get the underflow condition wrong.
>
> % cc -o z a.c -lm && ./z
> FE_UNDERFLOW: x = 0.000000e+00
>
> % cc -O -o z a.c -lm && ./z
> x = 1.000000e-60 <--- This is not a possible value!
>
> % gcc -o z a.c -lm && ./z
> FE_UNDERFLOW: x = 0.000000e+00
>
> % gcc -O -o z a.c -lm && ./z
> FE_UNDERFLOW: x = 0.000000e+00
Hmm, this is an interesting one. On amd64, it works as expected with
clang, but there it always uses SSE, obviously:
$ ./underflow-amd64
FE_UNDERFLOW: x = 0.000000e+00
The problem seems to be caused by the intermediate result being stored
using fstpl instead of fstps, e.g. simplifying the sample program (to
get rid of all the SSE stuff the fexxx() macros insert):
int main(void)
{
float x;
__uint16_t status;
__fnclex();
x = foo();
__fnstsw(&status);
printf("status: %#x\n", (unsigned)status);
printf("x = %e\n", x);
return 0;
}
With gcc, the assembly becomes:
foo:
flds tiny.1853
flds tiny.1853
fmulp %st, %st(1)
ret
[...]
main:
[...]
fnclex
call foo
fstps 12(%esp)
fnstsw %ax
In this case, fmulp does not generate an underflow, but the fstps will.
With clang, the assembly becomes:
foo:
flds foo.tiny
fmuls foo.tiny
retl
[...]
main:
subl $24, %esp
fnclex
calll foo
fstpl 12(%esp) # 8-byte Folded Spill
fnstsw 22(%esp)
So it's storing the intermediate result in a double, for some reason.
The fnstsw will then result in zero, since there was no underflow at
that point.
I will submit a bug for this upstream, thanks for the report.
-Dimitry
[-- Attachment #2 --]
-----BEGIN PGP SIGNATURE-----
Version: GnuPG/MacGPG2 v2.0.29
iEYEARECAAYFAlblx7YACgkQsF6jCi4glqNZZwCg31aoDFrKkjMxWFME/QNTcQAB
45gAniBh/gkRojA0mnSTGFXO2XyRoZor
=GVRB
-----END PGP SIGNATURE-----
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?74970883-FE44-47C0-BDA0-92DB0723398A>
