Date: Sun, 13 Mar 2016 21:03:57 +0100 From: Dimitry Andric <dim@FreeBSD.org> To: Steve Kargl <sgk@troutmask.apl.washington.edu> Cc: freebsd-toolchain@freebsd.org Subject: Re: clang gets numerical underflow wrong, please fix. Message-ID: <74970883-FE44-47C0-BDA0-92DB0723398A@FreeBSD.org> In-Reply-To: <20160313182521.GA25361@troutmask.apl.washington.edu> References: <20160313182521.GA25361@troutmask.apl.washington.edu>
next in thread | previous in thread | raw e-mail | index | archive | help
--Apple-Mail=_11C2F5B6-8463-491B-A91C-A51E76493731 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=us-ascii On 13 Mar 2016, at 19:25, Steve Kargl <sgk@troutmask.apl.washington.edu> = wrote: >=20 > Consider this small piece of code: >=20 > #include <fenv.h> > #include <stdio.h> >=20 > float > foo() > { > static const volatile float tiny =3D 1.e-30f; > return (tiny * tiny); > } >=20 > int > main(void) > { > float x; > feclearexcept(FE_ALL_EXCEPT); > x =3D foo(); > if (fetestexcept(FE_UNDERFLOW)) printf("FE_UNDERFLOW: "); > printf("x =3D %e\n", x); > return 0; > } >=20 > clang seems to get the underflow condition wrong. >=20 > % cc -o z a.c -lm && ./z > FE_UNDERFLOW: x =3D 0.000000e+00 >=20 > % cc -O -o z a.c -lm && ./z > x =3D 1.000000e-60 <--- This is not a possible value! >=20 > % gcc -o z a.c -lm && ./z > FE_UNDERFLOW: x =3D 0.000000e+00 >=20 > % gcc -O -o z a.c -lm && ./z > FE_UNDERFLOW: x =3D 0.000000e+00 Hmm, this is an interesting one. On amd64, it works as expected with clang, but there it always uses SSE, obviously: $ ./underflow-amd64 FE_UNDERFLOW: x =3D 0.000000e+00 The problem seems to be caused by the intermediate result being stored using fstpl instead of fstps, e.g. simplifying the sample program (to get rid of all the SSE stuff the fexxx() macros insert): int main(void) { float x; __uint16_t status; __fnclex(); x =3D foo(); __fnstsw(&status); printf("status: %#x\n", (unsigned)status); printf("x =3D %e\n", x); return 0; } With gcc, the assembly becomes: foo: flds tiny.1853 flds tiny.1853 fmulp %st, %st(1) ret [...] main: [...] fnclex call foo fstps 12(%esp) fnstsw %ax In this case, fmulp does not generate an underflow, but the fstps will. With clang, the assembly becomes: foo: flds foo.tiny fmuls foo.tiny retl [...] main: subl $24, %esp fnclex calll foo fstpl 12(%esp) # 8-byte Folded Spill fnstsw 22(%esp) So it's storing the intermediate result in a double, for some reason. The fnstsw will then result in zero, since there was no underflow at that point. I will submit a bug for this upstream, thanks for the report. -Dimitry --Apple-Mail=_11C2F5B6-8463-491B-A91C-A51E76493731 Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename=signature.asc Content-Type: application/pgp-signature; name=signature.asc Content-Description: Message signed with OpenPGP using GPGMail -----BEGIN PGP SIGNATURE----- Version: GnuPG/MacGPG2 v2.0.29 iEYEARECAAYFAlblx7YACgkQsF6jCi4glqNZZwCg31aoDFrKkjMxWFME/QNTcQAB 45gAniBh/gkRojA0mnSTGFXO2XyRoZor =GVRB -----END PGP SIGNATURE----- --Apple-Mail=_11C2F5B6-8463-491B-A91C-A51E76493731--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?74970883-FE44-47C0-BDA0-92DB0723398A>