Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 13 Mar 2016 21:03:57 +0100
From:      Dimitry Andric <dim@FreeBSD.org>
To:        Steve Kargl <sgk@troutmask.apl.washington.edu>
Cc:        freebsd-toolchain@freebsd.org
Subject:   Re: clang gets numerical underflow wrong, please fix.
Message-ID:  <74970883-FE44-47C0-BDA0-92DB0723398A@FreeBSD.org>
In-Reply-To: <20160313182521.GA25361@troutmask.apl.washington.edu>
References:  <20160313182521.GA25361@troutmask.apl.washington.edu>

next in thread | previous in thread | raw e-mail | index | archive | help

--Apple-Mail=_11C2F5B6-8463-491B-A91C-A51E76493731
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
	charset=us-ascii

On 13 Mar 2016, at 19:25, Steve Kargl <sgk@troutmask.apl.washington.edu> =
wrote:
>=20
> Consider this small piece of code:
>=20
> #include <fenv.h>
> #include <stdio.h>
>=20
> float
> foo()
> {
> 	static const volatile float tiny =3D 1.e-30f;
> 	return (tiny * tiny);
> }
>=20
> int
> main(void)
> {
>   float x;
>   feclearexcept(FE_ALL_EXCEPT);
>   x =3D foo();
>   if (fetestexcept(FE_UNDERFLOW)) printf("FE_UNDERFLOW: ");
>   printf("x =3D %e\n", x);
>   return 0;
> }
>=20
> clang seems to get the underflow condition wrong.
>=20
> % cc -o z a.c -lm && ./z
> FE_UNDERFLOW: x =3D 0.000000e+00
>=20
> % cc -O -o z a.c -lm && ./z
> x =3D 1.000000e-60             <--- This is not a possible value!
>=20
> % gcc -o z a.c -lm && ./z
> FE_UNDERFLOW: x =3D 0.000000e+00
>=20
> % gcc -O -o z a.c -lm && ./z
> FE_UNDERFLOW: x =3D 0.000000e+00

Hmm, this is an interesting one.  On amd64, it works as expected with
clang, but there it always uses SSE, obviously:

$ ./underflow-amd64
FE_UNDERFLOW: x =3D 0.000000e+00

The problem seems to be caused by the intermediate result being stored
using fstpl instead of fstps, e.g. simplifying the sample program (to
get rid of all the SSE stuff the fexxx() macros insert):

int main(void)
{
  float x;
  __uint16_t status;
  __fnclex();
  x =3D foo();
  __fnstsw(&status);
  printf("status: %#x\n", (unsigned)status);
  printf("x =3D %e\n", x);
  return 0;
}

With gcc, the assembly becomes:

foo:
        flds    tiny.1853
        flds    tiny.1853
        fmulp   %st, %st(1)
        ret
[...]
main:
[...]
        fnclex
        call    foo
        fstps   12(%esp)
        fnstsw %ax

In this case, fmulp does not generate an underflow, but the fstps will.
With clang, the assembly becomes:

foo:
        flds    foo.tiny
        fmuls   foo.tiny
        retl
[...]
main:
        subl    $24, %esp
        fnclex
        calll   foo
        fstpl   12(%esp)                # 8-byte Folded Spill
        fnstsw  22(%esp)

So it's storing the intermediate result in a double, for some reason.
The fnstsw will then result in zero, since there was no underflow at
that point.

I will submit a bug for this upstream, thanks for the report.

-Dimitry


--Apple-Mail=_11C2F5B6-8463-491B-A91C-A51E76493731
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
	filename=signature.asc
Content-Type: application/pgp-signature;
	name=signature.asc
Content-Description: Message signed with OpenPGP using GPGMail

-----BEGIN PGP SIGNATURE-----
Version: GnuPG/MacGPG2 v2.0.29

iEYEARECAAYFAlblx7YACgkQsF6jCi4glqNZZwCg31aoDFrKkjMxWFME/QNTcQAB
45gAniBh/gkRojA0mnSTGFXO2XyRoZor
=GVRB
-----END PGP SIGNATURE-----

--Apple-Mail=_11C2F5B6-8463-491B-A91C-A51E76493731--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?74970883-FE44-47C0-BDA0-92DB0723398A>