Date: Sun, 13 Mar 2016 13:10:04 -0700 From: Steve Kargl <sgk@troutmask.apl.washington.edu> To: Dimitry Andric <dim@FreeBSD.org> Cc: freebsd-toolchain@freebsd.org Subject: Re: clang gets numerical underflow wrong, please fix. Message-ID: <20160313201004.GA26343@troutmask.apl.washington.edu> In-Reply-To: <74970883-FE44-47C0-BDA0-92DB0723398A@FreeBSD.org> References: <20160313182521.GA25361@troutmask.apl.washington.edu> <74970883-FE44-47C0-BDA0-92DB0723398A@FreeBSD.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, Mar 13, 2016 at 09:03:57PM +0100, Dimitry Andric wrote: > On 13 Mar 2016, at 19:25, Steve Kargl <sgk@troutmask.apl.washington.edu> wrote: > > > > Consider this small piece of code: > > > > #include <fenv.h> > > #include <stdio.h> > > > > float > > foo() > > { > > static const volatile float tiny = 1.e-30f; > > return (tiny * tiny); > > } > > > > int > > main(void) > > { > > float x; > > feclearexcept(FE_ALL_EXCEPT); > > x = foo(); > > if (fetestexcept(FE_UNDERFLOW)) printf("FE_UNDERFLOW: "); > > printf("x = %e\n", x); > > return 0; > > } > > > > clang seems to get the underflow condition wrong. > > > > % cc -o z a.c -lm && ./z > > FE_UNDERFLOW: x = 0.000000e+00 > > > > % cc -O -o z a.c -lm && ./z > > x = 1.000000e-60 <--- This is not a possible value! > > > > % gcc -o z a.c -lm && ./z > > FE_UNDERFLOW: x = 0.000000e+00 > > > > % gcc -O -o z a.c -lm && ./z > > FE_UNDERFLOW: x = 0.000000e+00 > > Hmm, this is an interesting one. On amd64, it works as expected with > clang, but there it always uses SSE, obviously: > > $ ./underflow-amd64 > FE_UNDERFLOW: x = 0.000000e+00 > > The problem seems to be caused by the intermediate result being stored > using fstpl instead of fstps, e.g. simplifying the sample program (to > get rid of all the SSE stuff the fexxx() macros insert): > > int main(void) > { > float x; > __uint16_t status; > __fnclex(); > x = foo(); > __fnstsw(&status); > printf("status: %#x\n", (unsigned)status); > printf("x = %e\n", x); > return 0; > } > > With gcc, the assembly becomes: > > foo: > flds tiny.1853 > flds tiny.1853 > fmulp %st, %st(1) > ret > [...] > main: > [...] > fnclex > call foo > fstps 12(%esp) > fnstsw %ax > > In this case, fmulp does not generate an underflow, but the fstps will. > With clang, the assembly becomes: > > foo: > flds foo.tiny > fmuls foo.tiny > retl > [...] > main: > subl $24, %esp > fnclex > calll foo > fstpl 12(%esp) # 8-byte Folded Spill > fnstsw 22(%esp) > > So it's storing the intermediate result in a double, for some reason. > The fnstsw will then result in zero, since there was no underflow at > that point. > > I will submit a bug for this upstream, thanks for the report. > Thanks for the quick reply. But, it must be using an 80-bit extended double instead of a double for storage. This variation #include <fenv.h> #include <stdio.h> int main(void) { int i; // float x = 1.f; double x = 1.; i = 0; feclearexcept(FE_ALL_EXCEPT); do { x /= 2; i++; } while(!fetestexcept(FE_UNDERFLOW)); if (fetestexcept(FE_UNDERFLOW)) printf("FE_UNDERFLOW: "); printf("x = %e after %d iterations\n", x, i); return 0; } yields % cc -O -o z b.c -lm && ./z FE_UNDERFLOW: x = 0.000000e+00 after 16435 iterations It should be 1075 iterations. Note, there is a similar issue with OVERFLOW. The upshot is that clang on current is probably miscompiling libm. -- Steve
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20160313201004.GA26343>