From owner-freebsd-toolchain@freebsd.org  Mon Mar 14 00:02:31 2016
Return-Path: <owner-freebsd-toolchain@freebsd.org>
Delivered-To: freebsd-toolchain@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id DA525ACF0C0
 for <freebsd-toolchain@mailman.ysv.freebsd.org>;
 Mon, 14 Mar 2016 00:02:31 +0000 (UTC) (envelope-from dim@FreeBSD.org)
Received: from tensor.andric.com (tensor.andric.com [87.251.56.140])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client CN "tensor.andric.com",
 Issuer "COMODO RSA Domain Validation Secure Server CA" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 6C56CDEB
 for <freebsd-toolchain@freebsd.org>; Mon, 14 Mar 2016 00:02:31 +0000 (UTC)
 (envelope-from dim@FreeBSD.org)
Received: from coleburn.home.andric.com (coleburn.home.andric.com
 [192.168.0.15])
 (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by tensor.andric.com (Postfix) with ESMTPSA id 55A166E9D;
 Mon, 14 Mar 2016 01:02:28 +0100 (CET)
Subject: Re: clang gets numerical underflow wrong, please fix.
Mime-Version: 1.0 (Mac OS X Mail 9.2 \(3112\))
Content-Type: multipart/signed;
 boundary="Apple-Mail=_40B5429D-BCD2-4684-8E3A-55F296B73BBE";
 protocol="application/pgp-signature"; micalg=pgp-sha1
X-Pgp-Agent: GPGMail 2.6b2 (ebbf3ef)
From: Dimitry Andric <dim@FreeBSD.org>
In-Reply-To: <20160313201004.GA26343@troutmask.apl.washington.edu>
Date: Mon, 14 Mar 2016 01:02:20 +0100
Cc: freebsd-toolchain@freebsd.org
Message-Id: <A70D119A-514A-4949-9BCB-CA344650BDB5@FreeBSD.org>
References: <20160313182521.GA25361@troutmask.apl.washington.edu>
 <74970883-FE44-47C0-BDA0-92DB0723398A@FreeBSD.org>
 <20160313201004.GA26343@troutmask.apl.washington.edu>
To: Steve Kargl <sgk@troutmask.apl.washington.edu>
X-Mailer: Apple Mail (2.3112)
X-BeenThere: freebsd-toolchain@freebsd.org
X-Mailman-Version: 2.1.21
Precedence: list
List-Id: Maintenance of FreeBSD's integrated toolchain
 <freebsd-toolchain.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-toolchain>, 
 <mailto:freebsd-toolchain-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-toolchain/>
List-Post: <mailto:freebsd-toolchain@freebsd.org>
List-Help: <mailto:freebsd-toolchain-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-toolchain>, 
 <mailto:freebsd-toolchain-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 14 Mar 2016 00:02:32 -0000


--Apple-Mail=_40B5429D-BCD2-4684-8E3A-55F296B73BBE
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
	charset=us-ascii

On 13 Mar 2016, at 21:10, Steve Kargl <sgk@troutmask.apl.washington.edu> =
wrote:
> On Sun, Mar 13, 2016 at 09:03:57PM +0100, Dimitry Andric wrote:
...
>> So it's storing the intermediate result in a double, for some reason.
>> The fnstsw will then result in zero, since there was no underflow at
>> that point.
>>=20
>> I will submit a bug for this upstream, thanks for the report.

Submitted upstream as: https://llvm.org/bugs/show_bug.cgi?id=3D26931


> Thanks for the quick reply.  But, it must be using an 80-bit
> extended double instead of a double for storage.  This variation
>=20
> #include <fenv.h>
> #include <stdio.h>
>=20
> int
> main(void)
> {
>   int i;
> //   float x =3D 1.f;
>   double x =3D 1.;
>   i =3D 0;
>   feclearexcept(FE_ALL_EXCEPT);
>   do {
>      x /=3D 2;
>      i++;
>   } while(!fetestexcept(FE_UNDERFLOW));
>   if (fetestexcept(FE_UNDERFLOW)) printf("FE_UNDERFLOW: ");
>   printf("x =3D %e after %d iterations\n", x, i);
>=20
>   return 0;
> }
>=20
> yields
>=20
> % cc -O -o z b.c -lm && ./z
> FE_UNDERFLOW: x =3D 0.000000e+00 after 16435 iterations
>=20
> It should be 1075 iterations.
>=20
> Note, there is a similar issue with OVERFLOW.  The upshot is
> that clang on current is probably miscompiling libm.

With this example, I also get different results from gcc (4.8.5),
depending on the optimization level:

$ gcc -O underflow-iter.c -o underflow-iter-gcc -lm
$ ./underflow-iter-gcc
FE_UNDERFLOW: x =3D 0.000000e+00 after 1075 iterations
$ gcc -O2 underflow-iter.c -o underflow-iter-gcc -lm
$ ./underflow-iter-gcc
FE_UNDERFLOW: x =3D 0.000000e+00 after 16435 iterations

Similar for the overflow case:

$ gcc -O overflow-iter.c -o overflow-iter-gcc -lm
$ ./overflow-iter-gcc
FE_OVERFLOW: x =3D inf after 1024 iterations
$ gcc -O2 overflow-iter.c -o overflow-iter-gcc -lm
$ ./overflow-iter-gcc
FE_OVERFLOW: x =3D inf after 16384 iterations

Are we depending on some sort of subtle undefined behavior here?  With
-O, the 'main loop' becomes:

.L3:
	fld1
	fstpl	24(%esp)
	movl	$0, %ebx
.L8:
	fldl	24(%esp)
	fld	%st(0)
	faddp	%st, %st(1)
	fstpl	24(%esp)
	addl	$1, %ebx
	fnstsw %ax
	movl	%eax, %esi
	movl	__has_sse, %eax
	testl	%eax, %eax
	je	.L4
	cmpl	$2, %eax
	jne	.L5
	call	__test_sse
	testl	%eax, %eax
	je	.L5
.L4:
	stmxcsr 44(%esp)
	jmp	.L6
.L5:
	movl	$0, 44(%esp)
.L6:
	orl	44(%esp), %esi
	testl	$8, %esi
	je	.L8

With -O2, it becomes:

.L3:
	fld1
	xorl	%ebx, %ebx
.L12:
	fadd	%st(0), %st
	addl	$1, %ebx
	fnstsw %ax
	testl	%edx, %edx
	movl	%eax, %esi
	je	.L10
	cmpl	$2, %edx
	je	.L27
.L9:
	xorl	%eax, %eax
.L8:
	orl	%eax, %esi
	andl	$8, %esi
	je	.L12

So it switches from using faddp and fstpl to direct fadd of %st(0) and
%st.  I assume that uses the internal 80 bit precision?  Gcc also
manages to move the __has_sse stuff out to further down in the function,
but it does not really affect the result.

-Dimitry


--Apple-Mail=_40B5429D-BCD2-4684-8E3A-55F296B73BBE
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
	filename=signature.asc
Content-Type: application/pgp-signature;
	name=signature.asc
Content-Description: Message signed with OpenPGP using GPGMail

-----BEGIN PGP SIGNATURE-----
Version: GnuPG/MacGPG2 v2.0.29

iEYEARECAAYFAlbl/5MACgkQsF6jCi4glqO95wCfaSScY8fm/V7XtAcMJ7Xz7Ctw
/OUAoISYUy/1dgZFhXFbT7wPyDRgSWZF
=prQV
-----END PGP SIGNATURE-----

--Apple-Mail=_40B5429D-BCD2-4684-8E3A-55F296B73BBE--