From owner-freebsd-toolchain@freebsd.org Mon Mar 14 00:02:31 2016 Return-Path: Delivered-To: freebsd-toolchain@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id DA525ACF0C0 for ; Mon, 14 Mar 2016 00:02:31 +0000 (UTC) (envelope-from dim@FreeBSD.org) Received: from tensor.andric.com (tensor.andric.com [87.251.56.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "tensor.andric.com", Issuer "COMODO RSA Domain Validation Secure Server CA" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 6C56CDEB for ; Mon, 14 Mar 2016 00:02:31 +0000 (UTC) (envelope-from dim@FreeBSD.org) Received: from coleburn.home.andric.com (coleburn.home.andric.com [192.168.0.15]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by tensor.andric.com (Postfix) with ESMTPSA id 55A166E9D; Mon, 14 Mar 2016 01:02:28 +0100 (CET) Subject: Re: clang gets numerical underflow wrong, please fix. Mime-Version: 1.0 (Mac OS X Mail 9.2 \(3112\)) Content-Type: multipart/signed; boundary="Apple-Mail=_40B5429D-BCD2-4684-8E3A-55F296B73BBE"; protocol="application/pgp-signature"; micalg=pgp-sha1 X-Pgp-Agent: GPGMail 2.6b2 (ebbf3ef) From: Dimitry Andric In-Reply-To: <20160313201004.GA26343@troutmask.apl.washington.edu> Date: Mon, 14 Mar 2016 01:02:20 +0100 Cc: freebsd-toolchain@freebsd.org Message-Id: References: <20160313182521.GA25361@troutmask.apl.washington.edu> <74970883-FE44-47C0-BDA0-92DB0723398A@FreeBSD.org> <20160313201004.GA26343@troutmask.apl.washington.edu> To: Steve Kargl X-Mailer: Apple Mail (2.3112) X-BeenThere: freebsd-toolchain@freebsd.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: Maintenance of FreeBSD's integrated toolchain List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 14 Mar 2016 00:02:32 -0000 --Apple-Mail=_40B5429D-BCD2-4684-8E3A-55F296B73BBE Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=us-ascii On 13 Mar 2016, at 21:10, Steve Kargl = wrote: > On Sun, Mar 13, 2016 at 09:03:57PM +0100, Dimitry Andric wrote: ... >> So it's storing the intermediate result in a double, for some reason. >> The fnstsw will then result in zero, since there was no underflow at >> that point. >>=20 >> I will submit a bug for this upstream, thanks for the report. Submitted upstream as: https://llvm.org/bugs/show_bug.cgi?id=3D26931 > Thanks for the quick reply. But, it must be using an 80-bit > extended double instead of a double for storage. This variation >=20 > #include > #include >=20 > int > main(void) > { > int i; > // float x =3D 1.f; > double x =3D 1.; > i =3D 0; > feclearexcept(FE_ALL_EXCEPT); > do { > x /=3D 2; > i++; > } while(!fetestexcept(FE_UNDERFLOW)); > if (fetestexcept(FE_UNDERFLOW)) printf("FE_UNDERFLOW: "); > printf("x =3D %e after %d iterations\n", x, i); >=20 > return 0; > } >=20 > yields >=20 > % cc -O -o z b.c -lm && ./z > FE_UNDERFLOW: x =3D 0.000000e+00 after 16435 iterations >=20 > It should be 1075 iterations. >=20 > Note, there is a similar issue with OVERFLOW. The upshot is > that clang on current is probably miscompiling libm. With this example, I also get different results from gcc (4.8.5), depending on the optimization level: $ gcc -O underflow-iter.c -o underflow-iter-gcc -lm $ ./underflow-iter-gcc FE_UNDERFLOW: x =3D 0.000000e+00 after 1075 iterations $ gcc -O2 underflow-iter.c -o underflow-iter-gcc -lm $ ./underflow-iter-gcc FE_UNDERFLOW: x =3D 0.000000e+00 after 16435 iterations Similar for the overflow case: $ gcc -O overflow-iter.c -o overflow-iter-gcc -lm $ ./overflow-iter-gcc FE_OVERFLOW: x =3D inf after 1024 iterations $ gcc -O2 overflow-iter.c -o overflow-iter-gcc -lm $ ./overflow-iter-gcc FE_OVERFLOW: x =3D inf after 16384 iterations Are we depending on some sort of subtle undefined behavior here? With -O, the 'main loop' becomes: .L3: fld1 fstpl 24(%esp) movl $0, %ebx .L8: fldl 24(%esp) fld %st(0) faddp %st, %st(1) fstpl 24(%esp) addl $1, %ebx fnstsw %ax movl %eax, %esi movl __has_sse, %eax testl %eax, %eax je .L4 cmpl $2, %eax jne .L5 call __test_sse testl %eax, %eax je .L5 .L4: stmxcsr 44(%esp) jmp .L6 .L5: movl $0, 44(%esp) .L6: orl 44(%esp), %esi testl $8, %esi je .L8 With -O2, it becomes: .L3: fld1 xorl %ebx, %ebx .L12: fadd %st(0), %st addl $1, %ebx fnstsw %ax testl %edx, %edx movl %eax, %esi je .L10 cmpl $2, %edx je .L27 .L9: xorl %eax, %eax .L8: orl %eax, %esi andl $8, %esi je .L12 So it switches from using faddp and fstpl to direct fadd of %st(0) and %st. I assume that uses the internal 80 bit precision? Gcc also manages to move the __has_sse stuff out to further down in the function, but it does not really affect the result. -Dimitry --Apple-Mail=_40B5429D-BCD2-4684-8E3A-55F296B73BBE Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename=signature.asc Content-Type: application/pgp-signature; name=signature.asc Content-Description: Message signed with OpenPGP using GPGMail -----BEGIN PGP SIGNATURE----- Version: GnuPG/MacGPG2 v2.0.29 iEYEARECAAYFAlbl/5MACgkQsF6jCi4glqO95wCfaSScY8fm/V7XtAcMJ7Xz7Ctw /OUAoISYUy/1dgZFhXFbT7wPyDRgSWZF =prQV -----END PGP SIGNATURE----- --Apple-Mail=_40B5429D-BCD2-4684-8E3A-55F296B73BBE--