Date: Fri, 09 Apr 2021 21:58:45 +0000 From: bugzilla-noreply@freebsd.org To: bugs@FreeBSD.org Subject: =?UTF-8?B?W0J1ZyAyNTQ5MTFdIGxpYi9tc3VuL2N0cmlnX3Rlc3QgZmFpbHMg?= =?UTF-8?B?aWYgY29tcGlsZWQgd2l0aCBBVljCoCgtbWF2eCkgb3IgYW55IENQVVNFVCBl?= =?UTF-8?B?bmFibGluZyBBVlg=?= Message-ID: <bug-254911-227-9N5qNE5EtM@https.bugs.freebsd.org/bugzilla/> In-Reply-To: <bug-254911-227@https.bugs.freebsd.org/bugzilla/> References: <bug-254911-227@https.bugs.freebsd.org/bugzilla/>
next in thread | previous in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D254911 --- Comment #3 from Dimitry Andric <dim@FreeBSD.org> --- Hmm it seems that we have a case here that is similar to what is described here: https://stackoverflow.com/questions/63125919/how-to-avoid-floating-point-ex= ceptions-in-unused-simd-lanes The gist being that clang indeed uses the vdivps (Divide Packed Single-Precision) instruction by default, so the two calculations (beta * r= ho * s) / denom, t / denom) are emitted as: #DEBUG_VALUE: ctanhf:denom <- $xmm2 .loc 1 77 35 is_stmt 1 # src/lib/msun/src/s_ctanhf.c:77:35 vmulss %xmm1, %xmm3, %xmm1 .loc 1 77 41 is_stmt 0 # src/lib/msun/src/s_ctanhf.c:77:41 vmulss %xmm1, %xmm0, %xmm0 .loc 1 77 46 # src/lib/msun/src/s_ctanhf.c:77:46 vinsertps $16, -80(%rbp), %xmm0, %xmm0 # 16-byte Folded Reload # xmm0 =3D xmm0[0],mem[0],xmm0[2,3] vmovsldup %xmm2, %xmm1 # xmm1 =3D xmm2[0,0,2,2] vdivps %xmm1, %xmm0, %xmm0 Now the problem with vdivps is apparently that the unused 'lanes' of the SI= MD registers can still result in floating point exception bits being set, such= as FE_INVALID (in this case probably because the unused lanes have zero in the= m, giving 0/0). That stackoverflow article suggests using clang's -ffp-exception-behavior=3Dmaytrap option (documented at <https://releases.llvm.org/11.0.1/tools/clang/docs/UsersManual.html#cmdopti= on-ffp-exception-behavior>), meaning "The compiler avoids transformations that may raise exceptions that would not have been raised by the original code". It is supported from clan= g 10 onwards. In practice, this indeed avoids using vdivps, and uses vdivss (Divide Scalar Single-Precision) instead, and the assembly for line 77 then looks like: #DEBUG_VALUE: ctanhf:denom <- $xmm1 .loc 1 77 35 is_stmt 1 # src/lib/msun/src/s_ctanhf.c:77:35 vmulss %xmm2, %xmm4, %xmm2 .loc 1 77 41 is_stmt 0 # src/lib/msun/src/s_ctanhf.c:77:41 vmulss %xmm0, %xmm2, %xmm0 .loc 1 77 46 # src/lib/msun/src/s_ctanhf.c:77:46 vdivss %xmm1, %xmm0, %xmm2 vmovss -80(%rbp), %xmm0 # 4-byte Reload # xmm0 =3D mem[0],zero,zero,zero #DEBUG_VALUE: ctanhf:t <- $xmm0 .loc 1 77 57 # src/lib/msun/src/s_ctanhf.c:77:57 vdivss %xmm1, %xmm0, %xmm0 And indeed, in this case the FE_INVALID is gone, and the tests succeed. I guess it may be good to use this -ffp-exception-behavior=3Dmaytrap flag f= or the whole of lib/msun, as many of these functions rely on this behavior. It does not seem to be required for gcc. --=20 You are receiving this mail because: You are the assignee for the bug.=
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-254911-227-9N5qNE5EtM>