Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 09 Apr 2021 21:58:45 +0000
From:      bugzilla-noreply@freebsd.org
To:        bugs@FreeBSD.org
Subject:   =?UTF-8?B?W0J1ZyAyNTQ5MTFdIGxpYi9tc3VuL2N0cmlnX3Rlc3QgZmFpbHMg?= =?UTF-8?B?aWYgY29tcGlsZWQgd2l0aCBBVljCoCgtbWF2eCkgb3IgYW55IENQVVNFVCBl?= =?UTF-8?B?bmFibGluZyBBVlg=?=
Message-ID:  <bug-254911-227-9N5qNE5EtM@https.bugs.freebsd.org/bugzilla/>
In-Reply-To: <bug-254911-227@https.bugs.freebsd.org/bugzilla/>

index | next in thread | previous in thread | raw e-mail

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=254911

--- Comment #3 from Dimitry Andric <dim@FreeBSD.org> ---
Hmm it seems that we have a case here that is similar to what is described
here:

https://stackoverflow.com/questions/63125919/how-to-avoid-floating-point-exceptions-in-unused-simd-lanes

The gist being that clang indeed uses the vdivps (Divide Packed
Single-Precision) instruction by default, so the two calculations (beta * rho *
s) / denom, t / denom) are emitted as:

        #DEBUG_VALUE: ctanhf:denom <- $xmm2
        .loc    1 77 35 is_stmt 1               #
src/lib/msun/src/s_ctanhf.c:77:35
        vmulss  %xmm1, %xmm3, %xmm1
        .loc    1 77 41 is_stmt 0               #
src/lib/msun/src/s_ctanhf.c:77:41
        vmulss  %xmm1, %xmm0, %xmm0
        .loc    1 77 46                         #
src/lib/msun/src/s_ctanhf.c:77:46
        vinsertps       $16, -80(%rbp), %xmm0, %xmm0 # 16-byte Folded Reload
                                        # xmm0 = xmm0[0],mem[0],xmm0[2,3]
        vmovsldup       %xmm2, %xmm1            # xmm1 = xmm2[0,0,2,2]
        vdivps  %xmm1, %xmm0, %xmm0

Now the problem with vdivps is apparently that the unused 'lanes' of the SIMD
registers can still result in floating point exception bits being set, such as
FE_INVALID (in this case probably because the unused lanes have zero in them,
giving 0/0).

That stackoverflow article suggests using clang's
-ffp-exception-behavior=maytrap option (documented at
<https://releases.llvm.org/11.0.1/tools/clang/docs/UsersManual.html#cmdoption-ffp-exception-behavior>),
meaning "The compiler avoids transformations that may raise exceptions that
would not have been raised by the original code". It is supported from clang 10
onwards.

In practice, this indeed avoids using vdivps, and uses vdivss (Divide Scalar
Single-Precision) instead, and the assembly for line 77 then looks like:

        #DEBUG_VALUE: ctanhf:denom <- $xmm1
        .loc    1 77 35 is_stmt 1               #
src/lib/msun/src/s_ctanhf.c:77:35
        vmulss  %xmm2, %xmm4, %xmm2
        .loc    1 77 41 is_stmt 0               #
src/lib/msun/src/s_ctanhf.c:77:41
        vmulss  %xmm0, %xmm2, %xmm0
        .loc    1 77 46                         #
src/lib/msun/src/s_ctanhf.c:77:46
        vdivss  %xmm1, %xmm0, %xmm2
        vmovss  -80(%rbp), %xmm0                # 4-byte Reload
                                        # xmm0 = mem[0],zero,zero,zero
        #DEBUG_VALUE: ctanhf:t <- $xmm0
        .loc    1 77 57                         #
src/lib/msun/src/s_ctanhf.c:77:57
        vdivss  %xmm1, %xmm0, %xmm0

And indeed, in this case the FE_INVALID is gone, and the tests succeed.

I guess it may be good to use this -ffp-exception-behavior=maytrap flag for the
whole of lib/msun, as many of these functions rely on this behavior. It does
not seem to be required for gcc.

-- 
You are receiving this mail because:
You are the assignee for the bug.

home | help

Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-254911-227-9N5qNE5EtM>