Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 06 Nov 2023 12:41:25 +0000
From:      bugzilla-noreply@freebsd.org
To:        toolchain@FreeBSD.org
Subject:   [Bug 274927] Toolchain fails on the __sync_val_compare_and_swap function without -march=native (port biology/seqwish)
Message-ID:  <bug-274927-29464-r6KnWoa4IU@https.bugs.freebsd.org/bugzilla/>
In-Reply-To: <bug-274927-29464@https.bugs.freebsd.org/bugzilla/>
References:  <bug-274927-29464@https.bugs.freebsd.org/bugzilla/>

next in thread | previous in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D274927

--- Comment #8 from Dimitry Andric <dim@FreeBSD.org> ---
These are all called via seqwish::DisjointSets::unite() (which is in=20
https://github.com/ekg/seqwish/blob/master/src/dset64-gccAtomic.hpp):

0000000000000000 <seqwish::DisjointSets::unite(unsigned long, unsigned long=
)>:
   0:   55                      push   %rbp
   1:   48 89 e5                mov    %rsp,%rbp
...
  43:   e8 00 00 00 00          call   48
<seqwish::DisjointSets::unite(unsigned long, unsigned long)+0x48>
                        44: R_X86_64_PLT32=20=20=20=20=20
__sync_val_compare_and_swap_16-0x4

The file has a comment about this:

 * The implementation in shasta/src/dset64.hpp uses std::atomic<__uint128_t>
 * for lock-free synchronization.
 * On older GCC versions, std::atomic<__uint128_t> is lock-free
 * if compilation is done with -mcx16, which enables the use of the
 * 16-byte (128 bit) compare-and-swap instruction, CMPXCHG16B.
 *
 * Unfortunately, on newer GCC versions, this is no longer true
 * because of gcc bug 80878:
 * https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D80878
 *
 * As a result, there was a significant performance loss in
 * versions of Shasta built with gcc 7,
 * which is used by default on Ubuntu 18.04, when using
 * machines with large number of virtual processors.
 *
 * It is unlikely that this gcc bug will ever be fixed,
 * and to avoid this performance loss this implementation
 * uses gcc primitive __sync_bool_compare_and_swap instead
 * for lock-free synchronization. When compilation
 * is done with -mcx16 and optimization turned on,
 * this primitive uses the CMPXCHG16B instruction
 * and results in optimal speed.
 *
 * The CMPXCHG16B instruction is available on most modern 64-bit x86
processors.
 * Some older processors that don't implement this instruction
 * will crash with an "Illegal instruction" error
 * upon attempting to run this code.

However __sync_bool_compare_and_swap is usually provided by a compiler libr=
ary
such as libgcc or libcompiler-rt. I don't think we have this function for 1=
28
bit integers, though.

As noted in the comment, the code should be compiled with -mxc16 for optimal
performance. Processors which do not support CMPXCHG16B are quite ancient n=
ow.

--=20
You are receiving this mail because:
You are the assignee for the bug.=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-274927-29464-r6KnWoa4IU>