Date: Mon, 06 Nov 2023 12:41:25 +0000 From: bugzilla-noreply@freebsd.org To: toolchain@FreeBSD.org Subject: [Bug 274927] Toolchain fails on the __sync_val_compare_and_swap function without -march=native (port biology/seqwish) Message-ID: <bug-274927-29464-r6KnWoa4IU@https.bugs.freebsd.org/bugzilla/> In-Reply-To: <bug-274927-29464@https.bugs.freebsd.org/bugzilla/> References: <bug-274927-29464@https.bugs.freebsd.org/bugzilla/>
next in thread | previous in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D274927 --- Comment #8 from Dimitry Andric <dim@FreeBSD.org> --- These are all called via seqwish::DisjointSets::unite() (which is in=20 https://github.com/ekg/seqwish/blob/master/src/dset64-gccAtomic.hpp): 0000000000000000 <seqwish::DisjointSets::unite(unsigned long, unsigned long= )>: 0: 55 push %rbp 1: 48 89 e5 mov %rsp,%rbp ... 43: e8 00 00 00 00 call 48 <seqwish::DisjointSets::unite(unsigned long, unsigned long)+0x48> 44: R_X86_64_PLT32=20=20=20=20=20 __sync_val_compare_and_swap_16-0x4 The file has a comment about this: * The implementation in shasta/src/dset64.hpp uses std::atomic<__uint128_t> * for lock-free synchronization. * On older GCC versions, std::atomic<__uint128_t> is lock-free * if compilation is done with -mcx16, which enables the use of the * 16-byte (128 bit) compare-and-swap instruction, CMPXCHG16B. * * Unfortunately, on newer GCC versions, this is no longer true * because of gcc bug 80878: * https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D80878 * * As a result, there was a significant performance loss in * versions of Shasta built with gcc 7, * which is used by default on Ubuntu 18.04, when using * machines with large number of virtual processors. * * It is unlikely that this gcc bug will ever be fixed, * and to avoid this performance loss this implementation * uses gcc primitive __sync_bool_compare_and_swap instead * for lock-free synchronization. When compilation * is done with -mcx16 and optimization turned on, * this primitive uses the CMPXCHG16B instruction * and results in optimal speed. * * The CMPXCHG16B instruction is available on most modern 64-bit x86 processors. * Some older processors that don't implement this instruction * will crash with an "Illegal instruction" error * upon attempting to run this code. However __sync_bool_compare_and_swap is usually provided by a compiler libr= ary such as libgcc or libcompiler-rt. I don't think we have this function for 1= 28 bit integers, though. As noted in the comment, the code should be compiled with -mxc16 for optimal performance. Processors which do not support CMPXCHG16B are quite ancient n= ow. --=20 You are receiving this mail because: You are the assignee for the bug.=
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-274927-29464-r6KnWoa4IU>