Date: Sun, 21 Sep 2014 23:37:42 +0200 From: Jilles Tjoelker <jilles@stack.nl> To: freebsd-threads@freebsd.org Cc: adrian@freebsd.org Subject: sem_post() performance Message-ID: <20140921213742.GA46868@stack.nl>
next in thread | raw e-mail | index | archive | help
It has been reported that POSIX semaphores are slow, in contexts such as Python. Note that POSIX semaphores are the only synchronization objects that support use by different processes in shared memory; this does not work for mutexes and condition variables because they are pointers to the actual data structure. In fact, sem_post() unconditionally performs an umtx system call. To avoid both lost wakeups and possible writes to a destroyed semaphore, an uncontested sem_post() must check the _has_waiters flag atomically with incrementing _count. The proper way to do this would be to take one bit from _count and use it for the _has_waiters flag; the definition of SEM_VALUE_MAX permits this. However, this would require a new set of umtx semaphore operations and will break ABI of process-shared semaphores (things may break if an old and a new libc access the same semaphore over shared memory). This diff only affects 32-bit aligned but 64-bit misaligned semaphores on 64-bit systems, and changes _count and _has_waiters atomically using a 64-bit atomic operation. It probably needs a may_alias attribute for correctness, but <sys/cdefs.h> does not have a wrapper for that. Some x86 CPUs may cope with misaligned atomic ops without destroying performance (especially if they do not cross a cache line), so the alignment restriction could be relaxed to make the patch more practical. Many CPUs in the i386 architecture have a 64-bit atomic op (cmpxchg8b) which could be used here. This appears to restore performance of 10-stable uncontested semaphores with the strange alignment to 9-stable levels (a tight loop with sem_wait and sem_post). I have not tested in any real workload. Index: lib/libc/gen/sem_new.c =================================================================== --- lib/libc/gen/sem_new.c (revision 269952) +++ lib/libc/gen/sem_new.c (working copy) @@ -437,6 +437,32 @@ _sem_post(sem_t *sem) if (sem_check_validity(sem) != 0) return (-1); +#ifdef __LP64__ + if (((uintptr_t)&sem->_kern._count & 7) == 0) { + uint64_t oldval, newval; + + while (!sem->_kern._has_waiters) { + count = sem->_kern._count; + if (count + 1 > SEM_VALUE_MAX) + return (EOVERFLOW); + /* + * Expect _count == count and _has_waiters == 0. + */ +#if BYTE_ORDER == LITTLE_ENDIAN + oldval = (uint64_t)count << 32; + newval = (uint64_t)(count + 1) << 32; +#elif BYTE_ORDER == BIG_ENDIAN + oldval = (uint64_t)count; + newval = (uint64_t)(count + 1); +#else +#error Unknown byte order +#endif + if (atomic_cmpset_rel_64((uint64_t *)&sem->_kern._count, + oldval, newval)) + return (0); + } + } +#endif do { count = sem->_kern._count; if (count + 1 > SEM_VALUE_MAX) -- Jilles Tjoelker
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20140921213742.GA46868>