From owner-freebsd-threads@FreeBSD.ORG Sun Sep 21 21:37:45 2014 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id AFD73914; Sun, 21 Sep 2014 21:37:45 +0000 (UTC) Received: from mx1.stack.nl (relay02.stack.nl [IPv6:2001:610:1108:5010::104]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client CN "mailhost.stack.nl", Issuer "CA Cert Signing Authority" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 755B2C3E; Sun, 21 Sep 2014 21:37:45 +0000 (UTC) Received: from snail.stack.nl (snail.stack.nl [IPv6:2001:610:1108:5010::131]) by mx1.stack.nl (Postfix) with ESMTP id 7A079358C54; Sun, 21 Sep 2014 23:37:42 +0200 (CEST) Received: by snail.stack.nl (Postfix, from userid 1677) id 4287F28494; Sun, 21 Sep 2014 23:37:42 +0200 (CEST) Date: Sun, 21 Sep 2014 23:37:42 +0200 From: Jilles Tjoelker To: freebsd-threads@freebsd.org Subject: sem_post() performance Message-ID: <20140921213742.GA46868@stack.nl> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.21 (2010-09-15) Cc: adrian@freebsd.org X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 21 Sep 2014 21:37:45 -0000 It has been reported that POSIX semaphores are slow, in contexts such as Python. Note that POSIX semaphores are the only synchronization objects that support use by different processes in shared memory; this does not work for mutexes and condition variables because they are pointers to the actual data structure. In fact, sem_post() unconditionally performs an umtx system call. To avoid both lost wakeups and possible writes to a destroyed semaphore, an uncontested sem_post() must check the _has_waiters flag atomically with incrementing _count. The proper way to do this would be to take one bit from _count and use it for the _has_waiters flag; the definition of SEM_VALUE_MAX permits this. However, this would require a new set of umtx semaphore operations and will break ABI of process-shared semaphores (things may break if an old and a new libc access the same semaphore over shared memory). This diff only affects 32-bit aligned but 64-bit misaligned semaphores on 64-bit systems, and changes _count and _has_waiters atomically using a 64-bit atomic operation. It probably needs a may_alias attribute for correctness, but does not have a wrapper for that. Some x86 CPUs may cope with misaligned atomic ops without destroying performance (especially if they do not cross a cache line), so the alignment restriction could be relaxed to make the patch more practical. Many CPUs in the i386 architecture have a 64-bit atomic op (cmpxchg8b) which could be used here. This appears to restore performance of 10-stable uncontested semaphores with the strange alignment to 9-stable levels (a tight loop with sem_wait and sem_post). I have not tested in any real workload. Index: lib/libc/gen/sem_new.c =================================================================== --- lib/libc/gen/sem_new.c (revision 269952) +++ lib/libc/gen/sem_new.c (working copy) @@ -437,6 +437,32 @@ _sem_post(sem_t *sem) if (sem_check_validity(sem) != 0) return (-1); +#ifdef __LP64__ + if (((uintptr_t)&sem->_kern._count & 7) == 0) { + uint64_t oldval, newval; + + while (!sem->_kern._has_waiters) { + count = sem->_kern._count; + if (count + 1 > SEM_VALUE_MAX) + return (EOVERFLOW); + /* + * Expect _count == count and _has_waiters == 0. + */ +#if BYTE_ORDER == LITTLE_ENDIAN + oldval = (uint64_t)count << 32; + newval = (uint64_t)(count + 1) << 32; +#elif BYTE_ORDER == BIG_ENDIAN + oldval = (uint64_t)count; + newval = (uint64_t)(count + 1); +#else +#error Unknown byte order +#endif + if (atomic_cmpset_rel_64((uint64_t *)&sem->_kern._count, + oldval, newval)) + return (0); + } + } +#endif do { count = sem->_kern._count; if (count + 1 > SEM_VALUE_MAX) -- Jilles Tjoelker