Date: Mon, 22 Sep 2014 15:53:13 -0400 From: John Baldwin <jhb@freebsd.org> To: freebsd-threads@freebsd.org Cc: adrian@freebsd.org Subject: Re: sem_post() performance Message-ID: <1531724.MPBlj40xOW@ralph.baldwin.cx> In-Reply-To: <20140921213742.GA46868@stack.nl> References: <20140921213742.GA46868@stack.nl>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sunday, September 21, 2014 11:37:42 PM Jilles Tjoelker wrote: > It has been reported that POSIX semaphores are slow, in contexts such as > Python. Note that POSIX semaphores are the only synchronization objects > that support use by different processes in shared memory; this does not > work for mutexes and condition variables because they are pointers to > the actual data structure. > > In fact, sem_post() unconditionally performs an umtx system call. *sigh* I was worried that that might be the case. > To avoid both lost wakeups and possible writes to a destroyed semaphore, > an uncontested sem_post() must check the _has_waiters flag atomically > with incrementing _count. > > The proper way to do this would be to take one bit from _count and use > it for the _has_waiters flag; the definition of SEM_VALUE_MAX permits > this. However, this would require a new set of umtx semaphore operations > and will break ABI of process-shared semaphores (things may break if an > old and a new libc access the same semaphore over shared memory). > > This diff only affects 32-bit aligned but 64-bit misaligned semaphores > on 64-bit systems, and changes _count and _has_waiters atomically using > a 64-bit atomic operation. It probably needs a may_alias attribute for > correctness, but <sys/cdefs.h> does not have a wrapper for that. It wasn't clear on first reading, but you are using aliasing to get around the need for new umtx calls by using a 64-bit atomic op to adjust two ints at the same time, yes? Note that since a failing semaphore op calls into the kernel for the "hard" case, you might in fact be able to change the ABI without breaking process-shared semaphores. That is, suppose you left 'has_waiters' as always true and reused the high bit of count for has_waiters. Would old binaries always trap into the kernel? (Not sure they will, especially the case where an old binary creates the semaphore, a new binary would have to force has_waiters to true in every sem op, but even that might not be enough.) > Some x86 CPUs may cope with misaligned atomic ops without destroying > performance (especially if they do not cross a cache line), so the > alignment restriction could be relaxed to make the patch more practical. > > Many CPUs in the i386 architecture have a 64-bit atomic op (cmpxchg8b) > which could be used here. > > This appears to restore performance of 10-stable uncontested semaphores > with the strange alignment to 9-stable levels (a tight loop with > sem_wait and sem_post). I have not tested in any real workload. > > Index: lib/libc/gen/sem_new.c > =================================================================== > --- lib/libc/gen/sem_new.c (revision 269952) > +++ lib/libc/gen/sem_new.c (working copy) > @@ -437,6 +437,32 @@ _sem_post(sem_t *sem) > if (sem_check_validity(sem) != 0) > return (-1); > > +#ifdef __LP64__ > + if (((uintptr_t)&sem->_kern._count & 7) == 0) { > + uint64_t oldval, newval; > + > + while (!sem->_kern._has_waiters) { > + count = sem->_kern._count; > + if (count + 1 > SEM_VALUE_MAX) > + return (EOVERFLOW); > + /* > + * Expect _count == count and _has_waiters == 0. > + */ > +#if BYTE_ORDER == LITTLE_ENDIAN > + oldval = (uint64_t)count << 32; > + newval = (uint64_t)(count + 1) << 32; > +#elif BYTE_ORDER == BIG_ENDIAN > + oldval = (uint64_t)count; > + newval = (uint64_t)(count + 1); > +#else > +#error Unknown byte order > +#endif > + if (atomic_cmpset_rel_64((uint64_t *)&sem->_kern._count, > + oldval, newval)) > + return (0); > + } > + } > +#endif I think this looks ok, but it's a shame we can't fix it more cleanly. -- John Baldwin
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1531724.MPBlj40xOW>