Date: Tue, 23 Sep 2014 17:49:28 -0400 (EDT) From: Daniel Eischen <deischen@freebsd.org> To: Jilles Tjoelker <jilles@stack.nl> Cc: adrian@freebsd.org, freebsd-threads@freebsd.org Subject: Re: sem_post() performance Message-ID: <Pine.GSO.4.64.1409231742300.2865@sea.ntplx.net> In-Reply-To: <20140923212000.GA78110@stack.nl> References: <20140921213742.GA46868@stack.nl> <1531724.MPBlj40xOW@ralph.baldwin.cx> <20140923212000.GA78110@stack.nl>
next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, 23 Sep 2014, Jilles Tjoelker wrote: > On Mon, Sep 22, 2014 at 03:53:13PM -0400, John Baldwin wrote: >> On Sunday, September 21, 2014 11:37:42 PM Jilles Tjoelker wrote: >>> It has been reported that POSIX semaphores are slow, in contexts such as >>> Python. Note that POSIX semaphores are the only synchronization objects >>> that support use by different processes in shared memory; this does not >>> work for mutexes and condition variables because they are pointers to >>> the actual data structure. > >>> In fact, sem_post() unconditionally performs an umtx system call. > >> *sigh* I was worried that that might be the case. > >>> To avoid both lost wakeups and possible writes to a destroyed semaphore, >>> an uncontested sem_post() must check the _has_waiters flag atomically >>> with incrementing _count. > >>> The proper way to do this would be to take one bit from _count and >>> use it for the _has_waiters flag; the definition of SEM_VALUE_MAX >>> permits this. However, this would require a new set of umtx >>> semaphore operations and will break ABI of process-shared semaphores >>> (things may break if an old and a new libc access the same semaphore >>> over shared memory). > >>> This diff only affects 32-bit aligned but 64-bit misaligned >>> semaphores on 64-bit systems, and changes _count and _has_waiters >>> atomically using a 64-bit atomic operation. It probably needs a >>> may_alias attribute for correctness, but <sys/cdefs.h> does not have >>> a wrapper for that. > >> It wasn't clear on first reading, but you are using aliasing to get >> around the need for new umtx calls by using a 64-bit atomic op to >> adjust two ints at the same time, yes? Note that since a failing >> semaphore op calls into the kernel for the "hard" case, you might in >> fact be able to change the ABI without breaking process-shared >> semaphores. That is, suppose you left 'has_waiters' as always true >> and reused the high bit of count for has_waiters. > >> Would old binaries always trap into the kernel? (Not sure they will, >> especially the case where an old binary creates the semaphore, a new >> binary would have to force has_waiters to true in every sem op, but >> even that might not be enough.) > > I think that everything will break when a binary linked to old and new > libcs use the same semaphore. If the new contested bit is set, the old > sem_getvalue() will return garbage, the old sem_trywait() will fail even > if the real count is greater than 0, the old sem_wait() and > sem_timedwait() may spin if the real count is greater than 0 and the old > sem_post() will fail with [EOVERFLOW]. > > That the "hard" path always issues a system call does not help much, > since the system calls do not write to _count (this is an throughput > optimization, allowing a fast-path thread through while a slow-path > thread is entering or leaving the kernel). [ ... ] > Consideration: just declare mixing process-shared semaphores with > sufficiently different libc unsupported, and change SEM_MAGIC to enforce > that? (This does not prevent running old binaries, as long as they're > dynamically linked to libc and you use a new libc.so.) Yes and yes :-) And we need to add such a magic or version number to our mutex and CVs when we convert their types from pointers to actual structs. -- DE
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.GSO.4.64.1409231742300.2865>