From owner-freebsd-threads@FreeBSD.ORG Mon Oct 13 21:58:39 2014 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id B81E09EF; Mon, 13 Oct 2014 21:58:39 +0000 (UTC) Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [IPv6:2001:470:1f11:75::1]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 92D5B6AB; Mon, 13 Oct 2014 21:58:39 +0000 (UTC) Received: from ralph.baldwin.cx (pool-173-70-85-31.nwrknj.fios.verizon.net [173.70.85.31]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 75DBEB923; Mon, 13 Oct 2014 17:58:36 -0400 (EDT) From: John Baldwin To: freebsd-threads@freebsd.org Subject: Re: sem_post() performance Date: Mon, 13 Oct 2014 17:35:09 -0400 Message-ID: <17114533.JBQOYsdsdz@ralph.baldwin.cx> User-Agent: KMail/4.12.5 (FreeBSD/10.1-BETA2; KDE/4.12.5; amd64; ; ) In-Reply-To: <20140921213742.GA46868@stack.nl> References: <20140921213742.GA46868@stack.nl> MIME-Version: 1.0 Content-Transfer-Encoding: 7Bit Content-Type: text/plain; charset="us-ascii" X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Mon, 13 Oct 2014 17:58:36 -0400 (EDT) Cc: adrian@freebsd.org X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 13 Oct 2014 21:58:39 -0000 On Sunday, September 21, 2014 11:37:42 PM Jilles Tjoelker wrote: > It has been reported that POSIX semaphores are slow, in contexts such as > Python. Note that POSIX semaphores are the only synchronization objects > that support use by different processes in shared memory; this does not > work for mutexes and condition variables because they are pointers to > the actual data structure. > > In fact, sem_post() unconditionally performs an umtx system call. > > To avoid both lost wakeups and possible writes to a destroyed semaphore, > an uncontested sem_post() must check the _has_waiters flag atomically > with incrementing _count. > > The proper way to do this would be to take one bit from _count and use > it for the _has_waiters flag; the definition of SEM_VALUE_MAX permits > this. However, this would require a new set of umtx semaphore operations > and will break ABI of process-shared semaphores (things may break if an > old and a new libc access the same semaphore over shared memory). Have you thought more about pursuing this option? I think there was a general consensus from earlier in the thread to just break the ABI (at least adjust SEM_MAGIC to give some protection) and fix it. > This diff only affects 32-bit aligned but 64-bit misaligned semaphores > on 64-bit systems, and changes _count and _has_waiters atomically using > a 64-bit atomic operation. It probably needs a may_alias attribute for > correctness, but does not have a wrapper for that. It does have one bug: > + if (atomic_cmpset_rel_64((uint64_t *)&sem->_kern._count, > + oldval, newval)) This needs to be '&_has_waiters'. Right now it changes _count and _flags, but not _has_waiters. -- John Baldwin