From owner-freebsd-arch@FreeBSD.ORG  Mon Jan 14 19:37:07 2013
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id B4962623;
 Mon, 14 Jan 2013 19:37:07 +0000 (UTC) (envelope-from jhb@freebsd.org)
Received: from bigwig.baldwin.cx (bigknife-pt.tunnel.tserv9.chi1.ipv6.he.net
 [IPv6:2001:470:1f10:75::2])
 by mx1.freebsd.org (Postfix) with ESMTP id 8FB1534A;
 Mon, 14 Jan 2013 19:37:07 +0000 (UTC)
Received: from pakbsde14.localnet (unknown [38.105.238.108])
 by bigwig.baldwin.cx (Postfix) with ESMTPSA id A04EBB95B;
 Mon, 14 Jan 2013 14:37:06 -0500 (EST)
From: John Baldwin <jhb@freebsd.org>
To: David Chisnall <theraven@freebsd.org>
Subject: Re: Fast sigblock (AKA rtld speedup)
Date: Mon, 14 Jan 2013 13:58:32 -0500
User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110714-p22; KDE/4.5.5; amd64; ; )
References: <20130107182235.GA65279@kib.kiev.ua>
 <20130114174703.GB88220@stack.nl>
 <D6772A0E-FBA4-4168-B152-7E7694720A16@FreeBSD.org>
In-Reply-To: <D6772A0E-FBA4-4168-B152-7E7694720A16@FreeBSD.org>
MIME-Version: 1.0
Content-Type: Text/Plain;
  charset="iso-8859-15"
Content-Transfer-Encoding: 7bit
Message-Id: <201301141358.33216.jhb@freebsd.org>
X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7
 (bigwig.baldwin.cx); Mon, 14 Jan 2013 14:37:06 -0500 (EST)
Cc: toolchain@freebsd.org, Jilles Tjoelker <jilles@stack.nl>,
 freebsd-arch@freebsd.org
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 14 Jan 2013 19:37:07 -0000

On Monday, January 14, 2013 1:24:04 pm David Chisnall wrote:
> On 14 Jan 2013, at 17:47, Jilles Tjoelker wrote:
> 
> > The code which does that check is actually under contrib/gcc. Problem
> > is, they designed __gthread_active_p() to distinguish threaded and
> > unthreaded programming environments -- it must be known in advance and
> > cannot be changed later. The code for the unthreaded environment then
> > takes advantage of this by not even allocating memory for mutexes in
> > some cases.
> 
> It's worth taking a step back and asking why this code exists at all, and 
the main reason is that acquiring a mutex used to be really expensive.  It 
still is on some fruit-flavoured operating systems, but elsewhere it's a 
single atomic operation in the uncontended case, and in that case the cache 
line will already be exclusively owned by the calling core in single-threaded 
code.  
> 
> I would much rather that we followed the example of Solaris and made the 
multithreaded case fast and the default than keep piling on hacks that allow 
code to shave off a few clock cycles in the single-threaded case.  In 
particular, the popularity of multicore systems means that it is increasingly 
rare for code to be both single threaded and performance critical, so this 
seems like misplaced optimisation.

We have single-threaded performance critical applications that run on 
multicore systems (we just run several copies) and if we link in libthr, then 
pthread_mutex operations (even on uncontested locks) show up as one of the top 
consumers of CPU time when we profile our applications.

> I strongly suspect that making it possible to inline the uncontended lock 
case for a pthread mutex and eliminating all of the branches on __isthreaded 
would give us a net speedup in both single and multithreaded cases.

I'm less certain.  Note that you can't inline mutex ops until you expose
the mutexes themselves to userland (that is, making pthread_mutex_t not
be opaque).

-- 
John Baldwin