FreeBSD Mail Archives

Date:      Mon, 14 Jan 2013 13:58:32 -0500
From:      John Baldwin <jhb@freebsd.org>
To:        David Chisnall <theraven@freebsd.org>
Cc:        toolchain@freebsd.org, Jilles Tjoelker <jilles@stack.nl>, freebsd-arch@freebsd.org
Subject:   Re: Fast sigblock (AKA rtld speedup)
Message-ID:  <201301141358.33216.jhb@freebsd.org>
In-Reply-To: <D6772A0E-FBA4-4168-B152-7E7694720A16@FreeBSD.org>
References:  <20130107182235.GA65279@kib.kiev.ua> <20130114174703.GB88220@stack.nl> <D6772A0E-FBA4-4168-B152-7E7694720A16@FreeBSD.org>


On Monday, January 14, 2013 1:24:04 pm David Chisnall wrote:
> On 14 Jan 2013, at 17:47, Jilles Tjoelker wrote:
> 
> > The code which does that check is actually under contrib/gcc. Problem
> > is, they designed __gthread_active_p() to distinguish threaded and
> > unthreaded programming environments -- it must be known in advance and
> > cannot be changed later. The code for the unthreaded environment then
> > takes advantage of this by not even allocating memory for mutexes in
> > some cases.
> 
> It's worth taking a step back and asking why this code exists at all, and 
the main reason is that acquiring a mutex used to be really expensive.  It 
still is on some fruit-flavoured operating systems, but elsewhere it's a 
single atomic operation in the uncontended case, and in that case the cache 
line will already be exclusively owned by the calling core in single-threaded 
code.  
> 
> I would much rather that we followed the example of Solaris and made the 
multithreaded case fast and the default than keep piling on hacks that allow 
code to shave off a few clock cycles in the single-threaded case.  In 
particular, the popularity of multicore systems means that it is increasingly 
rare for code to be both single threaded and performance critical, so this 
seems like misplaced optimisation.

We have single-threaded performance critical applications that run on 
multicore systems (we just run several copies) and if we link in libthr, then 
pthread_mutex operations (even on uncontested locks) show up as one of the top 
consumers of CPU time when we profile our applications.

> I strongly suspect that making it possible to inline the uncontended lock 
case for a pthread mutex and eliminating all of the branches on __isthreaded 
would give us a net speedup in both single and multithreaded cases.

I'm less certain.  Note that you can't inline mutex ops until you expose
the mutexes themselves to userland (that is, making pthread_mutex_t not
be opaque).

-- 
John Baldwin

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201301141358.33216.jhb>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation