From owner-freebsd-arch@FreeBSD.ORG Mon Jan 14 19:37:07 2013 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id B4962623; Mon, 14 Jan 2013 19:37:07 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from bigwig.baldwin.cx (bigknife-pt.tunnel.tserv9.chi1.ipv6.he.net [IPv6:2001:470:1f10:75::2]) by mx1.freebsd.org (Postfix) with ESMTP id 8FB1534A; Mon, 14 Jan 2013 19:37:07 +0000 (UTC) Received: from pakbsde14.localnet (unknown [38.105.238.108]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id A04EBB95B; Mon, 14 Jan 2013 14:37:06 -0500 (EST) From: John Baldwin To: David Chisnall Subject: Re: Fast sigblock (AKA rtld speedup) Date: Mon, 14 Jan 2013 13:58:32 -0500 User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110714-p22; KDE/4.5.5; amd64; ; ) References: <20130107182235.GA65279@kib.kiev.ua> <20130114174703.GB88220@stack.nl> In-Reply-To: MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-15" Content-Transfer-Encoding: 7bit Message-Id: <201301141358.33216.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Mon, 14 Jan 2013 14:37:06 -0500 (EST) Cc: toolchain@freebsd.org, Jilles Tjoelker , freebsd-arch@freebsd.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 14 Jan 2013 19:37:07 -0000 On Monday, January 14, 2013 1:24:04 pm David Chisnall wrote: > On 14 Jan 2013, at 17:47, Jilles Tjoelker wrote: > > > The code which does that check is actually under contrib/gcc. Problem > > is, they designed __gthread_active_p() to distinguish threaded and > > unthreaded programming environments -- it must be known in advance and > > cannot be changed later. The code for the unthreaded environment then > > takes advantage of this by not even allocating memory for mutexes in > > some cases. > > It's worth taking a step back and asking why this code exists at all, and the main reason is that acquiring a mutex used to be really expensive. It still is on some fruit-flavoured operating systems, but elsewhere it's a single atomic operation in the uncontended case, and in that case the cache line will already be exclusively owned by the calling core in single-threaded code. > > I would much rather that we followed the example of Solaris and made the multithreaded case fast and the default than keep piling on hacks that allow code to shave off a few clock cycles in the single-threaded case. In particular, the popularity of multicore systems means that it is increasingly rare for code to be both single threaded and performance critical, so this seems like misplaced optimisation. We have single-threaded performance critical applications that run on multicore systems (we just run several copies) and if we link in libthr, then pthread_mutex operations (even on uncontested locks) show up as one of the top consumers of CPU time when we profile our applications. > I strongly suspect that making it possible to inline the uncontended lock case for a pthread mutex and eliminating all of the branches on __isthreaded would give us a net speedup in both single and multithreaded cases. I'm less certain. Note that you can't inline mutex ops until you expose the mutexes themselves to userland (that is, making pthread_mutex_t not be opaque). -- John Baldwin