From owner-freebsd-current@freebsd.org Sat Jul 16 15:17:45 2016 Return-Path: Delivered-To: freebsd-current@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 04432B9B6A5 for ; Sat, 16 Jul 2016 15:17:45 +0000 (UTC) (envelope-from mjguzik@gmail.com) Received: from mail-wm0-x229.google.com (mail-wm0-x229.google.com [IPv6:2a00:1450:400c:c09::229]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 8029F1300; Sat, 16 Jul 2016 15:17:44 +0000 (UTC) (envelope-from mjguzik@gmail.com) Received: by mail-wm0-x229.google.com with SMTP id o80so64647440wme.1; Sat, 16 Jul 2016 08:17:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=date:from:to:cc:subject:message-id:mail-followup-to:references :mime-version:content-disposition:in-reply-to:user-agent; bh=D2D2V+KSVnde0r89esjwBqFtu3HmaPIM8UHrf1Rsz/w=; b=EMWrTtKjGE9LVS4PFon1uCfxRu+abRuj7KMnGuvlriPazCv211saOuQpflTctKitnf 1er2cNYEmBec7okEAMTxjimUTgOfvtpvKRu/KV0+/x1IYtl0zQGIOklW0b2FKvkAgJ21 g4+5BKxOIRC4NUlZR5PTSNs4ifqdQ9F3wDckuO4kc+YhyaBul8Z6ESAahUDT5Ax6UhG5 p1K7BFzK4OF7h4UBx7zcnrGVM05wsK2jvOCWqPT/v18IMJ/k1uJ7GcNc3De0J+vVazP7 slBPpLEHfiWEBWHj+qqkAtRlMLr3W7tPy+rPJ1tjGjo9Ja4lvg4dDSXahFa10bVBSpDu 4WtA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:date:from:to:cc:subject:message-id :mail-followup-to:references:mime-version:content-disposition :in-reply-to:user-agent; bh=D2D2V+KSVnde0r89esjwBqFtu3HmaPIM8UHrf1Rsz/w=; b=dC6qOoiZolbvT2H0qgtbF+8IVMeFGYyWR461FoEEXIZvWF16rR64X9YGIhEr4IQ+x8 rFGC5iQtDRNgh90MO49XyCikz8S5XQRHWtA9Vi989itzZulDw8NyRbLtS/41F+r2QVJq 3HcEBqg3KxDIHwI7jtb97+qmf0s2ryMZvXl7wtUUw6QR37s3LtXKR+Lso0pTQo2joLk0 Dx5VycrBKbnf7++zMzGM9k2bvw1xEfWHyLMeqVfmdlOVJQ9MLRtE4Q1w5/d+jaUT058q QG/3JK74U2pIBpg/J96oF//TmSte1vz4ieqN/pSXmWTEeDhhCmZfHOhUaY7yNU1bTBXO pFTQ== X-Gm-Message-State: ALyK8tJ1ZY1j2njOfj1goX/52b7QjY+ao+LkLXQJiUS6cyjzPeabHSU4YOoABTIzzQBXtg== X-Received: by 10.194.141.84 with SMTP id rm20mr5177544wjb.16.1468682262741; Sat, 16 Jul 2016 08:17:42 -0700 (PDT) Received: from dft-labs.eu (n1x0n-1-pt.tunnel.tserv5.lon1.ipv6.he.net. [2001:470:1f08:1f7::2]) by smtp.gmail.com with ESMTPSA id z5sm3212998wme.5.2016.07.16.08.17.41 (version=TLS1_2 cipher=AES128-SHA bits=128/128); Sat, 16 Jul 2016 08:17:41 -0700 (PDT) Date: Sat, 16 Jul 2016 17:17:39 +0200 From: Mateusz Guzik To: Ian Lepore Cc: freebsd-current@freebsd.org Subject: Re: [PATCH] microoptimize locking primitives by introducing randomized delay between atomic ops Message-ID: <20160716151739.GA23095@dft-labs.eu> Mail-Followup-To: Mateusz Guzik , Ian Lepore , freebsd-current@freebsd.org References: <20160710111326.GA7853@dft-labs.eu> <1468161121.72182.115.camel@freebsd.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <1468161121.72182.115.camel@freebsd.org> User-Agent: Mutt/1.5.21 (2010-09-15) X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 16 Jul 2016 15:17:45 -0000 On Sun, Jul 10, 2016 at 08:32:01AM -0600, Ian Lepore wrote: > On Sun, 2016-07-10 at 13:13 +0200, Mateusz Guzik wrote: > > If the lock is contended, primitives like __mtx_lock_sleep will spin > > checking if the owner is running or the lock was freed. The problem > > is > > that once it is discovered that the lock is free, multiple CPUs are > > likely to try to do the atomic op which will make it more costly for > > everyone and throughput suffers. > > > > The standard thing to do is to have some sort of a randomized delay > > so > > that this kind of behaviour is reduced. > > > > As such, below is a trivial hack which takes cpu_ticks() into account > > and performs % 2048, which in my testing gives reasonbly good > > results. > > > > Please note there is definitely way more room for improvement in > > general. > > > > In terms of results, there was no statistically significant change in > > -j 40 buildworld nor buildkernel. > > > > However, a 40-way find on a ports tree placed on tmpfs yielded the > > following: > > > > x vanilla > > + patched > > +-------------------------------------------------------------------- > > --------------------+ > > > ++++ + x > > > x x x | > > > + ++++ +++ + + + ++ + + x x > > > x xxxxxxxx x x x| > > > |_____M____A__________| > > > |________AM______| | > > +-------------------------------------------------------------------- > > --------------------+ > > N Min Max Median Avg > > Stddev > > x 20 12.431 15.952 14.897 14.7444 > > 0.74241657 > > + 20 8.103 11.863 9.0135 9.44565 > > 1.0059484 > > Difference at 95.0% confidence > > -5.29875 +/- 0.565836 > > -35.9374% +/- 3.83764% > > (Student's t, pooled s = 0.884057) > > > > The patch: > [...] > > What about platforms that don't have a useful implementation of > cpu_ticks()? > Do we have such platforms and do they have smp? > What about platforms that don't suffer the large expense for atomic ops > that x86 apparently does? > The current state of locking primitives already seems to be x86-centric. Postponing of atomic ops is implemented in some parts and this patch only extends it (in a different form). That said, if we have platforms where this kind of stuff is detrimental to performance, machine-specific primitives should be introduced. Meanwhile, courtesy of andrew@ I tested the patch on cavium (48-way arm64) and saw great improvement. x vanilla + patched +----------------------------------------------------------------------------------------+ |+ | |+ | |+ | |+ | |+ | |+ x | |++ xxx | |++ xxxxxx| |A| |A_| | +----------------------------------------------------------------------------------------+ N Min Max Median Avg Stddev x 10 17.25 17.849 17.48 17.4968 0.19581556 + 10 6.56 6.679 6.586 6.6011 0.038013009 Difference at 95.0% confidence -10.8957 +/- 0.132528 -62.2725% +/- 0.757439% (Student's t, pooled s = 0.141047) Note: find does open+close a lot. close results in exclusive vnode locking if the fs does not have the MNTK_EXTENDED_SHARED flag set, which is the case on tmpfs. On this machine it contributed to a major slowdown. The flag was set locally. I'm not sure yet how safe the change in terms of general use. It is definitely fine enough for the benchmark. That said, I would like to commit this next week unless there are objections. -- Mateusz Guzik