Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 10 Jul 2016 08:32:01 -0600
From:      Ian Lepore <ian@freebsd.org>
To:        Mateusz Guzik <mjguzik@gmail.com>, freebsd-current@freebsd.org
Subject:   Re: [PATCH] microoptimize locking primitives by introducing randomized delay between atomic ops
Message-ID:  <1468161121.72182.115.camel@freebsd.org>
In-Reply-To: <20160710111326.GA7853@dft-labs.eu>
References:  <20160710111326.GA7853@dft-labs.eu>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, 2016-07-10 at 13:13 +0200, Mateusz Guzik wrote:
> If the lock is contended, primitives like __mtx_lock_sleep will spin
> checking if the owner is running or the lock was freed. The problem
> is
> that once it is discovered that the lock is free, multiple CPUs are
> likely to try to do the atomic op which will make it more costly for
> everyone and throughput suffers.
> 
> The standard thing to do is to have some sort of a randomized delay
> so
> that this kind of behaviour is reduced.
> 
> As such, below is a trivial hack which takes cpu_ticks() into account
> and performs % 2048, which in my testing gives reasonbly good
> results.
> 
> Please note there is definitely way more room for improvement in
> general.
> 
> In terms of results, there was no statistically significant change in
> -j 40 buildworld nor buildkernel.
> 
> However, a 40-way find on a ports tree placed on tmpfs yielded the
> following:
> 
> x vanilla            
> + patched
> +--------------------------------------------------------------------
> --------------------+
> >     ++++                +                                         x
> >          x x x      |
> > +    ++++ +++    +  +  + ++       +       +     x               x 
> >  x  xxxxxxxx x x     x|
> >    |_____M____A__________|                                     
> >  |________AM______|     |
> +--------------------------------------------------------------------
> --------------------+
>     N           Min           Max        Median           Avg       
>  Stddev
> x  20        12.431        15.952        14.897       14.7444   
>  0.74241657
> +  20         8.103        11.863        9.0135       9.44565    
>  1.0059484
> Difference at 95.0% confidence
> 	-5.29875 +/- 0.565836
> 	-35.9374% +/- 3.83764%
> 	(Student's t, pooled s = 0.884057)
> 
> The patch:
[...]

What about platforms that don't have a useful implementation of
cpu_ticks()?

What about platforms that don't suffer the large expense for atomic ops
that x86 apparently does?

-- Ian




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1468161121.72182.115.camel>