Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 2 Nov 2012 09:02:18 +0000
From:      Attilio Rao <attilio@freebsd.org>
To:        Bruce Evans <brde@optusnet.com.au>
Cc:        src-committers@freebsd.org, John Baldwin <jhb@freebsd.org>, Jeff Roberson <jeff@freebsd.org>, Florian Smeets <flo@freebsd.org>, Bruce Evans <bde@freebsd.org>, svn-src-projects@freebsd.org
Subject:   Re: svn commit: r238907 - projects/calloutng/sys/kern
Message-ID:  <CAJ-FndDpB2vQ7nBgYS4mNHb3_G4sXi3SfU7Y7kdVHMQZYKapig@mail.gmail.com>
In-Reply-To: <20121029155136.O943@besplex.bde.org>
References:  <201207301350.q6UDobCI099069@svn.freebsd.org> <CAJ-FndBj8tpC_BJXs_RH8sG2TBG8yA=Lxu3-GTVT9Ap_zOCuVQ@mail.gmail.com> <CAJ-FndDnO7wjnWPV0tTu%2BUGHjsxa3YDarMxmyei3ZmjLAFvRkQ@mail.gmail.com> <201207301732.33474.jhb@freebsd.org> <CAJ-FndD5EO12xsWOAe6u0EvX00q33wxO4OivnGjzj0=T2Oe8uA@mail.gmail.com> <CAJ-FndCRg0UCThFkatp=tw7rUWWCvhsApLE=iztLpxpGBC1F9w@mail.gmail.com> <CAJ-FndBqV2uD8Th9ePtxyJwhMAPzY3AXA5cQ7HszLp=%2BfSpHTA@mail.gmail.com> <CAJ-FndDPLmkpAJeGVN2wgbhdgHYezyUV-PPvH9e-CA7Go7HG3A@mail.gmail.com> <20121029155136.O943@besplex.bde.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On 10/29/12, Bruce Evans <brde@optusnet.com.au> wrote:
> On Mon, 29 Oct 2012, Attilio Rao wrote:
>
>> Now that sched_pin()/sched_unpin() are fixed I would like to introduce
>> this new patch, making critical_enter()/critical_exit() inline:
>> http://www.freebsd.org/~attilio/inline_critical.patch
>>
>> The concept is pretty simple: simple add/dec for critical_enter, exit
>> are inlined, the rest is in an "hard path". Debugging enables the hard
>> paths by default (really I think that only KTR may be due here, but I
>> thought that in case of INVARIANTS this was also wanted, so I added
>> the check also for that case).
>>
>> flo@ has stress-tested the patch already and he may be starting
>> serious benchmarks soon.
>> The effectiveness of this patch is to determine and it brings again
>> the question: better an hard function or the usage of compiler membars
>> to avoid issues? I think that this patch should be in only if we can
>> absolutely prove a measurable performance bump, otherwise I will just
>> add a note to critical_enter()/critical_exit() explaining why they
>> should not be inlined at all.
>
> Inlining of mtx_lock_spin() is bogus unless critical_enter() is inlined.
> Similarly for mtx_unlock_spin() and critical_exit().  It saves 1 function
> call. but critical_enter() does a function call anyway.  critical_exit*(
> also has a branch in branch in it that might cost more than the function
> call just for mispredicting it.
>
> My version goes the other way and uninlines mtx_lock_spin() and
> mtx_unlock_spin().  Then it inlines (open codes) critical_enter() and
> critical_exit() in them.  This saves a lot of text space and thus
> hopefully saves time too.  I couldn't find any cases where it either
> saved or lost a significant amount of time.  The main cause of time
> differences is probably mispredicted branches: with mtx*() inlined,
> the branches in it can be predicted separately, so they are more
> predictable.  However, when they are separate, there may be enough to
> thrash the branch predictor.  It is probably best to only inline mtx*()
> for a few heavily-used locks and only inline critical*() if it is either
> in one of these or is in a non-inline function.  Otherwise, inlining
> mainly gives code bloat that is only harmless if the code is rarely
> actually used.  But this is hard to configure.  <sys/mutex.h> is
> already a mess to support uninlining for debugging cases.

So, I thought more about this and I think that inlining
critical_exit() is not really going to bring any benefit here but
bloat.
This is because nested critical sections are rare rather not, which
means you will end up in doing the function call most of the time and
plus we have the possible pessimization introduced by the memory
clobber (__compiler_membar()) and as you note possible deficiency
caming from the branch misprediction*.

All that to say that I shrunk my patch to only inline
critical_enter(). Respect v1, this patch also does use the fast path
in all cases but KTR, does use a common accessor function for both
hard and soft versions to bump the counter.
In theory inlined critical_enter() should be a net win, but I need to
carefully evaluate the difference introduced by the
__compiler_membar(), a case very similar to sched_pin(). I will start
doing some investigations soon but they would require some good effort
for both the widespread nature of critical_enter() and the difference
in code gcc will certainly produce because of a less function call and
the memory clobber.

New patch here:
http://www.freebsd.org/~attilio/inline_critical2.patch

I feel very unsure style-wise about my add to sys/systm.h because it
is already very messy that I cannot give a logical sense to that
chunk.

Thanks,
Attilio

* In theory we could use __predict_false() or similar but I don't
think this will really help as x86 doesn't have explicit instructions
to control branch prediction

-- 
Peace can only be achieved by understanding - A. Einstein



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAJ-FndDpB2vQ7nBgYS4mNHb3_G4sXi3SfU7Y7kdVHMQZYKapig>