Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 7 Dec 2012 13:42:03 +1100 (EST)
From:      Bruce Evans <brde@optusnet.com.au>
To:        Attilio Rao <attilio@FreeBSD.org>
Cc:        src-committers@FreeBSD.org, John Baldwin <jhb@FreeBSD.org>, Jeff Roberson <jeff@FreeBSD.org>, Florian Smeets <flo@FreeBSD.org>, Bruce Evans <bde@FreeBSD.org>, Bruce Evans <brde@optusnet.com.au>, svn-src-projects@FreeBSD.org
Subject:   Re: svn commit: r238907 - projects/calloutng/sys/kern
Message-ID:  <20121207125401.S1231@besplex.bde.org>
In-Reply-To: <CAJ-FndCnj9hKV9f%2BRmF-OCijtRpKUWv88AktvrsBYp5_hFUmVA@mail.gmail.com>
References:  <201207301350.q6UDobCI099069@svn.freebsd.org> <CAJ-FndBj8tpC_BJXs_RH8sG2TBG8yA=Lxu3-GTVT9Ap_zOCuVQ@mail.gmail.com> <CAJ-FndDnO7wjnWPV0tTu%2BUGHjsxa3YDarMxmyei3ZmjLAFvRkQ@mail.gmail.com> <201207301732.33474.jhb@freebsd.org> <CAJ-FndD5EO12xsWOAe6u0EvX00q33wxO4OivnGjzj0=T2Oe8uA@mail.gmail.com> <CAJ-FndCRg0UCThFkatp=tw7rUWWCvhsApLE=iztLpxpGBC1F9w@mail.gmail.com> <CAJ-FndBqV2uD8Th9ePtxyJwhMAPzY3AXA5cQ7HszLp=%2BfSpHTA@mail.gmail.com> <CAJ-FndDPLmkpAJeGVN2wgbhdgHYezyUV-PPvH9e-CA7Go7HG3A@mail.gmail.com> <20121029155136.O943@besplex.bde.org> <CAJ-FndDpB2vQ7nBgYS4mNHb3_G4sXi3SfU7Y7kdVHMQZYKapig@mail.gmail.com> <20121109034942.C5338@besplex.bde.org> <CAJ-FndCZKpHaRcRMBT5XTRh0nRDjK3f-aVT3rjRzKpMRLU%2BDzA@mail.gmail.com> <CAJ-FndCnj9hKV9f%2BRmF-OCijtRpKUWv88AktvrsBYp5_hFUmVA@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, 5 Dec 2012, Attilio Rao wrote:

> On Sat, Nov 24, 2012 at 3:43 PM, Attilio Rao <attilio@freebsd.org> wrote:
>> On Thu, Nov 8, 2012 at 5:26 PM, Bruce Evans <brde@optusnet.com.au> wrote:
>>> On Fri, 2 Nov 2012, Attilio Rao wrote:
>>>
>>>> On 10/29/12, Bruce Evans <brde@optusnet.com.au> wrote:
>>>>>
>>>>> On Mon, 29 Oct 2012, Attilio Rao wrote:
>>>>>
>>>>>> Now that sched_pin()/sched_unpin() are fixed I would like to introduce
>>>>>> this new patch, making critical_enter()/critical_exit() inline:
>>>>>> http://www.freebsd.org/~attilio/inline_critical.patch
>>>>>
>>>>> ...
>>>>>
>>>>> My version goes the other way and uninlines mtx_lock_spin() and
>>>>> mtx_unlock_spin().  Then it inlines (open codes) critical_enter() and
>>>>> ...
>>>>
>>>>
>>>> So, I thought more about this and I think that inlining
>>>> critical_exit() is not really going to bring any benefit here but
>>>> bloat.
>>>> This is because nested critical sections are rare rather not, which
>>>
>>>
>>> Rather rare !not? :-)
>>>
>>>
>>>> means you will end up in doing the function call most of the time and
>>>> plus we have the possible pessimization introduced by the memory
>>>> clobber (__compiler_membar()) and as you note possible deficiency
>>>> caming from the branch misprediction*.
>>>
>>>
>>> This seems best.
>>>
>>> I see a point about the rareness of the branch in critical_exit().
>>> Not sure if it is the one you are making: since the nested case is
>>> rare, then the branch will normally be correctly predicted.  If the
>>> function remains uninlined, then the branch still has a good chance
>>> of being correctly predicted.  This depends on the nested case being
>>> so rare across all callers, that the non-nested case doesn't mess
>>> up the prediction by happening often.  The branch predictor only
>>> has to maintain history for 1 branch for this.  However, if the
>>> call is inlined and there are many callers, there might be too many
>>> to maintain history for them all.
>>
>> Yes, that's basically the same conclusion I came up with.
>>
>> It seems you are not opposed to this version of the patch.
>> I made some in-kernel benchmarks but they didn't really show a
>> performance improvements, bigger than 1-2%, which on SMP system is
>> basically accountable to thirdy-part effects.

1-2% is about the best that can be hoped for from a single change,
but is too hard to measure with confidence.

>> However we already employ the same code for sched_pin() and I then
>> think that we can just commit this patch as-is.
>
> I made up my mind on instead not committing this patch, as I cannot
> prove a real performance gain, as also Jeff agreed with privately.
> Instead, I would like to commit this small comment explaining why it
> is not inlined (see below). Let me know what you think.

Good.

Minor grammar and fixes:

> Index: sys/kern/kern_switch.c
> ===================================================================
> --- sys/kern/kern_switch.c      (revision 243911)
> +++ sys/kern/kern_switch.c      (working copy)
> @@ -176,6 +176,11 @@ retry:
> /*
>  * Kernel thread preemption implementation.  Critical sections mark
>  * regions of code in which preemptions are not allowed.
> + * It would seem a good idea to inline such function but, in order

s/would/might/

s/such/such a/  or better (?),  s/such function/crticical_enter()

I think the new part of the comment only applies to critical_enter().
critical_exit() is much larger, so it never seemed such a good idea
to inline it.

If this sentence begins a new paragraph, then it should be preceded by
an empty line.  Otherwise, it should not begin on a new line.  I think
the latter applies.  In fact, the comment should be separate.  The old
part of the comment applies to both critical_enter() and critical_exit(),
but it is bogusly attached to only the former.  After separating it from
the former, the new part of the comment can be better attached to the
former alone.

> + * to prevent instructions reordering by the compiler, a __compiler_membar()

s/instructions/instructions/

s/a __compiler.../__compiler.../

> + * must be used here (look at sched_pin() case).  The performance penalty

s/must/would have to/

s/look at sched_pin() case/the same as for sched_pin()/

Looking at sched_pin() doesn't show any comment about this.  I seem to
remember discussions of this.  Maybe the details are in a log message.

Technical points:
- is the function being extern really enough to force the equivalent of
   __compiler_membar()?  Inter-file optimization might break this.  It
   would be easy to add an explicit __compiler_membar() to the beginning
   of critical_enter(), but there are probably many other functions that
   would need the same treatment for inter-file optimization.
- compilers already do intra-file optimization giving automatic inlining
   of (static) functions that are only called once, unless this is
   disabled by -fno-inline-functions-called-once.  It should be disabled
   by default, since it also breaks debugging including stack traces,
   and profiling.  Perhaps it also breaks implicit membars.
- IIRC, inlining is not permitted to change function call semantics, so
   it may be a bug for the membar in sched_pin() to have any effect.
   Anyway, extern functions are not required to give stricter ordering
   than static [inline] ones.  Function calls are required to give
   sequence points (after their parameters have been evaluated), and
   sequence points are required to have all side effects of previous
   evaluations complete and no side effects of subsequent evaluations
   begun.  Is that any different from a membar?  I think membars are
   a little more magic, and it is hard to see how anything can give
   stricter ordering requirements on the compiler than a sequence
   point except by magic.

> + * imposed by the membar could, then, produce slower code than
> + * the function call itself, for most cases.
>  */

The punctuation given by all those commas seems to be correct, but oit
is painfully formal.

> void
> critical_enter(void)
>

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20121207125401.S1231>