Date: Sun, 9 Dec 2012 04:55:50 +0000 From: Attilio Rao <attilio@freebsd.org> To: Bruce Evans <brde@optusnet.com.au> Cc: src-committers@freebsd.org, John Baldwin <jhb@freebsd.org>, Jeff Roberson <jeff@freebsd.org>, Florian Smeets <flo@freebsd.org>, Bruce Evans <bde@freebsd.org>, svn-src-projects@freebsd.org Subject: Re: svn commit: r238907 - projects/calloutng/sys/kern Message-ID: <CAJ-FndAfmV7GTRzZfRTbOf0Fw2Bq6JJeCqHgybdm04GmN8p-mg@mail.gmail.com> In-Reply-To: <20121207125401.S1231@besplex.bde.org> References: <201207301350.q6UDobCI099069@svn.freebsd.org> <CAJ-FndBj8tpC_BJXs_RH8sG2TBG8yA=Lxu3-GTVT9Ap_zOCuVQ@mail.gmail.com> <CAJ-FndDnO7wjnWPV0tTu%2BUGHjsxa3YDarMxmyei3ZmjLAFvRkQ@mail.gmail.com> <201207301732.33474.jhb@freebsd.org> <CAJ-FndD5EO12xsWOAe6u0EvX00q33wxO4OivnGjzj0=T2Oe8uA@mail.gmail.com> <CAJ-FndCRg0UCThFkatp=tw7rUWWCvhsApLE=iztLpxpGBC1F9w@mail.gmail.com> <CAJ-FndBqV2uD8Th9ePtxyJwhMAPzY3AXA5cQ7HszLp=%2BfSpHTA@mail.gmail.com> <CAJ-FndDPLmkpAJeGVN2wgbhdgHYezyUV-PPvH9e-CA7Go7HG3A@mail.gmail.com> <20121029155136.O943@besplex.bde.org> <CAJ-FndDpB2vQ7nBgYS4mNHb3_G4sXi3SfU7Y7kdVHMQZYKapig@mail.gmail.com> <20121109034942.C5338@besplex.bde.org> <CAJ-FndCZKpHaRcRMBT5XTRh0nRDjK3f-aVT3rjRzKpMRLU%2BDzA@mail.gmail.com> <CAJ-FndCnj9hKV9f%2BRmF-OCijtRpKUWv88AktvrsBYp5_hFUmVA@mail.gmail.com> <20121207125401.S1231@besplex.bde.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, Dec 7, 2012 at 2:42 AM, Bruce Evans <brde@optusnet.com.au> wrote: > On Wed, 5 Dec 2012, Attilio Rao wrote: > >> On Sat, Nov 24, 2012 at 3:43 PM, Attilio Rao <attilio@freebsd.org> wrote: >>> >>> On Thu, Nov 8, 2012 at 5:26 PM, Bruce Evans <brde@optusnet.com.au> wrote: >>>> >>>> On Fri, 2 Nov 2012, Attilio Rao wrote: >>>> >>>>> On 10/29/12, Bruce Evans <brde@optusnet.com.au> wrote: >>>>>> >>>>>> >>>>>> On Mon, 29 Oct 2012, Attilio Rao wrote: >>>>>> >>>>>>> Now that sched_pin()/sched_unpin() are fixed I would like to >>>>>>> introduce >>>>>>> this new patch, making critical_enter()/critical_exit() inline: >>>>>>> http://www.freebsd.org/~attilio/inline_critical.patch >>>>>> >>>>>> >>>>>> ... >>>>>> >>>>>> My version goes the other way and uninlines mtx_lock_spin() and >>>>>> mtx_unlock_spin(). Then it inlines (open codes) critical_enter() and >>>>>> ... >>>>> >>>>> >>>>> >>>>> So, I thought more about this and I think that inlining >>>>> critical_exit() is not really going to bring any benefit here but >>>>> bloat. >>>>> This is because nested critical sections are rare rather not, which >>>> >>>> >>>> >>>> Rather rare !not? :-) >>>> >>>> >>>>> means you will end up in doing the function call most of the time and >>>>> plus we have the possible pessimization introduced by the memory >>>>> clobber (__compiler_membar()) and as you note possible deficiency >>>>> caming from the branch misprediction*. >>>> >>>> >>>> >>>> This seems best. >>>> >>>> I see a point about the rareness of the branch in critical_exit(). >>>> Not sure if it is the one you are making: since the nested case is >>>> rare, then the branch will normally be correctly predicted. If the >>>> function remains uninlined, then the branch still has a good chance >>>> of being correctly predicted. This depends on the nested case being >>>> so rare across all callers, that the non-nested case doesn't mess >>>> up the prediction by happening often. The branch predictor only >>>> has to maintain history for 1 branch for this. However, if the >>>> call is inlined and there are many callers, there might be too many >>>> to maintain history for them all. >>> >>> >>> Yes, that's basically the same conclusion I came up with. >>> >>> It seems you are not opposed to this version of the patch. >>> I made some in-kernel benchmarks but they didn't really show a >>> performance improvements, bigger than 1-2%, which on SMP system is >>> basically accountable to thirdy-part effects. > > > 1-2% is about the best that can be hoped for from a single change, > but is too hard to measure with confidence. > > >>> However we already employ the same code for sched_pin() and I then >>> think that we can just commit this patch as-is. >> >> >> I made up my mind on instead not committing this patch, as I cannot >> prove a real performance gain, as also Jeff agreed with privately. >> Instead, I would like to commit this small comment explaining why it >> is not inlined (see below). Let me know what you think. > > > Good. > > Minor grammar and fixes: > > >> Index: sys/kern/kern_switch.c >> =================================================================== >> --- sys/kern/kern_switch.c (revision 243911) >> +++ sys/kern/kern_switch.c (working copy) >> @@ -176,6 +176,11 @@ retry: >> /* >> * Kernel thread preemption implementation. Critical sections mark >> * regions of code in which preemptions are not allowed. >> + * It would seem a good idea to inline such function but, in order > > > s/would/might/ > > s/such/such a/ or better (?), s/such function/crticical_enter() > > I think the new part of the comment only applies to critical_enter(). > critical_exit() is much larger, so it never seemed such a good idea > to inline it. > > If this sentence begins a new paragraph, then it should be preceded by > an empty line. Otherwise, it should not begin on a new line. I think > the latter applies. In fact, the comment should be separate. The old > part of the comment applies to both critical_enter() and critical_exit(), > but it is bogusly attached to only the former. After separating it from > the former, the new part of the comment can be better attached to the > former alone. > > >> + * to prevent instructions reordering by the compiler, a >> __compiler_membar() > > > s/instructions/instructions/ > > s/a __compiler.../__compiler.../ > > >> + * must be used here (look at sched_pin() case). The performance penalty > > > s/must/would have to/ > > s/look at sched_pin() case/the same as for sched_pin()/ > > Looking at sched_pin() doesn't show any comment about this. I seem to > remember discussions of this. Maybe the details are in a log message. > > Technical points: > - is the function being extern really enough to force the equivalent of > __compiler_membar()? Inter-file optimization might break this. It > would be easy to add an explicit __compiler_membar() to the beginning > of critical_enter(), but there are probably many other functions that > would need the same treatment for inter-file optimization. > - compilers already do intra-file optimization giving automatic inlining > of (static) functions that are only called once, unless this is > disabled by -fno-inline-functions-called-once. It should be disabled > by default, since it also breaks debugging including stack traces, > and profiling. Perhaps it also breaks implicit membars. > - IIRC, inlining is not permitted to change function call semantics, so > it may be a bug for the membar in sched_pin() to have any effect. > Anyway, extern functions are not required to give stricter ordering > than static [inline] ones. Function calls are required to give > sequence points (after their parameters have been evaluated), and > sequence points are required to have all side effects of previous > evaluations complete and no side effects of subsequent evaluations > begun. Is that any different from a membar? I think membars are > a little more magic, and it is hard to see how anything can give > stricter ordering requirements on the compiler than a sequence > point except by magic. > > >> + * imposed by the membar could, then, produce slower code than >> + * the function call itself, for most cases. >> */ > > > The punctuation given by all those commas seems to be correct, but oit > is painfully formal. I've integrated your suggestions and committed as r244046, thanks. For your concern about sched_pin() please check notes reported in this same thread and commit log. Thanks, Attilio -- Peace can only be achieved by understanding - A. Einstein
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAJ-FndAfmV7GTRzZfRTbOf0Fw2Bq6JJeCqHgybdm04GmN8p-mg>