From owner-svn-src-projects@FreeBSD.ORG Fri Dec 7 02:42:27 2012 Return-Path: Delivered-To: svn-src-projects@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 021B4514; Fri, 7 Dec 2012 02:42:27 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from fallbackmx07.syd.optusnet.com.au (fallbackmx07.syd.optusnet.com.au [211.29.132.9]) by mx1.freebsd.org (Postfix) with ESMTP id 57D588FC12; Fri, 7 Dec 2012 02:42:22 +0000 (UTC) Received: from mail10.syd.optusnet.com.au (mail10.syd.optusnet.com.au [211.29.132.191]) by fallbackmx07.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id qB72gFvL016289; Fri, 7 Dec 2012 13:42:16 +1100 Received: from c122-106-175-26.carlnfd1.nsw.optusnet.com.au (c122-106-175-26.carlnfd1.nsw.optusnet.com.au [122.106.175.26]) by mail10.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id qB72g3Nj004606 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 7 Dec 2012 13:42:05 +1100 Date: Fri, 7 Dec 2012 13:42:03 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Attilio Rao Subject: Re: svn commit: r238907 - projects/calloutng/sys/kern In-Reply-To: Message-ID: <20121207125401.S1231@besplex.bde.org> References: <201207301350.q6UDobCI099069@svn.freebsd.org> <201207301732.33474.jhb@freebsd.org> <20121029155136.O943@besplex.bde.org> <20121109034942.C5338@besplex.bde.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.0 cv=CJhyxmXD c=1 sm=1 a=zSpkIMvUouMA:10 a=kj9zAlcOel0A:10 a=PO7r1zJSAAAA:8 a=JzwRw_2MAAAA:8 a=Y4pc3zAY36oA:10 a=6I5d2MoRAAAA:8 a=iE42H3Ao5iMW_Y88yhAA:9 a=CjuIK1q_8ugA:10 a=SV7veod9ZcQA:10 a=bxQHXO5Py4tHmhUgaywp5w==:117 Cc: src-committers@FreeBSD.org, John Baldwin , Jeff Roberson , Florian Smeets , Bruce Evans , Bruce Evans , svn-src-projects@FreeBSD.org X-BeenThere: svn-src-projects@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "SVN commit messages for the src " projects" tree" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 07 Dec 2012 02:42:27 -0000 On Wed, 5 Dec 2012, Attilio Rao wrote: > On Sat, Nov 24, 2012 at 3:43 PM, Attilio Rao wrote: >> On Thu, Nov 8, 2012 at 5:26 PM, Bruce Evans wrote: >>> On Fri, 2 Nov 2012, Attilio Rao wrote: >>> >>>> On 10/29/12, Bruce Evans wrote: >>>>> >>>>> On Mon, 29 Oct 2012, Attilio Rao wrote: >>>>> >>>>>> Now that sched_pin()/sched_unpin() are fixed I would like to introduce >>>>>> this new patch, making critical_enter()/critical_exit() inline: >>>>>> http://www.freebsd.org/~attilio/inline_critical.patch >>>>> >>>>> ... >>>>> >>>>> My version goes the other way and uninlines mtx_lock_spin() and >>>>> mtx_unlock_spin(). Then it inlines (open codes) critical_enter() and >>>>> ... >>>> >>>> >>>> So, I thought more about this and I think that inlining >>>> critical_exit() is not really going to bring any benefit here but >>>> bloat. >>>> This is because nested critical sections are rare rather not, which >>> >>> >>> Rather rare !not? :-) >>> >>> >>>> means you will end up in doing the function call most of the time and >>>> plus we have the possible pessimization introduced by the memory >>>> clobber (__compiler_membar()) and as you note possible deficiency >>>> caming from the branch misprediction*. >>> >>> >>> This seems best. >>> >>> I see a point about the rareness of the branch in critical_exit(). >>> Not sure if it is the one you are making: since the nested case is >>> rare, then the branch will normally be correctly predicted. If the >>> function remains uninlined, then the branch still has a good chance >>> of being correctly predicted. This depends on the nested case being >>> so rare across all callers, that the non-nested case doesn't mess >>> up the prediction by happening often. The branch predictor only >>> has to maintain history for 1 branch for this. However, if the >>> call is inlined and there are many callers, there might be too many >>> to maintain history for them all. >> >> Yes, that's basically the same conclusion I came up with. >> >> It seems you are not opposed to this version of the patch. >> I made some in-kernel benchmarks but they didn't really show a >> performance improvements, bigger than 1-2%, which on SMP system is >> basically accountable to thirdy-part effects. 1-2% is about the best that can be hoped for from a single change, but is too hard to measure with confidence. >> However we already employ the same code for sched_pin() and I then >> think that we can just commit this patch as-is. > > I made up my mind on instead not committing this patch, as I cannot > prove a real performance gain, as also Jeff agreed with privately. > Instead, I would like to commit this small comment explaining why it > is not inlined (see below). Let me know what you think. Good. Minor grammar and fixes: > Index: sys/kern/kern_switch.c > =================================================================== > --- sys/kern/kern_switch.c (revision 243911) > +++ sys/kern/kern_switch.c (working copy) > @@ -176,6 +176,11 @@ retry: > /* > * Kernel thread preemption implementation. Critical sections mark > * regions of code in which preemptions are not allowed. > + * It would seem a good idea to inline such function but, in order s/would/might/ s/such/such a/ or better (?), s/such function/crticical_enter() I think the new part of the comment only applies to critical_enter(). critical_exit() is much larger, so it never seemed such a good idea to inline it. If this sentence begins a new paragraph, then it should be preceded by an empty line. Otherwise, it should not begin on a new line. I think the latter applies. In fact, the comment should be separate. The old part of the comment applies to both critical_enter() and critical_exit(), but it is bogusly attached to only the former. After separating it from the former, the new part of the comment can be better attached to the former alone. > + * to prevent instructions reordering by the compiler, a __compiler_membar() s/instructions/instructions/ s/a __compiler.../__compiler.../ > + * must be used here (look at sched_pin() case). The performance penalty s/must/would have to/ s/look at sched_pin() case/the same as for sched_pin()/ Looking at sched_pin() doesn't show any comment about this. I seem to remember discussions of this. Maybe the details are in a log message. Technical points: - is the function being extern really enough to force the equivalent of __compiler_membar()? Inter-file optimization might break this. It would be easy to add an explicit __compiler_membar() to the beginning of critical_enter(), but there are probably many other functions that would need the same treatment for inter-file optimization. - compilers already do intra-file optimization giving automatic inlining of (static) functions that are only called once, unless this is disabled by -fno-inline-functions-called-once. It should be disabled by default, since it also breaks debugging including stack traces, and profiling. Perhaps it also breaks implicit membars. - IIRC, inlining is not permitted to change function call semantics, so it may be a bug for the membar in sched_pin() to have any effect. Anyway, extern functions are not required to give stricter ordering than static [inline] ones. Function calls are required to give sequence points (after their parameters have been evaluated), and sequence points are required to have all side effects of previous evaluations complete and no side effects of subsequent evaluations begun. Is that any different from a membar? I think membars are a little more magic, and it is hard to see how anything can give stricter ordering requirements on the compiler than a sequence point except by magic. > + * imposed by the membar could, then, produce slower code than > + * the function call itself, for most cases. > */ The punctuation given by all those commas seems to be correct, but oit is painfully formal. > void > critical_enter(void) > Bruce