From owner-svn-src-projects@FreeBSD.ORG Sun Dec 9 04:55:59 2012 Return-Path: Delivered-To: svn-src-projects@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 6E2F378B; Sun, 9 Dec 2012 04:55:59 +0000 (UTC) (envelope-from asmrookie@gmail.com) Received: from mail-la0-f54.google.com (mail-la0-f54.google.com [209.85.215.54]) by mx1.freebsd.org (Postfix) with ESMTP id DCF238FC08; Sun, 9 Dec 2012 04:55:57 +0000 (UTC) Received: by mail-la0-f54.google.com with SMTP id j13so1575740lah.13 for ; Sat, 08 Dec 2012 20:55:50 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:reply-to:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type; bh=NXX4iFT26THTrr8YjnBAPrlLl2HfsyOuVxnSgjX5F5s=; b=v7I+qpp7990jWN1VYH7jbNbwxoJtoOVO5N4FNtvC014e6UMvw4aNudP7xmK4Yw5ho4 JjmrFJYsxsd1WFcq9jhGvODbIksduY+HfHRj4Mqr/AfJRdFQ7DZxbtkiV1uykgsXv/zY +pTSEuMGOCCN2A7DmIjZOFS8bbH8e00ORXeg2D31APPWnenrAE9Skm1lgfR+1ckicbR0 PfV+iAJuaCGNbVdgCvLNzTnvAROA1dlw4ckFnypapHxRGtRl/1NEOoZ/Oy8TQkERiTLF JJ4GFJOFUNuyD3yTrpfkjtwcKy2ZIEBz9kMGCR/Q3gTg0upllXfIz87n2Ud6pVkUgfX9 3eXg== MIME-Version: 1.0 Received: by 10.152.132.3 with SMTP id oq3mr9762813lab.18.1355028950828; Sat, 08 Dec 2012 20:55:50 -0800 (PST) Sender: asmrookie@gmail.com Received: by 10.112.84.193 with HTTP; Sat, 8 Dec 2012 20:55:50 -0800 (PST) In-Reply-To: <20121207125401.S1231@besplex.bde.org> References: <201207301350.q6UDobCI099069@svn.freebsd.org> <201207301732.33474.jhb@freebsd.org> <20121029155136.O943@besplex.bde.org> <20121109034942.C5338@besplex.bde.org> <20121207125401.S1231@besplex.bde.org> Date: Sun, 9 Dec 2012 04:55:50 +0000 X-Google-Sender-Auth: OBJL8GYYWn_adGhRVugN46Sp9hw Message-ID: Subject: Re: svn commit: r238907 - projects/calloutng/sys/kern From: Attilio Rao To: Bruce Evans Content-Type: text/plain; charset=UTF-8 Cc: src-committers@freebsd.org, John Baldwin , Jeff Roberson , Florian Smeets , Bruce Evans , svn-src-projects@freebsd.org X-BeenThere: svn-src-projects@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list Reply-To: attilio@FreeBSD.org List-Id: "SVN commit messages for the src " projects" tree" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 09 Dec 2012 04:55:59 -0000 On Fri, Dec 7, 2012 at 2:42 AM, Bruce Evans wrote: > On Wed, 5 Dec 2012, Attilio Rao wrote: > >> On Sat, Nov 24, 2012 at 3:43 PM, Attilio Rao wrote: >>> >>> On Thu, Nov 8, 2012 at 5:26 PM, Bruce Evans wrote: >>>> >>>> On Fri, 2 Nov 2012, Attilio Rao wrote: >>>> >>>>> On 10/29/12, Bruce Evans wrote: >>>>>> >>>>>> >>>>>> On Mon, 29 Oct 2012, Attilio Rao wrote: >>>>>> >>>>>>> Now that sched_pin()/sched_unpin() are fixed I would like to >>>>>>> introduce >>>>>>> this new patch, making critical_enter()/critical_exit() inline: >>>>>>> http://www.freebsd.org/~attilio/inline_critical.patch >>>>>> >>>>>> >>>>>> ... >>>>>> >>>>>> My version goes the other way and uninlines mtx_lock_spin() and >>>>>> mtx_unlock_spin(). Then it inlines (open codes) critical_enter() and >>>>>> ... >>>>> >>>>> >>>>> >>>>> So, I thought more about this and I think that inlining >>>>> critical_exit() is not really going to bring any benefit here but >>>>> bloat. >>>>> This is because nested critical sections are rare rather not, which >>>> >>>> >>>> >>>> Rather rare !not? :-) >>>> >>>> >>>>> means you will end up in doing the function call most of the time and >>>>> plus we have the possible pessimization introduced by the memory >>>>> clobber (__compiler_membar()) and as you note possible deficiency >>>>> caming from the branch misprediction*. >>>> >>>> >>>> >>>> This seems best. >>>> >>>> I see a point about the rareness of the branch in critical_exit(). >>>> Not sure if it is the one you are making: since the nested case is >>>> rare, then the branch will normally be correctly predicted. If the >>>> function remains uninlined, then the branch still has a good chance >>>> of being correctly predicted. This depends on the nested case being >>>> so rare across all callers, that the non-nested case doesn't mess >>>> up the prediction by happening often. The branch predictor only >>>> has to maintain history for 1 branch for this. However, if the >>>> call is inlined and there are many callers, there might be too many >>>> to maintain history for them all. >>> >>> >>> Yes, that's basically the same conclusion I came up with. >>> >>> It seems you are not opposed to this version of the patch. >>> I made some in-kernel benchmarks but they didn't really show a >>> performance improvements, bigger than 1-2%, which on SMP system is >>> basically accountable to thirdy-part effects. > > > 1-2% is about the best that can be hoped for from a single change, > but is too hard to measure with confidence. > > >>> However we already employ the same code for sched_pin() and I then >>> think that we can just commit this patch as-is. >> >> >> I made up my mind on instead not committing this patch, as I cannot >> prove a real performance gain, as also Jeff agreed with privately. >> Instead, I would like to commit this small comment explaining why it >> is not inlined (see below). Let me know what you think. > > > Good. > > Minor grammar and fixes: > > >> Index: sys/kern/kern_switch.c >> =================================================================== >> --- sys/kern/kern_switch.c (revision 243911) >> +++ sys/kern/kern_switch.c (working copy) >> @@ -176,6 +176,11 @@ retry: >> /* >> * Kernel thread preemption implementation. Critical sections mark >> * regions of code in which preemptions are not allowed. >> + * It would seem a good idea to inline such function but, in order > > > s/would/might/ > > s/such/such a/ or better (?), s/such function/crticical_enter() > > I think the new part of the comment only applies to critical_enter(). > critical_exit() is much larger, so it never seemed such a good idea > to inline it. > > If this sentence begins a new paragraph, then it should be preceded by > an empty line. Otherwise, it should not begin on a new line. I think > the latter applies. In fact, the comment should be separate. The old > part of the comment applies to both critical_enter() and critical_exit(), > but it is bogusly attached to only the former. After separating it from > the former, the new part of the comment can be better attached to the > former alone. > > >> + * to prevent instructions reordering by the compiler, a >> __compiler_membar() > > > s/instructions/instructions/ > > s/a __compiler.../__compiler.../ > > >> + * must be used here (look at sched_pin() case). The performance penalty > > > s/must/would have to/ > > s/look at sched_pin() case/the same as for sched_pin()/ > > Looking at sched_pin() doesn't show any comment about this. I seem to > remember discussions of this. Maybe the details are in a log message. > > Technical points: > - is the function being extern really enough to force the equivalent of > __compiler_membar()? Inter-file optimization might break this. It > would be easy to add an explicit __compiler_membar() to the beginning > of critical_enter(), but there are probably many other functions that > would need the same treatment for inter-file optimization. > - compilers already do intra-file optimization giving automatic inlining > of (static) functions that are only called once, unless this is > disabled by -fno-inline-functions-called-once. It should be disabled > by default, since it also breaks debugging including stack traces, > and profiling. Perhaps it also breaks implicit membars. > - IIRC, inlining is not permitted to change function call semantics, so > it may be a bug for the membar in sched_pin() to have any effect. > Anyway, extern functions are not required to give stricter ordering > than static [inline] ones. Function calls are required to give > sequence points (after their parameters have been evaluated), and > sequence points are required to have all side effects of previous > evaluations complete and no side effects of subsequent evaluations > begun. Is that any different from a membar? I think membars are > a little more magic, and it is hard to see how anything can give > stricter ordering requirements on the compiler than a sequence > point except by magic. > > >> + * imposed by the membar could, then, produce slower code than >> + * the function call itself, for most cases. >> */ > > > The punctuation given by all those commas seems to be correct, but oit > is painfully formal. I've integrated your suggestions and committed as r244046, thanks. For your concern about sched_pin() please check notes reported in this same thread and commit log. Thanks, Attilio -- Peace can only be achieved by understanding - A. Einstein