Date: Fri, 2 Nov 2012 09:02:18 +0000 From: Attilio Rao <attilio@freebsd.org> To: Bruce Evans <brde@optusnet.com.au> Cc: src-committers@freebsd.org, John Baldwin <jhb@freebsd.org>, Jeff Roberson <jeff@freebsd.org>, Florian Smeets <flo@freebsd.org>, Bruce Evans <bde@freebsd.org>, svn-src-projects@freebsd.org Subject: Re: svn commit: r238907 - projects/calloutng/sys/kern Message-ID: <CAJ-FndDpB2vQ7nBgYS4mNHb3_G4sXi3SfU7Y7kdVHMQZYKapig@mail.gmail.com> In-Reply-To: <20121029155136.O943@besplex.bde.org> References: <201207301350.q6UDobCI099069@svn.freebsd.org> <CAJ-FndBj8tpC_BJXs_RH8sG2TBG8yA=Lxu3-GTVT9Ap_zOCuVQ@mail.gmail.com> <CAJ-FndDnO7wjnWPV0tTu%2BUGHjsxa3YDarMxmyei3ZmjLAFvRkQ@mail.gmail.com> <201207301732.33474.jhb@freebsd.org> <CAJ-FndD5EO12xsWOAe6u0EvX00q33wxO4OivnGjzj0=T2Oe8uA@mail.gmail.com> <CAJ-FndCRg0UCThFkatp=tw7rUWWCvhsApLE=iztLpxpGBC1F9w@mail.gmail.com> <CAJ-FndBqV2uD8Th9ePtxyJwhMAPzY3AXA5cQ7HszLp=%2BfSpHTA@mail.gmail.com> <CAJ-FndDPLmkpAJeGVN2wgbhdgHYezyUV-PPvH9e-CA7Go7HG3A@mail.gmail.com> <20121029155136.O943@besplex.bde.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On 10/29/12, Bruce Evans <brde@optusnet.com.au> wrote: > On Mon, 29 Oct 2012, Attilio Rao wrote: > >> Now that sched_pin()/sched_unpin() are fixed I would like to introduce >> this new patch, making critical_enter()/critical_exit() inline: >> http://www.freebsd.org/~attilio/inline_critical.patch >> >> The concept is pretty simple: simple add/dec for critical_enter, exit >> are inlined, the rest is in an "hard path". Debugging enables the hard >> paths by default (really I think that only KTR may be due here, but I >> thought that in case of INVARIANTS this was also wanted, so I added >> the check also for that case). >> >> flo@ has stress-tested the patch already and he may be starting >> serious benchmarks soon. >> The effectiveness of this patch is to determine and it brings again >> the question: better an hard function or the usage of compiler membars >> to avoid issues? I think that this patch should be in only if we can >> absolutely prove a measurable performance bump, otherwise I will just >> add a note to critical_enter()/critical_exit() explaining why they >> should not be inlined at all. > > Inlining of mtx_lock_spin() is bogus unless critical_enter() is inlined. > Similarly for mtx_unlock_spin() and critical_exit(). It saves 1 function > call. but critical_enter() does a function call anyway. critical_exit*( > also has a branch in branch in it that might cost more than the function > call just for mispredicting it. > > My version goes the other way and uninlines mtx_lock_spin() and > mtx_unlock_spin(). Then it inlines (open codes) critical_enter() and > critical_exit() in them. This saves a lot of text space and thus > hopefully saves time too. I couldn't find any cases where it either > saved or lost a significant amount of time. The main cause of time > differences is probably mispredicted branches: with mtx*() inlined, > the branches in it can be predicted separately, so they are more > predictable. However, when they are separate, there may be enough to > thrash the branch predictor. It is probably best to only inline mtx*() > for a few heavily-used locks and only inline critical*() if it is either > in one of these or is in a non-inline function. Otherwise, inlining > mainly gives code bloat that is only harmless if the code is rarely > actually used. But this is hard to configure. <sys/mutex.h> is > already a mess to support uninlining for debugging cases. So, I thought more about this and I think that inlining critical_exit() is not really going to bring any benefit here but bloat. This is because nested critical sections are rare rather not, which means you will end up in doing the function call most of the time and plus we have the possible pessimization introduced by the memory clobber (__compiler_membar()) and as you note possible deficiency caming from the branch misprediction*. All that to say that I shrunk my patch to only inline critical_enter(). Respect v1, this patch also does use the fast path in all cases but KTR, does use a common accessor function for both hard and soft versions to bump the counter. In theory inlined critical_enter() should be a net win, but I need to carefully evaluate the difference introduced by the __compiler_membar(), a case very similar to sched_pin(). I will start doing some investigations soon but they would require some good effort for both the widespread nature of critical_enter() and the difference in code gcc will certainly produce because of a less function call and the memory clobber. New patch here: http://www.freebsd.org/~attilio/inline_critical2.patch I feel very unsure style-wise about my add to sys/systm.h because it is already very messy that I cannot give a logical sense to that chunk. Thanks, Attilio * In theory we could use __predict_false() or similar but I don't think this will really help as x86 doesn't have explicit instructions to control branch prediction -- Peace can only be achieved by understanding - A. Einstein
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAJ-FndDpB2vQ7nBgYS4mNHb3_G4sXi3SfU7Y7kdVHMQZYKapig>