Date: Mon, 29 Oct 2012 13:06:20 +0000 From: Attilio Rao <attilio@freebsd.org> To: Bruce Evans <brde@optusnet.com.au> Cc: src-committers@freebsd.org, John Baldwin <jhb@freebsd.org>, Jeff Roberson <jeff@freebsd.org>, Florian Smeets <flo@freebsd.org>, Bruce Evans <bde@freebsd.org>, svn-src-projects@freebsd.org Subject: Re: svn commit: r238907 - projects/calloutng/sys/kern Message-ID: <CAJ-FndAyPVB8VS%2BzNZTUfVhSp9hSOZOjamBwxhhikq3gSMfs3g@mail.gmail.com> In-Reply-To: <20121029155136.O943@besplex.bde.org> References: <201207301350.q6UDobCI099069@svn.freebsd.org> <CAJ-FndBj8tpC_BJXs_RH8sG2TBG8yA=Lxu3-GTVT9Ap_zOCuVQ@mail.gmail.com> <CAJ-FndDnO7wjnWPV0tTu%2BUGHjsxa3YDarMxmyei3ZmjLAFvRkQ@mail.gmail.com> <201207301732.33474.jhb@freebsd.org> <CAJ-FndD5EO12xsWOAe6u0EvX00q33wxO4OivnGjzj0=T2Oe8uA@mail.gmail.com> <CAJ-FndCRg0UCThFkatp=tw7rUWWCvhsApLE=iztLpxpGBC1F9w@mail.gmail.com> <CAJ-FndBqV2uD8Th9ePtxyJwhMAPzY3AXA5cQ7HszLp=%2BfSpHTA@mail.gmail.com> <CAJ-FndDPLmkpAJeGVN2wgbhdgHYezyUV-PPvH9e-CA7Go7HG3A@mail.gmail.com> <20121029155136.O943@besplex.bde.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On 10/29/12, Bruce Evans <brde@optusnet.com.au> wrote: > On Mon, 29 Oct 2012, Attilio Rao wrote: > >> Now that sched_pin()/sched_unpin() are fixed I would like to introduce >> this new patch, making critical_enter()/critical_exit() inline: >> http://www.freebsd.org/~attilio/inline_critical.patch >> >> The concept is pretty simple: simple add/dec for critical_enter, exit >> are inlined, the rest is in an "hard path". Debugging enables the hard >> paths by default (really I think that only KTR may be due here, but I >> thought that in case of INVARIANTS this was also wanted, so I added >> the check also for that case). >> >> flo@ has stress-tested the patch already and he may be starting >> serious benchmarks soon. >> The effectiveness of this patch is to determine and it brings again >> the question: better an hard function or the usage of compiler membars >> to avoid issues? I think that this patch should be in only if we can >> absolutely prove a measurable performance bump, otherwise I will just >> add a note to critical_enter()/critical_exit() explaining why they >> should not be inlined at all. > > Inlining of mtx_lock_spin() is bogus unless critical_enter() is inlined. > Similarly for mtx_unlock_spin() and critical_exit(). It saves 1 function > call. but critical_enter() does a function call anyway. critical_exit*( > also has a branch in branch in it that might cost more than the function > call just for mispredicting it. Correct, that is a further argument for having inlined critical_enter(), even if the actual calling cames from spinlock_enter(), not critical_enter(), which must be MD (that's on FreeBSD, not sure what happens in your OS). > My version goes the other way and uninlines mtx_lock_spin() and > mtx_unlock_spin(). Then it inlines (open codes) critical_enter() and > critical_exit() in them. This saves a lot of text space and thus > hopefully saves time too. I couldn't find any cases where it either > saved or lost a significant amount of time. The main cause of time > differences is probably mispredicted branches: with mtx*() inlined, > the branches in it can be predicted separately, so they are more > predictable. However, when they are separate, there may be enough to > thrash the branch predictor. It is probably best to only inline mtx*() > for a few heavily-used locks and only inline critical*() if it is either > in one of these or is in a non-inline function. Otherwise, inlining > mainly gives code bloat that is only harmless if the code is rarely > actually used. But this is hard to configure. <sys/mutex.h> is > already a mess to support uninlining for debugging cases. > > OTOH, I couldn't get uninlining of mtx_lock() and mtx_unlock() to work. > These are a little smaller than the spinlock versions, mainly since they > don't inline critical*(). The loss from unlining them cannot be > compensated for by uninlining critical*() in them, and I found just 1 > workload where uninlining them was slower: udp packets per second bandwidth > tests. I don't think that uninling mtx_lock()/unlock() (btw, on which hw are you testing them if you are still able to catch performance penalties by branch misprediction?!) is a good idea, likely what would be a better one is to both inline critical_enter() and spinlock_enter(). Thanks, Attilio -- Peace can only be achieved by understanding - A. Einstein
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAJ-FndAyPVB8VS%2BzNZTUfVhSp9hSOZOjamBwxhhikq3gSMfs3g>