Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 30 Oct 2012 02:34:23 +1100 (EST)
From:      Bruce Evans <brde@optusnet.com.au>
To:        Attilio Rao <attilio@FreeBSD.org>
Cc:        src-committers@FreeBSD.org, John Baldwin <jhb@FreeBSD.org>, Jeff Roberson <jeff@FreeBSD.org>, Florian Smeets <flo@FreeBSD.org>, Bruce Evans <bde@FreeBSD.org>, Bruce Evans <brde@optusnet.com.au>, svn-src-projects@FreeBSD.org
Subject:   Re: svn commit: r238907 - projects/calloutng/sys/kern
Message-ID:  <20121030014250.D5191@besplex.bde.org>
In-Reply-To: <CAJ-FndAyPVB8VS%2BzNZTUfVhSp9hSOZOjamBwxhhikq3gSMfs3g@mail.gmail.com>
References:  <201207301350.q6UDobCI099069@svn.freebsd.org> <CAJ-FndBj8tpC_BJXs_RH8sG2TBG8yA=Lxu3-GTVT9Ap_zOCuVQ@mail.gmail.com> <CAJ-FndDnO7wjnWPV0tTu%2BUGHjsxa3YDarMxmyei3ZmjLAFvRkQ@mail.gmail.com> <201207301732.33474.jhb@freebsd.org> <CAJ-FndD5EO12xsWOAe6u0EvX00q33wxO4OivnGjzj0=T2Oe8uA@mail.gmail.com> <CAJ-FndCRg0UCThFkatp=tw7rUWWCvhsApLE=iztLpxpGBC1F9w@mail.gmail.com> <CAJ-FndBqV2uD8Th9ePtxyJwhMAPzY3AXA5cQ7HszLp=%2BfSpHTA@mail.gmail.com> <CAJ-FndDPLmkpAJeGVN2wgbhdgHYezyUV-PPvH9e-CA7Go7HG3A@mail.gmail.com> <20121029155136.O943@besplex.bde.org> <CAJ-FndAyPVB8VS%2BzNZTUfVhSp9hSOZOjamBwxhhikq3gSMfs3g@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, 29 Oct 2012, Attilio Rao wrote:

> On 10/29/12, Bruce Evans <brde@optusnet.com.au> wrote:
>> On Mon, 29 Oct 2012, Attilio Rao wrote:
>>
>>> Now that sched_pin()/sched_unpin() are fixed I would like to introduce
>>> this new patch, making critical_enter()/critical_exit() inline:
>>> http://www.freebsd.org/~attilio/inline_critical.patch
>>>
>>> The concept is pretty simple: simple add/dec for critical_enter, exit
>>> are inlined, the rest is in an "hard path". Debugging enables the hard
>>> paths by default (really I think that only KTR may be due here, but I
>>> thought that in case of INVARIANTS this was also wanted, so I added
>>> the check also for that case).
>> ...
>> Inlining of mtx_lock_spin() is bogus unless critical_enter() is inlined.
>> Similarly for mtx_unlock_spin() and critical_exit().  It saves 1 function
>> call. but critical_enter() does a function call anyway.  critical_exit*(
>> also has a branch in branch in it that might cost more than the function
>> call just for mispredicting it.
>
> Correct, that is a further argument for having inlined
> critical_enter(),

And for inlining neither, ot the opposite one like I do.

> even if the actual calling cames from
> spinlock_enter(), not critical_enter(), which must be MD (that's on
> FreeBSD, not sure what happens in your OS).

I forgot that I don't have the slow functions spinlock_enter() and
spinlock_exit() in mtx_[un]lock_spin().  (My mutexes don't block
interrupts, as required for fast interrupt handling that is actually
fast (really low-latency).  My spinlocks just use critical*(), and
critical*() doesn't block fast interrupt handling.)

The spinlock_enter() calls mean that inlining mutex calls is even more
bogus.  Instead of just 1 function call which does not much more than
increment or decrement a counter, there is a nested call to the critical*()
one and another call to spinlock_enter().  spinlock_enter() is MD and
might need to do lots of slow hardware things.  critical_enter() does
the following on i386:

% void
% spinlock_enter(void)
% {
% 	struct thread *td;
% 	register_t flags;
% 
% 	td = curthread;
% 	if (td->td_md.md_spinlock_count == 0) {
% 		flags = intr_disable();

This is a CPU control instruction and thus tends to be slow.  It was very
slow on Pentium4.  It might involve some serialization although it is
not a full serialization instruction.

% 		td->td_md.md_spinlock_count = 1;
% 		flags &= ~PSL_T;

The previous line is from my version.  It fixes spurious trace traps when
the flags are popped in critical_exit().  Similar fixes are needed for
the pushfl/popfl sequences in swtch.s.  The spurious trace traps were
and might still be more harmful than they should be since they exercise
deadlock bugs in syscons and/or printf.  Simply trace through a large
amount of code in ddb, going through here a few times to set up spurious
trace traps for several td's.   It may also be necessary to have syscons
and/or printf doing non-ddb i/o.  Eventually the trace traps bite and
demonstrate the deadlock.

% 		td->td_md.md_saved_flags = flags;
% 	} else
% 		td->td_md.md_spinlock_count++;
% 	critical_enter();
% }

Everything else uses simple non-control instructions so it is quite fast.
However, if this is not serialized, then it can run in parallel with
mtx_lock_spin() and vice versa since there are no inter-dependencies.
It is unclear whether the parallelism is helped or harmed by not
inlining mtx_lock_spin().

>> My version goes the other way and uninlines mtx_lock_spin() and
>> mtx_unlock_spin().  Then it inlines (open codes) critical_enter() and
>> critical_exit() in them.  This saves a lot of text space and thus
>> hopefully saves time too.  I couldn't find any cases where it either
>> ...
>>
>> OTOH, I couldn't get uninlining of mtx_lock() and mtx_unlock() to work.
>> ..
>
> I don't think that uninling mtx_lock()/unlock() (btw, on which hw are
> you testing them if you are still able to catch performance penalties
> by branch misprediction?!) is a good idea, likely what would be a
> better one is to both inline critical_enter() and spinlock_enter().

Er, it is a good idea, as explained above.  Whether it is better in
practice is very MD.  The mtx non-calls are already quite large, and
adding critical*() and spinlock*() to them would make them larger.
Above a certain MD size, inlining is just slower because it busts caches.
spinlock*() is especially hard to inline because it does MD magic that
might be even larger than the i386 version.

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20121030014250.D5191>