Date: Wed, 2 Jan 2002 17:58:32 -0800 (PST) From: Matthew Dillon <dillon@apollo.backplane.com> To: John Baldwin <jhb@FreeBSD.ORG> Cc: Peter Jeremy <peter.jeremy@alcatel.com.au>, Michal Mertl <mime@traveller.cz>, Bruce Evans <bde@zeta.org.au>, Mike Smith <msmith@FreeBSD.ORG>, Bernd Walter <ticso@cicely8.cicely.de>, arch@FreeBSD.ORG Subject: Re: When to use atomic_ functions? (was: 64 bit counters) Message-ID: <200201030158.g031wWD61364@apollo.backplane.com> References: <XFMail.020102173003.jhb@FreeBSD.org>
next in thread | previous in thread | raw e-mail | index | archive | help
:> section is dropped. If an interrupt occurs on the cpu while the current
:> thread is in a critical section, it should be deferred just like we
:> did with _cpl in our spl*() code.
:
:Have you even looked at the current implementation? :)
Yes, I have John. Perhaps you didn't read the part of my email
that indicated most of the critical sections we have in the system
are top-level sections. Optimizing the nesting is all well and fine
but it isn't going to improve peformance.
Let me put this in perspective, to show you how twisted the code
has become.
* Because preemption can cause a thread to resume on a different cpu,
the per-cpu area pointer is not consistent and per-cpu variables
must be accessed atomically with the pointer.
* This means that obtaining 'curthread', for example, becomes relatively
non-cacheable in terms of code generation. Every time you get it you
have to re-obtain it. So a great deal of our code does this in order
to work around the lack of optimization:
struct thread *td = curthread;
... use td multiple times ...
* critical_enter() is a procedure, not a macro. So for something that
should be as simple as '++curthread->td_critnest' which could be
two instructions we instead have a 36 byte procedure which does
pushes, moves, pops, comparisons, and a nasty cli instruction.
Now, John, compare that to the spl*() code in -stable. Something like
splvm() is a 32 byte procedure doing a couple of mov's from globals
(three on the same cache line... with a little effort all could be
on the same cache line), a couple of or's, and that is it. No cli
instruction. No conditionals. It almost doesn't have to be a procedure
but the mask is just complex enough that it's a good idea.
Not so for critical_enter(). If properly done that could be an inline
that increments a counter in curthread. Two instructions and about
10x faster.
Critical exit is somewhat more complex but still very easy.
:This requires the bottom-half code to not use any locks at all which might be
:possible. However it means if I have an idle CPU, it can't pick up the
:interrupt handler for the interrupt that came in until the CPU that gets its
:interrupt leaves the handler. If you want to prevent that, you need to allow
Yes, but so what? The cpu holding the critical section isn't going to
be in that section for more then a few clock cycles most likely.
Just as with our spl*() mechanism, the chance of an interrupt occuring
while in the critical section is extremely low. That's why it doesn't
matter if it is slightly heavier-weight when the case does occur. We
gain far more clock cycles in performance then we lose in the occassional
interrupt that hits us in a critical section.
:access to the scheduler when the interrupt comes in, which means that when top
:half code holds the sched_lock it needs to disable interrupts.
:
:>:critical enter/exit calls that do disable interrutps if needed and enable
:>:them
:>:again when needed. Right now the current critical_enter/exit code assumes
:>:that
:>
:> I think it is overkill too. Keep it simple and it can be used without
:> having to have a manual reference by your desk to figure out what it
:> actually does.
:
:Err, it's not near that complicated. This only changes the calls to
:critical_enter/exit in one location. I was saying that the separate nesting
:count was a bit too complicated.
I have to disagree. When you start adding up all the special casing
in all the supposedly uncomplicated procedures you wind up with something
that is unnecessarily complex and definitely overcomplicated.
:> Yes, definitely too complicated. But I disagree with the 'as an
:> optimization later on'. The perceived cost of these functions colors
:> everyone's implementation of code around them, and some pretty nasty
:> code (due to people trying to avoid perceived-to-be expensive calls)
:> can be the result. Something like this needs to be implemented right
:> off the bat so people can depend on it.
:
:No. People need to write algorithms and not assume that implementation
:specifics about certain functions. There is a manpage (though it needs a bit
:of updating) for the critcal section stuff and it still accurately documents
:what the API is guaranteed to provide: protection from preemption.
I don't understand what you are saying here. The algorithm I described
is platform independant (just as our spl mechanism was). What more can
one ask? It's great.. an API that everyone can depend on performing
well, exactly as advertised, across a multitude of platforms.
:> I'm willing to do this, if you don't have your fingers in this particular
:> section of code.
:
:Actually, I have some changes in this code already including halfway
:implementing this.
You have half-way implemented just about everything I've ever tried to
do work on in -current. Is there anything you haven't touched?
You have half the core locked up and untouchable with all the work you
have in progress.
-Matt
To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200201030158.g031wWD61364>
