Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 02 Jan 2002 18:33:13 -0800 (PST)
From:      John Baldwin <jhb@FreeBSD.org>
To:        Matthew Dillon <dillon@apollo.backplane.com>
Cc:        arch@FreeBSD.ORG, Bernd Walter <ticso@cicely8.cicely.de>, Mike Smith <msmith@FreeBSD.ORG>, Bruce Evans <bde@zeta.org.au>, Michal Mertl <mime@traveller.cz>, Peter Jeremy <peter.jeremy@alcatel.com.au>
Subject:   Re: When to use atomic_ functions? (was: 64 bit counters)
Message-ID:  <XFMail.020102183313.jhb@FreeBSD.org>
In-Reply-To: <200201030158.g031wWD61364@apollo.backplane.com>

next in thread | previous in thread | raw e-mail | index | archive | help

On 03-Jan-02 Matthew Dillon wrote:
>:>     section is dropped.  If an interrupt occurs on the cpu while the
>:>     current
>:>     thread is in a critical section, it should be deferred just like we
>:>     did with _cpl in our spl*() code.  
>:
>:Have you even looked at the current implementation? :)
> 
>     Yes, I have John.  Perhaps you didn't read the part of my email
>     that indicated most of the critical sections we have in the system
>     are top-level sections.  Optimizing the nesting is all well and fine
>     but it isn't going to improve peformance.
> 
>     Let me put this in perspective, to show you how twisted the code
>     has become.
> 
>     * Because preemption can cause a thread to resume on a different cpu,
>       the per-cpu area pointer is not consistent and per-cpu variables
>       must be accessed atomically with the pointer.
> 
>     * This means that obtaining 'curthread', for example, becomes relatively
>       non-cacheable in terms of code generation.  Every time you get it you
>       have to re-obtain it.  So a great deal of our code does this in order
>       to work around the lack of optimization:

No, this is incorrect.  curthread and curpcb are a bit special here.  They are
valid even across migration.

>       struct thread *td = curthread;
> 
>       ... use td multiple times ...

This is done cause PCPU_GET() is slow.  Perhaps that is what you are trying to
say in that our PCPU_GET() macros are slow?  On x86 they are one instruction
for most cases, and on other archs they are 2 or 3 at most.

>     * critical_enter() is a procedure, not a macro.  So for something that
>       should be as simple as '++curthread->td_critnest' which could be
>       two instructions we instead have a 36 byte procedure which does
>       pushes, moves, pops, comparisons, and a nasty cli instruction.

It can be made a macro trivially if we want do it later.  However, during
earlier work when one may want to put extra stuff in it such as assertions it
is ok for it to be a function for the time being.  Microptimizing one function
is well and good and all, but in -current there are really bigger fish to fry.

>:This requires the bottom-half code to not use any locks at all which might be
>:possible.  However it means if I have an idle CPU, it can't pick up the
>:interrupt handler for the interrupt that came in until the CPU that gets its
>:interrupt leaves the handler.  If you want to prevent that, you need to allow
> 
>     Yes, but so what?  The cpu holding the critical section isn't going to
>     be in that section for more then a few clock cycles most likely.

Fair enough.  I suppose we could use a per-thread bitmask of pending interrupts.
Hmm, except that that assumes a dense region of vectors which alpha doesn't
have.  Note that on the alpha we did spl's via the x86 equivalent of cli/sti
for this reason.  On the alpha vector numbers are rather sparse and won't
fit into a bitmask.  You can't use an embedded list in struct ithd in case
multiple CPU's get the same interrupt in a critical section.  Also, now we have
the problem that we can't use the spin lock to protect the ICU on SMP systems,
meaning that we will corrupt the ICU masks now when disabling interrupts (which
we _have_ to do if interrupts are level-triggered as they are for PCI
interrupts are on alpha).

I'm afraid the issues here are a bit more complex than it first seems. 
However, the idea of deferring interrupts is a nice one if we can do it.  It is
actually very similar to the way preemption will/would defer switches to
real-time threads until the outermost critical_exit().

>:>:critical enter/exit calls that do disable interrutps if needed and enable
>:>:them
>:>:again when needed.  Right now the current critical_enter/exit code assumes
>:>:that
>:> 
>:>     I think it is overkill too.  Keep it simple and it can be used without
>:>     having to have a manual reference by your desk to figure out what it 
>:>     actually does.
>:
>:Err, it's not near that complicated.  This only changes the calls to
>:critical_enter/exit in one location.  I was saying that the separate nesting
>:count was a bit too complicated.
> 
>     I have to disagree.  When you start adding up all the special casing
>     in all the supposedly uncomplicated procedures you wind up with something
>     that is unnecessarily complex and definitely overcomplicated.

Err, I was going to use separate procedures precisely to avoid having to
complicate critical_enter/exit.  critical_enter/exit_spinlock or some such
which would only be used in kern_mutex.c and sys/mutex.h in one place in each
file.

>:>     Yes, definitely too complicated.  But I disagree with the 'as an
>:>     optimization later on'.  The perceived cost of these functions colors
>:>     everyone's implementation of code around them, and some pretty nasty
>:>     code (due to people trying to avoid perceived-to-be expensive calls)
>:>     can be the result.  Something like this needs to be implemented right
>:>     off the bat so people can depend on it.
>:
>:No.  People need to write algorithms and not assume that implementation
>:specifics about certain functions.  There is a manpage (though it needs a bit
>:of updating) for the critcal section stuff and it still accurately documents
>:what the API is guaranteed to provide: protection from preemption.
> 
>     I don't understand what you are saying here.  The algorithm I described
>     is platform independant (just as our spl mechanism was).  What more can
>     one ask?  It's great.. an API that everyone can depend on performing
>     well, exactly as advertised, across a multitude of platforms.

No, I think your algorithm is very i386 specific to be honest.  The spl
mechanisms differend widely on i386 and alpha.  It may be that this code really
needs to be MD as the spl code was instead of MI to optimize it, but we don't
need it right now.  Your argument is that people will not use critical_enter
because of its implementation details, and I'm saying that people should use
the API because of the service it presents and should not be worried about how
it is implemented.

>:>     I'm willing to do this, if you don't have your fingers in this
>:>     particular
>:>     section of code.
>:
>:Actually, I have some changes in this code already including halfway
>:implementing this.
> 
>     You have half-way implemented just about everything I've ever tried to
>     do work on in -current.   Is there anything you haven't touched?
>     You have half the core locked up and untouchable with all the work you
>     have in progress.

No, I do not have half the code locked up by any means.  I do have a change I
need to test related to the critical_enter/exit stuff that I want to commit in
a day or so though.  As far as changing critical_enter/exit to defer, I'm not
going to work on that right now except to think about it.  Specifically, not
only must critical_enter/exit change, but the various MD code that calls
ithread_schedule() needs to defer those calls.  Also, there is the problem of
the icu lock, and the problem of figuring a way to store the per-thread state
of triggered interrupts for each arch.  If there is a MI way, that would be
best.  Well, the embedded list of ithreads might work actually.  But that
doesn't fix icu_lock.

>                                                       -Matt

-- 

John Baldwin <jhb@FreeBSD.org>  <><  http://www.FreeBSD.org/~jhb/
"Power Users Use the Power to Serve!"  -  http://www.FreeBSD.org/

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?XFMail.020102183313.jhb>