Date: Wed, 2 Jan 2002 17:58:32 -0800 (PST) From: Matthew Dillon <dillon@apollo.backplane.com> To: John Baldwin <jhb@FreeBSD.ORG> Cc: Peter Jeremy <peter.jeremy@alcatel.com.au>, Michal Mertl <mime@traveller.cz>, Bruce Evans <bde@zeta.org.au>, Mike Smith <msmith@FreeBSD.ORG>, Bernd Walter <ticso@cicely8.cicely.de>, arch@FreeBSD.ORG Subject: Re: When to use atomic_ functions? (was: 64 bit counters) Message-ID: <200201030158.g031wWD61364@apollo.backplane.com> References: <XFMail.020102173003.jhb@FreeBSD.org>
next in thread | previous in thread | raw e-mail | index | archive | help
:> section is dropped. If an interrupt occurs on the cpu while the current :> thread is in a critical section, it should be deferred just like we :> did with _cpl in our spl*() code. : :Have you even looked at the current implementation? :) Yes, I have John. Perhaps you didn't read the part of my email that indicated most of the critical sections we have in the system are top-level sections. Optimizing the nesting is all well and fine but it isn't going to improve peformance. Let me put this in perspective, to show you how twisted the code has become. * Because preemption can cause a thread to resume on a different cpu, the per-cpu area pointer is not consistent and per-cpu variables must be accessed atomically with the pointer. * This means that obtaining 'curthread', for example, becomes relatively non-cacheable in terms of code generation. Every time you get it you have to re-obtain it. So a great deal of our code does this in order to work around the lack of optimization: struct thread *td = curthread; ... use td multiple times ... * critical_enter() is a procedure, not a macro. So for something that should be as simple as '++curthread->td_critnest' which could be two instructions we instead have a 36 byte procedure which does pushes, moves, pops, comparisons, and a nasty cli instruction. Now, John, compare that to the spl*() code in -stable. Something like splvm() is a 32 byte procedure doing a couple of mov's from globals (three on the same cache line... with a little effort all could be on the same cache line), a couple of or's, and that is it. No cli instruction. No conditionals. It almost doesn't have to be a procedure but the mask is just complex enough that it's a good idea. Not so for critical_enter(). If properly done that could be an inline that increments a counter in curthread. Two instructions and about 10x faster. Critical exit is somewhat more complex but still very easy. :This requires the bottom-half code to not use any locks at all which might be :possible. However it means if I have an idle CPU, it can't pick up the :interrupt handler for the interrupt that came in until the CPU that gets its :interrupt leaves the handler. If you want to prevent that, you need to allow Yes, but so what? The cpu holding the critical section isn't going to be in that section for more then a few clock cycles most likely. Just as with our spl*() mechanism, the chance of an interrupt occuring while in the critical section is extremely low. That's why it doesn't matter if it is slightly heavier-weight when the case does occur. We gain far more clock cycles in performance then we lose in the occassional interrupt that hits us in a critical section. :access to the scheduler when the interrupt comes in, which means that when top :half code holds the sched_lock it needs to disable interrupts. : :>:critical enter/exit calls that do disable interrutps if needed and enable :>:them :>:again when needed. Right now the current critical_enter/exit code assumes :>:that :> :> I think it is overkill too. Keep it simple and it can be used without :> having to have a manual reference by your desk to figure out what it :> actually does. : :Err, it's not near that complicated. This only changes the calls to :critical_enter/exit in one location. I was saying that the separate nesting :count was a bit too complicated. I have to disagree. When you start adding up all the special casing in all the supposedly uncomplicated procedures you wind up with something that is unnecessarily complex and definitely overcomplicated. :> Yes, definitely too complicated. But I disagree with the 'as an :> optimization later on'. The perceived cost of these functions colors :> everyone's implementation of code around them, and some pretty nasty :> code (due to people trying to avoid perceived-to-be expensive calls) :> can be the result. Something like this needs to be implemented right :> off the bat so people can depend on it. : :No. People need to write algorithms and not assume that implementation :specifics about certain functions. There is a manpage (though it needs a bit :of updating) for the critcal section stuff and it still accurately documents :what the API is guaranteed to provide: protection from preemption. I don't understand what you are saying here. The algorithm I described is platform independant (just as our spl mechanism was). What more can one ask? It's great.. an API that everyone can depend on performing well, exactly as advertised, across a multitude of platforms. :> I'm willing to do this, if you don't have your fingers in this particular :> section of code. : :Actually, I have some changes in this code already including halfway :implementing this. You have half-way implemented just about everything I've ever tried to do work on in -current. Is there anything you haven't touched? You have half the core locked up and untouchable with all the work you have in progress. -Matt To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200201030158.g031wWD61364>