From owner-freebsd-hackers Wed Apr 3 23:41:13 2002 Delivered-To: freebsd-hackers@freebsd.org Received: from scaup.prod.itd.earthlink.net (scaup.mail.pas.earthlink.net [207.217.120.49]) by hub.freebsd.org (Postfix) with ESMTP id 0D20E37B416 for ; Wed, 3 Apr 2002 23:41:05 -0800 (PST) Received: from pool0193.cvx40-bradley.dialup.earthlink.net ([216.244.42.193] helo=mindspring.com) by scaup.prod.itd.earthlink.net with esmtp (Exim 3.33 #1) id 16t1rf-0005fw-00; Wed, 03 Apr 2002 23:40:59 -0800 Message-ID: <3CAC036C.71DB41BB@mindspring.com> Date: Wed, 03 Apr 2002 23:40:28 -0800 From: Terry Lambert X-Mailer: Mozilla 4.7 [en]C-CCK-MCD {Sony} (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: John Regehr Cc: freebsd-hackers@FreeBSD.ORG Subject: Re: Linuxthreads on Linux vs FreeBSD performance question References: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG John Regehr wrote: > I'm writing a paper for the FREENIX track at USENIX about a tool for > measuring scheduling behavior under different operating systems, and > I've run across a performance anomaly that I'd like to explain. > > It's this: for threads created with Linuxthreads, FreeBSD has > considerably slower context switches than Linux. There is a set of > histograms here: [ ... ] > All numbers are taken on a PIII 850, with FreeBSD 4.5-RELEASE and Linux > 2.4.17. All context switches are between threads in the same address > space. > > Anyway, I was speculating that the higher cost is either due to (1) a > failure, in FreeBSD, to avoid page table operations when switching > between threads in the same addres space, or (2) some other kind of > semantic mismatch between Linuxthreads and rfork. Is one of these > guesses right? No. Ingo Mollnar did a lot of work in the Linux 2.4 scheduler to ensure thread group affinity. By doing this, he guarantees that you don't pick a process (thread) at random to run, but instead you treat threads in a thread group (process -- threads sharing things like address space mappings) preferentially. On top of this, he did some work on CPU affinity, which ensures that a given thread will not arbitrarily migrate from one CPU to another (resulting in cache busting of the L1 cache where the thread was running). In FreeBSD, this only happens statistically, rather than explicitly. Explicitly can be bad, particularly if the work is done in the scheduler, rather than being approached cleverly and obliquely, instead. Schedulers are picky things, and are easy to preterb, and the more complex they are, the more "worst case" saddle points you end up having. The main problem with the Linux approach is that the wiring of either type of affinity into the scheduler itself is actually a fairly bad thing to do, overall, since it can result in starvation deadlock for other processes. This means that Linux will get the best numbers in the case where you are running under "benchmark conditions", but will suffer under real world loads, which are mixed. The correct approach for CPU affinity is to run with per CPU scheduler queues. This also eliminates the locking and migration issues that you would normally see with migration of schedulable entities from one CPU to another, and the starvation possibility that comes with complicating your scheduler with preferential scheduling. The correct approach to thread group affinity is quantum allocation: once you give a quantum to a process, the process owns that wuantum, and whatever work needs to be done will be done in the context of the quantum. There are several ways to achieve this, but topologically, they all boil down to asynchronous system calls (e.g. an "async call gate", "scheduler activations", etc.). Other approaches are possible, but the introduce starvation issues. My idea of "the ideal multithreading SMP scaling benchmark" is a benchmark that is run on a system that already has a 50% load from other processes, other than the benchmark, so that what you are measuring is how the algorithms will perform in reality, rather than how they will perform in the benchmark. Because you are attempting a comparative benchmark, I would suspect that you are probably running a significantly quiescent system (just the benchmark itself, and the code being benchmarked, running). I expect that you have not stopped "cron" (which runs once a second), nor have you stopped other "system processes" which will end up in the scheduling queue, which will result in seeing the Linux scheduler based thread group affinity as a good thing: it stops the overhead of the additional context switch overhead normally associates with an addresss space mapping change, whereas the FreeBSD approach will not. Also, because of your test setup, you don't have any process which can find itself starved for CPU by the preferential treatment of threads in the same thread group, when it comes time for the scheduler to pick which work it wants to do. As a result, you are not seeing the starvation of other processes which would normally occur as a result of this algorithm. Minimally, your benchmark should include one unrelated process, with periodic counting work to do, that will run a statistically signifcant number of times during your benchmark, in order to check the scheduler is not starving other processes (this counting should be factored in as a raw value with specified weighting into the overall benchmark). Realize that the 4.5 release of FreeBSD did not concentrate on SMP scalability of processes sharing the same address space ("rfork optimization at the expense of other work"), and that the 5.0 version of FreeBSD will address both the CPU affinity and the thread group affinity issues, without damaging the scheduler to the point of starving competing programs. Right now, you are comparing apples and oranges. I expect that under real load, the relative performance of FreeBSD vs. Linux will depend on the number of threads in a thread group (threaded process). I expect that under real load, you will see that FreeBSD performs well at 6 for 2 CPUs (that's the expected break-even point of random replacement vs. ordered replacement with starvation), and at more than that, FreeBSD performance will degrade slower than Linux for more threads (processes with shared VM), but degrade faster than Linux for more CPUs. I also expect that FreeBSD 5.x, when released, will blow the doors off of most commercial offerings, even though I would have done a lot of things differently (e.g. interrupt threads). I look at these issues as "room for future improvement" that other OS's don't have. PS: -hackers is not the correct list for this question; the list you probably wanted was either -smp or -arch. -- Terry To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message