From owner-freebsd-hackers  Wed Apr  3 23:41:13 2002
Delivered-To: freebsd-hackers@freebsd.org
Received: from scaup.prod.itd.earthlink.net (scaup.mail.pas.earthlink.net [207.217.120.49])
	by hub.freebsd.org (Postfix) with ESMTP id 0D20E37B416
	for <freebsd-hackers@freebsd.org>; Wed,  3 Apr 2002 23:41:05 -0800 (PST)
Received: from pool0193.cvx40-bradley.dialup.earthlink.net ([216.244.42.193] helo=mindspring.com)
	by scaup.prod.itd.earthlink.net with esmtp (Exim 3.33 #1)
	id 16t1rf-0005fw-00; Wed, 03 Apr 2002 23:40:59 -0800
Message-ID: <3CAC036C.71DB41BB@mindspring.com>
Date: Wed, 03 Apr 2002 23:40:28 -0800
From: Terry Lambert <tlambert2@mindspring.com>
X-Mailer: Mozilla 4.7 [en]C-CCK-MCD {Sony}  (Win98; U)
X-Accept-Language: en
MIME-Version: 1.0
To: John Regehr <regehr@cs.utah.edu>
Cc: freebsd-hackers@FreeBSD.ORG
Subject: Re: Linuxthreads on Linux vs FreeBSD performance question
References: <Pine.LNX.4.21.0204031311270.15454-100000@famine.cs.utah.edu>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-hackers@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-hackers.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-hackers>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-hackers>
X-Loop: FreeBSD.ORG

John Regehr wrote:
> I'm writing a paper for the FREENIX track at USENIX about a tool for
> measuring scheduling behavior under different operating systems, and
> I've run across a performance anomaly that I'd like to explain.
> 
> It's this: for threads created with Linuxthreads, FreeBSD has
> considerably slower context switches than Linux.  There is a set of
> histograms here:

[ ... ]

> All numbers are taken on a PIII 850, with FreeBSD 4.5-RELEASE and Linux
> 2.4.17.  All context switches are between threads in the same address
> space.
> 
> Anyway, I was speculating that the higher cost is either due to (1) a
> failure, in FreeBSD, to avoid page table operations when switching
> between threads in the same addres space, or (2) some other kind of
> semantic mismatch between Linuxthreads and rfork.  Is one of these
> guesses right?

No.

Ingo Mollnar did a lot of work in the Linux 2.4 scheduler to
ensure thread group affinity.

By doing this, he guarantees that you don't pick a process
(thread) at random to run, but instead you treat threads in
a thread group (process -- threads sharing things like address
space mappings) preferentially.

On top of this, he did some work on CPU affinity, which
ensures that a given thread will not arbitrarily migrate
from one CPU to another (resulting in cache busting of
the L1 cache where the thread was running).

In FreeBSD, this only happens statistically, rather than
explicitly.

Explicitly can be bad, particularly if the work is done
in the scheduler, rather than being approached cleverly and
obliquely, instead.  Schedulers are picky things, and are
easy to preterb, and the more complex they are, the more
"worst case" saddle points you end up having.

The main problem with the Linux approach is that the wiring
of either type of  affinity into the scheduler itself is
actually a fairly bad thing to do, overall, since it can
result in starvation deadlock for other processes.

This means that Linux will get the best numbers in the
case where you are running under "benchmark conditions",
but will suffer under real world loads, which are mixed.

The correct approach for CPU affinity is to run with per
CPU scheduler queues.  This also eliminates the locking
and migration issues that you would normally see with
migration of schedulable entities from one CPU to another,
and the starvation possibility that comes with complicating
your scheduler with preferential scheduling.

The correct approach to thread group affinity is quantum
allocation: once you give a quantum to a process, the
process owns that wuantum, and whatever work needs to be
done will be done in the context of the quantum.  There
are several ways to achieve this, but topologically, they
all boil down to asynchronous system calls (e.g. an "async
call gate", "scheduler activations", etc.).

Other approaches are possible, but the introduce starvation
issues.


My idea of "the ideal multithreading SMP scaling benchmark"
is a benchmark that is run on a system that already has a
50% load from other processes, other than the benchmark, so
that what you are measuring is how the algorithms will
perform in reality, rather than how they will perform in
the benchmark.


Because you are attempting a comparative benchmark, I would
suspect that you are probably running a significantly
quiescent system (just the benchmark itself, and the code
being benchmarked, running).  I expect that you have not
stopped "cron" (which runs once a second), nor have you
stopped other "system processes" which will end up in the
scheduling queue, which will result in seeing the Linux
scheduler based thread group affinity as a good thing: it
stops the overhead of the additional context switch overhead
normally associates with an addresss space mapping change,
whereas the FreeBSD approach will not.

Also, because of your test setup, you don't have any process
which can find itself starved for CPU by the preferential
treatment of threads in the same thread group, when it comes
time for the scheduler to pick which work it wants to do.
As a result, you are not seeing the starvation of other
processes which would normally occur as a result of this
algorithm.

Minimally, your benchmark should include one unrelated process,
with periodic counting work to do, that will run a statistically
signifcant number of times during your benchmark, in order to
check the scheduler is not starving other processes (this counting
should be factored in as a raw value with specified weighting into
the overall benchmark).


Realize that the 4.5 release of FreeBSD did not concentrate
on SMP scalability of processes sharing the same address
space ("rfork optimization at the expense of other work"),
and that the 5.0 version of FreeBSD will address both the
CPU affinity and the thread group affinity issues, without
damaging the scheduler to the point of starving competing
programs.

Right now, you are comparing apples and oranges.

I expect that under real load, the relative performance of
FreeBSD vs. Linux will depend on the number of threads in
a thread group (threaded process).  I expect that under real
load, you will see that FreeBSD performs well at 6 for 2 CPUs
(that's the expected break-even point of random replacement
vs. ordered replacement with starvation), and at more than
that, FreeBSD performance will degrade slower than Linux for
more threads (processes with shared VM), but degrade faster
than Linux for more CPUs.

I also expect that FreeBSD 5.x, when released, will blow the
doors off of most commercial offerings, even though I would
have done a lot of things differently (e.g. interrupt threads).
I look at these issues as "room for future improvement" that
other OS's don't have.

PS: -hackers is not the correct list for this question; the
    list you probably wanted was either -smp or -arch.

-- Terry

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message