FreeBSD Mail Archives

Date:      Wed, 22 May 2002 12:02:42 -0700
From:      Terry Lambert <tlambert2@mindspring.com>
To:        Doug Rabson <dfr@nlsystems.com>
Cc:        Alfred Perlstein <bright@mu.org>, "Dorr H. Clark" <dclark@applmath.scu.edu>, freebsd-smp@FreeBSD.ORG
Subject:   Re: hyperthreading: myth or legend? (was Re: hyperthreading? (was Re:  question))
Message-ID:  <3CEBEB52.423171B3@mindspring.com>
References:  <20020514222840.GB1585@elvis.mu.org> <Pine.GHP.4.21.0205220940410.28331-100000@hpux38.dc.engr.scu.edu> <20020522172759.GV54960@elvis.mu.org> <200205221917.37801.dfr@nlsystems.com>

Doug Rabson wrote:
> > A benchmarking utility reports equivelant performance to a 4 way
> > machine.
> 
> One thing we don't do which we could use to squeeze extra performance is to
> adjust the allocation of cpus to procs. When one hyperthread is idle on a cpu
> while the other one is running, the running hyperthread is faster since it
> can use more functional units. When we schedule a new thread, we should
> prefer cpus which are totally idle (i.e. both hyperthreads are idle) and only
> schedule two hyperthreads on a single cpu when there is no totally idle cpu
> left.

CPU affinity is the number one way to deal with this.

CPU negaffinity, if it exists to promote scalability, has to
make an inverse preference for logical CPUs.

The Intel documentation specifically states that the speed
improvement for a second logical CPU is at best "40%".  This
is because only idle resources are available for the second
logical CPU (in fact, the Intel Hyperthreding home page states
that some resources will not be available to software, if the
Hyperthreading is enabled in the BIOS).

For SMP, the speed up for an additional CPU is generally held
to be 80%.  In fact, this is not the case for FreeBSD, because
it lacks the scheduler code, and uses data locks rather than
code locks, so it's scaling tends to drop off exponentially,
resulting in the classic "4 CPU point of diminishing returns"
that Sequent overcame in the early 1980s.  So while you appear
to get similar performance from SMT (Hyperthreading) as you do
from SMP, this is really an illusion.

Simple test programs are also going to fail to show the fall
off properly; specifically, testing with a small load on a
relatively quiescent system (e.g. like they do on Linux) is
going to give a false improvement number because of cache
locality which would not be true under real world load.  In
the real world, you have to have affinity between threads in
a thread group to avoid an unrelated process from "context
busting" you.

Oh yeah, forgot in the post previous: to avoid resource contention,
you are expected by Intel to halt idle threads.  FreeBSD's SMP idle
loop doesn't halt, because it hasn't done the IPI dispatch work in
the scheduler to deal with more than one thing becoming runnable at
a time when one or more CPUs (or "virtual CPUs") are halted.  Not
halting means that there is unnecessary contention between virtual
CPUs for the real CPU resources.  This would drop the scaling
addition "best case" from "40%" to something somewhat lower for
an SMT processor, as opposed to a physically seperate SMP processor.

-- Terry

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-smp" in the body of the message

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3CEBEB52.423171B3>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation