Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 31 Jan 2003 15:12:58 -0500
From:      Don Bowman <don@sandvine.com>
To:        'Matthew Dillon' <dillon@apollo.backplane.com>, Peter Wemm <peter@wemm.org>
Cc:        Bosko Milekic <bmilekic@unixdaemons.com>, "Daniel C. Sobral" <dcs@tcoip.com.br>, Trish Lynch <trish@bsdunix.net>, freebsd-current@FreeBSD.ORG
Subject:   RE: Hyperthreading and machdep.cpu_idle_hlt 
Message-ID:  <FE045D4D9F7AED4CBFF1B3B813C8533701023610@mail.sandvine.com>

next in thread | raw e-mail | index | archive | help
> From: Matthew Dillon [mailto:dillon@apollo.backplane.com]
> 
> :The cache and most of the execution hardware is shared.  The 
> execution
> :units can run something like 4 instructions per clock.  If the "idle"
> :logical core is in a spinloop, then it is generating instructions for
> :execution, so you are dividing the execution resources 
> between one context
> :that is doing real work, and the other context that is 
> burning off the
> :"excess" resources.  Overall, it is a huge loss.  It is 
> absolutely essential
> :that logical cpus be halted when they are not doing useful work.
> 
>     Ah, that makes sense.  Are the two logical cpus shared 50-50?

Hyperthreading is also called symmetric multi-threading (hyperthread
is a trademark of intel, SMT is the general term).
The two logical cpu's are like a co-operative scheduler. Whenever there
is a stall on one, the other wakes up on the same tick.
THe most common cause for the stall is an access to memory. Ie when
the first 'cpu' does a load-word, the memory controller tries to
get that from L1->L2->L3->memory, with increasing latency. The
other 'CPU' starts executing on the same cycle as the latency 
to the memory starts, and only stops when it too stalls.

Thus the worst thing you could have would be a nop-loop with
no stalls, which would squeeze the other to death.

This is common in the network-processor world (e.g. AMCC, etc)
since those applications are governed by memory latency.

As the clock rate of memory has gone up, the overall latency to
the first word has stayed relatively constant, so even though
DDR 266 memory may have a much faster throughput, it takes
just as long for that first access.

Intel also has a speculative prefetch which tries to guess 
which memory will be needed next, and bring that in. There is
an explicit prefetch in the SSE2/MMX set if you know better
than the processor. This is good for for e.g. prefetch both
halves of a tree before you do the compare.

In practise I've found intel's numbers to be true, that the
SMT gives you a ~20% boost, implying that there is nowhere
close to a 50-50% split in normal use.

--don

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?FE045D4D9F7AED4CBFF1B3B813C8533701023610>