Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 31 Jan 2003 11:48:17 -0800
From:      Peter Wemm <peter@wemm.org>
To:        Bosko Milekic <bmilekic@unixdaemons.com>
Cc:        Matthew Dillon <dillon@apollo.backplane.com>, "Daniel C. Sobral" <dcs@tcoip.com.br>, Trish Lynch <trish@bsdunix.net>, freebsd-current@FreeBSD.ORG
Subject:   Re: Hyperthreading and machdep.cpu_idle_hlt 
Message-ID:  <20030131194817.335B72A89E@canning.wemm.org>
In-Reply-To: <20030131141700.A7526@unixdaemons.com> 

next in thread | previous in thread | raw e-mail | index | archive | help
Bosko Milekic wrote:
> 
> 
> On Fri, Jan 31, 2003 at 11:08:38AM -0800, Matthew Dillon wrote:
> > 
> > :AFAIK, full hyperthreading support, as it is, has been merged to 
> > :-stable. It consists of a patch to recognize the virtual CPUs, so they 
> > :will be dealt with like any SMP system, as long as HTT is enabled on the 
> > :BIOS.
> > :
> > :-- 
> > :Daniel C. Sobral                   (8-DCS)
> > :Gerencia de Operacoes
> > 
> >     Yah.  Shoot, well this Sony VAIO desktop has a P4 with HTT set in
> >     it, but it doesn't have an APIC, the BIOS is clueless, and there
> >     is no mptable, so I guess I am S.O.L. in regards to using hyperthreadin
    g
> >     on this box.
> > 
> > 					-Matt
> > 					Matthew Dillon 
> > 					<dillon@backplane.com>
> 
>   Why do you think that hlt-ing the CPU(s) when idle would actually
>   improve performance in this case?  My only suspicion is that perhaps
>   this reduces scheduling on the auxiliary 'logical' (fake) CPUs,
>   thereby indirectly reducing cache ping-ponging and abuse.  I would
>   imagine that both units sharing the same execution engine in the
>   HTT-enabled model would be effectively 'hlt'-ed when one of the two
>   threads executes an 'hlt' until the next timer tick.

The cache and most of the execution hardware is shared.  The execution
units can run something like 4 instructions per clock.  If the "idle"
logical core is in a spinloop, then it is generating instructions for
execution, so you are dividing the execution resources between one context
that is doing real work, and the other context that is burning off the
"excess" resources.  Overall, it is a huge loss.  It is absolutely essential
that logical cpus be halted when they are not doing useful work.

Why bother with HTT on a single physical cpu system?  The problem with the
x86 instruction set is that it is Really Hard(TM) to extract enough
parallel work from the instruction stream to keep all the pipelines running
at full speed all the time.   I remember when the P4 first came out, there
was a lot of ridicule since the decoder simply didn't have enough bandwidth
to generate enough micro-ops to keep the pipelines busy in already ideal
situations.  Intel grand plan is to add lots more pipelines, more logical
cpus etc.  The P4 division figures this will scale much more effectively
than trying to make the compilers better and dedicating much more
resources trying to squeeze more parallelism out of a single x86 instruction
stream.

Under ideal circumstances, HTT would be a win.  ie: there would be lots of
processes with 5-10 threads so that we could dispatch a process to a
physical cpu and try and arrange for its threads to run on the logical
cores (think 4 or 8 or even 16 logical contexts down the road).  That way
you get a single page table tree to cache in the TLB rather than trying to
split the TLB 2 (or 4 or 8 or ...) ways.  The threads would be better
likely to have locality of reference so make better use of the physical L2
cache etc.  This happens to work well in theory on the highly threaded
windows world... Note "in theory".  Windows needs HTT-aware tuning and
algorithms to make this work better, right now it seems to be not that smart
about scheduling etc.  Although, that is a lot better than the situation
that we're in right now. :-(

Personally, I doubt that HTT will buy much on FreeBSD, apart from being
buzzword compliant.  I'd actually like a compile option or boot tunable so
that it to be turned on or off (and treated like a regular Xeon SMP
system).  Single-physical-cpu systems already have this compile option, it
is called 'options SMP' :-).  But if you have 2x P4 Xeons, it would be nice to
be able to use them as a normal 2 way system rather than 4 logical ways.

Cheers,
-Peter
--
Peter Wemm - peter@wemm.org; peter@FreeBSD.org; peter@yahoo-inc.com
"All of this is for nothing if we don't go to the stars" - JMS/B5


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20030131194817.335B72A89E>