Date: Fri, 31 Jan 2003 15:12:58 -0500 From: Don Bowman <don@sandvine.com> To: 'Matthew Dillon' <dillon@apollo.backplane.com>, Peter Wemm <peter@wemm.org> Cc: Bosko Milekic <bmilekic@unixdaemons.com>, "Daniel C. Sobral" <dcs@tcoip.com.br>, Trish Lynch <trish@bsdunix.net>, freebsd-current@FreeBSD.ORG Subject: RE: Hyperthreading and machdep.cpu_idle_hlt Message-ID: <FE045D4D9F7AED4CBFF1B3B813C8533701023610@mail.sandvine.com>
next in thread | raw e-mail | index | archive | help
> From: Matthew Dillon [mailto:dillon@apollo.backplane.com] > > :The cache and most of the execution hardware is shared. The > execution > :units can run something like 4 instructions per clock. If the "idle" > :logical core is in a spinloop, then it is generating instructions for > :execution, so you are dividing the execution resources > between one context > :that is doing real work, and the other context that is > burning off the > :"excess" resources. Overall, it is a huge loss. It is > absolutely essential > :that logical cpus be halted when they are not doing useful work. > > Ah, that makes sense. Are the two logical cpus shared 50-50? Hyperthreading is also called symmetric multi-threading (hyperthread is a trademark of intel, SMT is the general term). The two logical cpu's are like a co-operative scheduler. Whenever there is a stall on one, the other wakes up on the same tick. THe most common cause for the stall is an access to memory. Ie when the first 'cpu' does a load-word, the memory controller tries to get that from L1->L2->L3->memory, with increasing latency. The other 'CPU' starts executing on the same cycle as the latency to the memory starts, and only stops when it too stalls. Thus the worst thing you could have would be a nop-loop with no stalls, which would squeeze the other to death. This is common in the network-processor world (e.g. AMCC, etc) since those applications are governed by memory latency. As the clock rate of memory has gone up, the overall latency to the first word has stayed relatively constant, so even though DDR 266 memory may have a much faster throughput, it takes just as long for that first access. Intel also has a speculative prefetch which tries to guess which memory will be needed next, and bring that in. There is an explicit prefetch in the SSE2/MMX set if you know better than the processor. This is good for for e.g. prefetch both halves of a tree before you do the compare. In practise I've found intel's numbers to be true, that the SMT gives you a ~20% boost, implying that there is nowhere close to a 50-50% split in normal use. --don To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-current" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?FE045D4D9F7AED4CBFF1B3B813C8533701023610>