From owner-freebsd-current Fri Jan 31 12:13:14 2003 Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 8391937B401 for ; Fri, 31 Jan 2003 12:13:12 -0800 (PST) Received: from mail.sandvine.com (sandvine.com [199.243.201.138]) by mx1.FreeBSD.org (Postfix) with ESMTP id E1C6843F79 for ; Fri, 31 Jan 2003 12:13:11 -0800 (PST) (envelope-from don@sandvine.com) Received: by mail.sandvine.com with Internet Mail Service (5.5.2653.19) id ; Fri, 31 Jan 2003 15:13:06 -0500 Message-ID: From: Don Bowman To: 'Matthew Dillon' , Peter Wemm Cc: Bosko Milekic , "Daniel C. Sobral" , Trish Lynch , freebsd-current@FreeBSD.ORG Subject: RE: Hyperthreading and machdep.cpu_idle_hlt Date: Fri, 31 Jan 2003 15:12:58 -0500 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2653.19) Content-Type: text/plain; charset="iso-8859-1" Sender: owner-freebsd-current@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG > From: Matthew Dillon [mailto:dillon@apollo.backplane.com] > > :The cache and most of the execution hardware is shared. The > execution > :units can run something like 4 instructions per clock. If the "idle" > :logical core is in a spinloop, then it is generating instructions for > :execution, so you are dividing the execution resources > between one context > :that is doing real work, and the other context that is > burning off the > :"excess" resources. Overall, it is a huge loss. It is > absolutely essential > :that logical cpus be halted when they are not doing useful work. > > Ah, that makes sense. Are the two logical cpus shared 50-50? Hyperthreading is also called symmetric multi-threading (hyperthread is a trademark of intel, SMT is the general term). The two logical cpu's are like a co-operative scheduler. Whenever there is a stall on one, the other wakes up on the same tick. THe most common cause for the stall is an access to memory. Ie when the first 'cpu' does a load-word, the memory controller tries to get that from L1->L2->L3->memory, with increasing latency. The other 'CPU' starts executing on the same cycle as the latency to the memory starts, and only stops when it too stalls. Thus the worst thing you could have would be a nop-loop with no stalls, which would squeeze the other to death. This is common in the network-processor world (e.g. AMCC, etc) since those applications are governed by memory latency. As the clock rate of memory has gone up, the overall latency to the first word has stayed relatively constant, so even though DDR 266 memory may have a much faster throughput, it takes just as long for that first access. Intel also has a speculative prefetch which tries to guess which memory will be needed next, and bring that in. There is an explicit prefetch in the SSE2/MMX set if you know better than the processor. This is good for for e.g. prefetch both halves of a tree before you do the compare. In practise I've found intel's numbers to be true, that the SMT gives you a ~20% boost, implying that there is nowhere close to a 50-50% split in normal use. --don To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-current" in the body of the message