From owner-freebsd-current@FreeBSD.ORG Mon Aug 25 10:35:14 2003 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 7E86816A4BF for ; Mon, 25 Aug 2003 10:35:14 -0700 (PDT) Received: from khavrinen.lcs.mit.edu (khavrinen.lcs.mit.edu [18.24.4.193]) by mx1.FreeBSD.org (Postfix) with ESMTP id 924E743FEC for ; Mon, 25 Aug 2003 10:35:13 -0700 (PDT) (envelope-from wollman@khavrinen.lcs.mit.edu) Received: from khavrinen.lcs.mit.edu (localhost [IPv6:::1]) by khavrinen.lcs.mit.edu (8.12.9/8.12.9) with ESMTP id h7PHZ96X094225 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK CN=khavrinen.lcs.mit.edu issuer=SSL+20Client+20CA); Mon, 25 Aug 2003 13:35:09 -0400 (EDT) (envelope-from wollman@khavrinen.lcs.mit.edu) Received: (from wollman@localhost) by khavrinen.lcs.mit.edu (8.12.9/8.12.9/Submit) id h7PHZ9bd094222; Mon, 25 Aug 2003 13:35:09 -0400 (EDT) (envelope-from wollman) Date: Mon, 25 Aug 2003 13:35:09 -0400 (EDT) From: Garrett Wollman Message-Id: <200308251735.h7PHZ9bd094222@khavrinen.lcs.mit.edu> To: "Daniel C. Sobral" In-Reply-To: <3F4A43EA.9090500@tcoip.com.br> References: <3F4A1CE2.6080806@freebsd.org> <20030825164907.GA17503@dragon.nuxi.com> <3F4A43EA.9090500@tcoip.com.br> X-Spam-Score: -19.8 () IN_REP_TO,QUOTED_EMAIL_TEXT,REFERENCES,REPLY_WITH_QUOTES X-Scanned-By: MIMEDefang 2.33 (www . roaringpenguin . com / mimedefang) cc: current@freebsd.org Subject: Re: HTT on current X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 25 Aug 2003 17:35:14 -0000 < said: > There are two problems with HTT. First, L1/L2 cache issues. Second, the > virtual CPUs are not independent, and there are many cases where > instructions in one virtual CPU stall the other. So take, for example, > the case of a userland application on CPU0 stalling the kernel on CPU1. I don't think that this is quite stated right. The problem is that the P4 is not very wide to begin with, and it's very hard to optimize well for that 23-stage pipeline.[1] So if you have a thread with lots of latent ILP (either because you did a good job optimizing it for a four-way superscalar, or because you did a bad job scheduling it and are depending on the processor to make up for the naive optimization), it is bound to run more slowly when some of the functional units it could have used are taken by another thread of execution. But some sorts of applications can benefit, if the application can be decomposed into threads that exercise different FUs (for example, one thread that is memory intensive and one thread that is compute intensive). The challenge then is to make sure that they always get scheduled on the same processor at the same time. The key to getting good performace on an SMT architecture with an arbitrary instruction mix is more functional units. The never-built Alpha EV8, which was to be an eight-way superscalar with four-way SMT and a wide memory bus, would be much easier with which to achieve optimum performance. -GAWollman [1] That's why the Athlon gets more instructions per cycle: it has a much shallower pipeline and more functional units, so it can execute naively-optimized, ILP-heavy code much faster without stalling.