From owner-freebsd-current@FreeBSD.ORG  Mon Aug 25 10:35:14 2003
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
Delivered-To: freebsd-current@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 7E86816A4BF
	for <current@freebsd.org>; Mon, 25 Aug 2003 10:35:14 -0700 (PDT)
Received: from khavrinen.lcs.mit.edu (khavrinen.lcs.mit.edu [18.24.4.193])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 924E743FEC
	for <current@freebsd.org>; Mon, 25 Aug 2003 10:35:13 -0700 (PDT)
	(envelope-from wollman@khavrinen.lcs.mit.edu)
Received: from khavrinen.lcs.mit.edu (localhost [IPv6:::1])
	by khavrinen.lcs.mit.edu (8.12.9/8.12.9) with ESMTP id h7PHZ96X094225
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256
	verify=OK CN=khavrinen.lcs.mit.edu issuer=SSL+20Client+20CA);
	Mon, 25 Aug 2003 13:35:09 -0400 (EDT)
	(envelope-from wollman@khavrinen.lcs.mit.edu)
Received: (from wollman@localhost)
	by khavrinen.lcs.mit.edu (8.12.9/8.12.9/Submit) id h7PHZ9bd094222;
	Mon, 25 Aug 2003 13:35:09 -0400 (EDT)
	(envelope-from wollman)
Date: Mon, 25 Aug 2003 13:35:09 -0400 (EDT)
From: Garrett Wollman <wollman@khavrinen.lcs.mit.edu>
Message-Id: <200308251735.h7PHZ9bd094222@khavrinen.lcs.mit.edu>
To: "Daniel C. Sobral" <dcs@tcoip.com.br>
In-Reply-To: <3F4A43EA.9090500@tcoip.com.br>
References: <JCEIKJMCANNPGKFKGLKLOENEDJAA.mikej@trigger.net>
	<3F4A1CE2.6080806@freebsd.org>
	<20030825164907.GA17503@dragon.nuxi.com>
	<3F4A43EA.9090500@tcoip.com.br>
X-Spam-Score: -19.8 ()
	IN_REP_TO,QUOTED_EMAIL_TEXT,REFERENCES,REPLY_WITH_QUOTES
X-Scanned-By: MIMEDefang 2.33 (www . roaringpenguin . com / mimedefang)
cc: current@freebsd.org
Subject: Re: HTT on current
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
	<freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 25 Aug 2003 17:35:14 -0000

<<On Mon, 25 Aug 2003 14:14:18 -0300, "Daniel C. Sobral" <dcs@tcoip.com.br> said:

> There are two problems with HTT. First, L1/L2 cache issues. Second, the 
> virtual CPUs are not independent, and there are many cases where 
> instructions in one virtual CPU stall the other. So take, for example, 
> the case of a userland application on CPU0 stalling the kernel on CPU1.

I don't think that this is quite stated right.  The problem is that
the P4 is not very wide to begin with, and it's very hard to optimize
well for that 23-stage pipeline.[1]  So if you have a thread with lots
of latent ILP (either because you did a good job optimizing it for a
four-way superscalar, or because you did a bad job scheduling it and
are depending on the processor to make up for the naive optimization),
it is bound to run more slowly when some of the functional units it
could have used are taken by another thread of execution.  But some
sorts of applications can benefit, if the application can be
decomposed into threads that exercise different FUs (for example, one
thread that is memory intensive and one thread that is compute
intensive).  The challenge then is to make sure that they always get
scheduled on the same processor at the same time.

The key to getting good performace on an SMT architecture with an
arbitrary instruction mix is more functional units.  The never-built
Alpha EV8, which was to be an eight-way superscalar with four-way SMT
and a wide memory bus, would be much easier with which to achieve
optimum performance.

-GAWollman

[1] That's why the Athlon gets more instructions per cycle: it has a
much shallower pipeline and more functional units, so it can execute
naively-optimized, ILP-heavy code much faster without stalling.