From owner-freebsd-current@FreeBSD.ORG  Wed Jan 17 12:00:46 2007
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
X-Original-To: freebsd-current@freebsd.org
Delivered-To: freebsd-current@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 5D73B16A5B3;
	Wed, 17 Jan 2007 12:00:46 +0000 (UTC) (envelope-from bde@zeta.org.au)
Received: from mailout2.pacific.net.au (mailout2-3.pacific.net.au [61.8.2.226])
	by mx1.freebsd.org (Postfix) with ESMTP id EDACB13C44B;
	Wed, 17 Jan 2007 12:00:45 +0000 (UTC) (envelope-from bde@zeta.org.au)
Received: from mailproxy1.pacific.net.au (mailproxy1.pacific.net.au
	[61.8.2.162])
	by mailout2.pacific.net.au (Postfix) with ESMTP id 5DC1D6E17F;
	Wed, 17 Jan 2007 23:00:42 +1100 (EST)
Received: from besplex.bde.org (katana.zip.com.au [61.8.7.246])
	by mailproxy1.pacific.net.au (Postfix) with ESMTP id 580638C02;
	Wed, 17 Jan 2007 23:00:43 +1100 (EST)
Date: Wed, 17 Jan 2007 23:00:42 +1100 (EST)
From: Bruce Evans <bde@zeta.org.au>
X-X-Sender: bde@besplex.bde.org
To: Ivan Voras <ivoras@fer.hr>
In-Reply-To: <20070117134022.V18339@besplex.bde.org>
Message-ID: <20070117224812.Q23194@besplex.bde.org>
References: <3bbf2fe10607250813w8ff9e34pc505bf290e71758@mail.gmail.com>
	<3bbf2fe10607251004wf94e238xb5ea7a31c973817f@mail.gmail.com>
	<3bbf2fe10607261127p3f01a6c3w80027754f7d4e594@mail.gmail.com>
	<3bbf2fe10607281004o6727e976h19ee7e054876f914@mail.gmail.com>
	<3bbf2fe10701160851r79b04464m2cbdbb7f644b22b6@mail.gmail.com>
	<20070116154258.568e1aaf@pleiades.nextvenue.com>
	<b1fa29170701161355lc021b90o35fa5f9acb5749d@mail.gmail.com>
	<eoji7s$cit$2@sea.gmane.org>
	<b1fa29170701161425n7bcfe1e5m1b8c671caf3758db@mail.gmail.com>
	<eojlnb$qje$1@sea.gmane.org>
	<b1fa29170701161534n1f6c3803tbb8ca60996d200d9@mail.gmail.com>
	<eojok9$449$1@sea.gmane.org> <20070117134022.V18339@besplex.bde.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Mailman-Approved-At: Wed, 17 Jan 2007 12:36:37 +0000
Cc: freebsd-current@freebsd.org, freebsd-arch@freebsd.org
Subject: Re: [PATCH] Mantaining turnstile aligned to 128 bytes in i386 CPUs
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
	<freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>, 
	<mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 17 Jan 2007 12:00:46 -0000

On Wed, 17 Jan 2007, I wrote:

> ...
> P4 (nosedive's Xeon): movdqa 17% faster than movsl, but all other cached
>   moves slower using MMX or SSE[1-2]; movnt with block prefetch 60% faster
>   than movsl with no prefetch, but < 5% faster with no prefetch for both.
> AXP: (my 5 year old system with a newer CPU): movq through MMX is 60%
>   faster than movsl for cached moves, but movdqa through XMM is only 4%
>   faster.  movnt with block prefetch is 155% faster than movsl with no
>   prefetch, and 73% faster with no prefetch for both.
> A64 in 32-bit mode: in between P4 and AXP (closer to AXP).  movsl doesn't
>   lose by so much, and prefetchnta actually works so block prefetch is
>   not needed and there is a better chance of prefetching helping more
>   than benchmarks.

And MMX/XMM registers ar not needed to get movnt on machines with SSE2,
since movnti is part of SSE2.  This reduces the advantages of using MMX/XMM
registers on P4's and A64's in 32-bit mode to the non-nt parts of the
above (fully cached case), which I think are less important than the nt
parts.

Another complication with movnt is that its semantics are very machine-
dependent.  On AXP, movnt to a target that happens to be in the L1
cache goes at L1 cache speed, so it is probably good to use movnt
blindly (except movnti doesn't exist so you can't just substitute movl
with movnti and must use XMM registers with all their complications),
but on P4 and A64, movnt to a cached target goes at main memory speed
so you only want to use it intentionally to avoid thrashing the caches.

Bruce