From owner-freebsd-arch@FreeBSD.ORG  Wed Jan 17 04:50:45 2007
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
X-Original-To: freebsd-arch@FreeBSD.org
Delivered-To: freebsd-arch@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 9EDD416A40F;
	Wed, 17 Jan 2007 04:50:45 +0000 (UTC) (envelope-from bde@zeta.org.au)
Received: from mailout2.pacific.net.au (mailout2-3.pacific.net.au [61.8.2.226])
	by mx1.freebsd.org (Postfix) with ESMTP id 3A90F13C428;
	Wed, 17 Jan 2007 04:50:45 +0000 (UTC) (envelope-from bde@zeta.org.au)
Received: from mailproxy2.pacific.net.au (mailproxy2.pacific.net.au
	[61.8.2.163])
	by mailout2.pacific.net.au (Postfix) with ESMTP id D19F46E2C7;
	Wed, 17 Jan 2007 15:50:41 +1100 (EST)
Received: from besplex.bde.org (katana.zip.com.au [61.8.7.246])
	by mailproxy2.pacific.net.au (Postfix) with ESMTP id C6C8A2741B;
	Wed, 17 Jan 2007 15:50:42 +1100 (EST)
Date: Wed, 17 Jan 2007 15:50:41 +1100 (EST)
From: Bruce Evans <bde@zeta.org.au>
X-X-Sender: bde@besplex.bde.org
To: Ivan Voras <ivoras@fer.hr>
In-Reply-To: <eojok9$449$1@sea.gmane.org>
Message-ID: <20070117134022.V18339@besplex.bde.org>
References: <3bbf2fe10607250813w8ff9e34pc505bf290e71758@mail.gmail.com>
	<3bbf2fe10607251004wf94e238xb5ea7a31c973817f@mail.gmail.com>
	<3bbf2fe10607261127p3f01a6c3w80027754f7d4e594@mail.gmail.com>
	<3bbf2fe10607281004o6727e976h19ee7e054876f914@mail.gmail.com>
	<3bbf2fe10701160851r79b04464m2cbdbb7f644b22b6@mail.gmail.com>
	<20070116154258.568e1aaf@pleiades.nextvenue.com>
	<b1fa29170701161355lc021b90o35fa5f9acb5749d@mail.gmail.com>
	<eoji7s$cit$2@sea.gmane.org>
	<b1fa29170701161425n7bcfe1e5m1b8c671caf3758db@mail.gmail.com>
	<eojlnb$qje$1@sea.gmane.org>
	<b1fa29170701161534n1f6c3803tbb8ca60996d200d9@mail.gmail.com>
	<eojok9$449$1@sea.gmane.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: freebsd-current@FreeBSD.org, freebsd-arch@FreeBSD.org
Subject: Re: [PATCH] Mantaining turnstile aligned to 128 bytes in i386 CPUs
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 17 Jan 2007 04:50:45 -0000

On Wed, 17 Jan 2007, Ivan Voras wrote:

> Kip Macy wrote:
>>> Maybe even someone finds a way to get optimized versions of memcpy in
>>> the kernel :)
>
>> It makes a huge difference in a proprietary file serving appliance
>> that I know of.
>
> Beneficial difference?

Heheh.

>> However, past measurements on FreeBSD have supposedly
>> indicated that it isn't that big win as a result of increased context
>> switch time.

No, they indicated that the win is not very large (sometimes negative),
and is very machine dependent.  E.g., it is a small pessimization all 64
bit i386's running 64-bit mode -- that's just all i386's you would want
to buy now.  On other CPU classes:

P2 (my old Celeron): +- epsilon difference
P3 (freefall): +- epsilon difference
P4 (nosedive's Xeon): movdqa 17% faster than movsl, but all other cached
    moves slower using MMX or SSE[1-2]; movnt with block prefetch 60% faster
    than movsl with no prefetch, but < 5% faster with no prefetch for both.
AXP: (my 5 year old system with a newer CPU): movq through MMX is 60%
    faster than movsl for cached moves, but movdqa through XMM is only 4%
    faster.  movnt with block prefetch is 155% faster than movsl with no
    prefetch, and 73% faster with no prefetch for both.
A64 in 32-bit mode: in between P4 and AXP (closer to AXP).  movsl doesn't
    lose by so much, and prefetchnta actually works so block prefetch is
    not needed and there is a better chance of prefetching helping more
    than benchmarks.

Bruce