From owner-freebsd-current@FreeBSD.ORG Wed Jan 17 12:00:46 2007 Return-Path: X-Original-To: freebsd-current@freebsd.org Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 5D73B16A5B3; Wed, 17 Jan 2007 12:00:46 +0000 (UTC) (envelope-from bde@zeta.org.au) Received: from mailout2.pacific.net.au (mailout2-3.pacific.net.au [61.8.2.226]) by mx1.freebsd.org (Postfix) with ESMTP id EDACB13C44B; Wed, 17 Jan 2007 12:00:45 +0000 (UTC) (envelope-from bde@zeta.org.au) Received: from mailproxy1.pacific.net.au (mailproxy1.pacific.net.au [61.8.2.162]) by mailout2.pacific.net.au (Postfix) with ESMTP id 5DC1D6E17F; Wed, 17 Jan 2007 23:00:42 +1100 (EST) Received: from besplex.bde.org (katana.zip.com.au [61.8.7.246]) by mailproxy1.pacific.net.au (Postfix) with ESMTP id 580638C02; Wed, 17 Jan 2007 23:00:43 +1100 (EST) Date: Wed, 17 Jan 2007 23:00:42 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Ivan Voras In-Reply-To: <20070117134022.V18339@besplex.bde.org> Message-ID: <20070117224812.Q23194@besplex.bde.org> References: <3bbf2fe10607250813w8ff9e34pc505bf290e71758@mail.gmail.com> <3bbf2fe10607251004wf94e238xb5ea7a31c973817f@mail.gmail.com> <3bbf2fe10607261127p3f01a6c3w80027754f7d4e594@mail.gmail.com> <3bbf2fe10607281004o6727e976h19ee7e054876f914@mail.gmail.com> <3bbf2fe10701160851r79b04464m2cbdbb7f644b22b6@mail.gmail.com> <20070116154258.568e1aaf@pleiades.nextvenue.com> <20070117134022.V18339@besplex.bde.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Mailman-Approved-At: Wed, 17 Jan 2007 12:36:37 +0000 Cc: freebsd-current@freebsd.org, freebsd-arch@freebsd.org Subject: Re: [PATCH] Mantaining turnstile aligned to 128 bytes in i386 CPUs X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 17 Jan 2007 12:00:46 -0000 On Wed, 17 Jan 2007, I wrote: > ... > P4 (nosedive's Xeon): movdqa 17% faster than movsl, but all other cached > moves slower using MMX or SSE[1-2]; movnt with block prefetch 60% faster > than movsl with no prefetch, but < 5% faster with no prefetch for both. > AXP: (my 5 year old system with a newer CPU): movq through MMX is 60% > faster than movsl for cached moves, but movdqa through XMM is only 4% > faster. movnt with block prefetch is 155% faster than movsl with no > prefetch, and 73% faster with no prefetch for both. > A64 in 32-bit mode: in between P4 and AXP (closer to AXP). movsl doesn't > lose by so much, and prefetchnta actually works so block prefetch is > not needed and there is a better chance of prefetching helping more > than benchmarks. And MMX/XMM registers ar not needed to get movnt on machines with SSE2, since movnti is part of SSE2. This reduces the advantages of using MMX/XMM registers on P4's and A64's in 32-bit mode to the non-nt parts of the above (fully cached case), which I think are less important than the nt parts. Another complication with movnt is that its semantics are very machine- dependent. On AXP, movnt to a target that happens to be in the L1 cache goes at L1 cache speed, so it is probably good to use movnt blindly (except movnti doesn't exist so you can't just substitute movl with movnti and must use XMM registers with all their complications), but on P4 and A64, movnt to a cached target goes at main memory speed so you only want to use it intentionally to avoid thrashing the caches. Bruce