From owner-freebsd-arch@FreeBSD.ORG Wed May 31 23:25:22 2006 Return-Path: X-Original-To: freebsd-arch@freebsd.org Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id E7E4516A992; Wed, 31 May 2006 23:25:22 +0000 (UTC) (envelope-from bde@zeta.org.au) Received: from mailout1.pacific.net.au (mailout1.pacific.net.au [61.8.0.84]) by mx1.FreeBSD.org (Postfix) with ESMTP id 31FD943D70; Wed, 31 May 2006 23:25:22 +0000 (GMT) (envelope-from bde@zeta.org.au) Received: from mailproxy1.pacific.net.au (mailproxy1.pacific.net.au [61.8.0.86]) by mailout1.pacific.net.au (Postfix) with ESMTP id DA491427E25; Thu, 1 Jun 2006 09:25:20 +1000 (EST) Received: from katana.zip.com.au (katana.zip.com.au [61.8.7.246]) by mailproxy1.pacific.net.au (8.13.4/8.13.4/Debian-3sarge1) with ESMTP id k4VNPHvJ001167; Thu, 1 Jun 2006 09:25:18 +1000 Date: Thu, 1 Jun 2006 09:25:17 +1000 (EST) From: Bruce Evans X-X-Sender: bde@delplex.bde.org To: rookie@gufi.org In-Reply-To: <3bbf2fe10605311329h7adc1722j9088253515e0265b@mail.gmail.com> Message-ID: <20060601084052.D32549@delplex.bde.org> References: <3bbf2fe10605311156p7e629283r34d22b368877582d@mail.gmail.com> <447DFA0C.20207@FreeBSD.org> <3bbf2fe10605311329h7adc1722j9088253515e0265b@mail.gmail.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-hackers@freebsd.org, Suleiman Souhlal , freebsd-arch@freebsd.org Subject: Re: [patch] Adding optimized kernel copying support - Part III X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 31 May 2006 23:25:29 -0000 On Wed, 31 May 2006, Attilio Rao wrote: > 2006/5/31, Suleiman Souhlal : >> Nice work. Any chance you could also port it to amd64? :-) > > Not in the near future, I think. :P It is not useful for amd64. An amd64 has enough instruction bandwidth to saturate the L1 cache using 64-bit accesses although not using 32-bit accesses. An amd64 has 64-bit integer registers which can be accesses without the huge setup overheads and code complications for MMX/XMM registers. It already uses 64-bit registers or 64-bit movs for copying and zeroing of course. Perhaps it should use prefetches and nontemporal writes more than it already does, but these don't require using SSE2 instructions like nontemporal writes do for 32-bit CPUs. >> Does that mean it won't work with SMP and PREEMPTION? > > Yes it will work (even if I think it needs more testing) but maybe > would give lesser performances on SMP|PREEMPTION due to too much > traffic on memory/cache. For this I was planing to use non-temporal > instructions > (obviously benchmarks would be very appreciate). Er, isn't its main point to fix some !SMP assumptions made in the old copying-through-the-FPU code? (The old code is messy due to its avoidance of global changes. It wants to preserve the FPU state on the stack, but this doesn't quite work so it does extra things (still mostly locally) that only work in the !SMP && (!SMPng even with UP) case. Patching this approach to work with SMP || SMPng cases would make it messier.) The new code wouldn't behave much differently under SMP. It just might be a smaller optimization because more memory pressure for SMP causes more cache misses for everything and there are no benefits from copying through MMX/XMM unless nontemporal writes are used. All (?) CPUs with MMX or SSE* can saturate main memory using 32-bit instructions. On 32-bit CPUs, the benefits of using MMX/XMM come from being able to saturate the L1 cache on some CPUs (mainly Athlons and not P[2-4]), and from being able to use nontemporal writes on some CPUs (at least AthlonXP via SSE extensions all CPUs with SSE2). Bruce