From owner-freebsd-arch@FreeBSD.ORG Sun Jan 21 04:03:52 2007 Return-Path: X-Original-To: freebsd-arch@FreeBSD.org Delivered-To: freebsd-arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id EA8FC16A402; Sun, 21 Jan 2007 04:03:52 +0000 (UTC) (envelope-from bde@zeta.org.au) Received: from mailout1.pacific.net.au (mailout1-3.pacific.net.au [61.8.2.210]) by mx1.freebsd.org (Postfix) with ESMTP id 84B8813C459; Sun, 21 Jan 2007 04:03:52 +0000 (UTC) (envelope-from bde@zeta.org.au) Received: from mailproxy1.pacific.net.au (mailproxy1.pacific.net.au [61.8.2.162]) by mailout1.pacific.net.au (Postfix) with ESMTP id 29CF5328299; Sun, 21 Jan 2007 15:03:51 +1100 (EST) Received: from besplex.bde.org (katana.zip.com.au [61.8.7.246]) by mailproxy1.pacific.net.au (Postfix) with ESMTP id 9D5438C04; Sun, 21 Jan 2007 15:03:49 +1100 (EST) Date: Sun, 21 Jan 2007 15:03:48 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: David Malone In-Reply-To: <20070120215103.GA93101@walton.maths.tcd.ie> Message-ID: <20070121140716.W4007@besplex.bde.org> References: <20070117134022.V18339@besplex.bde.org> <20070117224812.Q23194@besplex.bde.org> <45AE7BF8.10703@fer.hr> <3bbf2fe10701171315g696bca4fi3bf676b62c06f4d@mail.gmail.com> <20070118094808.F11834@delplex.bde.org> <20070120215103.GA93101@walton.maths.tcd.ie> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: Attilio Rao , freebsd-current@FreeBSD.org, Ivan Voras , freebsd-arch@FreeBSD.org Subject: Re: Optimized copy&move (was: Re: [PATCH] Mantaining turnstile aligned to 128 bytes in i386 CPUs) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 21 Jan 2007 04:03:53 -0000 On Sat, 20 Jan 2007, David Malone wrote: > On Thu, Jan 18, 2007 at 11:16:19AM +1100, Bruce Evans wrote: >> - the FPU routines are faster on Athlons (XP and 64 at least), but these >> didn't exist until 2001. The introduction of these CPUs may have >> been the trigger for turning off the FPU routines in -current in 2001. >> Until then problems were limited to Pentium-1's since the dynamic >> configuration prevented the routines being used on all other machines. > > I think a very quirky K6-2 machine that I had let us reproduce the > problem fairly dependably and may have been part of the reason it > was finally turned off. I just looked again at your old (2001) mail about this. The userland benchmark was flawed. It tried 3 methods sequentially without warming up caches, so all methods did unintended testing of I-cache misses (including branch target cache cache) and the first method (userland bzero) warmed up the D-cache for the other 2. The kernel runtime configuration also fails to either warm or cool the caches initially. It assumes P1 cache sizes and depends on a 1MB buffer being much larger than caches. Maybe this was not enough for K6-2. It is certainly not enough for Athlon64, but I think it would mostly cause false negatives so I don't understand why it gave a false positive for the K6-2. After fixing the userland benchmark, userland bzero did much better and your benchmark agreed with mine that FPU methods for bzero are just pessimizations on A64-AXP. However, the behaviour for bcopy is quite different on A64-AXP -- even the old FPU methods are small optimizations in some cases (on A64, about 25% in the fully-L2 cached case; little difference for other large copies). Bruce