Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 21 Jan 2007 15:03:48 +1100 (EST)
From:      Bruce Evans <bde@zeta.org.au>
To:        David Malone <dwmalone@maths.tcd.ie>
Cc:        Attilio Rao <attilio@FreeBSD.org>, freebsd-current@FreeBSD.org, Ivan Voras <ivoras@fer.hr>, freebsd-arch@FreeBSD.org
Subject:   Re: Optimized copy&move (was: Re: [PATCH] Mantaining turnstile aligned to 128 bytes in i386 CPUs)
Message-ID:  <20070121140716.W4007@besplex.bde.org>
In-Reply-To: <20070120215103.GA93101@walton.maths.tcd.ie>
References:  <eoji7s$cit$2@sea.gmane.org> <b1fa29170701161425n7bcfe1e5m1b8c671caf3758db@mail.gmail.com> <eojlnb$qje$1@sea.gmane.org> <b1fa29170701161534n1f6c3803tbb8ca60996d200d9@mail.gmail.com> <eojok9$449$1@sea.gmane.org> <20070117134022.V18339@besplex.bde.org> <20070117224812.Q23194@besplex.bde.org> <45AE7BF8.10703@fer.hr> <3bbf2fe10701171315g696bca4fi3bf676b62c06f4d@mail.gmail.com> <20070118094808.F11834@delplex.bde.org> <20070120215103.GA93101@walton.maths.tcd.ie>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, 20 Jan 2007, David Malone wrote:

> On Thu, Jan 18, 2007 at 11:16:19AM +1100, Bruce Evans wrote:
>> - the FPU routines are faster on Athlons (XP and 64 at least), but these
>>   didn't exist until 2001.  The introduction of these CPUs may have
>>   been the trigger for turning off the FPU routines in -current in 2001.
>>   Until then problems were limited to Pentium-1's since the dynamic
>>   configuration prevented the routines being used on all other machines.
>
> I think a very quirky K6-2 machine that I had let us reproduce the
> problem fairly dependably and may have been part of the reason it
> was finally turned off.

I just looked again at your old (2001) mail about this.  The userland
benchmark was flawed.  It tried 3 methods sequentially without warming
up caches, so all methods did unintended testing of I-cache misses
(including branch target cache cache) and the first method (userland
bzero) warmed up the D-cache for the other 2.  The kernel runtime
configuration also fails to either warm or cool the caches initially.
It assumes P1 cache sizes and depends on a 1MB buffer being much larger
than caches.  Maybe this was not enough for K6-2.  It is certainly not
enough for Athlon64, but I think it would mostly cause false negatives
so I don't understand why it gave a false positive for the K6-2.

After fixing the userland benchmark, userland bzero did much better
and your benchmark agreed with mine that FPU methods for bzero are
just pessimizations on A64-AXP.  However, the behaviour for bcopy
is quite different on A64-AXP -- even the old FPU methods are small
optimizations in some cases (on A64, about 25% in the fully-L2 cached
case; little difference for other large copies).

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20070121140716.W4007>