Date: Sun, 24 Dec 1995 21:06:13 +1100 From: Bruce Evans <bde@zeta.org.au> To: imb@scgt.oz.au, tege@matematik.su.se Cc: freebsd-hackers@freebsd.org Subject: Re: Pentium bcopy Message-ID: <199512241006.VAA25049@godzilla.zeta.org.au>
next in thread | raw e-mail | index | archive | help
> > The reason that this is so much faster is that it uses the dual-ported
> > cache is a near-optimal way.
> Does this approach demonstrate any significant penalties with less
> sophisticated cache architectures, for example 386DX or non-pipelined ?
>The approach has a significant penalty on a 386 (3x slower).
>I suspect it might be a tad bit slower on a 486 with a write-through L1
>cache. But the approach should help on 486 systems with write-back cache.
>I don't have any 486 systems, so I cannot tell for sure. Here is a simple
>test program that you can use for timing tests:
On my 486DX2/66 with an unknown writing strategy, copy() is about 20%
faster than memcpy() (*) but can be improved another 20% by changing the
cache line allocation strategy slightly: replace the load of 28(%edi) by
a load of 12(%edi) and add a load of 28(%edi) in the middle of the loop.
The pairing stuff and the nops make little difference. cache-line
alignment of the source and target made little difference.
(*) When memcpy() is run a second time, it is as fast as the fastest
version as copy()!
On my 486DX/33 with a "write buffer" (which is faster than "write back"
on the same machine), the fancy copies are all much the same speed, the
speed of memcpy() is independent of the cache state and is 30% faster
than the speed of the fancy copies.
>unsigned long
>cputime ()
>{
> struct rusage rus;
> getrusage (0, &rus);
> return rus.ru_utime.tv_sec * 1000 + rus.ru_utime.tv_usec / 1000;
^^^^^^
>}
Not accurate enough. Use weights of 1000000 and 1 instead of 1000
and 1/1000, or double precision.
Actual results:
function 486DX2/66 486DX/33
-------- --------- --------
memcpy 11353454 9242061
copy 9389321 12595028
copy1 6841713 12888324
copy2 7055773 12823391
memcpy 6952372 9219855
copy1() is copy() with the above changes. copy2() is copy1() with
half as much unrolling and only one word copied at a time.
Bruce
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199512241006.VAA25049>
