From owner-freebsd-hackers  Sat Dec 23 18:25:03 1995
Return-Path: owner-hackers
Received: (from root@localhost)
          by freefall.freebsd.org (8.7.3/8.7.3) id SAA12644
          for hackers-outgoing; Sat, 23 Dec 1995 18:25:03 -0800 (PST)
Received: from insanus.matematik.su.se (insanus.matematik.su.se [130.237.198.12])
          by freefall.freebsd.org (8.7.3/8.7.3) with ESMTP id SAA12639
          for <freebsd-hackers@freebsd.org>; Sat, 23 Dec 1995 18:25:00 -0800 (PST)
Received: from localhost (prudens.matematik.su.se [130.237.198.5]) by insanus.matematik.su.se (8.7.1/8.6.9) with ESMTP id DAA26871; Sun, 24 Dec 1995 03:24:44 +0100 (MET)
Message-Id: <199512240224.DAA26871@insanus.matematik.su.se>
X-Address: Department of Mathematics, Stockholm University 
	      S-106 91  Stockholm
	      SWEDEN
X-Phone: int+46 8 162000
X-Fax:   int+46 8 6126717
X-Url:   http://www.matematik.su.se
To: michael butler <imb@scgt.oz.au>
cc: tege@matematik.su.se (Torbjorn Granlund), freebsd-hackers@freebsd.org
Subject: Re: Pentium bcopy 
In-reply-to: Your message of "Sun, 24 Dec 1995 12:57:48 +1100."
             <199512240157.MAA09624@asstdc.scgt.oz.au> 
Date: Sun, 24 Dec 1995 03:24:42 +0100
From: Torbjorn Granlund <tege@matematik.su.se>
Sender: owner-hackers@freebsd.org
Precedence: bulk

  > The reason that this is so much faster is that it uses the dual-ported
  > cache is a near-optimal way.

  Does this approach demonstrate any significant penalties with less
  sophisticated cache architectures, for example 386DX or non-pipelined ?

The approach has a significant penalty on a 386 (3x slower).

I suspect it might be a tad bit slower on a 486 with a write-through L1
cache.  But the approach should help on 486 systems with write-back cache.

I don't have any 486 systems, so I cannot tell for sure.  Here is a simple
test program that you can use for timing tests:

#include <sys/time.h>
#include <sys/resource.h>

unsigned long
cputime ()
{
  struct rusage rus;

  getrusage (0, &rus);
  return rus.ru_utime.tv_sec * 1000 + rus.ru_utime.tv_usec / 1000;
}

#ifndef SIZE
#define SIZE 1000
#endif

main ()
{
  int s[SIZE], d[SIZE];
  int i;
  long t0;

  t0 = cputime ();
  for (i = 0; i < 100000; i++)
    copy (d, s, SIZE);
  printf ("copy %ld\n", cputime () - t0);

  t0 = cputime ();
  for (i = 0; i < 100000; i++)
    memcpy (d, s, SIZE * sizeof (int));
  printf ("memcpy %ld\n", cputime () - t0);

  exit (0);
}