From owner-freebsd-hackers Sat Dec 23 18:25:03 1995 Return-Path: owner-hackers Received: (from root@localhost) by freefall.freebsd.org (8.7.3/8.7.3) id SAA12644 for hackers-outgoing; Sat, 23 Dec 1995 18:25:03 -0800 (PST) Received: from insanus.matematik.su.se (insanus.matematik.su.se [130.237.198.12]) by freefall.freebsd.org (8.7.3/8.7.3) with ESMTP id SAA12639 for ; Sat, 23 Dec 1995 18:25:00 -0800 (PST) Received: from localhost (prudens.matematik.su.se [130.237.198.5]) by insanus.matematik.su.se (8.7.1/8.6.9) with ESMTP id DAA26871; Sun, 24 Dec 1995 03:24:44 +0100 (MET) Message-Id: <199512240224.DAA26871@insanus.matematik.su.se> X-Address: Department of Mathematics, Stockholm University S-106 91 Stockholm SWEDEN X-Phone: int+46 8 162000 X-Fax: int+46 8 6126717 X-Url: http://www.matematik.su.se To: michael butler cc: tege@matematik.su.se (Torbjorn Granlund), freebsd-hackers@freebsd.org Subject: Re: Pentium bcopy In-reply-to: Your message of "Sun, 24 Dec 1995 12:57:48 +1100." <199512240157.MAA09624@asstdc.scgt.oz.au> Date: Sun, 24 Dec 1995 03:24:42 +0100 From: Torbjorn Granlund Sender: owner-hackers@freebsd.org Precedence: bulk > The reason that this is so much faster is that it uses the dual-ported > cache is a near-optimal way. Does this approach demonstrate any significant penalties with less sophisticated cache architectures, for example 386DX or non-pipelined ? The approach has a significant penalty on a 386 (3x slower). I suspect it might be a tad bit slower on a 486 with a write-through L1 cache. But the approach should help on 486 systems with write-back cache. I don't have any 486 systems, so I cannot tell for sure. Here is a simple test program that you can use for timing tests: #include #include unsigned long cputime () { struct rusage rus; getrusage (0, &rus); return rus.ru_utime.tv_sec * 1000 + rus.ru_utime.tv_usec / 1000; } #ifndef SIZE #define SIZE 1000 #endif main () { int s[SIZE], d[SIZE]; int i; long t0; t0 = cputime (); for (i = 0; i < 100000; i++) copy (d, s, SIZE); printf ("copy %ld\n", cputime () - t0); t0 = cputime (); for (i = 0; i < 100000; i++) memcpy (d, s, SIZE * sizeof (int)); printf ("memcpy %ld\n", cputime () - t0); exit (0); }