Date: Mon, 5 Mar 2001 16:13:17 -0800 (PST) From: Matt Dillon <dillon@earth.backplane.com> To: Chris Dillon <cdillon@wolves.k12.mo.us> Cc: "E.B. Dreger" <eddy+public+spam@noc.everquick.net>, <freebsd-hackers@FreeBSD.ORG> Subject: Re: Machines are getting too damn fast Message-ID: <200103060013.f260DHY46910@earth.backplane.com> References: <Pine.BSF.4.32.0103051729350.84853-100000@mail.wolves.k12.mo.us>
next in thread | previous in thread | raw e-mail | index | archive | help
:IIRC, Intel is using a very different caching method on the P4 from :what we are used to on just about every other x86 processor we've :seen. Well, I can't remember if the data cache has changed much, but :the instruction cache has. I doubt the difference in instruction :cache behaviour would make a difference here though. Hmm. : :I wonder if it makes any difference that I'm using -march=pentium :-mcpu=pentium for my CFLAGS? Actually, the kernel I tested on might :even be using -march/-mcpu=pentiumpro, since I only recently changed :it to =pentium to allow me to do buildworlds for another Pentium-class :machine. I did wonder the same thing a while back and did the same :test with and without the optimizations, and with pentiumpro opts the :big block size transfer rate went _down_ a little bit, which was odd. :I didn't compare with L2-cache-friendly blocks, though. : :-- Chris Dillon - cdillon@wolves.k12.mo.us - cdillon@inter-linc.net I modified my original C program again, this time to simply read the data from memory given a block size in kilobytes as an argument. I had to throw in a little __asm to do it right, but here are my results. It shows about 3.2 GBytes/sec from the L2 (well, insofar as my 3-instruction loop goes), and about 1.4 GBytes/sec from main memory. NOTE: cc x.c -O2 -o x ./x 4 3124.96 MBytes/sec (read) ./x 8 3242.45 MBytes/sec (read) ./x 16 3060.93 MBytes/sec (read) ./x 32 3359.97 MBytes/sec (read) ./x 64 3362.06 MBytes/sec (read) ./x 128 3365.53 MBytes/sec (read) ./x 240 3307.86 MBytes/sec (read) ./x 256 3232.33 MBytes/sec (read) ./x 512 1396.45 MBytes/sec (read) ./x 1024 1397.90 MBytes/sec (read) In contrast I get 1052.50 MBytes/sec on the Dell 2400 from the L2, and 444 MBytes/sec from main memory. -Matt /* * NOTE: cc x.c -O2 -o x */ #include <sys/types.h> #include <sys/time.h> #include <stdio.h> #include <stdlib.h> #include <stdarg.h> #include <unistd.h> int deltausecs(struct timeval *tv1, struct timeval *tv2); int main(int ac, char **av) { int i; int bytes; double dtime; struct timeval tv1; struct timeval tv2; char *buf; if (ac == 1) { fprintf(stderr, "%s numKB\n", av[0]); exit(1); } bytes = strtol(av[1], NULL, 0) * 1024; if (bytes < 4 * 1024 || bytes > 256 * 1024 * 1024) { fprintf(stderr, "Oh please. Try a reasonable value\n"); exit(1); } buf = malloc(bytes); if (buf == NULL) { perror("malloc"); exit(1); } bzero(buf, bytes); gettimeofday(&tv1, NULL); for (i = 0; i < 1000000000; i += bytes) { register int j; for (j = bytes - 4; j >= 0; j -= 4) __asm __volatile("movl (%0,%1),%%eax" : "=r" (buf), "=r" (j) : "0" (buf), "1" (j) : "ax" ); } gettimeofday(&tv2, NULL); dtime = (double)deltausecs(&tv1, &tv2); printf("%6.2f MBytes/sec (read)\n", (double)1000000000 / dtime); return(0); } int deltausecs(struct timeval *tv1, struct timeval *tv2) { int usec; usec = (tv2->tv_usec + 1000000 - tv1->tv_usec); usec += (tv2->tv_sec - tv1->tv_sec - 1) * 1000000; return(usec); } To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200103060013.f260DHY46910>