Date: Tue, 7 May 1996 06:06:09 -0700 (PDT) From: asami@cs.berkeley.edu (Satoshi Asami) To: bde@zeta.org.au Cc: current@FreeBSD.org, nisha@cs.berkeley.edu, marc@bowtie.nl, ken@area238.residence.gatech.edu, wollman@lcs.mit.edu, wscott@ichips.intel.com, pattrsn@cs.berkeley.edu, culler@cs.berkeley.edu Subject: Re: more on fast bcopy Message-ID: <199605071306.GAA04277@silvia.HIP.Berkeley.EDU> In-Reply-To: <199605061207.WAA04793@godzilla.zeta.org.au> (message from Bruce Evans on Mon, 6 May 1996 22:07:13 %2B1000)
next in thread | previous in thread | raw e-mail | index | archive | help
Wayne: * Pentium Pro 200/256 * 128 Meg memory 2-way interleave * B-step Orion chipset * * The interesting results is that 'libc' is MUCH faster than any * of the other results. * * We implimented a fast string copy mode for 'rep movs' that kicks * in at about 128 elements. Yes, indeed, this is very interesting. I guess whatever I'm doing with all this is going to be moot once we all move to P6's. ;) By the way, I assume the external clock of the 200MHz P6 is 66MHz, is this correct? The memory copy speed of this machine seems to be slower than the Triton-based P5-133 that we have (see below). Do you know where the "B-step" Orion stands on the maturity curve, in terms of memory access speed? Bruce: * Why not? :-) It should be possible to use the fpu after saving and * restoring the FP registers reentrantly. ^^^^^^^^^^^ Yeah, we were running into problems with this. Can you tell us how to do it? ;) * >We've got 67MB/s on the 133MHz Pentium + Triton here. Wow. * * Same here. An FP method seems to be the fastest way of bzeroing * uncached memory too. I get about 150MB/sec for an FP based bzero and * about 85MB/sec (max) for all reasonable integer register based versions. I see. By the way, we tried unrolling the loops even more, and actually got up to 80MB/s for FP and 60MB/s for integer registers (this is for bcopy). I put the results on our machines as well as others on http://stampede.cs.berkeley.edu/~asami/Td/bcopy.html please take a look. If you would want to contribute, please grab ftp://stampede.cs.berkeley.edu/pub/bcopy/bcopy-960507.tar.gz and follow the instructions. Here is a brief summary: Name | CPU | Chipset | bcopy speed (MB/s) | | | libc unrolled-int unrolled-FP -----------+---------+---------+------------------------------- Wayne's | P6-200 | Orion-B | 47 36- 36- Garrett's | P6-150 | Orion-? | 26 27= 27= luke | P5-133 | Triton | 40 60 80 obiwan | P5-100 | SiS | 23 29 45 stampede | P5-90 | Neptune | 22 23= 44 Marc's | P5-90 | Pluto | 20 20= 32 Kenneth's | 486-100 | SiS | 10 10= 8- "=" means it's not much faster than libc, "-" means it's slower than libc. It's pretty clear that the FP trick only helps for Pentiums. Satoshi
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199605071306.GAA04277>