Date: Tue, 7 May 1996 06:06:09 -0700 (PDT) From: asami@cs.berkeley.edu (Satoshi Asami) To: bde@zeta.org.au Cc: current@FreeBSD.org, nisha@cs.berkeley.edu, marc@bowtie.nl, ken@area238.residence.gatech.edu, wollman@lcs.mit.edu, wscott@ichips.intel.com, pattrsn@cs.berkeley.edu, culler@cs.berkeley.edu Subject: Re: more on fast bcopy Message-ID: <199605071306.GAA04277@silvia.HIP.Berkeley.EDU> In-Reply-To: <199605061207.WAA04793@godzilla.zeta.org.au> (message from Bruce Evans on Mon, 6 May 1996 22:07:13 %2B1000)
index | next in thread | previous in thread | raw e-mail
Wayne:
* Pentium Pro 200/256
* 128 Meg memory 2-way interleave
* B-step Orion chipset
*
* The interesting results is that 'libc' is MUCH faster than any
* of the other results.
*
* We implimented a fast string copy mode for 'rep movs' that kicks
* in at about 128 elements.
Yes, indeed, this is very interesting. I guess whatever I'm doing
with all this is going to be moot once we all move to P6's. ;)
By the way, I assume the external clock of the 200MHz P6 is 66MHz, is
this correct? The memory copy speed of this machine seems to be
slower than the Triton-based P5-133 that we have (see below). Do you
know where the "B-step" Orion stands on the maturity curve, in terms
of memory access speed?
Bruce:
* Why not? :-) It should be possible to use the fpu after saving and
* restoring the FP registers reentrantly.
^^^^^^^^^^^
Yeah, we were running into problems with this. Can you tell us how to
do it? ;)
* >We've got 67MB/s on the 133MHz Pentium + Triton here. Wow.
*
* Same here. An FP method seems to be the fastest way of bzeroing
* uncached memory too. I get about 150MB/sec for an FP based bzero and
* about 85MB/sec (max) for all reasonable integer register based versions.
I see. By the way, we tried unrolling the loops even more, and
actually got up to 80MB/s for FP and 60MB/s for integer registers
(this is for bcopy).
I put the results on our machines as well as others on
http://stampede.cs.berkeley.edu/~asami/Td/bcopy.html
please take a look. If you would want to contribute, please grab
ftp://stampede.cs.berkeley.edu/pub/bcopy/bcopy-960507.tar.gz
and follow the instructions.
Here is a brief summary:
Name | CPU | Chipset | bcopy speed (MB/s)
| | | libc unrolled-int unrolled-FP
-----------+---------+---------+-------------------------------
Wayne's | P6-200 | Orion-B | 47 36- 36-
Garrett's | P6-150 | Orion-? | 26 27= 27=
luke | P5-133 | Triton | 40 60 80
obiwan | P5-100 | SiS | 23 29 45
stampede | P5-90 | Neptune | 22 23= 44
Marc's | P5-90 | Pluto | 20 20= 32
Kenneth's | 486-100 | SiS | 10 10= 8-
"=" means it's not much faster than libc, "-" means it's slower than
libc. It's pretty clear that the FP trick only helps for Pentiums.
Satoshi
help
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199605071306.GAA04277>
