From owner-freebsd-current Tue May 7 06:09:06 1996 Return-Path: owner-current Received: (from root@localhost) by freefall.freebsd.org (8.7.3/8.7.3) id GAA05529 for current-outgoing; Tue, 7 May 1996 06:09:06 -0700 (PDT) Received: from silvia.HIP.Berkeley.EDU (silvia.HIP.Berkeley.EDU [136.152.64.181]) by freefall.freebsd.org (8.7.3/8.7.3) with ESMTP id GAA05523 for ; Tue, 7 May 1996 06:09:02 -0700 (PDT) Received: (from asami@localhost) by silvia.HIP.Berkeley.EDU (8.7.5/8.6.9) id GAA04277; Tue, 7 May 1996 06:06:09 -0700 (PDT) Date: Tue, 7 May 1996 06:06:09 -0700 (PDT) Message-Id: <199605071306.GAA04277@silvia.HIP.Berkeley.EDU> To: bde@zeta.org.au CC: current@FreeBSD.org, nisha@cs.berkeley.edu, marc@bowtie.nl, ken@area238.residence.gatech.edu, wollman@lcs.mit.edu, wscott@ichips.intel.com, pattrsn@cs.berkeley.edu, culler@cs.berkeley.edu In-reply-to: <199605061207.WAA04793@godzilla.zeta.org.au> (message from Bruce Evans on Mon, 6 May 1996 22:07:13 +1000) Subject: Re: more on fast bcopy From: asami@cs.berkeley.edu (Satoshi Asami) Sender: owner-current@FreeBSD.org X-Loop: FreeBSD.org Precedence: bulk Wayne: * Pentium Pro 200/256 * 128 Meg memory 2-way interleave * B-step Orion chipset * * The interesting results is that 'libc' is MUCH faster than any * of the other results. * * We implimented a fast string copy mode for 'rep movs' that kicks * in at about 128 elements. Yes, indeed, this is very interesting. I guess whatever I'm doing with all this is going to be moot once we all move to P6's. ;) By the way, I assume the external clock of the 200MHz P6 is 66MHz, is this correct? The memory copy speed of this machine seems to be slower than the Triton-based P5-133 that we have (see below). Do you know where the "B-step" Orion stands on the maturity curve, in terms of memory access speed? Bruce: * Why not? :-) It should be possible to use the fpu after saving and * restoring the FP registers reentrantly. ^^^^^^^^^^^ Yeah, we were running into problems with this. Can you tell us how to do it? ;) * >We've got 67MB/s on the 133MHz Pentium + Triton here. Wow. * * Same here. An FP method seems to be the fastest way of bzeroing * uncached memory too. I get about 150MB/sec for an FP based bzero and * about 85MB/sec (max) for all reasonable integer register based versions. I see. By the way, we tried unrolling the loops even more, and actually got up to 80MB/s for FP and 60MB/s for integer registers (this is for bcopy). I put the results on our machines as well as others on http://stampede.cs.berkeley.edu/~asami/Td/bcopy.html please take a look. If you would want to contribute, please grab ftp://stampede.cs.berkeley.edu/pub/bcopy/bcopy-960507.tar.gz and follow the instructions. Here is a brief summary: Name | CPU | Chipset | bcopy speed (MB/s) | | | libc unrolled-int unrolled-FP -----------+---------+---------+------------------------------- Wayne's | P6-200 | Orion-B | 47 36- 36- Garrett's | P6-150 | Orion-? | 26 27= 27= luke | P5-133 | Triton | 40 60 80 obiwan | P5-100 | SiS | 23 29 45 stampede | P5-90 | Neptune | 22 23= 44 Marc's | P5-90 | Pluto | 20 20= 32 Kenneth's | 486-100 | SiS | 10 10= 8- "=" means it's not much faster than libc, "-" means it's slower than libc. It's pretty clear that the FP trick only helps for Pentiums. Satoshi