Date: Tue, 07 Nov 1995 02:56:06 -0800 From: "Amancio Hasty Jr." <hasty@rah.star-gate.com> To: freebsd-hackers@freebsd.org Message-ID: <199511071056.CAA02766@rah.star-gate.com>
next in thread | raw e-mail | index | archive | help
This is a MIME-encapsulated message
- --CAA02716.815741535/rah.star-gate.com
- --CAA02716.815741535/rah.star-gate.com
Content-Type: message/rfc822
Return-Path: hasty@rah.star-gate.com
Received: from rah.star-gate.com (rah.star-gate.com [204.188.121.18]) by
rah.star-gate.com (8.6.12/8.6.9) with SMTP id CAA02714 for
<freebsd-hackers@freebsd.org>; Tue, 7 Nov 1995 02:52:12 -0800
Message-Id: <199511071052.CAA02714@rah.star-gate.com>
Date: Tue, 07 Nov 95 02:52:13 -0800
Sender: hasty
From: "Amancio Hasty, Jr." <hasty@rah.star-gate.com>
X-Mailer: Mozilla 1.1N (X11; I; FreeBSD 2.1-STABLE i386)
MIME-Version: 1.0
To: freebsd-hackers@freebsd.org
Subject: Re: A question about fast copying with a Pentium processor
X-URL: news:47lm63$6j0@ixnews3.ix.netcom.com
Content-Transfer-Encoding: 7bit
Content-Type: text/plain; charset=us-ascii
Should we start using floating point ? 8)
Thats a joke however I do think that some of you may find this interesting...
Cheers,
Amancio
mschmit@ix.netcom.com (Mike Schmit) wrote:
>In <1995Nov5.235249.8471@nmt.edu> borchers@nmt.edu (Brian Borchers) writes:
>>
>>I've got a question about coding for speed on the Pentium that has me
>>somewhat baffled. Consider the problem of copying a large number of
>>double precision numbers from one array to another. Here's C code
>>for the operation:
>>
>> for (i=0; i<=SIZE-1; i++)
>> {
>> b[i]=a[i];
>> };
>>
>>
>>Using the Gnu C Compiler version 2.6.3 (I know, I should move up to the
>>latest version, but that has nothing to do with my question) we get
>>the following code for this loop:
>>
>>L20:
>> movl (%ebx),%eax
>> movl 4(%ebx),%edx
>> movl %eax,(%ecx)
>> movl %edx,4(%ecx)
>> addl $8,%ecx
>> addl $8,%ebx
>> cmpl %edi,%ecx
>> jle L20
>>
>>When I run the code on fairly large arrays, I find that my system can copy
>>about 30 Megabytes per second on arrays of four megabytes or so.
>>
>>I then rewrite the loop as follows:
>>
>>L20:
>> fldl (%ebx)
>> fstpl (%ecx)
>> addl $8,%ecx
>> addl $8,%ebx
>> cmpl %edi,%ecx
>> jle L20
>>
>>The resulting program copies data at about 60 Megabytes per second.
>>
>>Thinking about it, I came to the conclusion that both versions of the
>>code should probably be most limited by memory bandwidth. However, I
>>expect that both codes should be using exactly the same memory
>>bandwidth.
>>
>>Looking at "Optimizations for Intel's 32-Bit Processors", Version 2.0,
>>I see that on page 25, an approach like that used by gcc is suggested
>>as being twice as fast as the other approach, while in practice, it
>>seems to be twice as slow.
>>
>>Questions:
>>
>> - Why is the first version of the code not as fast as the
>second?
>>
>> - Why isn't the second version faster than the first (as
>indicated
>> by "Optimizations for Intel's 32-Bit Processors")
>
> (Did you mean first version?)
>
>>
>> - What's going on here?
>>
>
>I'm not sure why the Intel book says what it does. But the reason you
>are
>getting a faster copy is that the FP load and store instructions are
>reading and writing memory 8 bytes at a time (and presumably these have
>been properly aligned). The other integer code is just copying 4 bytes
>at a time.
>
>Mike Schmit
>
>-------------------------------------------------------------------
>mschmit@ix.netcom.com author:
>408-244-6826 Pentium Processor Programming Tools
>800-765-8086 ISBN: 0-12-627230-1
>-------------------------------------------------------------------
>
news:47lm63$6j0@ixnews3.ix.netcom.com
- --
Amancio Hasty
Hasty Software Consulting Services
Tel: 415-495-3046
Fax: 415-495-3046
Cellular: 415-309-8434
e-mail: hasty@star-gate.com Powered by FreeBSD
- --CAA02716.815741535/rah.star-gate.com--
------- End of Forwarded Message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199511071056.CAA02766>
