Date: Sat, 23 Dec 1995 17:56:22 -0800 From: "Amancio Hasty Jr." <hasty@rah.star-gate.com> To: Torbjorn Granlund <tege@matematik.su.se> Cc: freebsd-hackers@freebsd.org Subject: Re: Pentium bcopy Message-ID: <199512240156.RAA03248@rah.star-gate.com> In-Reply-To: Your message of "Sun, 24 Dec 1995 02:15:58 %2B0100." <199512240116.CAA26645@insanus.matematik.su.se>
next in thread | previous in thread | raw e-mail | index | archive | help
This looks cool if it works 8)
And I sure like hell am going to use the routine 8)
A few months ago , I posted an alternative to increase the performance
of bcopy and it got ignored...
bcopy can also be useful for X.
Long, long time ago someone did a hardware profile of the system and
found that we spend a lot of time copy things around during
"normal" system usage.
Tnks!!!
Amancio
>>> Torbjorn Granlund said:
> I sent you patches to improve the support.s bcopy a few months ago. I have
> not heard anything back (sic). Maybe I should just give up, and use some
> other operating system, where bug reports and contributions from external
> people are considered? Well, I won't give up just yet! ;-)
>
> Now, that is a diplomatic way of starting a message...
>
> This time I want to help improving the bcopy/memcpy/memmove functions for
> the Pentium (and 486). Here is a skeleton bcopy/memcpy that runs about 5
> times faster than your current implementation on a Pentium. This bcopy
> handles up to about 350 MB/s on a Pentium 133, compared to the current 70
> MB/s.
>
> The reason that this is so much faster is that it uses the dual-ported cache
> is a near-optimal way. Your code seems to rely on rep+movsl, which is much
> slower.
>
> Well, I haven't bothered to integrate this into your infrastructure since
> that might be a waste of my time, if you just keep ignoring my messages. If
> you are interested in this optimization, I volunteer to do the rest of the
> work.
>
> Note that bzero can be sped up in the same way. I have a feeling that
> bcopy/bzero are used now and then by the VM system...
>
> /* Pentium bcopy */
> .text
> .align 4
> .globl _copy
> _copy: pushl %edi
> pushl %esi
>
> movl 12(%esp),%edi /* destination pointer */
> movl 16(%esp),%esi /* source pointer */
> movl 20(%esp),%ecx /* size (in 32-bit words) */
>
> shrl $3,%ecx /* count for unrolled loop */
> jz Lend /* if zero, skip unrolled loop */
>
> movl (%edi),%eax /* Fetch destination cache line */
>
> .align 2,0x90 /* supply 0x90 for broken assemblers */
> Loop: movl 28(%edi),%eax /* allocate cache line for destination
*/
> nop /* we want these two insn to pair! */
>
> movl (%esi),%eax /* read words pairwise */
> movl 4(%esi),%edx
> movl %eax,(%edi) /* store words pairwise */
> movl %edx,4(%edi)
>
> movl 8(%esi),%eax
> movl 12(%esi),%edx
> movl %eax,8(%edi)
> movl %edx,12(%edi)
>
> movl 16(%esi),%eax
> movl 20(%esi),%edx
> movl %eax,16(%edi)
> movl %edx,20(%edi)
>
> movl 24(%esi),%eax
> movl 28(%esi),%edx
> movl %eax,24(%edi)
> movl %edx,28(%edi)
>
> addl $32,%esi /* update source pointer */
> addl $32,%edi /* update destnation pointer */
> decl %ecx /* decr loop count */
> jnz Loop
>
> /* Copy last 0-7 words */
> Lend: movl 20(%esp),%ecx
> andl $7,%ecx
> cld
> rep
> movsl
>
> popl %esi
> popl %edi
> ret
>
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199512240156.RAA03248>
