Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 23 Dec 1995 17:56:22 -0800
From:      "Amancio Hasty Jr." <hasty@rah.star-gate.com>
To:        Torbjorn Granlund <tege@matematik.su.se>
Cc:        freebsd-hackers@freebsd.org
Subject:   Re: Pentium bcopy 
Message-ID:  <199512240156.RAA03248@rah.star-gate.com>
In-Reply-To: Your message of "Sun, 24 Dec 1995 02:15:58 %2B0100." <199512240116.CAA26645@insanus.matematik.su.se> 

next in thread | previous in thread | raw e-mail | index | archive | help
This looks cool if it works 8)
And I  sure like hell am going to use the routine 8)

A few months ago , I posted an alternative to increase the performance
of bcopy and it got ignored...


bcopy can also be useful for X.

Long, long time ago someone did a hardware profile of the system and
found that we spend a lot of time copy things around during 
"normal" system usage.


	Tnks!!!
	Amancio

>>> Torbjorn Granlund said:
 > I sent you patches to improve the support.s bcopy a few months ago.  I have
 > not heard anything back (sic).  Maybe I should just give up, and use some
 > other operating system, where bug reports and contributions from external
 > people are considered?  Well, I won't give up just yet!  ;-)
 > 
 > Now, that is a diplomatic way of starting a message...
 > 
 > This time I want to help improving the bcopy/memcpy/memmove functions for
 > the Pentium (and 486).  Here is a skeleton bcopy/memcpy that runs about 5
 > times faster than your current implementation on a Pentium.  This bcopy
 > handles up to about 350 MB/s on a Pentium 133, compared to the current 70
 > MB/s.
 > 
 > The reason that this is so much faster is that it uses the dual-ported cache
 > is a near-optimal way.  Your code seems to rely on rep+movsl, which is much
 > slower.
 > 
 > Well, I haven't bothered to integrate this into your infrastructure since
 > that might be a waste of my time, if you just keep ignoring my messages.  If
 > you are interested in this optimization, I volunteer to do the rest of the
 > work.
 > 
 > Note that bzero can be sped up in the same way.  I have a feeling that
 > bcopy/bzero are used now and then by the VM system...
 > 
 > /* Pentium bcopy */
 > 	.text
 > 	.align 4
 > 	.globl	_copy
 > _copy:	pushl	%edi
 > 	pushl	%esi
 > 
 > 	movl	12(%esp),%edi	/* destination pointer */
 > 	movl	16(%esp),%esi	/* source pointer */
 > 	movl	20(%esp),%ecx	/* size (in 32-bit words) */
 > 
 > 	shrl	$3,%ecx		/* count for unrolled loop */
 > 	jz	Lend		/* if zero, skip unrolled loop */
 > 
 > 	movl	(%edi),%eax	/* Fetch destination cache line */
 > 
 > 	.align	2,0x90		/* supply 0x90 for broken assemblers */
 > Loop:	movl	28(%edi),%eax	/* allocate cache line for destination 
     */
 > 	nop			/* we want these two insn to pair! */
 > 
 > 	movl	(%esi),%eax	/* read words pairwise */
 > 	movl	4(%esi),%edx
 > 	movl	%eax,(%edi)	/* store words pairwise */
 > 	movl	%edx,4(%edi)
 > 
 > 	movl	8(%esi),%eax
 > 	movl	12(%esi),%edx
 > 	movl	%eax,8(%edi)
 > 	movl	%edx,12(%edi)
 > 
 > 	movl	16(%esi),%eax
 > 	movl	20(%esi),%edx
 > 	movl	%eax,16(%edi)
 > 	movl	%edx,20(%edi)
 > 
 > 	movl	24(%esi),%eax
 > 	movl	28(%esi),%edx
 > 	movl	%eax,24(%edi)
 > 	movl	%edx,28(%edi)
 > 
 > 	addl	$32,%esi	/* update source pointer */
 > 	addl	$32,%edi	/* update destnation pointer */
 > 	decl	%ecx		/* decr loop count */
 > 	jnz	Loop
 > 
 > /* Copy last 0-7 words */
 > Lend:	movl	20(%esp),%ecx
 > 	andl	$7,%ecx
 > 	cld
 > 	rep
 > 	movsl
 > 
 > 	popl	%esi
 > 	popl	%edi
 > 	ret
 > 




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199512240156.RAA03248>