Date: Sun, 24 Dec 1995 02:15:58 +0100 From: Torbjorn Granlund <tege@matematik.su.se> To: freebsd-hackers@freebsd.org Subject: Pentium bcopy Message-ID: <199512240116.CAA26645@insanus.matematik.su.se>
next in thread | raw e-mail | index | archive | help
I sent you patches to improve the support.s bcopy a few months ago. I have not heard anything back (sic). Maybe I should just give up, and use some other operating system, where bug reports and contributions from external people are considered? Well, I won't give up just yet! ;-) Now, that is a diplomatic way of starting a message... This time I want to help improving the bcopy/memcpy/memmove functions for the Pentium (and 486). Here is a skeleton bcopy/memcpy that runs about 5 times faster than your current implementation on a Pentium. This bcopy handles up to about 350 MB/s on a Pentium 133, compared to the current 70 MB/s. The reason that this is so much faster is that it uses the dual-ported cache is a near-optimal way. Your code seems to rely on rep+movsl, which is much slower. Well, I haven't bothered to integrate this into your infrastructure since that might be a waste of my time, if you just keep ignoring my messages. If you are interested in this optimization, I volunteer to do the rest of the work. Note that bzero can be sped up in the same way. I have a feeling that bcopy/bzero are used now and then by the VM system... /* Pentium bcopy */ .text .align 4 .globl _copy _copy: pushl %edi pushl %esi movl 12(%esp),%edi /* destination pointer */ movl 16(%esp),%esi /* source pointer */ movl 20(%esp),%ecx /* size (in 32-bit words) */ shrl $3,%ecx /* count for unrolled loop */ jz Lend /* if zero, skip unrolled loop */ movl (%edi),%eax /* Fetch destination cache line */ .align 2,0x90 /* supply 0x90 for broken assemblers */ Loop: movl 28(%edi),%eax /* allocate cache line for destination */ nop /* we want these two insn to pair! */ movl (%esi),%eax /* read words pairwise */ movl 4(%esi),%edx movl %eax,(%edi) /* store words pairwise */ movl %edx,4(%edi) movl 8(%esi),%eax movl 12(%esi),%edx movl %eax,8(%edi) movl %edx,12(%edi) movl 16(%esi),%eax movl 20(%esi),%edx movl %eax,16(%edi) movl %edx,20(%edi) movl 24(%esi),%eax movl 28(%esi),%edx movl %eax,24(%edi) movl %edx,28(%edi) addl $32,%esi /* update source pointer */ addl $32,%edi /* update destnation pointer */ decl %ecx /* decr loop count */ jnz Loop /* Copy last 0-7 words */ Lend: movl 20(%esp),%ecx andl $7,%ecx cld rep movsl popl %esi popl %edi ret
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199512240116.CAA26645>