Date: Sat, 23 Dec 1995 17:56:22 -0800 From: "Amancio Hasty Jr." <hasty@rah.star-gate.com> To: Torbjorn Granlund <tege@matematik.su.se> Cc: freebsd-hackers@freebsd.org Subject: Re: Pentium bcopy Message-ID: <199512240156.RAA03248@rah.star-gate.com> In-Reply-To: Your message of "Sun, 24 Dec 1995 02:15:58 %2B0100." <199512240116.CAA26645@insanus.matematik.su.se>
next in thread | previous in thread | raw e-mail | index | archive | help
This looks cool if it works 8) And I sure like hell am going to use the routine 8) A few months ago , I posted an alternative to increase the performance of bcopy and it got ignored... bcopy can also be useful for X. Long, long time ago someone did a hardware profile of the system and found that we spend a lot of time copy things around during "normal" system usage. Tnks!!! Amancio >>> Torbjorn Granlund said: > I sent you patches to improve the support.s bcopy a few months ago. I have > not heard anything back (sic). Maybe I should just give up, and use some > other operating system, where bug reports and contributions from external > people are considered? Well, I won't give up just yet! ;-) > > Now, that is a diplomatic way of starting a message... > > This time I want to help improving the bcopy/memcpy/memmove functions for > the Pentium (and 486). Here is a skeleton bcopy/memcpy that runs about 5 > times faster than your current implementation on a Pentium. This bcopy > handles up to about 350 MB/s on a Pentium 133, compared to the current 70 > MB/s. > > The reason that this is so much faster is that it uses the dual-ported cache > is a near-optimal way. Your code seems to rely on rep+movsl, which is much > slower. > > Well, I haven't bothered to integrate this into your infrastructure since > that might be a waste of my time, if you just keep ignoring my messages. If > you are interested in this optimization, I volunteer to do the rest of the > work. > > Note that bzero can be sped up in the same way. I have a feeling that > bcopy/bzero are used now and then by the VM system... > > /* Pentium bcopy */ > .text > .align 4 > .globl _copy > _copy: pushl %edi > pushl %esi > > movl 12(%esp),%edi /* destination pointer */ > movl 16(%esp),%esi /* source pointer */ > movl 20(%esp),%ecx /* size (in 32-bit words) */ > > shrl $3,%ecx /* count for unrolled loop */ > jz Lend /* if zero, skip unrolled loop */ > > movl (%edi),%eax /* Fetch destination cache line */ > > .align 2,0x90 /* supply 0x90 for broken assemblers */ > Loop: movl 28(%edi),%eax /* allocate cache line for destination */ > nop /* we want these two insn to pair! */ > > movl (%esi),%eax /* read words pairwise */ > movl 4(%esi),%edx > movl %eax,(%edi) /* store words pairwise */ > movl %edx,4(%edi) > > movl 8(%esi),%eax > movl 12(%esi),%edx > movl %eax,8(%edi) > movl %edx,12(%edi) > > movl 16(%esi),%eax > movl 20(%esi),%edx > movl %eax,16(%edi) > movl %edx,20(%edi) > > movl 24(%esi),%eax > movl 28(%esi),%edx > movl %eax,24(%edi) > movl %edx,28(%edi) > > addl $32,%esi /* update source pointer */ > addl $32,%edi /* update destnation pointer */ > decl %ecx /* decr loop count */ > jnz Loop > > /* Copy last 0-7 words */ > Lend: movl 20(%esp),%ecx > andl $7,%ecx > cld > rep > movsl > > popl %esi > popl %edi > ret >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199512240156.RAA03248>