Date: Fri, 5 Apr 1996 03:16:38 -0800 (PST) From: asami@cs.berkeley.edu (Satoshi Asami) To: davidg@root.com Cc: current@freebsd.org, nisha@cs.berkeley.edu, tege@matematik.su.se, hasty@rah.star-gate.com, dyson@freebsd.org Subject: Re: fast memory copy for large data sizes Message-ID: <199604051116.DAA24816@silvia.HIP.Berkeley.EDU> In-Reply-To: <199604051021.CAA00222@Root.COM> (message from David Greenman on Fri, 05 Apr 1996 02:21:48 -0800)
next in thread | previous in thread | raw e-mail | index | archive | help
> I have that mail, tried what was in there, but it wasn't as fast as FP
> copies. Maybe I screwed up something, I'll try again tomorrow.
It wasn't much trouble so I tried it again. Here's what I got on the
133MHz Pentium:
size libc ours
32 N/A 30.517578 MB/s
64 61.035156 MB/s 30.517578 MB/s
128 40.690104 MB/s 40.690104 MB/s
256 40.690104 MB/s 34.877232 MB/s
512 40.690104 MB/s 34.877232 MB/s
1024 40.690104 MB/s 33.674569 MB/s
2048 39.859694 MB/s 34.265351 MB/s
4096 39.859694 MB/s 34.265351 MB/s
8192 39.657360 MB/s 34.115721 MB/s
16384 39.556962 MB/s 34.115721 MB/s
32768 39.506953 MB/s 34.153005 MB/s
65536 39.531942 MB/s 34.227820 MB/s
131072 39.345294 MB/s 34.125034 MB/s
262144 39.227993 MB/s 34.227820 MB/s
524288 38.735668 MB/s 34.218451 MB/s
1048576 38.224839 MB/s 34.263003 MB/s
2097152 37.799323 MB/s 34.270635 MB/s
4194304 37.700283 MB/s 34.283265 MB/s
Hmm. I can't even get it to be faster than libc now. I think I've
seen 40MB/s for large copies before, I don't remember exactly what I
did.
Satoshi
P.S. Here's the "unrolled", pretty much stolen from Torbjorn's mail to
-hackers:
.align 2
.globl _unrolled
.type _unrolled,@function
_unrolled:
pushl %ebp
movl %esp,%ebp
pushl %edi
pushl %esi
movl 8(%ebp),%esi
movl 12(%ebp),%edi
movl 16(%ebp),%ecx /* count is in bytes */
shrl $5,%ecx
jz L54
movl (%edi),%eax /* fetch destination cache line */
.align 2,0x90
L55: movl 28(%edi),%eax /* fetch destination cache line */
orl %eax,%eax /* to make things go in pairs */
movl (%esi),%eax /* load pairwise */
movl 4(%esi),%edx
movl %eax,(%edi) /* and store pairwise */
movl %edx,4(%edi)
movl 8(%esi),%eax
movl 12(%esi),%edx
movl %eax,8(%edi)
movl %edx,12(%edi)
movl 16(%esi),%eax
movl 20(%esi),%edx
movl %eax,16(%edi)
movl %edx,20(%edi)
movl 24(%esi),%eax
movl 28(%esi),%edx
movl %eax,24(%edi)
movl %edx,28(%edi)
addl $32,%esi /* update source pointer */
addl $32,%edi /* update destnation pointer */
decl %ecx /* decr loop count */
jnz L55
L54:
movl 16(%ebp),%ecx
andl $31,%ecx
movl %ecx,%edx
shrl $2,%ecx /* first copy as much as we can in words */
cld
rep
movsl
movl %edx,%ecx
andl $3,%ecx /* and then up to 3 bytes */
rep
movsb
leal -8(%ebp),%esp
popl %esi
popl %edi
leave
ret
Lfe6:
.size _unrolled,Lfe6-_unrolled
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199604051116.DAA24816>
