From owner-freebsd-hackers Thu Feb 18 17:47:58 1999 Delivered-To: freebsd-hackers@freebsd.org Received: from alcanet.com.au (border.alcanet.com.au [203.62.196.10]) by hub.freebsd.org (Postfix) with ESMTP id 903A011889 for ; Thu, 18 Feb 1999 17:47:50 -0800 (PST) (envelope-from peter.jeremy@auss2.alcatel.com.au) Received: by border.alcanet.com.au id <40325>; Fri, 19 Feb 1999 12:37:11 +1100 Date: Fri, 19 Feb 1999 12:47:41 +1100 From: Peter Jeremy Subject: Re: vm_page_zero_fill To: hackers@FreeBSD.ORG Message-Id: <99Feb19.123711est.40325@border.alcanet.com.au> Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Alfred Perlstein wrote: >After playing with "gcc -O -S bcmp.c" on several platforms, i386, >sparc32, alpha. It seems to me that the function ought to be >replaced with this: [deleted] The code given is portable, but not optimal for any of these architectures - especially the Alpha. The original Alpha chips don't have character instructions so character handling is quite poor (and gcc2.7.x doesn't include support for the new character instructions). Optimal code for the Alpha would read 8-byte long-word aligned chunks from memory, then appropriately re-align and compare them. (There's some discussion about this, though not actual code, in the early Alpha white papers). A similar strategy probably holds for the SPARC (but 4-bytes loads except on UltraSPARCs). Something similar could be done on the ix86, but I'm not certain about the advantages. This _is_ one area where carefully hand-crafted code is worth the effort (especially on the RISC architectures). >it uses the "rep cmpsl" opcode, i have heard that using "movs/lods/cmps" >was no longer optimal after the 486 line, but i'm unsure. Sort of true. In theory, an explicit loop is faster than "rep cmps". Lack of CPU<->RAM bandwidth tends to make this less of an issue unless both strings are in L1 cache. Peter To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message