From owner-freebsd-alpha Mon Dec 30 0: 4: 9 2002 Delivered-To: freebsd-alpha@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id A08FE37B401 for ; Mon, 30 Dec 2002 00:04:05 -0800 (PST) Received: from eru.dd.chalmers.se (eru.dd.chalmers.se [129.16.117.10]) by mx1.FreeBSD.org (Postfix) with ESMTP id 8F4A643EC5 for ; Mon, 30 Dec 2002 00:04:04 -0800 (PST) (envelope-from g@dd.chalmers.se) Received: from kili.dd.chalmers.se (kili.dd.chalmers.se [129.16.117.17]) by eru.dd.chalmers.se (8.12.6/8.12.6) with ESMTP id gBU83iQQ029664; Mon, 30 Dec 2002 09:03:45 +0100 (MET) Date: Mon, 30 Dec 2002 09:03:44 +0100 (MET) From: Anders Gavare X-X-Sender: f98anga@kili.dd.chalmers.se To: Peter Jeremy Cc: freebsd-alpha@FreeBSD.ORG Subject: Re: faster strlen() using longs (?) In-Reply-To: <20021229202719.GC17648@gsmx07.alcatel.com.au> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-alpha@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org On Mon, 30 Dec 2002, Peter Jeremy wrote: > On 2002-Dec-29 00:25:49 +0100, Anders Gavare wrote: > >I'm using FreeBSD 4.5 on an Alpha, and I noticed that strlen() isn't > >implemented using words, but using chars. ... > >I've experimented with several different variations of using longs, and > >this is the fastest one I've come up with. ... > > It is 2.8 times faster than the default strlen() in libc. > > On what sort of CPU? With what length strings? What compiler > options? Have you tried it on a range of different Alpha CPUs? This is an AlphaPC 164SX, 533 MHz. It is the only Alpha I've got. Here are some very non-scientific results from using 16MB strings and 9 byte strings, all tests done with non-aligned strings. my_strlenX where X is a digit are variations of the same idea (load a word, see if it contains a zero, continue otherwise). for a in gcc gcc31; do for b in '-O0' '-O' '-O3'; do $a strlen_test.c -c \ $b; $a strlen_main.c strlen_test.o -o strlen_test $b; ./strlen_test "$a \ $b"; done;done|sort -n for long strings: 4.885080 user (184.28% better than libc) my_strlen3 (res=16777213), 'gcc -O' 5.200423 user (167.04% better than libc) my_strlen5 (res=16777213), 'gcc -O' 5.280694 user (162.27% better than libc) my_strlen5 (res=16777213), 'gcc31 -O' 5.284247 user (162.81% better than libc) my_strlen4 (res=16777213), 'gcc -O' 5.285044 user (162.77% better than libc) my_strlen6 (res=16777213), 'gcc -O' 5.483810 user (152.56% better than libc) my_strlen6 (res=16777213), 'gcc31 -O' 5.491186 user (152.22% better than libc) my_strlen4 (res=16777213), 'gcc31 -O' 5.856223 user (136.50% better than libc) my_strlen1 (res=16777213), 'gcc31 -O' 5.864422 user (136.47% better than libc) my_strlen4 (res=16777213), 'gcc -O3' 5.871919 user (136.29% better than libc) my_strlen3 (res=16777213), 'gcc31 -O3' 5.872501 user (136.15% better than libc) my_strlen5 (res=16777213), 'gcc -O3' 5.875515 user (136.03% better than libc) my_strlen6 (res=16777213), 'gcc -O3' 5.876855 user (135.97% better than libc) my_strlen3 (res=16777213), 'gcc -O3' 5.877666 user (136.06% better than libc) my_strlen6 (res=16777213), 'gcc31 -O3' 5.877785 user (136.05% better than libc) my_strlen4 (res=16777213), 'gcc31 -O3' 5.893618 user (135.42% better than libc) my_strlen5 (res=16777213), 'gcc31 -O3' 6.074682 user (128.40% better than libc) my_strlen2 (res=16777213), 'gcc31 -O3' 6.079617 user (128.42% better than libc) my_strlen1 (res=16777213), 'gcc -O' 6.081747 user (127.73% better than libc) my_strlen3 (res=16777213), 'gcc31 -O' 6.087773 user (127.80% better than libc) my_strlen1 (res=16777213), 'gcc -O3' 6.274350 user (120.74% better than libc) my_strlen0 (res=16777213), 'gcc31 -O' 6.277510 user (121.02% better than libc) my_strlen1 (res=16777213), 'gcc31 -O3' 6.279278 user (120.85% better than libc) my_strlen2 (res=16777213), 'gcc -O3' 6.281228 user (120.89% better than libc) my_strlen0 (res=16777213), 'gcc31 -O3' 6.468686 user (114.38% better than libc) my_strlen0 (res=16777213), 'gcc -O3' 6.483345 user (114.20% better than libc) my_strlen0 (res=16777213), 'gcc -O' 7.286602 user (90.59% better than libc) my_strlen2 (res=16777213), 'gcc -O' 7.517879 user (84.23% better than libc) my_strlen2 (res=16777213), 'gcc31 -O' 11.166152 user (24.48% better than libc) my_strlen5 (res=16777213), 'gcc -O0' 11.212218 user (23.96% better than libc) my_strlen6 (res=16777213), 'gcc -O0' 11.266838 user (23.36% better than libc) my_strlen4 (res=16777213), 'gcc -O0' 11.325385 user (25.47% better than libc) my_strlen6 (res=16777213), 'gcc31 -O0' 11.338611 user (25.33% better than libc) my_strlen4 (res=16777213), 'gcc31 -O0' 11.340291 user (22.56% better than libc) my_strlen3 (res=16777213), 'gcc -O0' 11.353625 user (25.16% better than libc) my_strlen5 (res=16777213), 'gcc31 -O0' 11.406141 user (24.59% better than libc) my_strlen3 (res=16777213), 'gcc31 -O0' 13.526092 user (2.76% better than libc) my_strlen1 (res=16777213), 'gcc -O0' 13.849852 user strlen (res=16777213), 'gcc31 -O' 13.867739 user strlen (res=16777213), 'gcc -O3' 13.874739 user strlen (res=16777213), 'gcc31 -O3' 13.887303 user strlen (res=16777213), 'gcc -O' 13.899129 user strlen (res=16777213), 'gcc -O0' 14.210420 user strlen (res=16777213), 'gcc31 -O0' 14.305201 user (-0.66% better than libc) my_strlen1 (res=16777213), 'gcc31 -O0' 16.202919 user (-14.22% better than libc) my_strlen0 (res=16777213), 'gcc -O0' 16.803712 user (-15.43% better than libc) my_strlen0 (res=16777213), 'gcc31 -O0' 26.708938 user (-46.80% better than libc) my_strlen2 (res=16777213), 'gcc31 -O0' 27.142429 user (-48.79% better than libc) my_strlen2 (res=16777213), 'gcc -O0' (gcc is 2.95.3, gcc31 is 3.1.1 20020617) Interesting to note is that gcc -O is the fastest one. Maybe I should rerun the tests without optimizing the main program, only the strlen functions. But the difference is marginal. and for short strings: 2.025466 user (41.97% faster than libc) my_strlen2 (res=9), 'gcc -O3' 2.027854 user (41.81% faster than libc) my_strlen1 (res=9), 'gcc -O3' 2.161020 user (32.15% faster than libc) my_strlen0 (res=9), 'gcc31 -O' 2.174822 user (32.41% faster than libc) my_strlen1 (res=9), 'gcc -O' 2.253191 user (26.77% faster than libc) my_strlen0 (res=9), 'gcc31 -O3' 2.278019 user (26.41% faster than libc) my_strlen5 (res=9), 'gcc -O' 2.285023 user (26.02% faster than libc) my_strlen2 (res=9), 'gcc -O' 2.310429 user (23.61% faster than libc) my_strlen2 (res=9), 'gcc31 -O' 2.341886 user (21.97% faster than libc) my_strlen2 (res=9), 'gcc31 -O3' 2.342074 user (21.96% faster than libc) my_strlen1 (res=9), 'gcc31 -O3' 2.354400 user (22.14% faster than libc) my_strlen0 (res=9), 'gcc -O3' 2.367278 user (21.47% faster than libc) my_strlen6 (res=9), 'gcc -O3' 2.367363 user (20.63% faster than libc) my_strlen3 (res=9), 'gcc31 -O' 2.367841 user (21.45% faster than libc) my_strlen3 (res=9), 'gcc -O3' 2.368394 user (21.42% faster than libc) my_strlen5 (res=9), 'gcc -O3' 2.368742 user (20.56% faster than libc) my_strlen6 (res=9), 'gcc31 -O' 2.370190 user (21.33% faster than libc) my_strlen4 (res=9), 'gcc -O3' 2.424396 user (17.79% faster than libc) my_strlen1 (res=9), 'gcc31 -O' 2.424801 user (17.77% faster than libc) my_strlen4 (res=9), 'gcc31 -O' 2.424825 user (17.77% faster than libc) my_strlen5 (res=9), 'gcc31 -O' 2.434469 user (18.29% faster than libc) my_strlen3 (res=9), 'gcc -O' 2.457539 user (17.18% faster than libc) my_strlen0 (res=9), 'gcc -O' 2.483910 user (14.99% faster than libc) my_strlen6 (res=9), 'gcc31 -O3' 2.484677 user (14.96% faster than libc) my_strlen4 (res=9), 'gcc31 -O3' 2.485388 user (14.93% faster than libc) my_strlen5 (res=9), 'gcc31 -O3' 2.514320 user (14.53% faster than libc) my_strlen6 (res=9), 'gcc -O' 2.514775 user (14.51% faster than libc) my_strlen4 (res=9), 'gcc -O' 2.599132 user (9.90% faster than libc) my_strlen3 (res=9), 'gcc31 -O3' 2.855807 user strlen (res=9), 'gcc31 -O' 2.856369 user strlen (res=9), 'gcc31 -O3' 2.875634 user strlen (res=9), 'gcc -O3' 2.879661 user strlen (res=9), 'gcc -O' 3.483060 user strlen (res=9), 'gcc31 -O0' 3.510804 user strlen (res=9), 'gcc -O0' 5.136020 user (-31.64% faster than libc) my_strlen1 (res=9), 'gcc -O0' 5.703832 user (-38.45% faster than libc) my_strlen0 (res=9), 'gcc -O0' 5.762493 user (-39.56% faster than libc) my_strlen0 (res=9), 'gcc31 -O0' 5.852239 user (-40.01% faster than libc) my_strlen3 (res=9), 'gcc -O0' 5.877360 user (-40.27% faster than libc) my_strlen4 (res=9), 'gcc -O0' 5.914327 user (-40.64% faster than libc) my_strlen5 (res=9), 'gcc -O0' 5.949831 user (-40.99% faster than libc) my_strlen6 (res=9), 'gcc -O0' 6.708716 user (-48.08% faster than libc) my_strlen1 (res=9), 'gcc31 -O0' 6.822677 user (-48.54% faster than libc) my_strlen2 (res=9), 'gcc -O0' 6.822860 user (-48.95% faster than libc) my_strlen6 (res=9), 'gcc31 -O0' 6.998892 user (-50.23% faster than libc) my_strlen4 (res=9), 'gcc31 -O0' 7.016791 user (-50.36% faster than libc) my_strlen3 (res=9), 'gcc31 -O0' 7.023093 user (-50.41% faster than libc) my_strlen5 (res=9), 'gcc31 -O0' 7.136907 user (-51.20% faster than libc) my_strlen2 (res=9), 'gcc31 -O0' For short strings this is only marginally better than libc's strlen(). All my_strlenX functions were placed in a separate file (strlen_test.c), so the speed gain is not due to inlining strlen itself. my_strlen3 is the one I posted yesterday, I think. I don't know if this is useful or not. Maybe it should be rewritten in assembly language if it should be placed in libc. Anders To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-alpha" in the body of the message