Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 30 Dec 2002 09:03:44 +0100 (MET)
From:      Anders Gavare <g@dd.chalmers.se>
To:        Peter Jeremy <peter.jeremy@alcatel.com.au>
Cc:        freebsd-alpha@FreeBSD.ORG
Subject:   Re: faster strlen() using longs (?)
Message-ID:  <Pine.GSO.4.44.0212300847350.17153-100000@kili.dd.chalmers.se>
In-Reply-To: <20021229202719.GC17648@gsmx07.alcatel.com.au>

next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, 30 Dec 2002, Peter Jeremy wrote:

> On 2002-Dec-29 00:25:49 +0100, Anders Gavare <g@dd.chalmers.se> wrote:
> >I'm using FreeBSD 4.5 on an Alpha, and I noticed that strlen() isn't
> >implemented using words, but using chars.
...
> >I've experimented with several different variations of using longs, and
> >this is the fastest one I've come up with. ...
> >  It is 2.8 times faster than the default strlen() in libc.
>
> On what sort of CPU?  With what length strings?  What compiler
> options?  Have you tried it on a range of different Alpha CPUs?

This is an AlphaPC 164SX, 533 MHz. It is the only Alpha I've got. Here
are some very non-scientific results from using 16MB strings and 9 byte
strings, all tests done with non-aligned strings. my_strlenX where X is
a digit are variations of the same idea (load a word, see if it contains
a zero, continue otherwise).

for a in gcc gcc31; do for b in '-O0' '-O' '-O3'; do $a strlen_test.c -c \
$b; $a strlen_main.c strlen_test.o -o strlen_test $b; ./strlen_test "$a \
$b"; done;done|sort -n

for long strings:

4.885080 user (184.28% better than libc)  my_strlen3 (res=16777213), 'gcc -O'
5.200423 user (167.04% better than libc)  my_strlen5 (res=16777213), 'gcc -O'
5.280694 user (162.27% better than libc)  my_strlen5 (res=16777213), 'gcc31 -O'
5.284247 user (162.81% better than libc)  my_strlen4 (res=16777213), 'gcc -O'
5.285044 user (162.77% better than libc)  my_strlen6 (res=16777213), 'gcc -O'
5.483810 user (152.56% better than libc)  my_strlen6 (res=16777213), 'gcc31 -O'
5.491186 user (152.22% better than libc)  my_strlen4 (res=16777213), 'gcc31 -O'
5.856223 user (136.50% better than libc)  my_strlen1 (res=16777213), 'gcc31 -O'
5.864422 user (136.47% better than libc)  my_strlen4 (res=16777213), 'gcc -O3'
5.871919 user (136.29% better than libc)  my_strlen3 (res=16777213), 'gcc31 -O3'
5.872501 user (136.15% better than libc)  my_strlen5 (res=16777213), 'gcc -O3'
5.875515 user (136.03% better than libc)  my_strlen6 (res=16777213), 'gcc -O3'
5.876855 user (135.97% better than libc)  my_strlen3 (res=16777213), 'gcc -O3'
5.877666 user (136.06% better than libc)  my_strlen6 (res=16777213), 'gcc31 -O3'
5.877785 user (136.05% better than libc)  my_strlen4 (res=16777213), 'gcc31 -O3'
5.893618 user (135.42% better than libc)  my_strlen5 (res=16777213), 'gcc31 -O3'
6.074682 user (128.40% better than libc)  my_strlen2 (res=16777213), 'gcc31 -O3'
6.079617 user (128.42% better than libc)  my_strlen1 (res=16777213), 'gcc -O'
6.081747 user (127.73% better than libc)  my_strlen3 (res=16777213), 'gcc31 -O'
6.087773 user (127.80% better than libc)  my_strlen1 (res=16777213), 'gcc -O3'
6.274350 user (120.74% better than libc)  my_strlen0 (res=16777213), 'gcc31 -O'
6.277510 user (121.02% better than libc)  my_strlen1 (res=16777213), 'gcc31 -O3'
6.279278 user (120.85% better than libc)  my_strlen2 (res=16777213), 'gcc -O3'
6.281228 user (120.89% better than libc)  my_strlen0 (res=16777213), 'gcc31 -O3'
6.468686 user (114.38% better than libc)  my_strlen0 (res=16777213), 'gcc -O3'
6.483345 user (114.20% better than libc)  my_strlen0 (res=16777213), 'gcc -O'
7.286602 user (90.59% better than libc)  my_strlen2 (res=16777213), 'gcc -O'
7.517879 user (84.23% better than libc)  my_strlen2 (res=16777213), 'gcc31 -O'
11.166152 user (24.48% better than libc)  my_strlen5 (res=16777213), 'gcc -O0'
11.212218 user (23.96% better than libc)  my_strlen6 (res=16777213), 'gcc -O0'
11.266838 user (23.36% better than libc)  my_strlen4 (res=16777213), 'gcc -O0'
11.325385 user (25.47% better than libc)  my_strlen6 (res=16777213), 'gcc31 -O0'
11.338611 user (25.33% better than libc)  my_strlen4 (res=16777213), 'gcc31 -O0'
11.340291 user (22.56% better than libc)  my_strlen3 (res=16777213), 'gcc -O0'
11.353625 user (25.16% better than libc)  my_strlen5 (res=16777213), 'gcc31 -O0'
11.406141 user (24.59% better than libc)  my_strlen3 (res=16777213), 'gcc31 -O0'
13.526092 user (2.76% better than libc)  my_strlen1 (res=16777213), 'gcc -O0'
13.849852 user  strlen (res=16777213), 'gcc31 -O'
13.867739 user  strlen (res=16777213), 'gcc -O3'
13.874739 user  strlen (res=16777213), 'gcc31 -O3'
13.887303 user  strlen (res=16777213), 'gcc -O'
13.899129 user  strlen (res=16777213), 'gcc -O0'
14.210420 user  strlen (res=16777213), 'gcc31 -O0'
14.305201 user (-0.66% better than libc)  my_strlen1 (res=16777213), 'gcc31 -O0'
16.202919 user (-14.22% better than libc)  my_strlen0 (res=16777213), 'gcc -O0'
16.803712 user (-15.43% better than libc)  my_strlen0 (res=16777213), 'gcc31 -O0'
26.708938 user (-46.80% better than libc)  my_strlen2 (res=16777213), 'gcc31 -O0'
27.142429 user (-48.79% better than libc)  my_strlen2 (res=16777213), 'gcc -O0'

(gcc is 2.95.3, gcc31 is 3.1.1 20020617)

Interesting to note is that gcc -O is the fastest one.
Maybe I should rerun the tests without optimizing the main program, only
the strlen functions. But the difference is marginal.

and for short strings:

2.025466 user (41.97% faster than libc)  my_strlen2 (res=9), 'gcc -O3'
2.027854 user (41.81% faster than libc)  my_strlen1 (res=9), 'gcc -O3'
2.161020 user (32.15% faster than libc)  my_strlen0 (res=9), 'gcc31 -O'
2.174822 user (32.41% faster than libc)  my_strlen1 (res=9), 'gcc -O'
2.253191 user (26.77% faster than libc)  my_strlen0 (res=9), 'gcc31 -O3'
2.278019 user (26.41% faster than libc)  my_strlen5 (res=9), 'gcc -O'
2.285023 user (26.02% faster than libc)  my_strlen2 (res=9), 'gcc -O'
2.310429 user (23.61% faster than libc)  my_strlen2 (res=9), 'gcc31 -O'
2.341886 user (21.97% faster than libc)  my_strlen2 (res=9), 'gcc31 -O3'
2.342074 user (21.96% faster than libc)  my_strlen1 (res=9), 'gcc31 -O3'
2.354400 user (22.14% faster than libc)  my_strlen0 (res=9), 'gcc -O3'
2.367278 user (21.47% faster than libc)  my_strlen6 (res=9), 'gcc -O3'
2.367363 user (20.63% faster than libc)  my_strlen3 (res=9), 'gcc31 -O'
2.367841 user (21.45% faster than libc)  my_strlen3 (res=9), 'gcc -O3'
2.368394 user (21.42% faster than libc)  my_strlen5 (res=9), 'gcc -O3'
2.368742 user (20.56% faster than libc)  my_strlen6 (res=9), 'gcc31 -O'
2.370190 user (21.33% faster than libc)  my_strlen4 (res=9), 'gcc -O3'
2.424396 user (17.79% faster than libc)  my_strlen1 (res=9), 'gcc31 -O'
2.424801 user (17.77% faster than libc)  my_strlen4 (res=9), 'gcc31 -O'
2.424825 user (17.77% faster than libc)  my_strlen5 (res=9), 'gcc31 -O'
2.434469 user (18.29% faster than libc)  my_strlen3 (res=9), 'gcc -O'
2.457539 user (17.18% faster than libc)  my_strlen0 (res=9), 'gcc -O'
2.483910 user (14.99% faster than libc)  my_strlen6 (res=9), 'gcc31 -O3'
2.484677 user (14.96% faster than libc)  my_strlen4 (res=9), 'gcc31 -O3'
2.485388 user (14.93% faster than libc)  my_strlen5 (res=9), 'gcc31 -O3'
2.514320 user (14.53% faster than libc)  my_strlen6 (res=9), 'gcc -O'
2.514775 user (14.51% faster than libc)  my_strlen4 (res=9), 'gcc -O'
2.599132 user (9.90% faster than libc)  my_strlen3 (res=9), 'gcc31 -O3'
2.855807 user  strlen (res=9), 'gcc31 -O'
2.856369 user  strlen (res=9), 'gcc31 -O3'
2.875634 user  strlen (res=9), 'gcc -O3'
2.879661 user  strlen (res=9), 'gcc -O'
3.483060 user  strlen (res=9), 'gcc31 -O0'
3.510804 user  strlen (res=9), 'gcc -O0'
5.136020 user (-31.64% faster than libc)  my_strlen1 (res=9), 'gcc -O0'
5.703832 user (-38.45% faster than libc)  my_strlen0 (res=9), 'gcc -O0'
5.762493 user (-39.56% faster than libc)  my_strlen0 (res=9), 'gcc31 -O0'
5.852239 user (-40.01% faster than libc)  my_strlen3 (res=9), 'gcc -O0'
5.877360 user (-40.27% faster than libc)  my_strlen4 (res=9), 'gcc -O0'
5.914327 user (-40.64% faster than libc)  my_strlen5 (res=9), 'gcc -O0'
5.949831 user (-40.99% faster than libc)  my_strlen6 (res=9), 'gcc -O0'
6.708716 user (-48.08% faster than libc)  my_strlen1 (res=9), 'gcc31 -O0'
6.822677 user (-48.54% faster than libc)  my_strlen2 (res=9), 'gcc -O0'
6.822860 user (-48.95% faster than libc)  my_strlen6 (res=9), 'gcc31 -O0'
6.998892 user (-50.23% faster than libc)  my_strlen4 (res=9), 'gcc31 -O0'
7.016791 user (-50.36% faster than libc)  my_strlen3 (res=9), 'gcc31 -O0'
7.023093 user (-50.41% faster than libc)  my_strlen5 (res=9), 'gcc31 -O0'
7.136907 user (-51.20% faster than libc)  my_strlen2 (res=9), 'gcc31 -O0'


For short strings this is only marginally better than libc's strlen().
All my_strlenX functions were placed in a separate file (strlen_test.c),
so the speed gain is not due to inlining strlen itself.

my_strlen3 is the one I posted yesterday, I think.   I don't know if
this is useful or not.

Maybe it should be rewritten in assembly language if it should be placed
in libc.


Anders


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-alpha" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.GSO.4.44.0212300847350.17153-100000>