Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 12 Jul 2007 21:32:31 +1000 (EST)
From:      Bruce Evans <brde@optusnet.com.au>
To:        Bruce Evans <brde@optusnet.com.au>
Cc:        "Sean C. Farley" <scf@freebsd.org>, freebsd-arch@freebsd.org
Subject:   Re: Assembly string functions in i386 libc
Message-ID:  <20070712211245.M8625@besplex.bde.org>
In-Reply-To: <20070712191616.A4682@delplex.bde.org>
References:  <20070711134721.D2385@thor.farley.org> <20070712191616.A4682@delplex.bde.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, 12 Jul 2007, Bruce Evans wrote:

> On Wed, 11 Jul 2007, Sean C. Farley wrote:
>
>> While looking at increasing the speed of strlen(), I noticed that on
>> i386 platforms (PIII, P4 and Athlon XP) the performance is abysmal in
>> libc compared to the version I was writing.  After more testing, I found
>> it was only the assembly version that is really slow.  The C version is
>> fairly quick.  Is there a need to continue to use the assembly versions
>> of string functions on i386?  Does it mainly help slower systems such as
>> those with i386 or i486 CPU's?
>
> I think you are mistaken about the asm version being slow.  In my tests
> ...

Partly.

>> I have the results from my P4 (Id = 0xf24 Stepping = 4) system and the
>> test program here[1].  strlen.tar.bz2 is the archive of it for anyone's
>> testing.  In the strlen/results subdirectory, there are the results for
>> strings of increasing lengths.
>
> Sorry, I didn't look at this.  I just wrote a quick re-test and ran it

Now I've looked at it.  I think it is not testing strlen() at all, except
for the libc case, because __pure prevents more than 1 call to strlen().
(The existence of __pure is also a bug.  __pure was the FreeBSD spelling
of the __const__ attribute in gcc-1.  It was removed when special support
for gcc-1 was dropped, and should not have been recycled.)  __pure is a
syntax error in the old version of FreeBSD that I tested on.  I first
tried __pure2, which is the FreeBSD spelling of the __const__ attribute
in gcc-2.  I think it is weaker than the __pure__ attribute in gcc-3.

After removing __pure* and adding -static -g to CFLAGS, with gcc-3.3.3:

On a old Celeron (400MHz) (all P2's probably behave like this):

%%%
libcstrlen:	time spent executing strlen(string) = 64:	7.786868
basestrlen:	time spent executing strlen(string) = 64:	3.816736
strlen:		time spent executing strlen(string) = 64:	3.364313
strlen2:	time spent executing strlen(string) = 64:	2.662973
%%%

rep scasb is apparently very slow on P2's.

On an A64 in i386 mode:

%%%
libcstrlen:	time spent executing strlen(string) = 64:	0.709657
basestrlen:	time spent executing strlen(string) = 64:	0.691397
strlen:		time spent executing strlen(string) = 64:	0.527339
strlen2:	time spent executing strlen(string) = 64:	0.441090
%%%

Now rep scasb is only slightly slower than the simple C loop (since all
small loops take 2 cycles on AXP and A64...).  strlen and strlen2 are
marginally faster since their loops do more.

basestrlen is fastest for lengths <= 5 on the Celeron.

basestrlen is fastest for lengths <= 9 on the A64.

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20070712211245.M8625>