Date: Thu, 12 Jul 2007 21:32:31 +1000 (EST) From: Bruce Evans <brde@optusnet.com.au> To: Bruce Evans <brde@optusnet.com.au> Cc: "Sean C. Farley" <scf@freebsd.org>, freebsd-arch@freebsd.org Subject: Re: Assembly string functions in i386 libc Message-ID: <20070712211245.M8625@besplex.bde.org> In-Reply-To: <20070712191616.A4682@delplex.bde.org> References: <20070711134721.D2385@thor.farley.org> <20070712191616.A4682@delplex.bde.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, 12 Jul 2007, Bruce Evans wrote: > On Wed, 11 Jul 2007, Sean C. Farley wrote: > >> While looking at increasing the speed of strlen(), I noticed that on >> i386 platforms (PIII, P4 and Athlon XP) the performance is abysmal in >> libc compared to the version I was writing. After more testing, I found >> it was only the assembly version that is really slow. The C version is >> fairly quick. Is there a need to continue to use the assembly versions >> of string functions on i386? Does it mainly help slower systems such as >> those with i386 or i486 CPU's? > > I think you are mistaken about the asm version being slow. In my tests > ... Partly. >> I have the results from my P4 (Id = 0xf24 Stepping = 4) system and the >> test program here[1]. strlen.tar.bz2 is the archive of it for anyone's >> testing. In the strlen/results subdirectory, there are the results for >> strings of increasing lengths. > > Sorry, I didn't look at this. I just wrote a quick re-test and ran it Now I've looked at it. I think it is not testing strlen() at all, except for the libc case, because __pure prevents more than 1 call to strlen(). (The existence of __pure is also a bug. __pure was the FreeBSD spelling of the __const__ attribute in gcc-1. It was removed when special support for gcc-1 was dropped, and should not have been recycled.) __pure is a syntax error in the old version of FreeBSD that I tested on. I first tried __pure2, which is the FreeBSD spelling of the __const__ attribute in gcc-2. I think it is weaker than the __pure__ attribute in gcc-3. After removing __pure* and adding -static -g to CFLAGS, with gcc-3.3.3: On a old Celeron (400MHz) (all P2's probably behave like this): %%% libcstrlen: time spent executing strlen(string) = 64: 7.786868 basestrlen: time spent executing strlen(string) = 64: 3.816736 strlen: time spent executing strlen(string) = 64: 3.364313 strlen2: time spent executing strlen(string) = 64: 2.662973 %%% rep scasb is apparently very slow on P2's. On an A64 in i386 mode: %%% libcstrlen: time spent executing strlen(string) = 64: 0.709657 basestrlen: time spent executing strlen(string) = 64: 0.691397 strlen: time spent executing strlen(string) = 64: 0.527339 strlen2: time spent executing strlen(string) = 64: 0.441090 %%% Now rep scasb is only slightly slower than the simple C loop (since all small loops take 2 cycles on AXP and A64...). strlen and strlen2 are marginally faster since their loops do more. basestrlen is fastest for lengths <= 5 on the Celeron. basestrlen is fastest for lengths <= 9 on the A64. Bruce
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20070712211245.M8625>