Date: Thu, 12 Jul 2007 16:02:47 -0500 (CDT) From: "Sean C. Farley" <scf@FreeBSD.org> To: Bruce Evans <brde@optusnet.com.au> Cc: freebsd-arch@FreeBSD.org Subject: Re: Assembly string functions in i386 libc Message-ID: <20070712142024.Q8789@thor.farley.org> In-Reply-To: <20070712211245.M8625@besplex.bde.org> References: <20070711134721.D2385@thor.farley.org> <20070712191616.A4682@delplex.bde.org> <20070712211245.M8625@besplex.bde.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, 12 Jul 2007, Bruce Evans wrote: > On Thu, 12 Jul 2007, Bruce Evans wrote: > >> On Wed, 11 Jul 2007, Sean C. Farley wrote: >> >>> While looking at increasing the speed of strlen(), I noticed that on >>> i386 platforms (PIII, P4 and Athlon XP) the performance is abysmal >>> in libc compared to the version I was writing. After more testing, >>> I found it was only the assembly version that is really slow. The C >>> version is fairly quick. Is there a need to continue to use the >>> assembly versions of string functions on i386? Does it mainly help >>> slower systems such as those with i386 or i486 CPU's? >> >> I think you are mistaken about the asm version being slow. In my >> tests ... > > Partly. > >>> I have the results from my P4 (Id = 0xf24 Stepping = 4) system and >>> the test program here[1]. strlen.tar.bz2 is the archive of it for >>> anyone's testing. In the strlen/results subdirectory, there are the >>> results for strings of increasing lengths. >> >> Sorry, I didn't look at this. I just wrote a quick re-test and ran >> it > > Now I've looked at it. I think it is not testing strlen() at all, > except for the libc case, because __pure prevents more than 1 call to > strlen(). (The existence of __pure is also a bug. __pure was the > FreeBSD spelling of the __const__ attribute in gcc-1. It was removed > when special support for gcc-1 was dropped, and should not have been > recycled.) __pure is a syntax error in the old version of FreeBSD > that I tested on. I first tried __pure2, which is the FreeBSD > spelling of the __const__ attribute in gcc-2. I think it is weaker > than the __pure__ attribute in gcc-3. >From what I could find, strlen() should not have the __const__ (__pure2) attribute since it is being passed a pointer, but __pure__ (__pure) should work. Are you saying that __pure used to mean __const__ in gcc-1 but now it means __pure__ for gcc-2.96 and above? The redefinition of __pure is what you are saying is a bug. Yes? > After removing __pure* and adding -static -g to CFLAGS, with > gcc-3.3.3: > > On a old Celeron (400MHz) (all P2's probably behave like this): > > %%% > libcstrlen: time spent executing strlen(string) = 64: 7.786868 > basestrlen: time spent executing strlen(string) = 64: 3.816736 > strlen: time spent executing strlen(string) = 64: 3.364313 > strlen2: time spent executing strlen(string) = 64: 2.662973 > %%% > > rep scasb is apparently very slow on P2's. > > On an A64 in i386 mode: > > %%% > libcstrlen: time spent executing strlen(string) = 64: 0.709657 > basestrlen: time spent executing strlen(string) = 64: 0.691397 > strlen: time spent executing strlen(string) = 64: 0.527339 > strlen2: time spent executing strlen(string) = 64: 0.441090 > %%% > > Now rep scasb is only slightly slower than the simple C loop (since > all small loops take 2 cycles on AXP and A64...). strlen and strlen2 > are marginally faster since their loops do more. > > basestrlen is fastest for lengths <= 5 on the Celeron. > > basestrlen is fastest for lengths <= 9 on the A64. I removed __pure from main.c and added -static -g. Athlon XP 2100 (1.72 GHz): libcstrlen: time spent executing strlen(string) = 64: 0.994755 asmstrlen: time spent executing strlen(string) = 64: 0.989012 basestrlen: time spent executing strlen(string) = 64: 0.879722 strlen: time spent executing strlen(string) = 64: 0.626727 strlen2: time spent executing strlen(string) = 64: 0.587162 P4 1.6 GHz: libcstrlen: time spent executing strlen(string) = 64: 2.412558 asmstrlen: time spent executing strlen(string) = 64: 2.413904 basestrlen: time spent executing strlen(string) = 64: 1.049927 strlen: time spent executing strlen(string) = 64: 0.543575 strlen2: time spent executing strlen(string) = 64: 0.547015 PIII 450MHz: libcstrlen: time spent executing strlen(string) = 64: 6.976066 asmstrlen: time spent executing strlen(string) = 64: 6.974106 basestrlen: time spent executing strlen(string) = 64: 3.464854 strlen: time spent executing strlen(string) = 64: 2.541872 strlen2: time spent executing strlen(string) = 64: 2.339469 The Athlon XP did much better with the assembly version than either Intel CPU for me. For all three CPU's using various string lengths from 1 to 256, the C versions always beat the assembly version although it came somewhat close for the 9 to 32 byte lengths to basestrlen. Even if this does not show that the assembly version should be replaced, I find this performance testing interesting. I learned something new. Sean -- scf@FreeBSD.org
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20070712142024.Q8789>