From owner-freebsd-arch@FreeBSD.ORG Thu Jul 12 21:03:07 2007 Return-Path: X-Original-To: freebsd-arch@FreeBSD.org Delivered-To: freebsd-arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 7312516A46D for ; Thu, 12 Jul 2007 21:03:07 +0000 (UTC) (envelope-from scf@FreeBSD.org) Received: from mail.farley.org (farley.org [67.64.95.201]) by mx1.freebsd.org (Postfix) with ESMTP id 1D9A313C458 for ; Thu, 12 Jul 2007 21:03:06 +0000 (UTC) (envelope-from scf@FreeBSD.org) Received: from thor.farley.org (thor.farley.org [192.168.1.5]) by mail.farley.org (8.14.1/8.14.1) with ESMTP id l6CL4tx4031666; Thu, 12 Jul 2007 16:04:56 -0500 (CDT) (envelope-from scf@FreeBSD.org) Date: Thu, 12 Jul 2007 16:02:47 -0500 (CDT) From: "Sean C. Farley" To: Bruce Evans In-Reply-To: <20070712211245.M8625@besplex.bde.org> Message-ID: <20070712142024.Q8789@thor.farley.org> References: <20070711134721.D2385@thor.farley.org> <20070712191616.A4682@delplex.bde.org> <20070712211245.M8625@besplex.bde.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Spam-Status: No, score=-4.4 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00 autolearn=ham version=3.2.1 X-Spam-Checker-Version: SpamAssassin 3.2.1 (2007-05-02) on mail.farley.org Cc: freebsd-arch@FreeBSD.org Subject: Re: Assembly string functions in i386 libc X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 12 Jul 2007 21:03:07 -0000 On Thu, 12 Jul 2007, Bruce Evans wrote: > On Thu, 12 Jul 2007, Bruce Evans wrote: > >> On Wed, 11 Jul 2007, Sean C. Farley wrote: >> >>> While looking at increasing the speed of strlen(), I noticed that on >>> i386 platforms (PIII, P4 and Athlon XP) the performance is abysmal >>> in libc compared to the version I was writing. After more testing, >>> I found it was only the assembly version that is really slow. The C >>> version is fairly quick. Is there a need to continue to use the >>> assembly versions of string functions on i386? Does it mainly help >>> slower systems such as those with i386 or i486 CPU's? >> >> I think you are mistaken about the asm version being slow. In my >> tests ... > > Partly. > >>> I have the results from my P4 (Id = 0xf24 Stepping = 4) system and >>> the test program here[1]. strlen.tar.bz2 is the archive of it for >>> anyone's testing. In the strlen/results subdirectory, there are the >>> results for strings of increasing lengths. >> >> Sorry, I didn't look at this. I just wrote a quick re-test and ran >> it > > Now I've looked at it. I think it is not testing strlen() at all, > except for the libc case, because __pure prevents more than 1 call to > strlen(). (The existence of __pure is also a bug. __pure was the > FreeBSD spelling of the __const__ attribute in gcc-1. It was removed > when special support for gcc-1 was dropped, and should not have been > recycled.) __pure is a syntax error in the old version of FreeBSD > that I tested on. I first tried __pure2, which is the FreeBSD > spelling of the __const__ attribute in gcc-2. I think it is weaker > than the __pure__ attribute in gcc-3. >From what I could find, strlen() should not have the __const__ (__pure2) attribute since it is being passed a pointer, but __pure__ (__pure) should work. Are you saying that __pure used to mean __const__ in gcc-1 but now it means __pure__ for gcc-2.96 and above? The redefinition of __pure is what you are saying is a bug. Yes? > After removing __pure* and adding -static -g to CFLAGS, with > gcc-3.3.3: > > On a old Celeron (400MHz) (all P2's probably behave like this): > > %%% > libcstrlen: time spent executing strlen(string) = 64: 7.786868 > basestrlen: time spent executing strlen(string) = 64: 3.816736 > strlen: time spent executing strlen(string) = 64: 3.364313 > strlen2: time spent executing strlen(string) = 64: 2.662973 > %%% > > rep scasb is apparently very slow on P2's. > > On an A64 in i386 mode: > > %%% > libcstrlen: time spent executing strlen(string) = 64: 0.709657 > basestrlen: time spent executing strlen(string) = 64: 0.691397 > strlen: time spent executing strlen(string) = 64: 0.527339 > strlen2: time spent executing strlen(string) = 64: 0.441090 > %%% > > Now rep scasb is only slightly slower than the simple C loop (since > all small loops take 2 cycles on AXP and A64...). strlen and strlen2 > are marginally faster since their loops do more. > > basestrlen is fastest for lengths <= 5 on the Celeron. > > basestrlen is fastest for lengths <= 9 on the A64. I removed __pure from main.c and added -static -g. Athlon XP 2100 (1.72 GHz): libcstrlen: time spent executing strlen(string) = 64: 0.994755 asmstrlen: time spent executing strlen(string) = 64: 0.989012 basestrlen: time spent executing strlen(string) = 64: 0.879722 strlen: time spent executing strlen(string) = 64: 0.626727 strlen2: time spent executing strlen(string) = 64: 0.587162 P4 1.6 GHz: libcstrlen: time spent executing strlen(string) = 64: 2.412558 asmstrlen: time spent executing strlen(string) = 64: 2.413904 basestrlen: time spent executing strlen(string) = 64: 1.049927 strlen: time spent executing strlen(string) = 64: 0.543575 strlen2: time spent executing strlen(string) = 64: 0.547015 PIII 450MHz: libcstrlen: time spent executing strlen(string) = 64: 6.976066 asmstrlen: time spent executing strlen(string) = 64: 6.974106 basestrlen: time spent executing strlen(string) = 64: 3.464854 strlen: time spent executing strlen(string) = 64: 2.541872 strlen2: time spent executing strlen(string) = 64: 2.339469 The Athlon XP did much better with the assembly version than either Intel CPU for me. For all three CPU's using various string lengths from 1 to 256, the C versions always beat the assembly version although it came somewhat close for the 9 to 32 byte lengths to basestrlen. Even if this does not show that the assembly version should be replaced, I find this performance testing interesting. I learned something new. Sean -- scf@FreeBSD.org