From owner-freebsd-arch@FreeBSD.ORG Thu Jul 12 11:32:36 2007 Return-Path: X-Original-To: freebsd-arch@freebsd.org Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 0A6E016A468; Thu, 12 Jul 2007 11:32:36 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail18.syd.optusnet.com.au (mail18.syd.optusnet.com.au [211.29.132.199]) by mx1.freebsd.org (Postfix) with ESMTP id 9D42813C46E; Thu, 12 Jul 2007 11:32:35 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from besplex.bde.org (c220-239-235-248.carlnfd3.nsw.optusnet.com.au [220.239.235.248]) by mail18.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id l6CBWV60027402 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 12 Jul 2007 21:32:33 +1000 Date: Thu, 12 Jul 2007 21:32:31 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Bruce Evans In-Reply-To: <20070712191616.A4682@delplex.bde.org> Message-ID: <20070712211245.M8625@besplex.bde.org> References: <20070711134721.D2385@thor.farley.org> <20070712191616.A4682@delplex.bde.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: "Sean C. Farley" , freebsd-arch@freebsd.org Subject: Re: Assembly string functions in i386 libc X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 12 Jul 2007 11:32:36 -0000 On Thu, 12 Jul 2007, Bruce Evans wrote: > On Wed, 11 Jul 2007, Sean C. Farley wrote: > >> While looking at increasing the speed of strlen(), I noticed that on >> i386 platforms (PIII, P4 and Athlon XP) the performance is abysmal in >> libc compared to the version I was writing. After more testing, I found >> it was only the assembly version that is really slow. The C version is >> fairly quick. Is there a need to continue to use the assembly versions >> of string functions on i386? Does it mainly help slower systems such as >> those with i386 or i486 CPU's? > > I think you are mistaken about the asm version being slow. In my tests > ... Partly. >> I have the results from my P4 (Id = 0xf24 Stepping = 4) system and the >> test program here[1]. strlen.tar.bz2 is the archive of it for anyone's >> testing. In the strlen/results subdirectory, there are the results for >> strings of increasing lengths. > > Sorry, I didn't look at this. I just wrote a quick re-test and ran it Now I've looked at it. I think it is not testing strlen() at all, except for the libc case, because __pure prevents more than 1 call to strlen(). (The existence of __pure is also a bug. __pure was the FreeBSD spelling of the __const__ attribute in gcc-1. It was removed when special support for gcc-1 was dropped, and should not have been recycled.) __pure is a syntax error in the old version of FreeBSD that I tested on. I first tried __pure2, which is the FreeBSD spelling of the __const__ attribute in gcc-2. I think it is weaker than the __pure__ attribute in gcc-3. After removing __pure* and adding -static -g to CFLAGS, with gcc-3.3.3: On a old Celeron (400MHz) (all P2's probably behave like this): %%% libcstrlen: time spent executing strlen(string) = 64: 7.786868 basestrlen: time spent executing strlen(string) = 64: 3.816736 strlen: time spent executing strlen(string) = 64: 3.364313 strlen2: time spent executing strlen(string) = 64: 2.662973 %%% rep scasb is apparently very slow on P2's. On an A64 in i386 mode: %%% libcstrlen: time spent executing strlen(string) = 64: 0.709657 basestrlen: time spent executing strlen(string) = 64: 0.691397 strlen: time spent executing strlen(string) = 64: 0.527339 strlen2: time spent executing strlen(string) = 64: 0.441090 %%% Now rep scasb is only slightly slower than the simple C loop (since all small loops take 2 cycles on AXP and A64...). strlen and strlen2 are marginally faster since their loops do more. basestrlen is fastest for lengths <= 5 on the Celeron. basestrlen is fastest for lengths <= 9 on the A64. Bruce