From owner-freebsd-arch@FreeBSD.ORG Wed Jul 11 23:46:30 2007 Return-Path: X-Original-To: freebsd-arch@FreeBSD.org Delivered-To: freebsd-arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 53A0216A400 for ; Wed, 11 Jul 2007 23:46:30 +0000 (UTC) (envelope-from scf@FreeBSD.org) Received: from mail.farley.org (farley.org [67.64.95.201]) by mx1.freebsd.org (Postfix) with ESMTP id 0149513C484 for ; Wed, 11 Jul 2007 23:46:29 +0000 (UTC) (envelope-from scf@FreeBSD.org) Received: from thor.farley.org (thor.farley.org [192.168.1.5]) by mail.farley.org (8.14.1/8.14.1) with ESMTP id l6BNmJqD010761; Wed, 11 Jul 2007 18:48:19 -0500 (CDT) (envelope-from scf@FreeBSD.org) Date: Wed, 11 Jul 2007 18:46:11 -0500 (CDT) From: "Sean C. Farley" To: Peter Jeremy In-Reply-To: <20070711221338.GC20178@turion.vk2pj.dyndns.org> Message-ID: <20070711171829.Y2385@thor.farley.org> References: <20070711134721.D2385@thor.farley.org> <20070711221338.GC20178@turion.vk2pj.dyndns.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Spam-Status: No, score=-4.4 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00 autolearn=ham version=3.2.1 X-Spam-Checker-Version: SpamAssassin 3.2.1 (2007-05-02) on mail.farley.org Cc: freebsd-arch@FreeBSD.org Subject: Re: Assembly string functions in i386 libc X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 11 Jul 2007 23:46:30 -0000 On Thu, 12 Jul 2007, Peter Jeremy wrote: > On 2007-Jul-11 15:24:01 -0500, "Sean C. Farley" wrote: >> libc compared to the version I was writing. After more testing, I >> found it was only the assembly version that is really slow. The C >> version is fairly quick. Is there a need to continue to use the >> assembly versions of string functions on i386? Does it mainly help >> slower systems such as those with i386 or i486 CPU's? > > The performance of string instructions has varied wildly across > various x86 implementations. Definitely, for short strings, the > overhead in initialising the various registers outweighs any actual > difference in loop performance. For any recent CPU, the location of > the string in the memory hierarchy far outweighs implementation > issues. bde@ has done various testing in the last and posted results. > > Some comments: > - comparing the strlen() in a shared libc with a statically linked one > is unfair - especially on the i386. I had been testing with strlen.S linked into the test program, but the results were the same (at least for me) as linking against libc. > - Your results don't include non-aligned inputs I ran the test again but skipping to the next byte in a given string. They are in a results-non-aligned directory. The string given to the program was always one byte bigger than before to allow the results to match up between aligned and non-aligned. > - Your results don't include non-power-of-2 lengths I have tested values of various lengths. The Makefile in the main directory shows other values I have tried. I can output some more outputs including the assembly file compiled directly into the program. >> I would appreciate it if anyone could see if strlen and strlen2 >> perform any better on an amd64. Although the current C version of >> strlen() in 7-CURRENT is faster than mine for smaller values, they >> perform better for larger strings. > > I've tested on: > FreeBSD 6.2-STABLE #28: Fri Jun 22 11:44:13 EST 2007 > root@turion.vk2pj.dyndns.org:/usr/obj/usr/src/sys/turion > CPU: AMD Turion(tm) 64 Mobile ML-40 (2194.52-MHz K8-class CPU) > Origin = "AuthenticAMD" Id = 0x20f42 Stepping = 2 > Features=0x78bfbff > Features2=0x1 > AMD Features=0xe2500800 > AMD Features2=0x1 > > There is no asm strlen so libcstrlen and basestrlen should be > identical (and disassembling [x]strlen() shows that the code _is_ > identical) but there are significant differences for short strings and > measurable differences for all lengths except 32 bytes. This > indicates that your program is not able to accurately compare strlen() > performance. I am not sure I understand. The 32-byte test results show a measurable difference in your output and mine. I just switched the program to use getrusage() from gettimeofday. This should show more accurate results for 32 bytes and the 4- and 8-byte tests below. > I've tried statically linking all the test programs and this removes > the libcstrlen/basestrlen differences. The very poor results for 4 > and 8 byte strings are unexpected but (as expected), your unrolled > strlen() implementations behave better for longer strings. > > The attached results all reflect your code with '-static' added to > every gcc/link step. I redid my tests with everything compiled statically. Also, getrusage() was used instead of gettimeofday(). Sean -- scf@FreeBSD.org