From owner-freebsd-arch@FreeBSD.ORG  Wed Jul 11 23:46:30 2007
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
X-Original-To: freebsd-arch@FreeBSD.org
Delivered-To: freebsd-arch@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 53A0216A400
	for <freebsd-arch@FreeBSD.org>; Wed, 11 Jul 2007 23:46:30 +0000 (UTC)
	(envelope-from scf@FreeBSD.org)
Received: from mail.farley.org (farley.org [67.64.95.201])
	by mx1.freebsd.org (Postfix) with ESMTP id 0149513C484
	for <freebsd-arch@FreeBSD.org>; Wed, 11 Jul 2007 23:46:29 +0000 (UTC)
	(envelope-from scf@FreeBSD.org)
Received: from thor.farley.org (thor.farley.org [192.168.1.5])
	by mail.farley.org (8.14.1/8.14.1) with ESMTP id l6BNmJqD010761;
	Wed, 11 Jul 2007 18:48:19 -0500 (CDT) (envelope-from scf@FreeBSD.org)
Date: Wed, 11 Jul 2007 18:46:11 -0500 (CDT)
From: "Sean C. Farley" <scf@FreeBSD.org>
To: Peter Jeremy <peterjeremy@optushome.com.au>
In-Reply-To: <20070711221338.GC20178@turion.vk2pj.dyndns.org>
Message-ID: <20070711171829.Y2385@thor.farley.org>
References: <20070711134721.D2385@thor.farley.org>
	<20070711221338.GC20178@turion.vk2pj.dyndns.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Spam-Status: No, score=-4.4 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00
	autolearn=ham version=3.2.1
X-Spam-Checker-Version: SpamAssassin 3.2.1 (2007-05-02) on mail.farley.org
Cc: freebsd-arch@FreeBSD.org
Subject: Re: Assembly string functions in i386 libc
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 11 Jul 2007 23:46:30 -0000

On Thu, 12 Jul 2007, Peter Jeremy wrote:

> On 2007-Jul-11 15:24:01 -0500, "Sean C. Farley" <scf@freebsd.org> wrote:
>> libc compared to the version I was writing.  After more testing, I
>> found it was only the assembly version that is really slow.  The C
>> version is fairly quick.  Is there a need to continue to use the
>> assembly versions of string functions on i386?  Does it mainly help
>> slower systems such as those with i386 or i486 CPU's?
>
> The performance of string instructions has varied wildly across
> various x86 implementations.  Definitely, for short strings, the
> overhead in initialising the various registers outweighs any actual
> difference in loop performance.  For any recent CPU, the location of
> the string in the memory hierarchy far outweighs implementation
> issues.  bde@ has done various testing in the last and posted results.
>
> Some comments:
> - comparing the strlen() in a shared libc with a statically linked one
>   is unfair - especially on the i386.

I had been testing with strlen.S linked into the test program, but the
results were the same (at least for me) as linking against libc.

> - Your results don't include non-aligned inputs

I ran the test again but skipping to the next byte in a given string.
They are in a results-non-aligned directory.  The string given to the
program was always one byte bigger than before to allow the results to
match up between aligned and non-aligned.

> - Your results don't include non-power-of-2 lengths

I have tested values of various lengths.  The Makefile in the main
directory shows other values I have tried.  I can output some more
outputs including the assembly file compiled directly into the program.

>> I would appreciate it if anyone could see if strlen and strlen2
>> perform any better on an amd64.  Although the current C version of
>> strlen() in 7-CURRENT is faster than mine for smaller values, they
>> perform better for larger strings.
>
> I've tested on:
> FreeBSD 6.2-STABLE #28: Fri Jun 22 11:44:13 EST 2007
>    root@turion.vk2pj.dyndns.org:/usr/obj/usr/src/sys/turion
> CPU: AMD Turion(tm) 64 Mobile ML-40                  (2194.52-MHz K8-class CPU)
>  Origin = "AuthenticAMD"  Id = 0x20f42  Stepping = 2
>  Features=0x78bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2>
>  Features2=0x1<SSE3>
>  AMD Features=0xe2500800<SYSCALL,NX,MMX+,FFXSR,LM,3DNow!+,3DNow!>
>  AMD Features2=0x1<LAHF>
>
> There is no asm strlen so libcstrlen and basestrlen should be
> identical (and disassembling [x]strlen() shows that the code _is_
> identical) but there are significant differences for short strings and
> measurable differences for all lengths except 32 bytes.  This
> indicates that your program is not able to accurately compare strlen()
> performance.

I am not sure I understand.  The 32-byte test results show a measurable
difference in your output and mine.

I just switched the program to use getrusage() from gettimeofday.  This
should show more accurate results for 32 bytes and the 4- and 8-byte
tests below.

> I've tried statically linking all the test programs and this removes
> the libcstrlen/basestrlen differences.  The very poor results for 4
> and 8 byte strings are unexpected but (as expected), your unrolled
> strlen() implementations behave better for longer strings.
>
> The attached results all reflect your code with '-static' added to
> every gcc/link step.

I redid my tests with everything compiled statically.  Also, getrusage()
was used instead of gettimeofday().

Sean
-- 
scf@FreeBSD.org