Date: Wed, 11 Jul 2007 18:46:11 -0500 (CDT) From: "Sean C. Farley" <scf@FreeBSD.org> To: Peter Jeremy <peterjeremy@optushome.com.au> Cc: freebsd-arch@FreeBSD.org Subject: Re: Assembly string functions in i386 libc Message-ID: <20070711171829.Y2385@thor.farley.org> In-Reply-To: <20070711221338.GC20178@turion.vk2pj.dyndns.org> References: <20070711134721.D2385@thor.farley.org> <20070711221338.GC20178@turion.vk2pj.dyndns.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, 12 Jul 2007, Peter Jeremy wrote: > On 2007-Jul-11 15:24:01 -0500, "Sean C. Farley" <scf@freebsd.org> wrote: >> libc compared to the version I was writing. After more testing, I >> found it was only the assembly version that is really slow. The C >> version is fairly quick. Is there a need to continue to use the >> assembly versions of string functions on i386? Does it mainly help >> slower systems such as those with i386 or i486 CPU's? > > The performance of string instructions has varied wildly across > various x86 implementations. Definitely, for short strings, the > overhead in initialising the various registers outweighs any actual > difference in loop performance. For any recent CPU, the location of > the string in the memory hierarchy far outweighs implementation > issues. bde@ has done various testing in the last and posted results. > > Some comments: > - comparing the strlen() in a shared libc with a statically linked one > is unfair - especially on the i386. I had been testing with strlen.S linked into the test program, but the results were the same (at least for me) as linking against libc. > - Your results don't include non-aligned inputs I ran the test again but skipping to the next byte in a given string. They are in a results-non-aligned directory. The string given to the program was always one byte bigger than before to allow the results to match up between aligned and non-aligned. > - Your results don't include non-power-of-2 lengths I have tested values of various lengths. The Makefile in the main directory shows other values I have tried. I can output some more outputs including the assembly file compiled directly into the program. >> I would appreciate it if anyone could see if strlen and strlen2 >> perform any better on an amd64. Although the current C version of >> strlen() in 7-CURRENT is faster than mine for smaller values, they >> perform better for larger strings. > > I've tested on: > FreeBSD 6.2-STABLE #28: Fri Jun 22 11:44:13 EST 2007 > root@turion.vk2pj.dyndns.org:/usr/obj/usr/src/sys/turion > CPU: AMD Turion(tm) 64 Mobile ML-40 (2194.52-MHz K8-class CPU) > Origin = "AuthenticAMD" Id = 0x20f42 Stepping = 2 > Features=0x78bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2> > Features2=0x1<SSE3> > AMD Features=0xe2500800<SYSCALL,NX,MMX+,FFXSR,LM,3DNow!+,3DNow!> > AMD Features2=0x1<LAHF> > > There is no asm strlen so libcstrlen and basestrlen should be > identical (and disassembling [x]strlen() shows that the code _is_ > identical) but there are significant differences for short strings and > measurable differences for all lengths except 32 bytes. This > indicates that your program is not able to accurately compare strlen() > performance. I am not sure I understand. The 32-byte test results show a measurable difference in your output and mine. I just switched the program to use getrusage() from gettimeofday. This should show more accurate results for 32 bytes and the 4- and 8-byte tests below. > I've tried statically linking all the test programs and this removes > the libcstrlen/basestrlen differences. The very poor results for 4 > and 8 byte strings are unexpected but (as expected), your unrolled > strlen() implementations behave better for longer strings. > > The attached results all reflect your code with '-static' added to > every gcc/link step. I redid my tests with everything compiled statically. Also, getrusage() was used instead of gettimeofday(). Sean -- scf@FreeBSD.org
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20070711171829.Y2385>