From owner-freebsd-hackers Tue May 28 01:44:12 1996 Return-Path: owner-hackers Received: (from root@localhost) by freefall.freebsd.org (8.7.5/8.7.3) id BAA11330 for hackers-outgoing; Tue, 28 May 1996 01:44:12 -0700 (PDT) Received: from godzilla.zeta.org.au (godzilla.zeta.org.au [203.2.228.19]) by freefall.freebsd.org (8.7.5/8.7.3) with SMTP id BAA11309 for ; Tue, 28 May 1996 01:44:02 -0700 (PDT) Received: (from bde@localhost) by godzilla.zeta.org.au (8.6.12/8.6.9) id SAA08598; Tue, 28 May 1996 18:36:08 +1000 Date: Tue, 28 May 1996 18:36:08 +1000 From: Bruce Evans Message-Id: <199605280836.SAA08598@godzilla.zeta.org.au> To: charnier@lirmm.fr, hackers@FreeBSD.ORG Subject: Re: strcpy, strcat: not the same look & feel. Sender: owner-hackers@FreeBSD.ORG X-Loop: FreeBSD.org Precedence: bulk >Which one is faster, the old version or the one with this patch applied? >Libc uses another one (assembler) but this could at least make libkern >faster. Or is it even better to use the libc's version? I'm not really sure >about my results but it seems that the following patch make strcpy 8% faster >(-O0) 6% faster (-O) and 0% faster (-O2) on my i486 according to gprof. >... >- for (; *to = *from; ++from, ++to); >+ while (*to++ = *from++); They are essentially the same, But gcc doesn't recognise this at any optimization level, and generates slightly different code that happens to be faster or slower depending on the cpu. I get quite different results for one test with a short string (of length 5) on a Pentium: -O0: 29% faster (16.79s reduced to 11.96s) -O1: 5% slower (12.23s increased to 12.85s) -O2: 9% slower (11.34s increased to 12.40s) -O3: 13% faster (2.57s reduced to 2.27s) The speed actually depends more on the surrounding code than on the loop. Essentially the same code is generated for the loop in all cases except -O0. -O3 is much faster because the copy function got inlined. Slightly different setup code for the other tests gives significantly different results. Only the results for -O0 case are easy to understand. The unoptimized code for the while loop happens to be less pessimal on the i386. Bruce