From owner-freebsd-current  Wed Jul  3  3:35:22 2002
Delivered-To: freebsd-current@freebsd.org
Received: from mx1.FreeBSD.org (mx1.FreeBSD.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 982F137B400
	for <current@FreeBSD.ORG>; Wed,  3 Jul 2002 03:35:19 -0700 (PDT)
Received: from mailman.zeta.org.au (mailman.zeta.org.au [203.26.10.16])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 8B00B43E52
	for <current@FreeBSD.ORG>; Wed,  3 Jul 2002 03:35:18 -0700 (PDT)
	(envelope-from bde@zeta.org.au)
Received: from bde.zeta.org.au (bde.zeta.org.au [203.2.228.102])
	by mailman.zeta.org.au (8.9.3/8.8.7) with ESMTP id UAA29094;
	Wed, 3 Jul 2002 20:35:06 +1000
Date: Wed, 3 Jul 2002 20:41:03 +1000 (EST)
From: Bruce Evans <bde@zeta.org.au>
X-X-Sender: bde@gamplex.bde.org
To: Garance A Drosihn <drosih@rpi.edu>
Cc: Matthew Dillon <dillon@apollo.backplane.com>,
	"David O'Brien" <dev-null@NUXI.com>,
	FreeBSD current users <current@FreeBSD.ORG>
Subject: Re: -current results (was something funny with soft updates?)
In-Reply-To: <p0511171bb948476deea0@[128.113.24.47]>
Message-ID: <20020703201421.B15898-100000@gamplex.bde.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-current@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-current.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-current>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-current>
X-Loop: FreeBSD.ORG

On Wed, 3 Jul 2002, Garance A Drosihn wrote:

> At 11:01 PM -0700 7/2/02, Matthew Dillon wrote:
> >     I get just about the same performance for GCC2 as I
> >     do for GCC3 in the tests I've run so far.  It makes
> >     me wonder what the hell GCC3 is burning all that
> >     cpu *on*.
>
> One of the guys here at RPI (dec, actually) claims he got
> buildworld under current to run at more reasonable speeds
> by explicitly setting the CPUTYPE.  I haven't had the time
> to run any experiments with that yet.

I got some improvements in generated code for a microbenchmark by
compiling with -march=<runtime arch>.  gcc on i386's now likes to
"optimize" "andb $1,%al" and "testb $1,%al" as "andl $1,%eax" and
"testl $1,%eax", respectively.  This tends to give a large pessimization
(50% for the above in a loop) on at least PentiumPro's and PII's due
to a partal register stall.  Compiling with -march=pentium2 regains
the original speed on a Celeron400 at least by zero-extending %eax
before using it, but double-crosses itself by going back to using
%al and not actually using %eax.  Manually changing the code back
to use %eax gave a 5% speedup for the loop relative to the old
version.  The manual change also gave a 5% speedup for an AthlonXP.
AthlonXP's don't have partial register stalls and all versions
generated by gcc gave the same results (-march=athlon-xp generated the
same code as -march=pentium2).

Summary: we can break even on all tested arches with gcc-3 for the
microbenchmark by setting CPUTYPE right.  We can beat gcc-2 by tweaking
the generated code to be what gcc-3 apparently intended.

But I don't like setting CPUTYPE or use -march, since I want to run
the same code on different (i386-sub-)arches.  I have 2 different ones
on active machines and 3 more on inactive machines).  Releases need
to run on even more arches.

Bruce


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message