From owner-freebsd-current  Sat Dec  2 21:06:43 1995
Return-Path: owner-current
Received: (from root@localhost)
          by freefall.freebsd.org (8.6.12/8.6.6) id VAA19728
          for current-outgoing; Sat, 2 Dec 1995 21:06:43 -0800
Received: from godzilla.zeta.org.au (godzilla.zeta.org.au [203.2.228.19])
          by freefall.freebsd.org (8.6.12/8.6.6) with ESMTP id VAA19715
          for <current@FreeBSD.ORG>; Sat, 2 Dec 1995 21:06:25 -0800
Received: (from bde@localhost) by godzilla.zeta.org.au (8.6.9/8.6.9) id QAA09482; Sun, 3 Dec 1995 16:05:43 +1100
Date: Sun, 3 Dec 1995 16:05:43 +1100
From: Bruce Evans <bde@zeta.org.au>
Message-Id: <199512030505.QAA09482@godzilla.zeta.org.au>
To: scrappy@hub.org, wollman@lcs.mit.edu
Subject: Re: -O6/-fstrength-reduce for kernel (Was: Re: changes in -current...)
Cc: current@FreeBSD.ORG
Sender: owner-current@FreeBSD.ORG
Precedence: bulk

>> 	Also, someone mentioned using -fno-strength-reduce?  If I
>> used that, with -O6, would I notice any benefits, or does using
>> -fno-strength-reduce just about take out any benefits to -O6?

>Your Mileage May Vary.

>In the testing I was doing last month on packet send and forwarding
>rates, I found that using any sort of optimization beyond the default
>`-O' resulted in a repeatable 1-3% /decrease/ in maximum packet rate.
>The same was true of `-m486'.  I'll be interested to see the results
>if the Pentium enhancements are ever integrated into the main gcc
>line.

>I suspect there may be some funny cache effects going on, but it's
>next to impossible to tell.

The 1%-3% decrease for -O2 is quite likely to be due to bogus strength
reductions.  Strength reduction usually requires more pseudo-registers.
Sometimes (more often on i*86's because there aren't enough real
registers to begin with) the register allocator can't cope.  And
because the i*86 has a fancy scaled index address mode, one common
strength reduction (for array indices) would at best decrease the
speed by a tiny amount.

The decrease for -m486 is probably due to cache effects.  About half of
the "optimizations" for -m486 are for alignment in the text section.
Such alignment mainly wastes the cache for Pentiums.  (In the worst
case (usually not in loops) it can introduce up to 15 nop's that get
executed.)  gcc-2.7 has -malign-* options that allow you to control
the alignment.

I would be interested in seeing the results if i*86 enhancements are
ever written for the main gcc line :-).

Bruce