Skip site navigation (1)Skip section navigation (2)
Date:      2 Dec 1999 15:00:47 -0000
From:      Ville-Pertti Keinonen <will@iki.fi>
To:        bde@zeta.org.au
Cc:        current@freebsd.org, marcel@scc.nl, dillon@apollo.backplane.com
Subject:   Re: kernel: -mpreferred-stack-boundary=2 ??
Message-ID:  <19991202150047.5370.qmail@ns.demophon.com>
In-Reply-To: <Pine.BSF.4.10.9912030012320.3368-100000@alphplex.bde.org> (message from Bruce Evans on Fri, 3 Dec 1999 00:26:57 %2B1100 (EST))

next in thread | previous in thread | raw e-mail | index | archive | help

> > Note that double-alignment vs. word-alignment can really have >30%
> > performance impact, at least on an Athlon and one meaningless floating
> > point microbenchmark (operations on small, fixed-sized
> > matrices...maybe it isn't even *that* meaningless).
> 
> I verified that the default alignment is a pessimisation of 1% for a
> meaningful benchmark (compiling a RELENG_3 kernel) on a Celeron:

If you haven't patched crt1.c (and assuming random-length command
lines) on average half of the command invocations produce
double-misaligned and 75% 16-byte-misaligned stacks.

The effect might not be significant when there isn't a lot of floating
point code involved, it may have inadvertent side-effects on the
cache-line locality of local variables.

If the pessimization persists when the initial alignment is fixed,
then there's a trade-off between a small pessimization for typical
code and a big pessimization for less common (but more often
performance-critical) code.

> gcc (current) compiled with gcc:
>        29.00 real        11.42 user         2.28 sys
>       158.47 real       146.48 user        10.35 sys
>        13.58 real        11.31 user         2.20 sys
>       157.99 real       146.16 user        11.19 sys

The first run should be ignored because you don't have predictable
cache contents at that point (I assume that's the explanation for the
above), or start each sequence of timed runs in a predictable state
(e.g. fresh boot).

The initial overhead doesn't affect your conclusions, though, since
the further runs are consistent.

> The times are for `time make depend; time make' after `make clean; sync;
> sleep 1' (2 times for each run).  The stack may have been perfectly
> misaligned for the default gcc.

It depends on the command line.  It took me a while to figure out what
was going on the first time I benchmarked a program that ran much
faster when run under one name compared to when it was run using
another name...  ;--)

> Corresponding times for egcs with various PQ_L2_SIZE's a few months ago:

Properly benchmarking page coloring can't be done in a straightforward
manner, since one of the desired effects is to prevent the page
selection behavior from degrading over time.  Did you set up the
machine specially for those runs?


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?19991202150047.5370.qmail>