Date: 2 Dec 1999 15:00:47 -0000 From: Ville-Pertti Keinonen <will@iki.fi> To: bde@zeta.org.au Cc: current@freebsd.org, marcel@scc.nl, dillon@apollo.backplane.com Subject: Re: kernel: -mpreferred-stack-boundary=2 ?? Message-ID: <19991202150047.5370.qmail@ns.demophon.com> In-Reply-To: <Pine.BSF.4.10.9912030012320.3368-100000@alphplex.bde.org> (message from Bruce Evans on Fri, 3 Dec 1999 00:26:57 %2B1100 (EST))
next in thread | previous in thread | raw e-mail | index | archive | help
> > Note that double-alignment vs. word-alignment can really have >30% > > performance impact, at least on an Athlon and one meaningless floating > > point microbenchmark (operations on small, fixed-sized > > matrices...maybe it isn't even *that* meaningless). > > I verified that the default alignment is a pessimisation of 1% for a > meaningful benchmark (compiling a RELENG_3 kernel) on a Celeron: If you haven't patched crt1.c (and assuming random-length command lines) on average half of the command invocations produce double-misaligned and 75% 16-byte-misaligned stacks. The effect might not be significant when there isn't a lot of floating point code involved, it may have inadvertent side-effects on the cache-line locality of local variables. If the pessimization persists when the initial alignment is fixed, then there's a trade-off between a small pessimization for typical code and a big pessimization for less common (but more often performance-critical) code. > gcc (current) compiled with gcc: > 29.00 real 11.42 user 2.28 sys > 158.47 real 146.48 user 10.35 sys > 13.58 real 11.31 user 2.20 sys > 157.99 real 146.16 user 11.19 sys The first run should be ignored because you don't have predictable cache contents at that point (I assume that's the explanation for the above), or start each sequence of timed runs in a predictable state (e.g. fresh boot). The initial overhead doesn't affect your conclusions, though, since the further runs are consistent. > The times are for `time make depend; time make' after `make clean; sync; > sleep 1' (2 times for each run). The stack may have been perfectly > misaligned for the default gcc. It depends on the command line. It took me a while to figure out what was going on the first time I benchmarked a program that ran much faster when run under one name compared to when it was run using another name... ;--) > Corresponding times for egcs with various PQ_L2_SIZE's a few months ago: Properly benchmarking page coloring can't be done in a straightforward manner, since one of the desired effects is to prevent the page selection behavior from degrading over time. Did you set up the machine specially for those runs? To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-current" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?19991202150047.5370.qmail>