Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 12 Mar 2011 11:02:23 +0100
From:      Martin Matuska <mm@FreeBSD.org>
To:        Poul-Henning Kamp <phk@phk.freebsd.dk>
Cc:        freebsd-performance@FreeBSD.org, freebsd-current@FreeBSD.org
Subject:   Re: FreeBSD Compiler Benchmark: gcc-base vs. gcc-ports vs. clang
Message-ID:  <4D7B44AF.7040406@FreeBSD.org>
In-Reply-To: <98496.1299861978@critter.freebsd.dk>
References:  <98496.1299861978@critter.freebsd.dk>

next in thread | previous in thread | raw e-mail | index | archive | help
Hi Poul-Henning,

I have redone the test for majority of the processors, this time taking
5 samples of each whole testrun, calculating the average, standard
deviation, relative standard deviation, standard error and relative
standard error.

The relative standard error is below 0.25% for ~91%, between 0.25% and
0.5% for ~7%, 0.5%-1.0% for ~1% and between 1.0%-2.0% for <1% of the
tests. Under a "test" I mean 5 runs for the same setting of the same
compiler on the same preocessor.

So let's say I have now the string/base64 test for a core i7 showing the
following (score +/- standard deviation):
gcc421: 82.7892 points +/- 0.8314 (1%)
gcc45-nocona: 96.0882 points +/- 1.1652 (1.21%)

For a relative comparsion of two settings of the same test I could
calculate the difference of averages = 13.299 (16.06%) points and sum of
standard deviations = 2.4834 points (3.00%)

Therefore if assuming normal distribution intervals I could say that:
With a 95% probability gcc45-nocona is faster than gcc421 by at least
10.18% (16.06 - 1.96x3.00) or with a 99.9% probability by at least 6.12%
(16,06 - 3.2906x3.00).

So I should probably take a significance level (e.g. 95%, 99% or 99.9%)
and normalize all the test scores for this level. Results out of the
interval (difference is below zero) are then not significant.

What significance level should I take?

I hope this approach is better :)

Dňa 11.03.2011 17:46, Poul-Henning Kamp  wrote / napísal(a):
> In message <4D7A42CC.8020807@FreeBSD.org>, Martin Matuska writes:
> 
>> But what I can say, e.g. for the Intel Atom processor, if there are
>> performance gains in all but one test (that falls 2% behind), generic
>> perl code (the routines benchmarked) on this processor is very likely to
>> run faster with that setup.
> 
> No, actually you cannot say that, unless you run all the tests at
> least three times for each compiler(+flag), calculate the average
> and standard deviation of all the tests, and see which, if any of
> the results are statistically significant.
> 
> Until you do that, you numbers are meaningless, because we have no
> idea what the signal/noise ratio is.
> 



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4D7B44AF.7040406>