From owner-freebsd-current@FreeBSD.ORG  Sat Mar 12 11:11:27 2011
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
Delivered-To: freebsd-current@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 760951065673
	for <freebsd-current@freebsd.org>; Sat, 12 Mar 2011 11:11:27 +0000 (UTC)
	(envelope-from m.e.sanliturk@gmail.com)
Received: from mail-vx0-f182.google.com (mail-vx0-f182.google.com
	[209.85.220.182])
	by mx1.freebsd.org (Postfix) with ESMTP id EB3F58FC12
	for <freebsd-current@freebsd.org>; Sat, 12 Mar 2011 11:11:26 +0000 (UTC)
Received: by vxc34 with SMTP id 34so3717624vxc.13
	for <multiple recipients>; Sat, 12 Mar 2011 03:11:26 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=domainkey-signature:mime-version:in-reply-to:references:date
	:message-id:subject:from:to:cc:content-type;
	bh=sC0AZo7A2J6ZpYhc16v8EiphF9GtDTBql8Di0rSF4PE=;
	b=xa1NMKosEWFBylhAi14r33iUrPJkNdfBN7BylrB6+JWo5D9LPgNPIpYOdClLqw2Qxn
	hC6wSTAUdK5xeirUVApg2+tNSFochhxuRYI67vVUlCVCenNY86Cz80EzLJSPiZIRX6xb
	B+WwBRHY1Ib7waozefb3RHmFIBqAXgNohvpnQ=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
	h=mime-version:in-reply-to:references:date:message-id:subject:from:to
	:cc:content-type;
	b=RGFa1NDA46pqnpR5r9rnrA0AzCfPa+MCqmOjdWvnq92QZ9Jy1vdI1H8BoQkoHCp9yP
	0g1kpvJLuyR62g8S8wUtfW4/i/mvczokWCtPhWwHZVFvYLe5khLF7DaEQHCJkP9Xw0V1
	8BqFR567Xn3RDva0JYyzLvsze7pDhvOvBPxfU=
MIME-Version: 1.0
Received: by 10.52.167.230 with SMTP id zr6mr15327704vdb.6.1299928286102; Sat,
	12 Mar 2011 03:11:26 -0800 (PST)
Received: by 10.52.169.165 with HTTP; Sat, 12 Mar 2011 03:11:26 -0800 (PST)
In-Reply-To: <4D7B44AF.7040406@FreeBSD.org>
References: <98496.1299861978@critter.freebsd.dk>
	<4D7B44AF.7040406@FreeBSD.org>
Date: Sat, 12 Mar 2011 06:11:26 -0500
Message-ID: <AANLkTi=AmrB_LYwbzaXEM1HFJM362WgzmA5KD0Exxwzy@mail.gmail.com>
From: Mehmet Erol Sanliturk <m.e.sanliturk@gmail.com>
To: Martin Matuska <mm@freebsd.org>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
X-Content-Filtered-By: Mailman/MimeDel 2.1.5
Cc: Poul-Henning Kamp <phk@phk.freebsd.dk>, freebsd-current@freebsd.org,
	freebsd-performance@freebsd.org
Subject: Re: FreeBSD Compiler Benchmark: gcc-base vs. gcc-ports vs. clang
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
	<freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>, 
	<mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 12 Mar 2011 11:11:27 -0000

2011/3/12 Martin Matuska <mm@freebsd.org>

> Hi Poul-Henning,
>
> I have redone the test for majority of the processors, this time taking
> 5 samples of each whole testrun, calculating the average, standard
> deviation, relative standard deviation, standard error and relative
> standard error.
>
> The relative standard error is below 0.25% for ~91%, between 0.25% and
> 0.5% for ~7%, 0.5%-1.0% for ~1% and between 1.0%-2.0% for <1% of the
> tests. Under a "test" I mean 5 runs for the same setting of the same
> compiler on the same preocessor.
>
> So let's say I have now the string/base64 test for a core i7 showing the
> following (score +/- standard deviation):
> gcc421: 82.7892 points +/- 0.8314 (1%)
> gcc45-nocona: 96.0882 points +/- 1.1652 (1.21%)
>
> For a relative comparsion of two settings of the same test I could
> calculate the difference of averages =3D 13.299 (16.06%) points and sum o=
f
> standard deviations =3D 2.4834 points (3.00%)
>
> Therefore if assuming normal distribution intervals I could say that:
> With a 95% probability gcc45-nocona is faster than gcc421 by at least
> 10.18% (16.06 - 1.96x3.00) or with a 99.9% probability by at least 6.12%
> (16,06 - 3.2906x3.00).
>
> So I should probably take a significance level (e.g. 95%, 99% or 99.9%)
> and normalize all the test scores for this level. Results out of the
> interval (difference is below zero) are then not significant.
>
> What significance level should I take?
>
> I hope this approach is better :)
>
> D=C5=88a 11.03.2011 17:46, Poul-Henning Kamp  wrote / nap=C3=ADsal(a):
> > In message <4D7A42CC.8020807@FreeBSD.org>, Martin Matuska writes:
> >
> >> But what I can say, e.g. for the Intel Atom processor, if there are
> >> performance gains in all but one test (that falls 2% behind), generic
> >> perl code (the routines benchmarked) on this processor is very likely =
to
> >> run faster with that setup.
> >
> > No, actually you cannot say that, unless you run all the tests at
> > least three times for each compiler(+flag), calculate the average
> > and standard deviation of all the tests, and see which, if any of
> > the results are statistically significant.
> >
> > Until you do that, you numbers are meaningless, because we have no
> > idea what the signal/noise ratio is.
> >
>
>

Additionally to possible answer by Poul-Henning Kamp , you may consider the
following pages because strength ( sensitivity ) of hypothesis tests are
determined by statistical power computations :

http://en.wikipedia.org/wiki/Statistical_power


http://en.wikipedia.org/wiki/Statistical_hypothesis_testing
http://en.wikipedia.org/wiki/Category:Hypothesis_testing

http://en.wikipedia.org/wiki/Category:Statistical_terminology


Thank you very much .

Mehmet Erol Sanliturk