From owner-freebsd-performance@FreeBSD.ORG  Sat Mar 12 12:43:09 2011
Return-Path: <owner-freebsd-performance@FreeBSD.ORG>
Delivered-To: freebsd-performance@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id ED7A9106564A;
	Sat, 12 Mar 2011 12:43:09 +0000 (UTC)
	(envelope-from m.e.sanliturk@gmail.com)
Received: from mail-vx0-f182.google.com (mail-vx0-f182.google.com
	[209.85.220.182])
	by mx1.freebsd.org (Postfix) with ESMTP id 747FE8FC13;
	Sat, 12 Mar 2011 12:43:09 +0000 (UTC)
Received: by vxc34 with SMTP id 34so3748120vxc.13
	for <multiple recipients>; Sat, 12 Mar 2011 04:43:08 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=domainkey-signature:mime-version:in-reply-to:references:date
	:message-id:subject:from:to:cc:content-type;
	bh=/MMRaNQwdG3xZ26aqPd4D0HaKhVa1hKsf3U9Kqe+hqM=;
	b=SMcZLUPgR2tB6ryWpyTb656USnujpID2G6Ojt6mZ8Vo6HTulEoY3qacAL6U8HFnIwR
	OTOZIao5NfkrJ8rZljVoH3847r1mahvgaf8EU/XFmQaq17EoV0CfT9C1Vw5iYWi9NqZy
	WzaT/iIgfO3lJ8uRmgISp015yPF/W0ihs5fN8=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
	h=mime-version:in-reply-to:references:date:message-id:subject:from:to
	:cc:content-type;
	b=x37xyCRFjrlqhlwQELV0JXlCHHMOEXdZU4J4IN/IIUrPzmA1Oao/sWSBRBMSROsGEW
	oixWCmMDtGKLgJct6di+G1Uv43/VLeZfjA89iIN+Z+xS/ASPN0rhx30t4JMzDcaF0KzK
	X21j9e8IUSFLVaWS2pIkhuA33bQMg2TbvsxoA=
MIME-Version: 1.0
Received: by 10.52.161.197 with SMTP id xu5mr3377137vdb.46.1299933788595; Sat,
	12 Mar 2011 04:43:08 -0800 (PST)
Received: by 10.52.169.165 with HTTP; Sat, 12 Mar 2011 04:43:08 -0800 (PST)
In-Reply-To: <4D7B44AF.7040406@FreeBSD.org>
References: <98496.1299861978@critter.freebsd.dk>
	<4D7B44AF.7040406@FreeBSD.org>
Date: Sat, 12 Mar 2011 07:43:08 -0500
Message-ID: <AANLkTi=opRnJz1xouXy_24iAVsS=emnfXWV5kxvOy_Hc@mail.gmail.com>
From: Mehmet Erol Sanliturk <m.e.sanliturk@gmail.com>
To: Martin Matuska <mm@freebsd.org>
Content-Type: text/plain; charset=UTF-8
X-Content-Filtered-By: Mailman/MimeDel 2.1.5
Cc: Poul-Henning Kamp <phk@phk.freebsd.dk>, freebsd-current@freebsd.org,
	freebsd-performance@freebsd.org
Subject: Re: FreeBSD Compiler Benchmark: gcc-base vs. gcc-ports vs. clang
X-BeenThere: freebsd-performance@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Performance/tuning <freebsd-performance.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-performance>
List-Post: <mailto:freebsd-performance@freebsd.org>
List-Help: <mailto:freebsd-performance-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 12 Mar 2011 12:43:10 -0000

2011/3/12 Martin Matuska <mm@freebsd.org>

> Hi Poul-Henning,
>
> I have redone the test for majority of the processors, this time taking
> 5 samples of each whole testrun, calculating the average, standard
> deviation, relative standard deviation, standard error and relative
> standard error.
>
> The relative standard error is below 0.25% for ~91%, between 0.25% and
> 0.5% for ~7%, 0.5%-1.0% for ~1% and between 1.0%-2.0% for <1% of the
> tests.



...


> Under a "test" I mean 5 runs for the same setting of the same
> compiler on the same processor.
>
>
...


To have VALID test results , it is NECESSARY to obtain the results by using
DIFFERENT computers .
( This point is NOT mentioned in your message . I am assuming that the SAME
computer is used to get the results . )

If you repeat the same computations on the SAME computer , the values are
CORRELATED , and the t test
is NOT valid , because you are computing mean and standard deviation of
CORRELATED values , where the correlation is introduced by the SAME
processor .

To obtain a proper test values set , you may use the following set up :
( CLang and GCC versions , compilation parameters will be the same in all of
the computers )

                     CLang    GCC
                     ---------    -------
Computer 1      v(1,1)    v(1,2)
Computer 2      v(2,1)    v(2,2)
.
.
.
Computer n      v(n,1)    v(n,2)

If you do NOT have so many computers , you may obtain test results from
other reliable sources by using the same compilation parameters .

Now it is possible to use t-test on PAIRED values .

To determine the sample size , it is necessary to make power computations
BEFORE execution of experiment   by specifying required values a priori .


If you want to compare ( Clang Version x ) ... ( Clang Version y ) ( GCC
Version x ) ... ( GCC version y ) ... etc.
as MORE than TWO compilers at the same time , it is necessary to use
MULTIPLE COMPARISONS .
Using two-by-two t-tests as isolated from the rest of the results (
variables as compilers ) will give distorted results unless differences are
significant at the 0.001 level ( where actual significance level will be
greater than 0.001 , but very likely that less than 0.05 ) .

Such computations ( paired t-test , power , multiple comparisons and others
) are available in R statistical package which is in the Ports .

It is my opinion that using different processor models with approximate
speeds will not distort results very much . Personally I prefer such a
different processors set up . In this set up it will be possible to test
performance of the compilers on a mixture of processors ( likely as
independent from processor model ) .


Thank you very much .


Mehmet Erol Sanliturk