Date: Sat, 12 Mar 2011 07:43:08 -0500 From: Mehmet Erol Sanliturk <m.e.sanliturk@gmail.com> To: Martin Matuska <mm@freebsd.org> Cc: Poul-Henning Kamp <phk@phk.freebsd.dk>, freebsd-current@freebsd.org, freebsd-performance@freebsd.org Subject: Re: FreeBSD Compiler Benchmark: gcc-base vs. gcc-ports vs. clang Message-ID: <AANLkTi=opRnJz1xouXy_24iAVsS=emnfXWV5kxvOy_Hc@mail.gmail.com> In-Reply-To: <4D7B44AF.7040406@FreeBSD.org> References: <98496.1299861978@critter.freebsd.dk> <4D7B44AF.7040406@FreeBSD.org>
next in thread | previous in thread | raw e-mail | index | archive | help
2011/3/12 Martin Matuska <mm@freebsd.org> > Hi Poul-Henning, > > I have redone the test for majority of the processors, this time taking > 5 samples of each whole testrun, calculating the average, standard > deviation, relative standard deviation, standard error and relative > standard error. > > The relative standard error is below 0.25% for ~91%, between 0.25% and > 0.5% for ~7%, 0.5%-1.0% for ~1% and between 1.0%-2.0% for <1% of the > tests. ... > Under a "test" I mean 5 runs for the same setting of the same > compiler on the same processor. > > ... To have VALID test results , it is NECESSARY to obtain the results by using DIFFERENT computers . ( This point is NOT mentioned in your message . I am assuming that the SAME computer is used to get the results . ) If you repeat the same computations on the SAME computer , the values are CORRELATED , and the t test is NOT valid , because you are computing mean and standard deviation of CORRELATED values , where the correlation is introduced by the SAME processor . To obtain a proper test values set , you may use the following set up : ( CLang and GCC versions , compilation parameters will be the same in all of the computers ) CLang GCC --------- ------- Computer 1 v(1,1) v(1,2) Computer 2 v(2,1) v(2,2) . . . Computer n v(n,1) v(n,2) If you do NOT have so many computers , you may obtain test results from other reliable sources by using the same compilation parameters . Now it is possible to use t-test on PAIRED values . To determine the sample size , it is necessary to make power computations BEFORE execution of experiment by specifying required values a priori . If you want to compare ( Clang Version x ) ... ( Clang Version y ) ( GCC Version x ) ... ( GCC version y ) ... etc. as MORE than TWO compilers at the same time , it is necessary to use MULTIPLE COMPARISONS . Using two-by-two t-tests as isolated from the rest of the results ( variables as compilers ) will give distorted results unless differences are significant at the 0.001 level ( where actual significance level will be greater than 0.001 , but very likely that less than 0.05 ) . Such computations ( paired t-test , power , multiple comparisons and others ) are available in R statistical package which is in the Ports . It is my opinion that using different processor models with approximate speeds will not distort results very much . Personally I prefer such a different processors set up . In this set up it will be possible to test performance of the compilers on a mixture of processors ( likely as independent from processor model ) . Thank you very much . Mehmet Erol Sanliturk
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?AANLkTi=opRnJz1xouXy_24iAVsS=emnfXWV5kxvOy_Hc>