From owner-freebsd-current@FreeBSD.ORG Fri Mar 11 15:42:09 2011 Return-Path: Delivered-To: freebsd-current@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E85BA106566B; Fri, 11 Mar 2011 15:42:08 +0000 (UTC) (envelope-from mm@FreeBSD.org) Received: from mail.vx.sk (mail.vx.sk [IPv6:2a01:4f8:100:1043::3]) by mx1.freebsd.org (Postfix) with ESMTP id 7D95A8FC18; Fri, 11 Mar 2011 15:42:08 +0000 (UTC) Received: from core.vx.sk (localhost [127.0.0.1]) by mail.vx.sk (Postfix) with ESMTP id BE58D1410E5; Fri, 11 Mar 2011 16:42:06 +0100 (CET) X-Virus-Scanned: amavisd-new at mail.vx.sk Received: from mail.vx.sk ([127.0.0.1]) by core.vx.sk (mail.vx.sk [127.0.0.1]) (amavisd-new, port 10024) with LMTP id Zm4m6Sdvw8Aa; Fri, 11 Mar 2011 16:42:04 +0100 (CET) Received: from [192.168.1.103] (chello089173152121.chello.sk [89.173.152.121]) by mail.vx.sk (Postfix) with ESMTPSA id 8B7CA1410D5; Fri, 11 Mar 2011 16:42:04 +0100 (CET) Message-ID: <4D7A42CC.8020807@FreeBSD.org> Date: Fri, 11 Mar 2011 16:42:04 +0100 From: Martin Matuska User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.13) Gecko/20101208 Thunderbird/3.1.7 MIME-Version: 1.0 To: Poul-Henning Kamp References: <90325.1299852096@critter.freebsd.dk> In-Reply-To: <90325.1299852096@critter.freebsd.dk> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Cc: freebsd-performance@FreeBSD.org, freebsd-current@FreeBSD.org Subject: Re: FreeBSD Compiler Benchmark: gcc-base vs. gcc-ports vs. clang X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 11 Mar 2011 15:42:09 -0000 I don't take this personally and fully understand your point. But even if all conditions you described are met, I am still not able to say "this is better" as I am not doing a microbenchmark. The +x% score is just an average of all test scores weightened by factor 1 - this does not reflect any real application out there, as these applications don't use the tested functions in that exact weighting ratio. If one function had score 0%, the program actually would be stale forever when executing this function but the score of this average would still look promising :-) But what I can say, e.g. for the Intel Atom processor, if there are performance gains in all but one test (that falls 2% behind), generic perl code (the routines benchmarked) on this processor is very likely to run faster with that setup. On the other hand, if clang generated code falls short in all tests, I can say it is very likely that it will run slower. But again, I am benchmarking just a subset of generic perl functions. Cheers, mm Dňa 11.03.2011 15:01, Poul-Henning Kamp wrote / napísal(a): > In message <4D7943B1.1030604@FreeBSD.org>, Martin Matuska writes: > >> More information, detailed test results and test configuration are at >> our blog: >> http://blog.vx.sk/archives/25-FreeBSD-Compiler-Benchmark-gcc-base-vs-gcc-ports-vs-clang.html > Please don't take this personally Martin, but you have triggered > my periodic rant about proper running, evaluation and reporting of > benchmarks. > > These results are not published at a level of detail that allows > anybody to draw any kind of conclusions from them. > > In particular, your use of "overall best" result selection is totally > bogus from a statistical point of view. > > At the very least, we need to see standard-deviations on your numbers, > and preferably, when you claim that "X is N% better than Y", you should > also provide the confidence interval on that judgment, "Student's T" > being the canonical test. > > The ministat(1) program does both of these things, and is now in > FreeBSD/src, so there is absolutely no excuse for not using it. > > In practice this means that you have to run each test at least three > times, to get a standardeviation, and you have to make sure that > your testconditions are as identical as possible. > > Therefore, proper benchmarking procedure is something like: > > (boot machine single-user // Improves reproducibility) > (mount md(4)/malloc filesystem // ditto) > (newfs test-partition // ditto) > for at least 4 iterations: > run test A > run test B > run test C > ... > Throw first result away for all tests > Run remaining results through ministat(1) > > This was a public service announcement. > > Poul-Henning > > PS: Recommended reading: http://www.larrygonick.com/html/pub/books/sci7.html >