From owner-freebsd-current@FreeBSD.ORG Wed Mar 16 06:00:47 2011 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A2051106564A for ; Wed, 16 Mar 2011 06:00:47 +0000 (UTC) (envelope-from m.e.sanliturk@gmail.com) Received: from mail-vx0-f182.google.com (mail-vx0-f182.google.com [209.85.220.182]) by mx1.freebsd.org (Postfix) with ESMTP id 54D3B8FC12 for ; Wed, 16 Mar 2011 06:00:47 +0000 (UTC) Received: by vxc34 with SMTP id 34so1519472vxc.13 for ; Tue, 15 Mar 2011 23:00:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:date:message-id:subject:from:to :content-type; bh=o2+PerhMj4tvOYcZBeI6VfQieIJ+7jHA7vDDUNSFsI8=; b=I/uGJz0HUZfC6XMf4qnDwTTZR+uxDVG51WuaHp2gjF+imWlA9gInfZrurACqvTCa4b gX+o8zsnD6FKHjjJjcunn/3aV1rTE29nRqTaxxvZ3qoFPcOaZ3yw3XZBwRB4srTWcbaN qGzZHH2c12XQ8wwgABsoHqFxTwDrk+GsZvY8I= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:date:message-id:subject:from:to:content-type; b=Pp0Bm1kfjAhorFq/GNkUeI7HlZ83ueoaJyiW4vBFe3Hj/yYXUAJgh/TJvLNp+iRz9V l0urMxxYkNuR9K4DlKuEKnskD14sC1yr+BNpdEam5jIVnt9Zimuw4Ml+OYY4U1w6BbEf JK6vzRiujYdbcaVFskD1yFvJf3C622Dh8IB7s= MIME-Version: 1.0 Received: by 10.52.94.167 with SMTP id dd7mr420436vdb.206.1300255246385; Tue, 15 Mar 2011 23:00:46 -0700 (PDT) Received: by 10.52.169.165 with HTTP; Tue, 15 Mar 2011 23:00:46 -0700 (PDT) Date: Wed, 16 Mar 2011 02:00:46 -0400 Message-ID: From: Mehmet Erol Sanliturk To: freebsd-current Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Subject: Comparison of quality of generated code by the compilers X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 16 Mar 2011 06:00:47 -0000 One important attribute of compilers is the quality of the generated code . To assess the difference between the quality of the generated codes of the compilers an experimental design may be used . Assume the following design is used . Select n distinct ( large as much as possible ) programs in such a way that any source file in a program does not appear in another program ( except compiler libraries ) to prevent correlation between programs where programs should be independent from each other . If sample size is not computed from power of the tests formulas , select a sample size at least greater than 15 . A sample size greater than 60 is extremely valuable . Only two compilers are compared . All of the programs are compilable by the compilers . Execute programs and record their success or failure in the following structure : Program CLang GCC ------------ ---------- --------- 1 0 or 1 0 or 1 2 0 or 1 0 or 1 . . . n 0 or 1 0 or 1 where 0 is success ( only correct results without a crash ) 1 is failure ( crash or incorrect results ) . When there are failures , generate a cross tabulation of the above table : GCC GCC -------------------------------------------- Success ( 0 ) Failure ( 1 ) | ----------------------------|------------------- CLang Success | count of ( 0 , 0 ) | count of ( 0 , 1 ) | pairs | pairs | ----------------------------|------------------- CLang Failure | count of ( 1 , 0 ) | count of ( 1 , 1 ) | pairs | pairs | -----------------------------|-------------------- One of the following tests with respect to table structure ( especially number of programs ) may be applied . http://en.wikipedia.org/wiki/Barnard%27s_exact_test ( Barnard's test ) http://en.wikipedia.org/wiki/Fisher%27s_exact_test ( Fisher's exact test ) http://en.wikipedia.org/wiki/Chi-square_test ( Chi-square test ) http://en.wikipedia.org/wiki/Pearson%27s_chi-square_test ( Pearson's chi-square test ) If the difference ( the contingency coefficient ) is significant , one compiler is best ( small number of failures ), the other is worst ( large number of failures ) . ---------------------------------------------------------- Assume there is no any failure , and execution times are available . Program CLang GCC ------------ ---------- --------- 1 t t 2 t t . . . n t t where t is the execution time of the program . Apply paired t test . If the paired differences are significant , one compiler is best ( short execution time , small mean ) , the other is worst ( long execution time , large mean ) . --------------------------------------------------------- The above paired t test may be used for the generated program sizes . If the paired differences are significant , one compiler is best ( small program size , small mean ) , the other is worst ( large program size , large mean ) . Thank you very much . Mehmet Erol Sanliturk