From owner-freebsd-toolchain@FreeBSD.ORG Fri Sep 21 21:39:41 2012 Return-Path: Delivered-To: freebsd-toolchain@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 75A11106566B; Fri, 21 Sep 2012 21:39:41 +0000 (UTC) (envelope-from dimitry@andric.com) Received: from tensor.andric.com (cl-327.ede-01.nl.sixxs.net [IPv6:2001:7b8:2ff:146::2]) by mx1.freebsd.org (Postfix) with ESMTP id 9A6D38FC0C; Fri, 21 Sep 2012 21:39:40 +0000 (UTC) Received: from [192.168.0.6] (spaceball.home.andric.com [192.168.0.6]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (No client certificate requested) by tensor.andric.com (Postfix) with ESMTPSA id 4ED435C59; Fri, 21 Sep 2012 23:39:38 +0200 (CEST) Message-ID: <505CDE9C.3060504@andric.com> Date: Fri, 21 Sep 2012 23:39:40 +0200 From: Dimitry Andric User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:16.0) Gecko/20120905 Thunderbird/16.0 MIME-Version: 1.0 To: freebsd-current@FreeBSD.org, freebsd-toolchain@FreeBSD.org Content-Type: multipart/mixed; boundary="------------070108000503090709030807" Cc: Subject: More kernel performance tests on FreeBSD 10.0-CURRENT X-BeenThere: freebsd-toolchain@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Maintenance of FreeBSD's integrated toolchain List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 21 Sep 2012 21:39:41 -0000 This is a multi-part message in MIME format. --------------070108000503090709030807 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Hi all, As a followup to my previous post about the performance of FreeBSD 10.0 kernels compiled with different compilers (clang and gcc), I did another series of tests, now on a more modern machine (Core i5-based). I also tested the performance with different compiler optimization settings. The attached text file[1] contains more information about these tests, performance data, and my conclusions. Any errors and omissions are also my fault, so if you notice them, please let me know. The executive summary: GENERIC kernels compiled with clang 3.2 are again a little faster than those compiled with gcc 4.2.1. For gcc, compiling with -O2 also gives a slightly faster kernel than with -O1, but for clang there is no measurable difference between those flags. Again, many thanks to Gavin Atkinson for providing the required hardware. -Dimitry [1]: Also available at: --------------070108000503090709030807 Content-Type: text/plain; charset=windows-1252; name="perftest-kernel-2012-09-21a.txt" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="perftest-kernel-2012-09-21a.txt" KERNEL PERFORMANCE TESTS ON FREEBSD 10.0-CURRENT, SEPTEMBER 2012, PART 2 ======================================================================== INTRODUCTION ------------ These tests aim to give an indication of the runtime performance of FreeBSD kernels compiled with different compilers, at various optimization levels. The compilers tested were: - gcc 4.2.1, the system compiler in FreeBSD. - clang 3.2 (trunk 162107), which is the default version of clang in FreeBSD 10.0-CURRENT, after r239462. All tests were run on a machine gracefully provided by Gavin Atkinson, which is based on an Intel DQ57TM desktop board, with a quad-core 3.20 GHz Intel Core i5 CPU (id=0x20652), and 4 GB RAM. It runs FreeBSD/amd64 10.0-CURRENT as of Tue Sep 11 19:11:00 UTC 2012. An excerpt of dmesg follows: CPU: Intel(R) Core(TM) i5 CPU 650 @ 3.20GHz (3192.08-MHz K8-class CPU) Origin = "GenuineIntel" Id = 0x20652 Family = 6 Model = 25 Stepping = 2 Features=0xbfebfbff Features2=0x298e3ff AMD Features=0x28100800 AMD Features2=0x1 TSC: P-state invariant, performance statistics real memory = 4294967296 (4096 MB) avail memory = 3882647552 (3702 MB) Event timer "LAPIC" quality 600 ACPI APIC Table: FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs FreeBSD/SMP: 1 package(s) x 2 core(s) x 2 SMT threads cpu0 (BSP): APIC ID: 0 cpu1 (AP): APIC ID: 1 cpu2 (AP): APIC ID: 4 cpu3 (AP): APIC ID: 5 With each compiler, stock GENERIC kernels for amd64 were built from head as of r240384, for each of the following optimization flags: -O2 -frename-registers -pipe -fno-strict-aliasing -O1 -pipe -O0 -pipe Note that clang does not support -frename-registers, so it was omitted for the corresponding kernel builds. No CPU-specific optimization flags (-march=) were used. Each kernel was installed into a separate kernel installation directory under /boot. The system was then booted with each of these kernels, without modifying anything else, and multiple runs of "make -j8 buildworld" were done. Between each run, the /usr/obj directory was fully cleaned out, and filesystems were synced. The timing results, processed with ministat(1), are below. Building world, multi-threaded, on a GENERIC kernel compiled by clang 3.2 -O0 ----------------------------------------------------------------------------- N Min Max Median Avg Stddev real 6 6503.62 6527.84 6520.49 6517.2817 8.3845558 user 6 12534.49 12576.55 12555.29 12555.547 14.079771 sys 6 9655.1 9733.92 9716.1 9709.9533 28.981809 maxrss 6 758208 758248 758224 758222.67 13.779213 ixrss 6 4396 4401 4397 4397.1667 1.9407902 idrss 6 523 523 523 523 0 isrss 6 126 126 126 126 0 minflt 6 6.6264519e+08 6.6337812e+08 6.6297908e+08 6.6299306e+08 249092.49 majflt 6 4354 10457 5722 6207.8333 2208.4725 nswap 6 40 56 42 44.333333 6.1210021 inblock 6 25167 44267 29212 31042.667 6677.3727 oublock 6 32801 34666 33500 33635.167 692.27897 msgsnd 6 0 0 0 0 0 msgrcv 6 0 0 0 0 0 nsignals 6 60495 60504 60502 60500 3.5213634 nvcsw 6 1750409 1759010 1754971 1754668.8 3641.3163 nivcsw 6 1867335 1943885 1924258 1909641.2 30495.366 Building world, multi-threaded, on a GENERIC kernel compiled by clang 3.2 -O1 ----------------------------------------------------------------------------- N Min Max Median Avg Stddev real 6 4788.59 4831.96 4798.01 4802.305 15.48322 user 6 12239.94 12285.9 12268.91 12263.5 17.190572 sys 6 4041.05 4100.4 4083.92 4076.235 21.374684 maxrss 6 758212 758256 758256 758242.67 18.532855 ixrss 6 4963 4971 4964 4964.6667 3.1411251 idrss 6 589 590 589 589.16667 0.40824829 isrss 6 132 132 132 132 0 minflt 6 6.617985e+08 6.6339562e+08 6.629315e+08 6.6272587e+08 574835.78 majflt 6 7935 23481 17450 16901.667 5324.564 nswap 6 40 52 48 47.333333 3.9327683 inblock 6 25121 44292 29173 30980.667 6715.0864 oublock 6 24867 28037 26579 26667.167 1162.513 msgsnd 6 0 0 0 0 0 msgrcv 6 0 0 0 0 0 nsignals 6 60492 60500 60498 60496.667 3.4448028 nvcsw 6 1559857 1576788 1562507 1565002.8 6454.8513 nivcsw 6 1632143 1721204 1688209 1682830 35836.46 Building world, multi-threaded, on a GENERIC kernel compiled by clang 3.2 -O2 ----------------------------------------------------------------------------- N Min Max Median Avg Stddev real 6 4780.24 4819.77 4801.98 4798.5867 14.236627 user 6 12242.91 12275.04 12256.37 12255.905 11.676621 sys 6 4052.75 4118.65 4104.76 4096.2217 22.874298 maxrss 6 758220 758256 758256 758244.67 17.603031 ixrss 6 4960 4970 4964 4963.8333 3.4880749 idrss 6 589 590 589 589.16667 0.40824829 isrss 6 132 132 132 132 0 minflt 6 6.6248246e+08 6.6340936e+08 6.6300404e+08 6.6293496e+08 324940.82 majflt 6 4300 22493 14128 12176.833 6396.7734 nswap 6 40 52 48 46 4.8989795 inblock 6 29120 44375 29277 31760 6180.4181 oublock 6 24915 28157 25984 26315.333 1251.164 msgsnd 6 0 0 0 0 0 msgrcv 6 0 0 0 0 0 nsignals 6 60490 60499 60497 60495.667 3.2041639 nvcsw 6 1559291 1575794 1570626 1569117.3 5467.274 nivcsw 6 1593865 1678135 1654604 1640246 31701.067 Building world, multi-threaded, on a GENERIC kernel compiled by gcc 4.2.1 -O0 ----------------------------------------------------------------------------- N Min Max Median Avg Stddev real 6 6083.69 6101.08 6096.85 6094.4383 6.5165003 user 6 12424.93 12462.24 12438.63 12441.97 12.975073 sys 6 8305.66 8394.45 8377.26 8366.4767 32.469675 maxrss 6 758208 758256 758224 758225.33 16.52473 ixrss 6 4481 4491 4484 4484.6667 3.3862467 idrss 6 533 534 533 533.16667 0.40824829 isrss 6 127 127 127 127 0 minflt 6 6.6241224e+08 6.6339646e+08 6.6301629e+08 6.6292507e+08 336924.37 majflt 6 4357 9603 6231 6667.8333 1812.2422 nswap 6 40 48 40 41.666667 3.204164 inblock 6 29162 44302 29272 31759.333 6145.0026 oublock 6 30081 32816 31538 31281.5 1163.8237 msgsnd 6 0 0 0 0 0 msgrcv 6 0 0 0 0 0 nsignals 6 60500 60501 60500 60500.333 0.51639753 nvcsw 6 1701009 1713077 1709140 1707903 3975.4753 nivcsw 6 1854572 1936195 1896858 1894873.2 26725.543 Building world, multi-threaded, on a GENERIC kernel compiled by gcc 4.2.1 -O1 ----------------------------------------------------------------------------- N Min Max Median Avg Stddev real 6 4943.74 4965.28 4955.62 4953.78 7.2888627 user 6 12274.46 12334.13 12322.13 12314.472 21.858036 sys 6 4576.99 4621.09 4617.21 4609.75 16.658918 maxrss 6 758208 758256 758224 758232 19.595918 ixrss 6 4897 4902 4898 4898.6667 1.9663842 idrss 6 581 582 581 581.33333 0.51639778 isrss 6 131 131 131 131 0 minflt 6 6.626435e+08 6.634147e+08 6.6301953e+08 6.629835e+08 279004.88 majflt 6 6092 11215 9188 8755.1667 1849.3565 nswap 6 40 62 48 49.333333 7.1180522 inblock 6 29076 44462 29163 31697 6253.6444 oublock 6 25415 28495 28175 27508.167 1179.5914 msgsnd 6 0 0 0 0 0 msgrcv 6 0 0 0 0 0 nsignals 6 60488 60499 60495 60494.333 3.9832984 nvcsw 6 1575048 1588567 1584504 1582316.7 5705.6913 nivcsw 6 1682902 1745827 1730506 1722802.3 24060.717 Building world, multi-threaded, on a GENERIC kernel compiled by gcc 4.2.1 -O2 ----------------------------------------------------------------------------- N Min Max Median Avg Stddev real 6 4876.16 4901.55 4895.24 4888.7583 10.598318 user 6 12241.35 12306.04 12283.94 12278.767 23.922356 sys 6 4400.43 4452.62 4446.22 4438.0117 19.231095 maxrss 6 758212 758256 758224 758229.33 17.095809 ixrss 6 4899 4905 4900 4900.6667 2.2509257 idrss 6 581 582 582 581.83333 0.40824829 isrss 6 131 131 131 131 0 minflt 6 6.6214332e+08 6.6334997e+08 6.6298766e+08 6.6278723e+08 436172.22 majflt 6 6055 12473 9169 8895.5 2381.6063 nswap 6 40 54 48 48 4.5607017 inblock 6 29193 44443 29313 31804 6192.0071 oublock 6 25113 28152 26770 26490.167 1254.3383 msgsnd 6 0 0 0 0 0 msgrcv 6 0 0 0 0 0 nsignals 6 60496 60501 60499 60498.667 2.2509257 nvcsw 6 1566521 1592140 1579251 1578889.5 9354.883 nivcsw 6 1686675 1809406 1785290 1756283.7 50719.325 Summary: -------- On a kernel compiled with clang 3.2 -O2, building world in multi-threaded mode is ~1.9% faster in real time than on a kernel compiled with gcc 4.2.1 -O2, and ~8.3% faster in system time. On a kernel compiled with clang 3.2 -O1, building world in multi-threaded mode is ~3.2% faster in real time than on a kernel compiled with gcc 4.2.1 -O1, and ~13.1% faster in system time. On a kernel compiled with gcc 4.2.1 -O2, building world in multi-threaded mode is ~1.3% faster in real time than on a kernel compiled with gcc 4.2.1 -O1, and ~3.9% faster in system time. The difference between building world in multi-threaded mode on kernels compiled with clang 3.2 -O2 and -O1 is not significant (to within 1 standard deviation). Conclusion: ----------- Kernels compiled with clang are a little faster in real time for building world, and in system time the difference is even larger, roughly 10%. For clang, the difference between -O1 and -O2 is not measurable, but for gcc, -O2 is slightly faster than -O1. ================================================================================ Copyright (c) 2012 Dimitry Andric Verbatim copying and redistribution of this entire text are permitted, provided this notice is preserved. ================================================================================ --------------070108000503090709030807--