Date: Tue, 04 Sep 2012 22:39:40 +0200 From: Dimitry Andric <dimitry@andric.com> To: freebsd-current@FreeBSD.org Subject: Compiler performance tests on FreeBSD 10.0-CURRENT Message-ID: <5046670C.6050500@andric.com>
next in thread | raw e-mail | index | archive | help
This is a multi-part message in MIME format. --------------010504020809040102000509 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Hi all, I recently performed a series of compiler performance tests on FreeBSD 10.0-CURRENT, particularly comparing gcc 4.2.1 and gcc 4.7.1 against clang 3.1 and clang 3.2. The attached text file[1] contains more information about the tests, some semi-cooked performance data, and my conclusions. Any errors and omissions are also my fault, so if you notice them, please let me know. The executive summary: clang compiles mostly faster than gcc (sometimes much faster), and uses significantly less memory. Finally, please note these tests were purely about compilation speed, not about the performance of the resulting executables. This still needs to be tested. -Dimitry [1]: Also available at: <http://www.andric.com/freebsd/perftest/perftest-2012-09-01a.txt> --------------010504020809040102000509 Content-Type: text/plain; charset=windows-1252; name="perftest-2012-09-01a.txt" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="perftest-2012-09-01a.txt" COMPILER PERFORMANCE TESTS ON FREEBSD 10.0-CURRENT, SEPTEMBER 2012 ================================================================== INTRODUCTION ------------ The compilers tested were: - gcc 4.2.1, the system compiler in FreeBSD, which is compiled by gcc 4.2.1. - gcc 4.7.1, from the official gcc.gnu.org release, compiled via a three-stage bootstrap, so the final compiler has been compiled by gcc 4.7.1. - clang 3.1 (branches/release_31 156863), which is the default version of clang in FreeBSD 10-CURRENT before r239462. The used executable was compiled by a previous copy of itself. - clang 3.2 (trunk 162107), which is the default version of clang in FreeBSD 10.0-CURRENT, after r239462. The used executable was compiled by a previous copy of itself. All tests were run on ref10-amd64.freebsd.org, which is a Dell 2950, 1.86GHz Core2 Xeon, 2x4 Core, 16G RAM. It runs FreeBSD/amd64 10.0-CURRENT #0 r231914: Sun Feb 19 17:24:37 UTC 2012. Each build was repeated 6 times, after cleaning out the object directories, and syncing. Each build was timed using the system time(1) command, using the -l argument to obtain rusage information. The programs tested by compilation were: - A large C++ program: clang 3.2, as it occurs in the FreeBSD 10.0-CURRENT source tree as of r239532. - A medium-large C program: gcc 4.2.1, as it occurs in the FreeBSD 10.0-CURRENT source tree as of r239532. - A large C++ library: boost 1.50.0, the officially released version from <http://www.boost.org/>. Building a large C++ program (clang 3.2) single-threaded ======================================================== Using clang 3.1: ---------------- N Min Max Median Avg Stddev real 6 2283.69 2288.46 2285.74 2285.505 1.6470064 user 6 2145.2 2147.2 2146.18 2146.0567 0.68266146 sys 6 128.3 132.08 130.65 130.54833 1.256653 maxrss 6 179264 179264 179264 179264 0 ixrss 6 21407 21436 21420 21419.833 9.6211572 idrss 6 3628 3632 3630 3629.8333 1.3291601 isrss 6 252 252 252 252 0 minflt 6 12485556 12485556 12485556 12485556 0 majflt 6 0 0 0 0 0 nswap 6 0 0 0 0 0 inblock 6 0 0 0 0 0 oublock 6 2058 2106 2103 2081.3333 25.216397 msgsnd 6 18 18 18 18 0 msgrcv 6 0 0 0 0 0 nsignals 6 1878 1878 1878 1878 0 nvcsw 6 16288 16357 16333 16320.667 29.615311 nivcsw 6 2071535 3998751 3057756 2966314 635381.66 Using clang 3.2: ---------------- N Min Max Median Avg Stddev real 6 2358.61 2362.84 2362.67 2361.22 1.7831321 user 6 2215.33 2221.13 2218.72 2218.57 2.0094278 sys 6 130.78 134.63 133.41 132.99833 1.4702301 maxrss 6 177796 177796 177796 177796 0 ixrss 6 21388 21413 21408 21400.833 11.052903 idrss 6 3702 3707 3706 3704.6667 2.2509257 isrss 6 253 253 253 253 0 minflt 6 12583827 12583827 12583827 12583827 0 majflt 6 0 0 0 0 0 nswap 6 0 0 0 0 0 inblock 6 0 0 0 0 0 oublock 6 2036 2074 2071 2054.8333 19.589963 msgsnd 6 18 26 26 23.333333 4.1311822 msgrcv 6 0 0 0 0 0 nsignals 6 1878 1878 1878 1878 0 nvcsw 6 16266 16391 16354 16327.667 53.909801 nivcsw 6 2118900 3891231 3534528 3168715.7 673236.29 Using gcc 4.2.1: ---------------- N Min Max Median Avg Stddev real 6 4238.49 4241.76 4240.78 4240.1867 1.2375244 user 6 3903.48 3908.6 3907.58 3906.5583 1.8932661 sys 6 358.38 361.43 359.94 359.94667 1.1494984 maxrss 6 568592 568592 568592 568592 0 ixrss 6 6348 6353 6350 6350.3333 1.6329932 idrss 6 3495 3498 3497 3496.5 1.0488088 isrss 6 146 146 146 146 0 minflt 6 47304156 47304184 47304175 47304172 10.545141 majflt 6 0 0 0 0 0 nswap 6 0 0 0 0 0 inblock 6 0 0 0 0 0 oublock 6 2620 2754 2732 2683.6667 68.730391 msgsnd 6 0 0 0 0 0 msgrcv 6 0 0 0 0 0 nsignals 6 1878 1878 1878 1878 0 nvcsw 6 67561 67763 67674 67648 75.198404 nivcsw 6 3994087 5821442 4846679 4810028.2 597366.52 Using gcc 4.7.1: ---------------- N Min Max Median Avg Stddev real 6 3818.41 3974.54 3820.49 3846.7417 62.715466 user 6 3506.86 3591.97 3509.96 3522.8283 33.896088 sys 6 333.58 364.34 338.93 340.70833 11.839839 maxrss 6 480724 480736 480724 480727.33 5.316639 ixrss 6 12173 12198 12194 12188.333 9.4375138 idrss 6 1520 1523 1522 1521.8333 1.1690452 isrss 6 134 134 134 134 0 minflt 6 38406568 38406673 38406592 38406599 38.768544 majflt 6 0 90 0 20.333333 36.45088 nswap 6 0 0 0 0 0 inblock 6 0 4775 0 1233.3333 2028.0327 oublock 6 2266 2301 2286 2284.3333 13.662601 msgsnd 6 30 31 30 30.166667 0.40824829 msgrcv 6 0 0 0 0 0 nsignals 6 1878 1878 1878 1878 0 nvcsw 6 59792 67936 60369 61859.833 3186.3204 nivcsw 6 2867702 4546665 4361653 3753550.8 769382.51 Summary: -------- For building this specific large C++ program, gcc 4.2.1 is ~86% slower than clang 3.1 in real time, ~82% slower in user time, and ~176% slower in system time. The maximum resident set size during building is ~217% larger, and it causes ~279% more page reclaims. Though gcc 4.7.1 is faster than its older version, it is still ~68% slower than clang 3.1 in real time, ~64% slower in user time, and ~161% slower in system time. The maximum resident set size during building is ~220% larger, and it causes ~208% more page reclaims. Finally, clang 3.2 is ~3% slower than clang 3.1 in both real time and user time, and ~2% slower in system time. The maximum resident set size and the number of page reclaims during building are approximately equal. Conclusion: ----------- Clang 3.1 is clearly the fastest compiler for building this specific large C++ program, with clang 3.2 trailing closely behind. Both are significantly faster, and use much less memory than either version of gcc. Building a medium-large C program (gcc 4.2.1) single-threaded ============================================================= Using clang 3.1: ---------------- N Min Max Median Avg Stddev real 6 303.31 304.06 303.65 303.67167 0.24991332 user 6 275.42 277.09 275.99 276.11167 0.57766484 sys 6 24.92 26.15 25.6 25.656667 0.44643775 maxrss 6 177876 177876 177876 177876 0 ixrss 6 20529 20559 20544 20542.833 12.38413 idrss 6 3618 3623 3621 3620.3333 1.9663842 isrss 6 247 247 247 247 0 minflt 6 2214250 2214250 2214250 2214250 0 majflt 6 0 0 0 0 0 nswap 6 0 0 0 0 0 inblock 6 0 0 0 0 0 oublock 6 677 677 677 677 0 msgsnd 6 18 18 18 18 0 msgrcv 6 0 0 0 0 0 nsignals 6 883 883 883 883 0 nvcsw 6 5705 5837 5819 5793.6667 49.33018 nivcsw 6 205418 467152 449398 371699.67 114414.58 Using clang 3.2: ---------------- N Min Max Median Avg Stddev real 6 330.22 331.23 330.95 330.69833 0.43687145 user 6 301.29 302.59 302.3 302.05667 0.49649438 sys 6 26.12 27.19 27.06 26.875 0.39747956 maxrss 6 186260 186260 186260 186260 0 ixrss 6 20639 20674 20660 20656.833 14.469508 idrss 6 3699 3705 3703 3702.3333 2.1602469 isrss 6 316 319 318 317.33333 1.2110601 minflt 6 2290933 2290934 2290934 2290933.7 0.51614557 majflt 6 0 0 0 0 0 nswap 6 0 0 0 0 0 inblock 6 0 0 0 0 0 oublock 6 668 669 668 668.16667 0.40824829 msgsnd 6 18 18 18 18 0 msgrcv 6 0 0 0 0 0 nsignals 6 883 883 883 883 0 nvcsw 6 5783 5822 5801 5799 17.944358 nivcsw 6 111115 520961 396082 316725.33 164041.32 Using gcc 4.2.1: ---------------- N Min Max Median Avg Stddev real 6 422.68 425.44 423.23 423.47333 1.0273396 user 6 389.1 391.67 390.58 390.41333 0.82734918 sys 6 36.85 39.2 38.65 38.23 0.83840324 maxrss 6 392560 392560 392560 392560 0 ixrss 6 5529 5542 5534 5534.5 6.0580525 idrss 6 3915 3924 3919 3919 4.1472883 isrss 6 142 142 142 142 0 minflt 6 4055461 4055464 4055463 4055462.7 1.21063 majflt 6 0 4 0 0.66666667 1.6329932 nswap 6 0 0 0 0 0 inblock 6 0 730 0 121.66667 298.02125 oublock 6 659 693 662 667 12.884099 msgsnd 6 0 0 0 0 0 msgrcv 6 0 0 0 0 0 nsignals 6 883 883 883 883 0 nvcsw 6 15645 16454 15874 15888 298.36019 nivcsw 6 121293 661776 414611 363556.83 207101.28 Using gcc 4.7.1: ---------------- N Min Max Median Avg Stddev real 6 461.58 462.55 462.01 461.98333 0.40287302 user 6 425.22 426.36 425.92 425.835 0.43825791 sys 6 40.83 42.94 41.99 41.925 0.71034499 maxrss 6 445624 445624 445624 445624 0 ixrss 6 10781 10816 10801 10797.5 12.405644 idrss 6 2427 2433 2430 2430 2.1908902 isrss 6 178 178 178 178 0 minflt 6 3883735 3883740 3883739 3883738 2.3664319 majflt 6 0 0 0 0 0 nswap 6 0 0 0 0 0 inblock 6 0 14 0 2.3333333 5.7154761 oublock 6 677 681 679 679 1.4142136 msgsnd 6 20 20 20 20 0 msgrcv 6 0 0 0 0 0 nsignals 6 883 883 883 883 0 nvcsw 6 16411 16660 16532 16542.333 98.544744 nivcsw 6 284414 901533 384379 449845.33 241447.41 Summary: -------- For building this specific medium C program, gcc 4.2.1 is ~40% slower than clang 3.1 in real time, ~41% slower in user time, and ~49% slower in system time. The maximum resident set size during building is ~121% larger, and it causes ~83% more page reclaims. For C, gcc 4.7.1 is even slower than its older version; it is ~52% slower than clang 3.1 in real time, ~54% slower in user time, and ~63% slower in system time. The maximum resident set size during building is ~151% larger, and it causes ~75% more page reclaims. Finally, clang 3.2 is ~9% slower than clang 3.1 in both real time and user time, and ~5% slower in system time. The maximum resident set size during building is ~5% larger, and it causes ~4% more page reclaims. Conclusion: ----------- Clang 3.1 is clearly the fastest compiler for building this specific medium- large C program, with clang 3.2 somewhat behind. Both are significantly faster, and use much less memory than either version of gcc. Building a large C++ library (boost 1.50.0) single-threaded =========================================================== Using clang 3.1: ---------------- N Min Max Median Avg Stddev real 6 1056.69 1060.49 1059.09 1058.6783 1.5028695 user 6 975.49 978.88 978.53 977.55 1.4653464 sys 6 73.75 76.42 74.87 74.95 1.0609618 maxrss 6 212324 216712 213668 214260.67 1774.6309 ixrss 6 22472 22549 22525 22514.5 31.232995 idrss 6 3793 3806 3802 3800.1667 5.492419 isrss 6 276 277 277 276.5 0.54772256 minflt 6 9543701 9543702 9543701 9543701.3 0.51234754 majflt 6 0 0 0 0 0 nswap 6 0 0 0 0 0 inblock 6 0 0 0 0 0 oublock 6 1453 1461 1457 1455.8333 3.3714487 msgsnd 6 115 115 115 115 0 msgrcv 6 0 0 0 0 0 nsignals 6 0 0 0 0 0 nvcsw 6 7352 7834 7576 7567.1667 167.70023 nivcsw 6 27478 2350999 1699745 1337037.8 1040439 Using clang 3.2: ---------------- N Min Max Median Avg Stddev real 6 1075.33 1077.94 1076.39 1076.4267 0.93958856 user 6 995.14 997.61 995.43 995.88833 0.9489661 sys 6 72.34 74.67 74.23 73.843333 0.81563881 maxrss 6 208552 211148 210436 209936 921.08458 ixrss 6 22437 22484 22458 22459 19.768662 idrss 6 3869 3878 3873 3873.3333 3.8815804 isrss 6 275 275 275 275 0 minflt 6 9351477 9351478 9351478 9351477.5 0.54772256 majflt 6 0 0 0 0 0 nswap 6 0 0 0 0 0 inblock 6 0 0 0 0 0 oublock 6 1448 1454 1449 1449.6667 2.4221203 msgsnd 6 115 115 115 115 0 msgrcv 6 0 0 0 0 0 nsignals 6 0 0 0 0 0 nvcsw 6 10481 12934 11049 11105.333 936.9249 nivcsw 6 975292 2383586 1633650 1615797.3 605542.76 Using gcc 4.2.1: ---------------- N Min Max Median Avg Stddev real 6 1037.86 1047.78 1039.71 1040.21 3.8054592 user 6 938.74 944.49 941.52 941.55667 1.8382999 sys 6 86.37 92.84 89.89 89.57 2.1105639 maxrss 6 560256 560316 560272 560274 21.428952 ixrss 6 6435 6453 6441 6443 7.2663608 idrss 6 3563 3573 3566 3567.5 4.0373258 isrss 6 136 136 136 136 0 minflt 6 12360490 12360492 12360491 12360491 0.63245553 majflt 6 0 0 0 0 0 nswap 6 0 0 0 0 0 inblock 6 0 4283 0 713.83333 1748.5274 oublock 6 2648 2656 2655 2653.3333 2.9439203 msgsnd 6 44 51 51 49.833333 2.857738 msgrcv 6 0 0 0 0 0 nsignals 6 0 0 0 0 0 nvcsw 6 7897 12281 8004 8696 1757.1741 nivcsw 6 19915 1989580 897003 957452.5 764999.83 Using gcc 4.7.1: ---------------- N Min Max Median Avg Stddev real 6 1038.13 1041.29 1040.98 1039.92 1.3837774 user 6 937.73 943.59 941.35 941.14167 2.0323919 sys 6 89.19 95.1 91.61 91.588333 2.0745739 maxrss 6 361268 361268 361268 361268 0 ixrss 6 12431 12474 12469 12457.333 17.385818 idrss 6 1547 1552 1551 1549.8333 1.9407902 isrss 6 129 129 129 129 0 minflt 6 10455489 10455489 10455489 10455489 0 majflt 6 0 0 0 0 0 nswap 6 0 0 0 0 0 inblock 6 0 162 0 27 66.136223 oublock 6 2537 2540 2539 2538.5 1.0488088 msgsnd 6 113 113 113 113 0 msgrcv 6 0 0 0 0 0 nsignals 6 0 0 0 0 0 nvcsw 6 7778 7975 7880 7874.1667 78.036957 nivcsw 6 27055 2302383 2187401 1478808.3 946760.86 Summary: -------- For building this specific large C++ library, clang 3.1 is ~2% slower than gcc 4.2.1 in real time, ~4% slower in user time, but ~20% faster in system time. The maximum resident set size during building is ~162% smaller, and it causes ~30% less page reclaims. As before, clang 3.2 is slower than its older version; it is ~3% slower than gcc 4.2.1 in real time, ~6% slower in user time, but ~21% faster in system time. The maximum resident set size is ~167% smaller, and it causes ~32% less page reclaims. Finally, gcc 4.7.1 is equally fast as gcc 4.2.1 in real time and user time, and ~2% slower in system time. The maximum resident set size is ~36% smaller, and it causes ~15% less page reclaims. Conclusion: ----------- Both gcc 4.2.1 and 4.7.1 are the fastest compilers for building this specific large C++ library, but both versions of clang are not far behind. Both versions of gcc use quite a bit more memory than either version of clang. ================================================================================ Copyright (c) 2012 Dimitry Andric <dimitry@andric.com> Verbatim copying and redistribution of this entire text are permitted, provided this notice is preserved. ================================================================================ --------------010504020809040102000509--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?5046670C.6050500>