Date: Sun, 16 Sep 2012 00:34:45 +0200 From: Dimitry Andric <dimitry@andric.com> To: freebsd-current@FreeBSD.org, freebsd-toolchain@FreeBSD.org Subject: Compiler performance tests on FreeBSD 10.0-CURRENT Message-ID: <50550285.4040203@andric.com>
next in thread | raw e-mail | index | archive | help
This is a multi-part message in MIME format. --------------090906020004070601070605 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Hi all, By request, I performed a series of kernel performance tests on FreeBSD 10.0-CURRENT, particularly comparing the runtime performance of GENERIC kernels compiled by gcc 4.2.1 and by clang 3.2. The attached text file[1] contains more information about the tests, some semi-cooked performance data, and my conclusions. Any errors and omissions are also my fault, so if you notice them, please let me know. The executive summary: GENERIC kernels compiled with clang 3.2 are slightly faster than those compiled by gcc 4.2.1, though the difference will not very noticeable in practice. Last but not least, thanks to Gavin Atkinson for providing the required hardware. -Dimitry [1]: Also available at: <http://www.andric.com/freebsd/perftest/perftest-kernel-2012-09-14a.txt> --------------090906020004070601070605 Content-Type: text/plain; charset=windows-1252; name="perftest-kernel-2012-09-14a.txt" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="perftest-kernel-2012-09-14a.txt" KERNEL PERFORMANCE TESTS ON FREEBSD 10.0-CURRENT, SEPTEMBER 2012 ================================================================ INTRODUCTION ------------ These tests aim to give an indication of the runtime performance of FreeBSD kernels compiled with different compilers. The compilers tested were: - gcc 4.2.1, the system compiler in FreeBSD. - clang 3.2 (trunk 162107), which is the default version of clang in FreeBSD 10.0-CURRENT, after r239462. All tests were run on a machine gracefully provided by Gavin Atkinson, which is a Dell PowerEdge 2850, with two 2.80 GHz Xeon-class CPUs (id=0xf41), and 4 GB RAM. It runs FreeBSD/amd64 10.0-CURRENT as of Tue Sep 11 19:11:00 UTC 2012. With each compiler, a stock GENERIC kernel for amd64 was built from head as of r240384, with the default optimization flags for this architecture, which are for gcc: -O2 -frename-registers -pipe -fno-strict-aliasing and for clang: -O2 -pipe -fno-strict-aliasing Each kernel was installed into /boot/kernel.${compilername}. The system was then booted with each of these kernels, without modifying anything else, and multiple runs of "make buildworld" were done; first in single-threaded mode, next in multi-threaded mode, using the -j8 flag. Between each run, the /usr/obj directory was fully cleaned out, and filesystems were synced. The timing results are below. Building world, single-threaded, on a GENERIC kernel compiled by clang 3.2 -------------------------------------------------------------------------- N Min Max Median Avg Stddev real 3 26589.27 26680.48 26653.58 26641.11 46.866211 user 3 20449.52 20472.88 20463.4 20461.933 11.748861 sys 3 7809.87 7837.94 7830.35 7826.0533 14.519891 maxrss 3 759420 759420 759420 759420 0 ixrss 3 4923 4926 4924 4924.3333 1.5275252 idrss 3 584 584 584 584 0 isrss 3 131 131 131 131 0 minflt 3 6.5828088e+08 6.5855089e+08 6.5828258e+08 6.5837145e+08 155402.8 majflt 3 0 2573 2568 1713.6667 1484.081 nswap 3 0 0 0 0 0 inblock 3 2176 30252 30170 20866 16186.067 oublock 3 28370 28377 28375 28374 3.6055513 msgsnd 3 0 5 2 2.3333333 2.5166115 msgrcv 3 0 3 2 1.6666667 1.5275252 nsignals 3 74107 74107 74107 74107 0 nvcsw 3 1086164 1107104 1106650 1099972.7 11960.81 nivcsw 3 604641 658906 616307 626618 28564.14 Building world, single-threaded, on a GENERIC kernel compiled by gcc 4.2.1 -------------------------------------------------------------------------- N Min Max Median Avg Stddev real 3 26986.71 27080.38 26992.54 27019.877 52.478445 user 3 20506.89 20516.1 20511.66 20511.55 4.6059852 sys 3 8245.69 8285.79 8253.04 8261.5067 21.348673 maxrss 3 759420 759420 759420 759420 0 ixrss 3 4894 4900 4898 4897.3333 3.0550505 idrss 3 581 581 581 581 0 isrss 3 131 131 131 131 0 minflt 3 6.5855245e+08 6.5855409e+08 6.5855253e+08 6.5855302e+08 922.2581 majflt 3 0 2566 0 855.33333 1481.4808 nswap 3 0 0 0 0 0 inblock 3 1619 29805 2008 11144 16162.07 oublock 3 28652 28747 28662 28687 52.201533 msgsnd 3 0 2 0 0.66666667 1.1547005 msgrcv 3 0 2 0 0.66666667 1.1547005 nsignals 3 74107 74107 74107 74107 0 nvcsw 3 1088827 1110096 1089758 1096227 12019.924 nivcsw 3 631463 668779 638421 646221 19843.159 Summary: -------- On a kernel compiled with gcc 4.2.1, building world in single-threaded mode is ~1.4% slower in real time than on a kernel compiled with clang 3.2, equally fast in user time, and ~5.6% slower in system time. Conclusion: ----------- The difference in real time is rather minimal, and even negligible in user time, but in system time it is much more pronounced. Since system time can be attributed to the kernel proper, a kernel compiled with clang 3.2 is clearly faster than a kernel compiled with gcc 4.2.1, by a margin of just over 5 percent. Building world, multi-threaded, on a GENERIC kernel compiled by clang 3.2 ------------------------------------------------------------------------- N Min Max Median Avg Stddev real 3 13832.75 13875.24 13871.47 13859.82 23.518969 user 3 33658.54 33743.43 33730.26 33710.743 45.686467 sys 3 14704.76 14775.59 14744.45 14741.6 35.500903 maxrss 3 758256 758256 758256 758256 0 ixrss 3 4829 4831 4830 4830 1 idrss 3 573 574 574 573.66667 0.57735027 isrss 3 130 130 130 130 0 minflt 3 6.6259374e+08 6.6304066e+08 6.6288552e+08 6.6283997e+08 226911.43 majflt 3 3160 4003 3801 3654.6667 440.13899 nswap 3 40 40 40 40 0 inblock 3 27763 28008 27853 27874.667 123.92874 oublock 3 55003 58725 57061 56929.667 1864.4724 msgsnd 3 0 0 0 0 0 msgrcv 3 0 0 0 0 0 nsignals 3 60496 60506 60499 60500.333 5.1316014 nvcsw 3 1891074 1894870 1893148 1893030.7 1900.7181 nivcsw 3 3095468 3126475 3116877 3112940 15873.988 Building world, multi-threaded, on a GENERIC kernel compiled by gcc 4.2.1 ------------------------------------------------------------------------- N Min Max Median Avg Stddev real 3 14017.65 14046.35 14042.26 14035.42 15.524552 user 3 33596.19 33687.03 33661.9 33648.373 46.906337 sys 3 15347.75 15438.63 15436.98 15407.787 51.999823 maxrss 3 758228 758248 758244 758240 10.583005 ixrss 3 4808 4809 4809 4808.6667 0.57735027 idrss 3 571 571 571 571 0 isrss 3 130 130 130 130 0 minflt 3 6.6301232e+08 6.6339175e+08 6.6312437e+08 6.6317615e+08 194941.64 majflt 3 3715 5509 3812 4345.3333 1008.9313 nswap 3 40 40 40 40 0 inblock 3 28327 43672 28374 33457.667 8845.9034 oublock 3 50661 57892 56870 55141 3913.3005 msgsnd 3 0 0 0 0 0 msgrcv 3 0 0 0 0 0 nsignals 3 60501 60506 60504 60503.667 2.5166114 nvcsw 3 1882397 1910610 1895448 1896151.7 14119.657 nivcsw 3 2747620 2856552 2788778 2797650 55005.267 Summary: -------- On a kernel compiled with gcc 4.2.1, building world in multi-threaded mode is ~1.3% slower in real time than on a kernel compiled with clang 3.2, equally fast in user time, and ~4.5% slower in system time. Conclusion: ----------- As with single-threaded mode, the difference in real time is rather minimal, and even negligible in user time, but in system time it is much more pronounced. Since system time can be attributed to the kernel proper, a kernel compiled with clang 3.2 is clearly faster than a kernel compiled with gcc 4.2.1, by a margin of just over 4 percent. ================================================================================ Copyright (c) 2012 Dimitry Andric <dimitry@andric.com> Verbatim copying and redistribution of this entire text are permitted, provided this notice is preserved. ================================================================================ --------------090906020004070601070605--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?50550285.4040203>