Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 16 Sep 2012 00:34:45 +0200
From:      Dimitry Andric <dimitry@andric.com>
To:        freebsd-current@FreeBSD.org, freebsd-toolchain@FreeBSD.org
Subject:   Compiler performance tests on FreeBSD 10.0-CURRENT
Message-ID:  <50550285.4040203@andric.com>

next in thread | raw e-mail | index | archive | help
This is a multi-part message in MIME format.
--------------090906020004070601070605
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

Hi all,

By request, I performed a series of kernel performance tests on FreeBSD
10.0-CURRENT, particularly comparing the runtime performance of GENERIC
kernels compiled by gcc 4.2.1 and by clang 3.2.

The attached text file[1] contains more information about the tests,
some semi-cooked performance data, and my conclusions.  Any errors and
omissions are also my fault, so if you notice them, please let me know.

The executive summary: GENERIC kernels compiled with clang 3.2 are
slightly faster than those compiled by gcc 4.2.1, though the difference
will not very noticeable in practice.

Last but not least, thanks to Gavin Atkinson for providing the required
hardware.

-Dimitry

[1]: Also available at:
<http://www.andric.com/freebsd/perftest/perftest-kernel-2012-09-14a.txt>;

--------------090906020004070601070605
Content-Type: text/plain; charset=windows-1252;
	name="perftest-kernel-2012-09-14a.txt"
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
 filename="perftest-kernel-2012-09-14a.txt"

KERNEL PERFORMANCE TESTS ON FREEBSD 10.0-CURRENT, SEPTEMBER 2012
================================================================

INTRODUCTION
------------

These tests aim to give an indication of the runtime performance of FreeBSD
kernels compiled with different compilers.  The compilers tested were:

- gcc 4.2.1, the system compiler in FreeBSD.
- clang 3.2 (trunk 162107), which is the default version of clang in FreeBSD
  10.0-CURRENT, after r239462.

All tests were run on a machine gracefully provided by Gavin Atkinson, which is
a Dell PowerEdge 2850, with two 2.80 GHz Xeon-class CPUs (id=0xf41), and 4 GB
RAM.  It runs FreeBSD/amd64 10.0-CURRENT as of Tue Sep 11 19:11:00 UTC 2012.

With each compiler, a stock GENERIC kernel for amd64 was built from head as of
r240384, with the default optimization flags for this architecture, which are
for gcc:

  -O2 -frename-registers -pipe -fno-strict-aliasing

and for clang:

  -O2 -pipe -fno-strict-aliasing

Each kernel was installed into /boot/kernel.${compilername}.  The system was
then booted with each of these kernels, without modifying anything else, and
multiple runs of "make buildworld" were done; first in single-threaded mode,
next in multi-threaded mode, using the -j8 flag.  Between each run, the /usr/obj
directory was fully cleaned out, and filesystems were synced.

The timing results are below.

Building world, single-threaded, on a GENERIC kernel compiled by clang 3.2
--------------------------------------------------------------------------
          N           Min           Max        Median           Avg        Stddev
real      3      26589.27      26680.48      26653.58      26641.11     46.866211
user      3      20449.52      20472.88       20463.4     20461.933     11.748861
sys       3       7809.87       7837.94       7830.35     7826.0533     14.519891
maxrss    3        759420        759420        759420        759420             0
ixrss     3          4923          4926          4924     4924.3333     1.5275252
idrss     3           584           584           584           584             0
isrss     3           131           131           131           131             0
minflt    3 6.5828088e+08 6.5855089e+08 6.5828258e+08 6.5837145e+08      155402.8
majflt    3             0          2573          2568     1713.6667      1484.081
nswap     3             0             0             0             0             0
inblock   3          2176         30252         30170         20866     16186.067
oublock   3         28370         28377         28375         28374     3.6055513
msgsnd    3             0             5             2     2.3333333     2.5166115
msgrcv    3             0             3             2     1.6666667     1.5275252
nsignals  3         74107         74107         74107         74107             0
nvcsw     3       1086164       1107104       1106650     1099972.7      11960.81
nivcsw    3        604641        658906        616307        626618      28564.14

Building world, single-threaded, on a GENERIC kernel compiled by gcc 4.2.1
--------------------------------------------------------------------------
          N           Min           Max        Median           Avg        Stddev
real      3      26986.71      27080.38      26992.54     27019.877     52.478445
user      3      20506.89       20516.1      20511.66      20511.55     4.6059852
sys       3       8245.69       8285.79       8253.04     8261.5067     21.348673
maxrss    3        759420        759420        759420        759420             0
ixrss     3          4894          4900          4898     4897.3333     3.0550505
idrss     3           581           581           581           581             0
isrss     3           131           131           131           131             0
minflt    3 6.5855245e+08 6.5855409e+08 6.5855253e+08 6.5855302e+08      922.2581
majflt    3             0          2566             0     855.33333     1481.4808
nswap     3             0             0             0             0             0
inblock   3          1619         29805          2008         11144      16162.07
oublock   3         28652         28747         28662         28687     52.201533
msgsnd    3             0             2             0    0.66666667     1.1547005
msgrcv    3             0             2             0    0.66666667     1.1547005
nsignals  3         74107         74107         74107         74107             0
nvcsw     3       1088827       1110096       1089758       1096227     12019.924
nivcsw    3        631463        668779        638421        646221     19843.159

Summary:
--------
On a kernel compiled with gcc 4.2.1, building world in single-threaded mode is
~1.4% slower in real time than on a kernel compiled with clang 3.2, equally fast
in user time, and ~5.6% slower in system time.

Conclusion:
-----------
The difference in real time is rather minimal, and even negligible in user time,
but in system time it is much more pronounced.

Since system time can be attributed to the kernel proper, a kernel compiled with
clang 3.2 is clearly faster than a kernel compiled with gcc 4.2.1, by a margin
of just over 5 percent.

Building world, multi-threaded, on a GENERIC kernel compiled by clang 3.2
-------------------------------------------------------------------------
          N           Min           Max        Median           Avg        Stddev
real      3      13832.75      13875.24      13871.47      13859.82     23.518969
user      3      33658.54      33743.43      33730.26     33710.743     45.686467
sys       3      14704.76      14775.59      14744.45       14741.6     35.500903
maxrss    3        758256        758256        758256        758256             0
ixrss     3          4829          4831          4830          4830             1
idrss     3           573           574           574     573.66667    0.57735027
isrss     3           130           130           130           130             0
minflt    3 6.6259374e+08 6.6304066e+08 6.6288552e+08 6.6283997e+08     226911.43
majflt    3          3160          4003          3801     3654.6667     440.13899
nswap     3            40            40            40            40             0
inblock   3         27763         28008         27853     27874.667     123.92874
oublock   3         55003         58725         57061     56929.667     1864.4724
msgsnd    3             0             0             0             0             0
msgrcv    3             0             0             0             0             0
nsignals  3         60496         60506         60499     60500.333     5.1316014
nvcsw     3       1891074       1894870       1893148     1893030.7     1900.7181
nivcsw    3       3095468       3126475       3116877       3112940     15873.988

Building world, multi-threaded, on a GENERIC kernel compiled by gcc 4.2.1
-------------------------------------------------------------------------
          N           Min           Max        Median           Avg        Stddev
real      3      14017.65      14046.35      14042.26      14035.42     15.524552
user      3      33596.19      33687.03       33661.9     33648.373     46.906337
sys       3      15347.75      15438.63      15436.98     15407.787     51.999823
maxrss    3        758228        758248        758244        758240     10.583005
ixrss     3          4808          4809          4809     4808.6667    0.57735027
idrss     3           571           571           571           571             0
isrss     3           130           130           130           130             0
minflt    3 6.6301232e+08 6.6339175e+08 6.6312437e+08 6.6317615e+08     194941.64
majflt    3          3715          5509          3812     4345.3333     1008.9313
nswap     3            40            40            40            40             0
inblock   3         28327         43672         28374     33457.667     8845.9034
oublock   3         50661         57892         56870         55141     3913.3005
msgsnd    3             0             0             0             0             0
msgrcv    3             0             0             0             0             0
nsignals  3         60501         60506         60504     60503.667     2.5166114
nvcsw     3       1882397       1910610       1895448     1896151.7     14119.657
nivcsw    3       2747620       2856552       2788778       2797650     55005.267

Summary:
--------
On a kernel compiled with gcc 4.2.1, building world in multi-threaded mode is
~1.3% slower in real time than on a kernel compiled with clang 3.2, equally fast
in user time, and ~4.5% slower in system time.

Conclusion:
-----------
As with single-threaded mode, the difference in real time is rather minimal, and
even negligible in user time, but in system time it is much more pronounced.

Since system time can be attributed to the kernel proper, a kernel compiled with
clang 3.2 is clearly faster than a kernel compiled with gcc 4.2.1, by a margin
of just over 4 percent.

================================================================================
Copyright (c) 2012 Dimitry Andric <dimitry@andric.com>

Verbatim copying and redistribution of this entire text are permitted, provided
this notice is preserved.
================================================================================

--------------090906020004070601070605--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?50550285.4040203>