Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 21 Sep 2012 23:39:40 +0200
From:      Dimitry Andric <dimitry@andric.com>
To:        freebsd-current@FreeBSD.org, freebsd-toolchain@FreeBSD.org
Subject:   More kernel performance tests on FreeBSD 10.0-CURRENT
Message-ID:  <505CDE9C.3060504@andric.com>

next in thread | raw e-mail | index | archive | help
This is a multi-part message in MIME format.
--------------070108000503090709030807
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

Hi all,

As a followup to my previous post about the performance of FreeBSD 10.0
kernels compiled with different compilers (clang and gcc), I did another
series of tests, now on a more modern machine (Core i5-based).  I also
tested the performance with different compiler optimization settings.

The attached text file[1] contains more information about these tests,
performance data, and my conclusions.  Any errors and omissions are also
my fault, so if you notice them, please let me know.

The executive summary: GENERIC kernels compiled with clang 3.2 are again
a little faster than those compiled with gcc 4.2.1.  For gcc, compiling
with -O2 also gives a slightly faster kernel than with -O1, but for
clang there is no measurable difference between those flags.

Again, many thanks to Gavin Atkinson for providing the required
hardware.

-Dimitry

[1]: Also available at:
<http://www.andric.com/freebsd/perftest/perftest-kernel-2012-09-21a.txt>;

--------------070108000503090709030807
Content-Type: text/plain; charset=windows-1252;
	name="perftest-kernel-2012-09-21a.txt"
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
 filename="perftest-kernel-2012-09-21a.txt"

KERNEL PERFORMANCE TESTS ON FREEBSD 10.0-CURRENT, SEPTEMBER 2012, PART 2
========================================================================

INTRODUCTION
------------
These tests aim to give an indication of the runtime performance of FreeBSD
kernels compiled with different compilers, at various optimization levels.  The
compilers tested were:

- gcc 4.2.1, the system compiler in FreeBSD.
- clang 3.2 (trunk 162107), which is the default version of clang in FreeBSD
  10.0-CURRENT, after r239462.

All tests were run on a machine gracefully provided by Gavin Atkinson, which is
based on an Intel DQ57TM desktop board, with a quad-core 3.20 GHz Intel Core i5
CPU (id=0x20652), and 4 GB RAM.  It runs FreeBSD/amd64 10.0-CURRENT as of Tue
Sep 11 19:11:00 UTC 2012.  An excerpt of dmesg follows:

CPU: Intel(R) Core(TM) i5 CPU         650  @ 3.20GHz (3192.08-MHz K8-class CPU)
  Origin = "GenuineIntel"  Id = 0x20652  Family = 6  Model = 25  Stepping = 2
  Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,
                      CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,
                      TM,PBE>
  Features2=0x298e3ff<SSE3,PCLMULQDQ,DTES64,MON,DS_CPL,VMX,SMX,EST,TM2,SSSE3,
                      CX16,xTPR,PDCM,SSE4.1,SSE4.2,POPCNT,AESNI>
  AMD Features=0x28100800<SYSCALL,NX,RDTSCP,LM>
  AMD Features2=0x1<LAHF>
  TSC: P-state invariant, performance statistics
real memory  = 4294967296 (4096 MB)
avail memory = 3882647552 (3702 MB)
Event timer "LAPIC" quality 600
ACPI APIC Table: <INTEL  DQ57TM  >
FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs
FreeBSD/SMP: 1 package(s) x 2 core(s) x 2 SMT threads
 cpu0 (BSP): APIC ID:  0
 cpu1 (AP): APIC ID:  1
 cpu2 (AP): APIC ID:  4
 cpu3 (AP): APIC ID:  5

With each compiler, stock GENERIC kernels for amd64 were built from head as of
r240384, for each of the following optimization flags:

  -O2 -frename-registers -pipe -fno-strict-aliasing
  -O1 -pipe
  -O0 -pipe

Note that clang does not support -frename-registers, so it was omitted for the
corresponding kernel builds.  No CPU-specific optimization flags (-march=) were
used.

Each kernel was installed into a separate kernel installation directory under
/boot.  The system was then booted with each of these kernels, without modifying
anything else, and multiple runs of "make -j8 buildworld" were done.  Between
each run, the /usr/obj directory was fully cleaned out, and filesystems were
synced.

The timing results, processed with ministat(1), are below.

Building world, multi-threaded, on a GENERIC kernel compiled by clang 3.2 -O0
-----------------------------------------------------------------------------
          N           Min           Max        Median           Avg        Stddev
real      6       6503.62       6527.84       6520.49     6517.2817     8.3845558
user      6      12534.49      12576.55      12555.29     12555.547     14.079771
sys       6        9655.1       9733.92        9716.1     9709.9533     28.981809
maxrss    6        758208        758248        758224     758222.67     13.779213
ixrss     6          4396          4401          4397     4397.1667     1.9407902
idrss     6           523           523           523           523             0
isrss     6           126           126           126           126             0
minflt    6 6.6264519e+08 6.6337812e+08 6.6297908e+08 6.6299306e+08     249092.49
majflt    6          4354         10457          5722     6207.8333     2208.4725
nswap     6            40            56            42     44.333333     6.1210021
inblock   6         25167         44267         29212     31042.667     6677.3727
oublock   6         32801         34666         33500     33635.167     692.27897
msgsnd    6             0             0             0             0             0
msgrcv    6             0             0             0             0             0
nsignals  6         60495         60504         60502         60500     3.5213634
nvcsw     6       1750409       1759010       1754971     1754668.8     3641.3163
nivcsw    6       1867335       1943885       1924258     1909641.2     30495.366

Building world, multi-threaded, on a GENERIC kernel compiled by clang 3.2 -O1
-----------------------------------------------------------------------------
          N           Min           Max        Median           Avg        Stddev
real      6       4788.59       4831.96       4798.01      4802.305      15.48322
user      6      12239.94       12285.9      12268.91       12263.5     17.190572
sys       6       4041.05        4100.4       4083.92      4076.235     21.374684
maxrss    6        758212        758256        758256     758242.67     18.532855
ixrss     6          4963          4971          4964     4964.6667     3.1411251
idrss     6           589           590           589     589.16667    0.40824829
isrss     6           132           132           132           132             0
minflt    6  6.617985e+08 6.6339562e+08  6.629315e+08 6.6272587e+08     574835.78
majflt    6          7935         23481         17450     16901.667      5324.564
nswap     6            40            52            48     47.333333     3.9327683
inblock   6         25121         44292         29173     30980.667     6715.0864
oublock   6         24867         28037         26579     26667.167      1162.513
msgsnd    6             0             0             0             0             0
msgrcv    6             0             0             0             0             0
nsignals  6         60492         60500         60498     60496.667     3.4448028
nvcsw     6       1559857       1576788       1562507     1565002.8     6454.8513
nivcsw    6       1632143       1721204       1688209       1682830      35836.46

Building world, multi-threaded, on a GENERIC kernel compiled by clang 3.2 -O2
-----------------------------------------------------------------------------
          N           Min           Max        Median           Avg        Stddev
real      6       4780.24       4819.77       4801.98     4798.5867     14.236627
user      6      12242.91      12275.04      12256.37     12255.905     11.676621
sys       6       4052.75       4118.65       4104.76     4096.2217     22.874298
maxrss    6        758220        758256        758256     758244.67     17.603031
ixrss     6          4960          4970          4964     4963.8333     3.4880749
idrss     6           589           590           589     589.16667    0.40824829
isrss     6           132           132           132           132             0
minflt    6 6.6248246e+08 6.6340936e+08 6.6300404e+08 6.6293496e+08     324940.82
majflt    6          4300         22493         14128     12176.833     6396.7734
nswap     6            40            52            48            46     4.8989795
inblock   6         29120         44375         29277         31760     6180.4181
oublock   6         24915         28157         25984     26315.333      1251.164
msgsnd    6             0             0             0             0             0
msgrcv    6             0             0             0             0             0
nsignals  6         60490         60499         60497     60495.667     3.2041639
nvcsw     6       1559291       1575794       1570626     1569117.3      5467.274
nivcsw    6       1593865       1678135       1654604       1640246     31701.067

Building world, multi-threaded, on a GENERIC kernel compiled by gcc 4.2.1 -O0
-----------------------------------------------------------------------------
          N           Min           Max        Median           Avg        Stddev
real      6       6083.69       6101.08       6096.85     6094.4383     6.5165003
user      6      12424.93      12462.24      12438.63      12441.97     12.975073
sys       6       8305.66       8394.45       8377.26     8366.4767     32.469675
maxrss    6        758208        758256        758224     758225.33      16.52473
ixrss     6          4481          4491          4484     4484.6667     3.3862467
idrss     6           533           534           533     533.16667    0.40824829
isrss     6           127           127           127           127             0
minflt    6 6.6241224e+08 6.6339646e+08 6.6301629e+08 6.6292507e+08     336924.37
majflt    6          4357          9603          6231     6667.8333     1812.2422
nswap     6            40            48            40     41.666667      3.204164
inblock   6         29162         44302         29272     31759.333     6145.0026
oublock   6         30081         32816         31538       31281.5     1163.8237
msgsnd    6             0             0             0             0             0
msgrcv    6             0             0             0             0             0
nsignals  6         60500         60501         60500     60500.333    0.51639753
nvcsw     6       1701009       1713077       1709140       1707903     3975.4753
nivcsw    6       1854572       1936195       1896858     1894873.2     26725.543

Building world, multi-threaded, on a GENERIC kernel compiled by gcc 4.2.1 -O1
-----------------------------------------------------------------------------
          N           Min           Max        Median           Avg        Stddev
real      6       4943.74       4965.28       4955.62       4953.78     7.2888627
user      6      12274.46      12334.13      12322.13     12314.472     21.858036
sys       6       4576.99       4621.09       4617.21       4609.75     16.658918
maxrss    6        758208        758256        758224        758232     19.595918
ixrss     6          4897          4902          4898     4898.6667     1.9663842
idrss     6           581           582           581     581.33333    0.51639778
isrss     6           131           131           131           131             0
minflt    6  6.626435e+08  6.634147e+08 6.6301953e+08  6.629835e+08     279004.88
majflt    6          6092         11215          9188     8755.1667     1849.3565
nswap     6            40            62            48     49.333333     7.1180522
inblock   6         29076         44462         29163         31697     6253.6444
oublock   6         25415         28495         28175     27508.167     1179.5914
msgsnd    6             0             0             0             0             0
msgrcv    6             0             0             0             0             0
nsignals  6         60488         60499         60495     60494.333     3.9832984
nvcsw     6       1575048       1588567       1584504     1582316.7     5705.6913
nivcsw    6       1682902       1745827       1730506     1722802.3     24060.717

Building world, multi-threaded, on a GENERIC kernel compiled by gcc 4.2.1 -O2
-----------------------------------------------------------------------------
          N           Min           Max        Median           Avg        Stddev
real      6       4876.16       4901.55       4895.24     4888.7583     10.598318
user      6      12241.35      12306.04      12283.94     12278.767     23.922356
sys       6       4400.43       4452.62       4446.22     4438.0117     19.231095
maxrss    6        758212        758256        758224     758229.33     17.095809
ixrss     6          4899          4905          4900     4900.6667     2.2509257
idrss     6           581           582           582     581.83333    0.40824829
isrss     6           131           131           131           131             0
minflt    6 6.6214332e+08 6.6334997e+08 6.6298766e+08 6.6278723e+08     436172.22
majflt    6          6055         12473          9169        8895.5     2381.6063
nswap     6            40            54            48            48     4.5607017
inblock   6         29193         44443         29313         31804     6192.0071
oublock   6         25113         28152         26770     26490.167     1254.3383
msgsnd    6             0             0             0             0             0
msgrcv    6             0             0             0             0             0
nsignals  6         60496         60501         60499     60498.667     2.2509257
nvcsw     6       1566521       1592140       1579251     1578889.5      9354.883
nivcsw    6       1686675       1809406       1785290     1756283.7     50719.325

Summary:
--------
On a kernel compiled with clang 3.2 -O2, building world in multi-threaded mode
is ~1.9% faster in real time than on a kernel compiled with gcc 4.2.1 -O2, and
~8.3% faster in system time.

On a kernel compiled with clang 3.2 -O1, building world in multi-threaded mode
is ~3.2% faster in real time than on a kernel compiled with gcc 4.2.1 -O1, and
~13.1% faster in system time.

On a kernel compiled with gcc 4.2.1 -O2, building world in multi-threaded mode
is ~1.3% faster in real time than on a kernel compiled with gcc 4.2.1 -O1, and
~3.9% faster in system time.

The difference between building world in multi-threaded mode on kernels compiled
with clang 3.2 -O2 and -O1 is not significant (to within 1 standard deviation).

Conclusion:
-----------
Kernels compiled with clang are a little faster in real time for building world,
and in system time the difference is even larger, roughly 10%.  For clang, the
difference between -O1 and -O2 is not measurable, but for gcc, -O2 is slightly
faster than -O1.

================================================================================
Copyright (c) 2012 Dimitry Andric <dimitry@andric.com>

Verbatim copying and redistribution of this entire text are permitted, provided
this notice is preserved.
================================================================================

--------------070108000503090709030807--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?505CDE9C.3060504>