Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 7 Oct 2013 10:28:04 -0700
From:      David Wolfskill <david@catwhisker.org>
To:        performance@freebsd.org
Subject:   Apparent performance regression 8.3@ -> 8.4@r255966?
Message-ID:  <20131007172804.GA7641@albert.catwhisker.org>

next in thread | raw e-mail | index | archive | help

--BXVAT5kNtrzKuDFl
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

At work, we have a bunch of machines that developers use to build some
software.  The machines presently run FreeBSD/amd64 8.3-STABLE @rxxxxxx
(with a few local patches, which have since been committed to stable/8),
and the software is built within a 32-bit jail.

The hardware includes 2 packages of 6 physical cores each @3.47GHz
(Intel X5690); SMT is enabled (so the scheduler sees hw.ncpu =3D=3D
24).  The memory on the machines was recently increased from 6GB
to 96GB.

I am trying to set up a replacement host environment on my test machine;
the current environment there is FreeBSD/amd64 8.4-STABLE @r255966; this
environment achieves a couple of objectives:

* It has no local patches.
* The known problems (e.g., with mfiutil failing to report battery
  status accurately) are believed to be addressed appropriately.

However: when I do comparison software builds, the new environment is
taking about 12% longer to perform the same work (comparing against a
fair sample of the deployed machines):


Now, when I do these builds, I do so under /usr/bin/time, as well
as using a bit of "scaffolding" I cobbled up (a few years back)
that basically samples a bunch of sysctl OIDs periodically (by
default, every 10 seconds).  Once the build is done, I grab the
file that has the sampled OID data and bring it to my desktop machine
to post-process it; I generate graphs showing (aggregate and per-core)
CPU utilization, as well as Load Averages over the course of the
build.  I can also generate graphs that show how the memory statistics
that "top" displays vary during the course of the build, as well as just
about any univariate OID, and quite a few simple multivariate OIDs
(e.g., kern.cp_time, kern.cp_times, and vm.loadavg).

After seeing the above results and poking around looking for
somewhat-recent tuning information, I ran across a suggestion that the
default of 2MB for vfs.ufs.dirhash_maxmem was probably on the low side.
So I started sampling both vfs.ufs.dirhash_maxmem (mostly to make
documentation of the configuration for a test run easier) and
vfs.ufs.dirhash_mem (to see what we were actually using).  And I tried
quadrupling vfs.ufs.dirhash_maxmem (to 8MB).

The next time I tried a test build, I found that vfs.ufs.dirhash_mem
started at about 3.8MB, climbed fairly steadily, then "clipped" at
8MB, so I quadrupled it again (to 32MB), and found that it climbed
to almost 12MB, then dropped precipitously to about 400KB (and
oscillated between about 400KB & 20MB for the rest of the build,
which appears to be the "packaging" phase).

Despite that increase in vfs.ufs.dirhash_maxmem, this does not
appear to have measurably affected the build times.

In examining the CPU utilization graphs, the CPU generally looks
about 5% busy for the first 15 minutes; this would be bmake determining
dependency graphs, I expect. For the next 1:20, CPU is about 70%
busy (~15% system; ~65% user/nice) for about 20 minutes, then drops
to about 45% busy (~25% system; ~20% user/nice) for the next 20
minutes, and that pattern repeats once.

We then see overall CPU use climb to about 60% (~20% system; ~40%
user/nice) for about 1:20.

Then there's a period of about 2:00 where overall CPU is at about 40%
(~30% system; ~10% user/nice).

Based on earlier work I did, where I was able to do a similar build in a
native FreeBSD/i386 (no PAE) enviroment on the same hardware (but when
it still only had 6GB RAM), and I managed to get the build done in 2:47,
I believe that getting more work done in parallel in this 2:00 period is
a key to improving performance: the 2:47 result showed that period to be
a very busy one for the CPU.

But I am at a loss to understand what might be preventing the work form
getting done (in a timely fashion).

I believe that there were some commits made to stable/9 (MFCed from
head) a few months ago to significantly reduce the overhead of using
jails or using nullfs (or both).  And I'm looking forward to being able
to test that -- but I need to get a "fixed" 8.x environment deployed
first, and a 12% increase in build times is not something that is likely
to be well-received.

Help?

Peace,
david
--=20
David H. Wolfskill				david@catwhisker.org
Taliban: Evil cowards with guns afraid of truth from a 14-year old girl.

See http://www.catwhisker.org/~david/publickey.gpg for my public key.

--BXVAT5kNtrzKuDFl
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (FreeBSD)

iEYEARECAAYFAlJS7yMACgkQmprOCmdXAD3WkQCcCErKrKm8i72ycj17dDo89KFO
F0kAn2GF/T0fsJeLznJMyZn1ijQ90rfO
=xORX
-----END PGP SIGNATURE-----

--BXVAT5kNtrzKuDFl--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20131007172804.GA7641>