Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 17 Apr 2012 12:50:33 -0700
From:      David Wolfskill <david@catwhisker.org>
To:        Gary Jennejohn <gljennjohn@googlemail.com>
Cc:        freebsd-hackers <freebsd-hackers@freebsd.org>
Subject:   Re: CAM disk I/O starvation
Message-ID:  <20120417195033.GP1437@albert.catwhisker.org>
In-Reply-To: <20120417211558.4793b705@ernst.jennejohn.org>
References:  <CADC0LV=-e%2B7PshRQdc69e2-Vktf6XFpVrqiMpx=QL4m_%2B9hSnw@mail.gmail.com> <20120403193124.46ad9de9@ernst.jennejohn.org> <CADC0LVm1HY2Dz%2BVk_GK35szRS6ySviLhMiL1TSRBOnPwQnBwRg@mail.gmail.com> <20120411192153.5672b62c@ernst.jennejohn.org> <CAJ-VmokwR%2BVHmup6OLN%2BBGHvoAeLvJ9%2BBeZ9Fm6xM7Pio73pzQ@mail.gmail.com> <20120417211558.4793b705@ernst.jennejohn.org>

next in thread | previous in thread | raw e-mail | index | archive | help

--YyxzkC/DtE3JUx8+
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Tue, Apr 17, 2012 at 09:15:58PM +0200, Gary Jennejohn wrote:
> ...
> I still have the old problem kernel around, but it's probably not
> instrumented for any meaningful diagnoses.
> ...

Several months ago, I was running a set of meaurements (to determine how
performance for a certain task varied when I changed the hardware
configuration of the machine in question).

While it turned out that disk I/O was (surprisingly) not very
significant, I did find that extending the mechanism I had been using to
graph (aggregate) CPU utilization to graph the utilization of each core
was enlightening.

I don't know whether I can release the code or not -- I'll ask -- but
the basic idea is to look at the CPU state counters (they are an ordered
quintuple, for user, nice, system, interrupt, and idle) -- use the
sysctl OIDs idle kern.cp_time for the aggregate of all CPUs; use idle
kern.cp_times for an array of them, one quintuple per core.  I graphed
them using stacked barcharts; I used math/R to handle the graphing.

Since CPU (or any other) utilization only makes sense over an interval,
you also need to choose one; I used 10-second intervals by default.

In any case, even under 7.1, I noticed that one of the cores got the
vast bulk of the interrupt processing.  (I also saw some of the cores go
quite a bit more idle than others, which was fairly curious.)

Anyway: the point of the above rambling is that it isn't necessary
to actually "instrument" the kernel itself: the work I did was
deliberately designed to be able to run on an unmodified FreeBSD
system with no ports, packages, or other 3rd-party software installed
except for lang/perl.  I also tried comparing running under
/usr/bin/time vs. running under my Perl script (which invokes
/usr/bin/time to get the getrusage() info) several times, and found no
statistically significant difference in resource usage -- even when I
reduced the sampling interval down to 1/second.

(A sufficiently motivated & talented individual could probably replace
the Perl script with a shell script.  As it is, the Perl script
fork/execs a shell script to do the interval-sampling.)

Peace,
david
--=20
David H. Wolfskill				david@catwhisker.org
Depriving a girl or boy of an opportunity for education is evil.

See http://www.catwhisker.org/~david/publickey.gpg for my public key.

--YyxzkC/DtE3JUx8+
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (FreeBSD)

iEYEARECAAYFAk+NyYgACgkQmprOCmdXAD3WCgCfaou9SVuzpqAczEeeqM6WmHnV
4E4AnjQLLW7rBhLsKE/g90SQdZagi6sQ
=n2qL
-----END PGP SIGNATURE-----

--YyxzkC/DtE3JUx8+--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20120417195033.GP1437>