Date: Tue, 17 Apr 2012 12:50:33 -0700 From: David Wolfskill <david@catwhisker.org> To: Gary Jennejohn <gljennjohn@googlemail.com> Cc: freebsd-hackers <freebsd-hackers@freebsd.org> Subject: Re: CAM disk I/O starvation Message-ID: <20120417195033.GP1437@albert.catwhisker.org> In-Reply-To: <20120417211558.4793b705@ernst.jennejohn.org> References: <CADC0LV=-e%2B7PshRQdc69e2-Vktf6XFpVrqiMpx=QL4m_%2B9hSnw@mail.gmail.com> <20120403193124.46ad9de9@ernst.jennejohn.org> <CADC0LVm1HY2Dz%2BVk_GK35szRS6ySviLhMiL1TSRBOnPwQnBwRg@mail.gmail.com> <20120411192153.5672b62c@ernst.jennejohn.org> <CAJ-VmokwR%2BVHmup6OLN%2BBGHvoAeLvJ9%2BBeZ9Fm6xM7Pio73pzQ@mail.gmail.com> <20120417211558.4793b705@ernst.jennejohn.org>
next in thread | previous in thread | raw e-mail | index | archive | help
--YyxzkC/DtE3JUx8+ Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Tue, Apr 17, 2012 at 09:15:58PM +0200, Gary Jennejohn wrote: > ... > I still have the old problem kernel around, but it's probably not > instrumented for any meaningful diagnoses. > ... Several months ago, I was running a set of meaurements (to determine how performance for a certain task varied when I changed the hardware configuration of the machine in question). While it turned out that disk I/O was (surprisingly) not very significant, I did find that extending the mechanism I had been using to graph (aggregate) CPU utilization to graph the utilization of each core was enlightening. I don't know whether I can release the code or not -- I'll ask -- but the basic idea is to look at the CPU state counters (they are an ordered quintuple, for user, nice, system, interrupt, and idle) -- use the sysctl OIDs idle kern.cp_time for the aggregate of all CPUs; use idle kern.cp_times for an array of them, one quintuple per core. I graphed them using stacked barcharts; I used math/R to handle the graphing. Since CPU (or any other) utilization only makes sense over an interval, you also need to choose one; I used 10-second intervals by default. In any case, even under 7.1, I noticed that one of the cores got the vast bulk of the interrupt processing. (I also saw some of the cores go quite a bit more idle than others, which was fairly curious.) Anyway: the point of the above rambling is that it isn't necessary to actually "instrument" the kernel itself: the work I did was deliberately designed to be able to run on an unmodified FreeBSD system with no ports, packages, or other 3rd-party software installed except for lang/perl. I also tried comparing running under /usr/bin/time vs. running under my Perl script (which invokes /usr/bin/time to get the getrusage() info) several times, and found no statistically significant difference in resource usage -- even when I reduced the sampling interval down to 1/second. (A sufficiently motivated & talented individual could probably replace the Perl script with a shell script. As it is, the Perl script fork/execs a shell script to do the interval-sampling.) Peace, david --=20 David H. Wolfskill david@catwhisker.org Depriving a girl or boy of an opportunity for education is evil. See http://www.catwhisker.org/~david/publickey.gpg for my public key. --YyxzkC/DtE3JUx8+ Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (FreeBSD) iEYEARECAAYFAk+NyYgACgkQmprOCmdXAD3WCgCfaou9SVuzpqAczEeeqM6WmHnV 4E4AnjQLLW7rBhLsKE/g90SQdZagi6sQ =n2qL -----END PGP SIGNATURE----- --YyxzkC/DtE3JUx8+--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20120417195033.GP1437>