From owner-freebsd-stable@FreeBSD.ORG Mon Dec 6 21:39:29 2010 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B19631065673 for ; Mon, 6 Dec 2010 21:39:29 +0000 (UTC) (envelope-from peterjeremy@acm.org) Received: from mail36.syd.optusnet.com.au (mail36.syd.optusnet.com.au [211.29.133.76]) by mx1.freebsd.org (Postfix) with ESMTP id 3A1828FC1C for ; Mon, 6 Dec 2010 21:39:28 +0000 (UTC) Received: from server.vk2pj.dyndns.org (c220-239-116-103.belrs4.nsw.optusnet.com.au [220.239.116.103]) by mail36.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id oB6LdQQI027313 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Tue, 7 Dec 2010 08:39:27 +1100 X-Bogosity: Ham, spamicity=0.000000 Received: from server.vk2pj.dyndns.org (localhost.vk2pj.dyndns.org [127.0.0.1]) by server.vk2pj.dyndns.org (8.14.4/8.14.4) with ESMTP id oB6LdPwA062186 for ; Tue, 7 Dec 2010 08:39:25 +1100 (EST) (envelope-from peter@server.vk2pj.dyndns.org) Received: (from peter@localhost) by server.vk2pj.dyndns.org (8.14.4/8.14.4/Submit) id oB6LdPo9062185 for freebsd-stable@freebsd.org; Tue, 7 Dec 2010 08:39:25 +1100 (EST) (envelope-from peter) Date: Tue, 7 Dec 2010 08:39:25 +1100 From: Peter Jeremy To: freebsd-stable@freebsd.org Message-ID: <20101206213925.GA61049@server.vk2pj.dyndns.org> References: <20101128072624.GA76358@server.vk2pj.dyndns.org> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="huq684BweRXVnRxX" Content-Disposition: inline In-Reply-To: X-PGP-Key: http://members.optusnet.com.au/peterjeremy/pubkey.asc User-Agent: Mutt/1.5.20 (2009-06-14) Subject: Re: idprio processes slowing down system X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 06 Dec 2010 21:39:29 -0000 --huq684BweRXVnRxX Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On 2010-Nov-28 02:24:21 -0600, Adam Vande More wrot= e: >On Sun, Nov 28, 2010 at 1:26 AM, Peter Jeremy wrote: >> Since all the boinc processes are running at i31, why are they impacting >> a buildkernel that runs with 0 nicety? > >With the setup you presented you're going to have a lot of context switches >as the buildworld is going to give plenty of oppurtunities for boinc >processes to get some time. Agreed. > When it does switch out, the CPU cache is >invalidated, then invalidated again when the buildworld preempts back. Not quite. The amd64 uses physically addressed caches (see [1] 7.6.1) so there's no need to flush the caches on a context switch. (Though the TLB _will_ need to be flushed since it does virtual-to-physical mapping (see [1] 5.5)). OTOH, whilst the boinc code is running, it will occupy space in the caches, thus reducing the effective cache size and presumably reducing the effective cache hit rate. > This is what makes it slow. Unfortunately, I don't think this explains the difference. My system doesn't have hyperthreading so any memory stalls will block the affected core and the stall time will be added to the currently running process. My timing figures show that the user and system time is unaffected by boinc - which is inconsistent with the slowdown being due to the impact on boinc on caching. I've done some further investigations following a suggestion from a friend. In particular, an idprio process should only be occupying idle time so the time used by boinc and the system idle task whilst boinc is running should be the same as the system idle time whilst boinc is not running. Re-running the tests and additionally monitoring process times gives me the following idle time stats: x /tmp/boinc_running + /tmp/boinc_stopped +------------------------------------------------------------------------+ | + + + + xx x x| ||__________A_M_______| |__AM| | +------------------------------------------------------------------------+ N Min Max Median Avg Stddev x 4 493.3 507.78 501.69 499.765 6.3722759 + 4 332.35 392.08 361.84 356.885 26.514364 Difference at 95.0% confidence -142.88 +/- 33.364 -28.5894% +/- 6.67595% (Student's t, pooled s =3D 19.2823) The numbers represent seconds of CPU time charged to [idle] (+) or [idle] and all boinc processes (x). This shows that when boinc is running, it is using time that would not otherwise be idle - which isn't what idprio processes should be doing. My suspicion is that idprio processes are not being preempted immediately a higher priority process becomes ready but are being allowed to continue to run for a short period (possibly until their current timeslice expires). Unfortunately, I haven't yet worked out how to prove or disprove this. I was hoping that someone more familiar with the scheduler behaviour would comment. [1] "AMD64 Architecture Programmer's Manual Volume 2: System Programming" http://support.amd.com/us/Processor_TechDocs/24593.pdf --=20 Peter Jeremy --huq684BweRXVnRxX Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.15 (FreeBSD) iEYEARECAAYFAkz9WA0ACgkQ/opHv/APuIe2tQCcD8I7r7aL2lH7UhqhVgIcEjeP Yb8An3tqj2nfeEbaxLcVFLvNzTOVuY3g =KBGa -----END PGP SIGNATURE----- --huq684BweRXVnRxX--