From owner-freebsd-hackers@FreeBSD.ORG Wed Jul 18 20:47:30 2012 Return-Path: Delivered-To: hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id C76E9106566C for ; Wed, 18 Jul 2012 20:47:30 +0000 (UTC) (envelope-from mavbsd@gmail.com) Received: from mail-wg0-f42.google.com (mail-wg0-f42.google.com [74.125.82.42]) by mx1.freebsd.org (Postfix) with ESMTP id 534228FC12 for ; Wed, 18 Jul 2012 20:47:30 +0000 (UTC) Received: by wgbfm10 with SMTP id fm10so4421060wgb.1 for ; Wed, 18 Jul 2012 13:47:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:message-id:date:from:user-agent:mime-version:to:cc:subject :references:in-reply-to:content-type:content-transfer-encoding; bh=+8r9l+r0PjFu6OA68GewdF2xmFXWJ18S1sxCqan9deI=; b=SugkjUhyQ/UbdeCKx1OlKPedFbRN61g3+ZDN8fapSQ+f2WKZDEZ6xGQTbopcphyBMY LEaHiTdIAaue6ed/rBsU3nFa6hNHTSZ/mXxRXQiETHH/D9c/5+/VL88ESVYrp+SwaKjc VmCp4HryVEm0/edfW1dsb2u0g8y3Ge2Dne0m+ZVpIHtsgENYj7n21T3c6pntvwp0w0+S yj2duDgffYVqq2sI3nTynjCjMlGQCDCaxD9Ma+hUVH3cKlXHK5TuTMtlsFRVwheHWEJj hpzp3idmZokkkURqL54l1COq6wG9qeMMuJdtH2sU9cvnvlCs/DItmQQPzTKtVFPfPHVm OUcA== Received: by 10.180.78.33 with SMTP id y1mr9776401wiw.3.1342644449319; Wed, 18 Jul 2012 13:47:29 -0700 (PDT) Received: from mavbook.mavhome.dp.ua (pc.mavhome.dp.ua. [212.86.226.226]) by mx.google.com with ESMTPS id ef5sm158370wib.3.2012.07.18.13.47.27 (version=TLSv1/SSLv3 cipher=OTHER); Wed, 18 Jul 2012 13:47:28 -0700 (PDT) Sender: Alexander Motin Message-ID: <500720DD.8090605@FreeBSD.org> Date: Wed, 18 Jul 2012 23:47:25 +0300 From: Alexander Motin User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:13.0) Gecko/20120628 Thunderbird/13.0.1 MIME-Version: 1.0 To: Ryan Stone References: In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: FreeBSD Hackers Subject: Re: ULE scheduler miscalculates CPU utilization for threads that run in short bursts X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 18 Jul 2012 20:47:31 -0000 On 18.07.2012 23:29, Ryan Stone wrote: > At $WORK we use a modification of DEVICE_POLLING instead of running > our NICs in interrupt mode. With the ULE scheduler we are seeing that > CPU utilization (e.g. in top -SH) is completely wrong: the polling > threads always end up being reported at a utilization of 0%. > > I see problems both with the CPU utilization algorithm introduced in > r232917 as well as the original one. The problem with the original > algorithm is pretty easy to explain: ULE was sampling for CPU usage in > hardclock(), which also kicks off the polling threads, so samples are > never taken when the polling thread was running. > > It appears that r232917 attempts to do time-based CPU accounting > instead of sampling based. sched_pctcpu_update() is called at various > places to update the CPU usage of each thread: > > static void > sched_pctcpu_update(struct td_sched *ts, int run) > { > int t = ticks; > > if (t - ts->ts_ltick >= SCHED_TICK_TARG) { > ts->ts_ticks = 0; > ts->ts_ftick = t - SCHED_TICK_TARG; > } else if (t - ts->ts_ftick >= SCHED_TICK_MAX) { > ts->ts_ticks = (ts->ts_ticks / (ts->ts_ltick - ts->ts_ftick)) * > (ts->ts_ltick - (t - SCHED_TICK_TARG)); > ts->ts_ftick = t - SCHED_TICK_TARG; > } > if (run) > ts->ts_ticks += (t - ts->ts_ltick) << SCHED_TICK_SHIFT; > ts->ts_ltick = t; > } > > The problem with it is that it only seems to work at the granularity > of 1 tick. My polling threads get woken up at each hardclock() > invocation and stop running before the next hardclock() invocation, so > ticks is (almost) never incremented while the polling thread is > running. This means that when sched_pctcpu_update is called when the > polling thread is going to sleep, run=1 but ts->ts_ltick == ticks, so > ts_ticks is incremented by 0. When the polling thread is woken up > again, ticks has been incremented in the meantime and > sched_pctcpu_update is called with run=0, so ts_ticks is not > incremented but ltick is set to ticks. The effect is that ts_ticks is > never incremented so CPU usage is always reported as 0. > > I think that you'll see the same effect with the softclock threads, too. That is obvious that it is impossible to measure pctcpu for hardclock - synchronized threads using hardclock as the only time source. Mentioned change made things neither more nor less broken then they already were. > I've experimented with reverting r232917 and instead moving the > sampling code from sched_tick() to sched_clock(), and that seems to > give me reasonably accurate results (for my workload, anyway). That approach will fix pctcpu accounting for thread synchronized to hardclock, but I worry can make worse for threads switching context around statclock. > The > other option would be to use a timer with a higher granularity than > ticks in sched_pctcpu_update(). That would be great it we had reliable and cheap timers on x86. Now ones that are fast (TSC) are unreliable, others that are reliable are too slow for this use. -- Alexander Motin