From owner-freebsd-hackers@FreeBSD.ORG Wed Jul 18 20:29:39 2012 Return-Path: Delivered-To: hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 6864E1065680; Wed, 18 Jul 2012 20:29:39 +0000 (UTC) (envelope-from rysto32@gmail.com) Received: from mail-vc0-f182.google.com (mail-vc0-f182.google.com [209.85.220.182]) by mx1.freebsd.org (Postfix) with ESMTP id 0FF6A8FC1C; Wed, 18 Jul 2012 20:29:38 +0000 (UTC) Received: by vcbf1 with SMTP id f1so1815746vcb.13 for ; Wed, 18 Jul 2012 13:29:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:cc:content-type; bh=QFExc81APErBdIs7u7To+/i/0qb8mZa3lGwJI1upoI8=; b=lH5uIavT4Am9pFXjHtQWxJ/+0LP0cfR51BM4PvtdW0t9kcvUMqIiosamSGW+nmkZGj ISUfXtT1KeJjkVu99Lp/Asot2OXX9/H2kBDjrlfkFa4I8kDuT+JQEvrRu81mfBR4gkg3 2OCyHDYYlUtgOsXZ0xDEcmeh1GOiOJvQcJolBE8Hv7dOjh64ln0efNw97VKfRdMDK6aH a+BwNWEjQDqGO5VjxxcW8l8McXUksaBMyDXl4FdVWbw4qAS+GTm2y0ZTvUAeYq5TO39O WkCC+zFs3e+xFGqjkvE9/IpmpANYE0NcNDASJkQ+93CkBgo0GPM+391+nZ8QM9pgIq88 8PVQ== MIME-Version: 1.0 Received: by 10.220.107.136 with SMTP id b8mr1656236vcp.17.1342643378592; Wed, 18 Jul 2012 13:29:38 -0700 (PDT) Received: by 10.52.37.130 with HTTP; Wed, 18 Jul 2012 13:29:38 -0700 (PDT) Date: Wed, 18 Jul 2012 16:29:38 -0400 Message-ID: From: Ryan Stone To: FreeBSD Hackers Content-Type: text/plain; charset=ISO-8859-1 Cc: Alexander Motin Subject: ULE scheduler miscalculates CPU utilization for threads that run in short bursts X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 18 Jul 2012 20:29:39 -0000 At $WORK we use a modification of DEVICE_POLLING instead of running our NICs in interrupt mode. With the ULE scheduler we are seeing that CPU utilization (e.g. in top -SH) is completely wrong: the polling threads always end up being reported at a utilization of 0%. I see problems both with the CPU utilization algorithm introduced in r232917 as well as the original one. The problem with the original algorithm is pretty easy to explain: ULE was sampling for CPU usage in hardclock(), which also kicks off the polling threads, so samples are never taken when the polling thread was running. It appears that r232917 attempts to do time-based CPU accounting instead of sampling based. sched_pctcpu_update() is called at various places to update the CPU usage of each thread: static void sched_pctcpu_update(struct td_sched *ts, int run) { int t = ticks; if (t - ts->ts_ltick >= SCHED_TICK_TARG) { ts->ts_ticks = 0; ts->ts_ftick = t - SCHED_TICK_TARG; } else if (t - ts->ts_ftick >= SCHED_TICK_MAX) { ts->ts_ticks = (ts->ts_ticks / (ts->ts_ltick - ts->ts_ftick)) * (ts->ts_ltick - (t - SCHED_TICK_TARG)); ts->ts_ftick = t - SCHED_TICK_TARG; } if (run) ts->ts_ticks += (t - ts->ts_ltick) << SCHED_TICK_SHIFT; ts->ts_ltick = t; } The problem with it is that it only seems to work at the granularity of 1 tick. My polling threads get woken up at each hardclock() invocation and stop running before the next hardclock() invocation, so ticks is (almost) never incremented while the polling thread is running. This means that when sched_pctcpu_update is called when the polling thread is going to sleep, run=1 but ts->ts_ltick == ticks, so ts_ticks is incremented by 0. When the polling thread is woken up again, ticks has been incremented in the meantime and sched_pctcpu_update is called with run=0, so ts_ticks is not incremented but ltick is set to ticks. The effect is that ts_ticks is never incremented so CPU usage is always reported as 0. I think that you'll see the same effect with the softclock threads, too. I've experimented with reverting r232917 and instead moving the sampling code from sched_tick() to sched_clock(), and that seems to give me reasonably accurate results (for my workload, anyway). The other option would be to use a timer with a higher granularity than ticks in sched_pctcpu_update().