From owner-freebsd-current@FreeBSD.ORG  Thu Aug 12 22:39:59 2004
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
Delivered-To: freebsd-current@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id 5E25A16A4CE; Thu, 12 Aug 2004 22:39:59 +0000 (GMT)
Received: from gw.catspoiler.org (217-ip-163.nccn.net [209.79.217.163])
	by mx1.FreeBSD.org (Postfix) with ESMTP
	id 25CCE43D2F; Thu, 12 Aug 2004 22:39:58 +0000 (GMT)
	(envelope-from truckman@FreeBSD.org)
Received: from FreeBSD.org (mousie.catspoiler.org [192.168.101.2])
	by gw.catspoiler.org (8.12.11/8.12.11) with ESMTP id i7CMddjc021563;
	Thu, 12 Aug 2004 15:39:44 -0700 (PDT)
	(envelope-from truckman@FreeBSD.org)
Message-Id: <200408122239.i7CMddjc021563@gw.catspoiler.org>
Date: Thu, 12 Aug 2004 15:39:39 -0700 (PDT)
From: Don Lewis <truckman@FreeBSD.org>
To: rwatson@FreeBSD.org
In-Reply-To: <200408121944.i7CJib4f021202@gw.catspoiler.org>
MIME-Version: 1.0
Content-Type: TEXT/plain; charset=us-ascii
cc: jroberson@chesapeake.net
cc: freebsd-current@FreeBSD.org
Subject: nice handling in ULE (was: Re: SCHEDULE and high load situations)
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
	<freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 12 Aug 2004 22:39:59 -0000

On 12 Aug, Don Lewis wrote:

> Here's a case where there is not much fork()/exec() activity (last pid
> is stable), and there doesn't appear to be much I/O, but the niced
> process is getting more than 50% of the CPU.  I noticed this during a
> portupgrade of editors/openoffice-1.1.  The CPU% for setiathome seems to
> stay right around in this range +/- a percent or so, with spikes higher
> when the other processes normally competing for CPU time are waiting for
> I/O.
> 
> 57 processes:  3 running, 53 sleeping, 1 stopped
> CPU states: 42.8% user, 56.0% nice,  1.2% system,  0.0% interrupt,  0.0% idle
> Mem: 99M Active, 628M Inact, 156M Wired, 31M Cache, 111M Buf, 83M Free
> Swap: 2055M Total, 84K Used, 2055M Free
> Seconds to delay:
>   PID USERNAME   PRI NICE   SIZE    RES STATE    TIME   WCPU    CPU COMMAND
>   504 setiathome 139   15 17744K 16884K RUN    719:22 54.69% 54.69% setiathome
> 64008 root       139    0  4232K  3832K RUN     11:50 41.41% 41.41% dmake
>   568 dl          76    0  6084K  2324K select   5:08  0.00%  0.00% sshd
>   483 uucp        76    0  1336K   988K select   2:36  0.00%  0.00% newapc
> 16995 root        76    0  1284K   760K select   2:21  0.00%  0.00% script
> 12823 root        -8    0  1200K   552K piperd   2:03  0.00%  0.00% tee
>   485 uucp        76    0  1308K   944K select   0:33  0.00%  0.00% upsd
> 53855 root         8    0  4956K  4212K wait     0:13  0.00%  0.00% perl
>   489 uucp         8    0  1316K  1012K nanslp   0:12  0.00%  0.00% upsmon
>   410 root        76    0  2876K  1500K select   0:10  0.00%  0.00% ntpd
> 59764 dl          76    0  2400K  1604K RUN      0:07  0.00%  0.00% top
>   437 root        76    0  3436K  2300K select   0:05  0.00%  0.00% sendmail
> 12822 root         8    0 21348K 20792K wait     0:02  0.00%  0.00% ruby18
>   490 uucp         8    0  1272K   908K nanslp   0:01  0.00%  0.00% upslog
>  1101 dl          76    0  6084K  2324K select   0:01  0.00%  0.00% sshd
>   454 root         8    0  1356K  1004K nanslp   0:01  0.00%  0.00% cron
>   657 root        20    0  2440K  1940K pause    0:01  0.00%  0.00% csh


I did some experimentation, and the problem I'm seeing appears to just
be related to how nice values are handled by ULE.  I'm running two
copies of the following program, one at nice +15, and the other not
niced:

hairball:~ 102>cat sponge.c
int
main(int argc, char **argv)
{
        while (1)
                ;
}

The niced process was started second, but it has accumulated more CPU
time and is getting a larger percentage of the CPU time according to
top.

last pid:   662;  load averages:  2.00,  1.95,  1.45    up 0+00:22:35  15:14:27
31 processes:  3 running, 28 sleeping
CPU states: 45.3% user, 53.1% nice,  1.2% system,  0.4% interrupt,  0.0% idle
Mem: 22M Active, 19M Inact, 44M Wired, 28K Cache, 28M Buf, 408M Free
Swap: 1024M Total, 1024M Free
Seconds to delay: 
  PID USERNAME PRI NICE   SIZE    RES STATE    TIME   WCPU    CPU COMMAND
  599 dl       139   15  1180K   448K RUN      8:34 53.91% 53.91% sponge
  598 dl       139    0  1180K   448K RUN      7:22 42.97% 42.97% sponge
  587 dl        76    0  2288K  1580K RUN      0:03  0.00%  0.00% top
  462 root      76    0 56656K 46200K select   0:02  0.00%  0.00% Xorg
  519 gdm       76    0 11252K  8564K select   0:01  0.00%  0.00% gdmlogin
  579 dl        76    0  6088K  2968K select   0:00  0.00%  0.00% sshd


I thought it might have something to do with grouping by niceness, which
would group the un-niced process with a bunch of other processes that
wake up every now and then for a little bit if CPU time, so I tried the
experiment again with nice +1 and nice +15.  This gave a rather
interesting result.  Top reports the nice +15 process as getting a
higher %CPU, but the nice +1 process has slowly accumulated a bit more
total CPU time.  The difference in total CPU time was initially seven
seconds or less.

last pid:   745;  load averages:  2.00,  1.99,  1.84    up 0+00:43:30  15:35:22
31 processes:  3 running, 28 sleeping
CPU states:  0.0% user, 99.6% nice,  0.4% system,  0.0% interrupt,  0.0% idle
Mem: 22M Active, 19M Inact, 44M Wired, 28K Cache, 28M Buf, 408M Free
Swap: 1024M Total, 1024M Free
Seconds to delay: 
  PID USERNAME PRI NICE   SIZE    RES STATE    TIME   WCPU    CPU COMMAND
  675 dl       139   15  1180K   448K RUN      9:48 52.34% 52.34% sponge
  674 dl       139    1  1180K   448K RUN     10:03 44.53% 44.53% sponge
  587 dl        76    0  2288K  1580K RUN      0:06  0.00%  0.00% top
  462 root      76    0 56656K 46200K select   0:03  0.00%  0.00% Xorg
  519 gdm       76    0 11252K  8564K select   0:02  0.00%  0.00% gdmlogin
  579 dl        76    0  6088K  2968K select   0:00  0.00%  0.00% sshd