From owner-freebsd-current@FreeBSD.ORG Thu Aug 12 22:39:59 2004 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 5E25A16A4CE; Thu, 12 Aug 2004 22:39:59 +0000 (GMT) Received: from gw.catspoiler.org (217-ip-163.nccn.net [209.79.217.163]) by mx1.FreeBSD.org (Postfix) with ESMTP id 25CCE43D2F; Thu, 12 Aug 2004 22:39:58 +0000 (GMT) (envelope-from truckman@FreeBSD.org) Received: from FreeBSD.org (mousie.catspoiler.org [192.168.101.2]) by gw.catspoiler.org (8.12.11/8.12.11) with ESMTP id i7CMddjc021563; Thu, 12 Aug 2004 15:39:44 -0700 (PDT) (envelope-from truckman@FreeBSD.org) Message-Id: <200408122239.i7CMddjc021563@gw.catspoiler.org> Date: Thu, 12 Aug 2004 15:39:39 -0700 (PDT) From: Don Lewis To: rwatson@FreeBSD.org In-Reply-To: <200408121944.i7CJib4f021202@gw.catspoiler.org> MIME-Version: 1.0 Content-Type: TEXT/plain; charset=us-ascii cc: jroberson@chesapeake.net cc: freebsd-current@FreeBSD.org Subject: nice handling in ULE (was: Re: SCHEDULE and high load situations) X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 12 Aug 2004 22:39:59 -0000 On 12 Aug, Don Lewis wrote: > Here's a case where there is not much fork()/exec() activity (last pid > is stable), and there doesn't appear to be much I/O, but the niced > process is getting more than 50% of the CPU. I noticed this during a > portupgrade of editors/openoffice-1.1. The CPU% for setiathome seems to > stay right around in this range +/- a percent or so, with spikes higher > when the other processes normally competing for CPU time are waiting for > I/O. > > 57 processes: 3 running, 53 sleeping, 1 stopped > CPU states: 42.8% user, 56.0% nice, 1.2% system, 0.0% interrupt, 0.0% idle > Mem: 99M Active, 628M Inact, 156M Wired, 31M Cache, 111M Buf, 83M Free > Swap: 2055M Total, 84K Used, 2055M Free > Seconds to delay: > PID USERNAME PRI NICE SIZE RES STATE TIME WCPU CPU COMMAND > 504 setiathome 139 15 17744K 16884K RUN 719:22 54.69% 54.69% setiathome > 64008 root 139 0 4232K 3832K RUN 11:50 41.41% 41.41% dmake > 568 dl 76 0 6084K 2324K select 5:08 0.00% 0.00% sshd > 483 uucp 76 0 1336K 988K select 2:36 0.00% 0.00% newapc > 16995 root 76 0 1284K 760K select 2:21 0.00% 0.00% script > 12823 root -8 0 1200K 552K piperd 2:03 0.00% 0.00% tee > 485 uucp 76 0 1308K 944K select 0:33 0.00% 0.00% upsd > 53855 root 8 0 4956K 4212K wait 0:13 0.00% 0.00% perl > 489 uucp 8 0 1316K 1012K nanslp 0:12 0.00% 0.00% upsmon > 410 root 76 0 2876K 1500K select 0:10 0.00% 0.00% ntpd > 59764 dl 76 0 2400K 1604K RUN 0:07 0.00% 0.00% top > 437 root 76 0 3436K 2300K select 0:05 0.00% 0.00% sendmail > 12822 root 8 0 21348K 20792K wait 0:02 0.00% 0.00% ruby18 > 490 uucp 8 0 1272K 908K nanslp 0:01 0.00% 0.00% upslog > 1101 dl 76 0 6084K 2324K select 0:01 0.00% 0.00% sshd > 454 root 8 0 1356K 1004K nanslp 0:01 0.00% 0.00% cron > 657 root 20 0 2440K 1940K pause 0:01 0.00% 0.00% csh I did some experimentation, and the problem I'm seeing appears to just be related to how nice values are handled by ULE. I'm running two copies of the following program, one at nice +15, and the other not niced: hairball:~ 102>cat sponge.c int main(int argc, char **argv) { while (1) ; } The niced process was started second, but it has accumulated more CPU time and is getting a larger percentage of the CPU time according to top. last pid: 662; load averages: 2.00, 1.95, 1.45 up 0+00:22:35 15:14:27 31 processes: 3 running, 28 sleeping CPU states: 45.3% user, 53.1% nice, 1.2% system, 0.4% interrupt, 0.0% idle Mem: 22M Active, 19M Inact, 44M Wired, 28K Cache, 28M Buf, 408M Free Swap: 1024M Total, 1024M Free Seconds to delay: PID USERNAME PRI NICE SIZE RES STATE TIME WCPU CPU COMMAND 599 dl 139 15 1180K 448K RUN 8:34 53.91% 53.91% sponge 598 dl 139 0 1180K 448K RUN 7:22 42.97% 42.97% sponge 587 dl 76 0 2288K 1580K RUN 0:03 0.00% 0.00% top 462 root 76 0 56656K 46200K select 0:02 0.00% 0.00% Xorg 519 gdm 76 0 11252K 8564K select 0:01 0.00% 0.00% gdmlogin 579 dl 76 0 6088K 2968K select 0:00 0.00% 0.00% sshd I thought it might have something to do with grouping by niceness, which would group the un-niced process with a bunch of other processes that wake up every now and then for a little bit if CPU time, so I tried the experiment again with nice +1 and nice +15. This gave a rather interesting result. Top reports the nice +15 process as getting a higher %CPU, but the nice +1 process has slowly accumulated a bit more total CPU time. The difference in total CPU time was initially seven seconds or less. last pid: 745; load averages: 2.00, 1.99, 1.84 up 0+00:43:30 15:35:22 31 processes: 3 running, 28 sleeping CPU states: 0.0% user, 99.6% nice, 0.4% system, 0.0% interrupt, 0.0% idle Mem: 22M Active, 19M Inact, 44M Wired, 28K Cache, 28M Buf, 408M Free Swap: 1024M Total, 1024M Free Seconds to delay: PID USERNAME PRI NICE SIZE RES STATE TIME WCPU CPU COMMAND 675 dl 139 15 1180K 448K RUN 9:48 52.34% 52.34% sponge 674 dl 139 1 1180K 448K RUN 10:03 44.53% 44.53% sponge 587 dl 76 0 2288K 1580K RUN 0:06 0.00% 0.00% top 462 root 76 0 56656K 46200K select 0:03 0.00% 0.00% Xorg 519 gdm 76 0 11252K 8564K select 0:02 0.00% 0.00% gdmlogin 579 dl 76 0 6088K 2968K select 0:00 0.00% 0.00% sshd