From owner-svn-src-all@FreeBSD.ORG Thu Feb 24 22:23:01 2011 Return-Path: Delivered-To: svn-src-all@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 6DD37106566B; Thu, 24 Feb 2011 22:23:01 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail04.syd.optusnet.com.au (mail04.syd.optusnet.com.au [211.29.132.185]) by mx1.freebsd.org (Postfix) with ESMTP id E57CE8FC2A; Thu, 24 Feb 2011 22:23:00 +0000 (UTC) Received: from c122-107-114-89.carlnfd1.nsw.optusnet.com.au (c122-107-114-89.carlnfd1.nsw.optusnet.com.au [122.107.114.89]) by mail04.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id p1OMMcHJ013652 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 25 Feb 2011 09:22:39 +1100 Date: Fri, 25 Feb 2011 09:22:38 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Bruce Evans In-Reply-To: <20110225070237.F983@besplex.bde.org> Message-ID: <20110225085508.O1276@besplex.bde.org> References: <201102241613.p1OGDXpM047076@svn.freebsd.org> <201102241347.39267.jhb@freebsd.org> <5965E5EC-A725-423A-9420-B84AD09993DC@elvandar.org> <201102241435.09011.jhb@freebsd.org> <20110225070237.F983@besplex.bde.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: Remko Lodder , John Baldwin , svn-src-all@FreeBSD.org, src-committers@FreeBSD.org, davidxu@FreeBSD.org, svn-src-head@FreeBSD.org, Remko Lodder Subject: Re: svn commit: r219003 - head/usr.bin/nice X-BeenThere: svn-src-all@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "SVN commit messages for the entire src tree \(except for " user" and " projects" \)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 24 Feb 2011 22:23:01 -0000 On Fri, 25 Feb 2011, Bruce Evans wrote: > On Thu, 24 Feb 2011, John Baldwin wrote: > >> On Thursday, February 24, 2011 2:03:33 pm Remko Lodder wrote: >>> > [contex restored: > +A priority of 19 or 20 will prevent a process from taking any cycles from > +others at nice 0 or better.] > >>> On Feb 24, 2011, at 7:47 PM, John Baldwin wrote: >>> >>>> Are you sure that this statement applies to both ULE and 4BSD? The two >>>> schedulers treat nice values a bit differently. >>> >>> No I am not sure that the statement applies, given your response I >>> understand >>> that both schedulers work differently. Can you or David tell me what the >>> difference >>> is so that I can properly document it? I thought that the tool is doin the >>> same for all >>> schedulers, but that the backend might treat it differently. > > I'm sure that testing would show that it doesn't apply in FreeBSD. It is > supposed to apply only approximately in FreeBSD, but niceness handling in > FreeBSD is quite broken so it doesn't apply at all. Also, the magic numbers > of 19 and 20 probably don't apply in FreeBSD. These were because there > nicenesses that are the same mod 2 (maybe after adding 1) have the same > effect, since priorities that are the same mode RQ_PPQ = 4 have the same > effect and the niceness space was scaled to the priority space by > multiplying by NICE_WEIGHT = 2. But NICE_WEIGHT has been broken to be 1 > in FreeBSD with SCHED_4BSD and doesn't apply with SCHED_ULE. With > SCHED_4BSD, there are 4 (not 2) nice values near 20 that give the same > behaviour. > > It strictly only applies to broken schedulers. Preventing a process > from taking *any* cycles gives priority inversion livelock. FreeBSD > has priority propagation to prevent this. Just tried it with SCHED_4BSD. On a multi-CPU system (ref9-i386), but I think I used cpuset correctly to emulate 1 CPU. % last pid: 85392; load averages: 1.71, 0.86, 0.38 up 94+01:00:36 21:55:59 % 66 processes: 3 running, 63 sleeping % CPU: 6.9% user, 3.7% nice, 2.0% system, 0.0% interrupt, 87.3% idle % Mem: 268M Active, 4969M Inact, 310M Wired, 50M Cache, 112M Buf, 2413M Free % Swap: 8192M Total, 580K Used, 8191M Free % % PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND % [... system is not nearly idle, but plenty of CPUs to spare] % 85368 bde 1 111 0 9892K 1312K RUN 1 1:07 65.67% sh % 85369 bde 1 123 20 9892K 1312K CPU1 1 0:35 37.89% sh This shows the bogus 1:2 ratio even for a niceness difference of 20. I've seen too much of this ratio. IIRC, before FreeBSD-4 was fixed, the various nonlinearities caused by not even clamping, combined with the broken scaling, gave a ratio of about this. Then FreeBSD-5 restored a similarly bogus ratio. Apparently, the algorithm for decaying p_estcpu in SCHED_4BSD tends to generate this ratio. SCHED_ULE uses a completely different algorithm and I think it has more control over the scaling, so it is surprising that it duplicates this brokenness so perfectly. And here is what it does with more nice values: this was generated by: % for i in 0 2 4 6 8 10 12 14 16 18 20 % do % cpuset -l 1 nice -$i sh -c "while :; do echo -n;done" & % done % top -o time % last pid: 85649; load averages: 10.99, 9.06, 5.35 up 94+01:19:33 22:14:56 % 74 processes: 12 running, 62 sleeping % % Mem: 270M Active, 4969M Inact, 310M Wired, 50M Cache, 112M Buf, 2411M Free % Swap: 8192M Total, 580K Used, 8191M Free % % % PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND % 85581 bde 1 98 0 9892K 1312K RUN 1 0:48 11.47% sh % 85582 bde 1 100 2 9892K 1312K RUN 1 0:45 10.69% sh % 85583 bde 1 102 4 9892K 1312K RUN 1 0:42 10.35% sh % 85584 bde 1 104 6 9892K 1312K CPU1 1 0:40 9.47% sh % 85585 bde 1 106 8 9892K 1312K RUN 1 0:38 8.79% sh % 85586 bde 1 108 10 9892K 1312K RUN 1 0:36 8.06% sh % 85587 bde 1 110 12 9892K 1312K RUN 1 0:34 8.40% sh % 85588 bde 1 111 14 9892K 1312K RUN 1 0:33 8.50% sh % 85589 bde 1 113 16 9892K 1312K RUN 1 0:31 7.67% sh % 85590 bde 1 115 18 9892K 1312K RUN 1 0:30 7.28% sh % 85591 bde 1 117 20 9892K 1312K RUN 1 0:29 6.69% sh This is OK except for the far-too-small dynamic range of 29:48 (even worse than 1:2). My version spaces out things nicely according to its table: % last pid: 1374; load averages: 11.02, 8.74, 4.93 up 0+02:26:12 09:16:47 % 43 processes: 12 running, 31 sleeping % CPU: 14.0% user, 85.7% nice, 0.0% system, 0.4% interrupt, 0.0% idle % Mem: 35M Active, 23M Inact, 67M Wired, 24K Cache, 61M Buf, 876M Free % Swap: % % PID USERNAME THR PRI NICE SIZE RES STATE TIME WCPU COMMAND % 1325 root 1 120 0 856K 572K RUN 2:18 28.52% sh % 1326 root 1 120 2 856K 572K RUN 1:39 19.97% sh % 1327 root 1 120 4 856K 572K RUN 1:10 13.96% sh % 1328 root 1 120 6 856K 572K RUN 0:50 9.72% sh % 1329 root 1 123 8 856K 572K RUN 0:36 7.18% sh % 1330 root 1 123 10 856K 572K RUN 0:25 5.03% sh % 1331 root 1 124 12 856K 572K RUN 0:18 2.93% sh % 1332 root 1 124 14 856K 572K RUN 0:13 1.86% sh % 1333 root 1 124 16 856K 572K RUN 0:09 0.98% sh % 1334 root 1 124 18 856K 572K RUN 0:06 1.07% sh % 1335 root 1 123 20 856K 572K RUN 0:05 0.15% sh The dynamic range here is 5:138. Not as close to the table's 1:32 as I would like. Bruce