From owner-svn-src-head@FreeBSD.ORG Fri Feb 25 02:44:49 2011 Return-Path: Delivered-To: svn-src-head@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 72A12106566B; Fri, 25 Feb 2011 02:44:49 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail08.syd.optusnet.com.au (mail08.syd.optusnet.com.au [211.29.132.189]) by mx1.freebsd.org (Postfix) with ESMTP id EA31E8FC0C; Fri, 25 Feb 2011 02:44:48 +0000 (UTC) Received: from c122-107-114-89.carlnfd1.nsw.optusnet.com.au (c122-107-114-89.carlnfd1.nsw.optusnet.com.au [122.107.114.89]) by mail08.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id p1P2iOm5014946 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 25 Feb 2011 13:44:25 +1100 Date: Fri, 25 Feb 2011 13:44:24 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Bruce Evans In-Reply-To: <20110225085508.O1276@besplex.bde.org> Message-ID: <20110225131532.W938@besplex.bde.org> References: <201102241613.p1OGDXpM047076@svn.freebsd.org> <201102241347.39267.jhb@freebsd.org> <5965E5EC-A725-423A-9420-B84AD09993DC@elvandar.org> <201102241435.09011.jhb@freebsd.org> <20110225070237.F983@besplex.bde.org> <20110225085508.O1276@besplex.bde.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: Remko Lodder , John Baldwin , svn-src-all@FreeBSD.org, src-committers@FreeBSD.org, davidxu@FreeBSD.org, svn-src-head@FreeBSD.org, Remko Lodder Subject: Re: svn commit: r219003 - head/usr.bin/nice X-BeenThere: svn-src-head@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: SVN commit messages for the src tree for head/-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 25 Feb 2011 02:44:49 -0000 On Fri, 25 Feb 2011, Bruce Evans wrote: > On Fri, 25 Feb 2011, Bruce Evans wrote: > >> On Thu, 24 Feb 2011, John Baldwin wrote: >> >>> On Thursday, February 24, 2011 2:03:33 pm Remko Lodder wrote: >>>> >> [contex restored: >> +A priority of 19 or 20 will prevent a process from taking any cycles from >> +others at nice 0 or better.] >> >>>> On Feb 24, 2011, at 7:47 PM, John Baldwin wrote: >>>> >>>>> Are you sure that this statement applies to both ULE and 4BSD? The two >>>>> schedulers treat nice values a bit differently. >>>> >>>> No I am not sure that the statement applies, given your response I >>>> understand >>>> that both schedulers work differently. Can you or David tell me what the >>>> difference >>>> is so that I can properly document it? I thought that the tool is doin >>>> the same for all >>>> schedulers, but that the backend might treat it differently. >> >> I'm sure that testing would show that it doesn't apply in FreeBSD. It is >> supposed to apply only approximately in FreeBSD, but niceness handling in >> FreeBSD is quite broken so it doesn't apply at all. Also, the magic >> numbers >> of 19 and 20 probably don't apply in FreeBSD. These were because there >> nicenesses that are the same mod 2 (maybe after adding 1) have the same >> effect, since priorities that are the same mode RQ_PPQ = 4 have the same >> effect and the niceness space was scaled to the priority space by >> multiplying by NICE_WEIGHT = 2. But NICE_WEIGHT has been broken to be 1 >> in FreeBSD with SCHED_4BSD and doesn't apply with SCHED_ULE. With >> SCHED_4BSD, there are 4 (not 2) nice values near 20 that give the same >> behaviour. >> >> It strictly only applies to broken schedulers. Preventing a process >> from taking *any* cycles gives priority inversion livelock. FreeBSD >> has priority propagation to prevent this. > > Just tried it with SCHED_4BSD. On a multi-CPU system (ref9-i386), but > I think I used cpuset correctly to emulate 1 CPU. > > % last pid: 85392; load averages: 1.71, 0.86, 0.38 up 94+01:00:36 > 21:55:59 > % 66 processes: 3 running, 63 sleeping > % CPU: 6.9% user, 3.7% nice, 2.0% system, 0.0% interrupt, 87.3% idle > % Mem: 268M Active, 4969M Inact, 310M Wired, 50M Cache, 112M Buf, 2413M Free > % Swap: 8192M Total, 580K Used, 8191M Free > % % PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU > COMMAND > % [... system is not nearly idle, but plenty of CPUs to spare] > % 85368 bde 1 111 0 9892K 1312K RUN 1 1:07 65.67% sh > % 85369 bde 1 123 20 9892K 1312K CPU1 1 0:35 37.89% sh > > This shows the bogus 1:2 ratio even for a niceness difference of 20. I've > seen too much of this ratio. IIRC, before FreeBSD-4 was fixed, the More tests: FreeBSD-5 with 4BSD on a 1-CPU system: % last pid: 1875; load averages: 11.94, 11.87, 10.76 up 0+00:36:11 10:45:09 % 35 processes: 13 running, 22 sleeping % CPU: 87.2% user, 12.1% nice, 0.0% system, 0.8% interrupt, 0.0% idle % Mem: 15M Active, 15M Inact, 21M Wired, 20K Cache, 9472K Buf, 950M Free % Swap: % % PID USERNAME THR PRI NICE SIZE RES STATE TIME WCPU COMMAND % 1229 root 1 112 -20 856K 576K RUN 12:03 49.85% sh % 1231 root 1 114 -16 856K 576K RUN 2:27 8.94% sh % 1233 root 1 114 -12 856K 576K RUN 2:09 7.91% sh % 1235 root 1 115 -8 856K 576K RUN 1:53 6.64% sh % 1237 root 1 115 -4 856K 576K RUN 1:32 5.91% sh % 1239 root 1 115 0 856K 576K RUN 1:24 4.93% sh % 1241 root 1 115 4 856K 576K RUN 1:13 3.96% sh % 1243 root 1 116 8 856K 576K RUN 0:45 1.95% sh % 1251 root 1 115 12 856K 576K RUN 0:35 1.86% sh % 1253 root 1 116 16 856K 576K RUN 0:22 0.05% sh % 1255 root 1 116 20 856K 576K RUN 0:00 0.00% sh I reduced the tests to only every 4 values after comfirming that the other 3 don't have much different behaviour (but the behaviour is not exactly dependent on the value mod 4). The "nice -20" process really does seem to get 0% of the CPU. It takes niceness difference of 40 to completely starve the low-priority process. So a swing of 20 for doing this is about right with the unbroken NICE_WEIGHT of 2. However, I think complete starvation is accidental and a bug. The low priority process should be allowed to run for maybe 0.01% of the time; this automatically avoids priority inversion bugs, and lets it become interactive transiently if it needs to, at no signficant cost. Note the nonlinear && non-geometric %CPU for the "nice --20" process only. This is almost certainly caused by the nonlinearity of the scaling giving by the clamping. Otherwise %CPU is sort of linear in the niceness. The dynamic range is too small, but otherwise the %CPU is a reasonable function of niceness. Removing just the "nice --20" process from the mix allows the "nice -20" process to get some cycles (about 1%). I don't remember if the nonlinearity is transferred to the "nice --16" process. FreeBSD-8 with 4BSD from 3 years ago: % last pid: 1899; load averages: 10.99, 10.97, 9.81 up 0+00:35:16 11:31:15 % 35 processes: 12 running, 23 sleeping % CPU states: 89.6% user, 10.4% nice, 0.0% system, 0.0% interrupt, 0.0% idle % Mem: 15M Active, 14M Inact, 16M Wired, 368K Cache, 9472K Buf, 952M Free % Swap: % % PID USERNAME THR PRI NICE SIZE RES STATE TIME WCPU COMMAND % 745 root 1 112 -20 856K 572K RUN 14:23 45.95% sh % 747 root 1 112 -16 856K 572K RUN 3:59 11.91% sh % 749 root 1 112 -12 856K 572K RUN 3:19 9.91% sh % 751 root 1 112 -8 856K 572K RUN 2:57 8.98% sh % 753 root 1 112 -4 856K 572K RUN 2:28 6.98% sh % 756 root 1 112 0 856K 572K RUN 1:57 5.91% sh % 759 root 1 112 4 856K 572K RUN 1:35 4.98% sh % 764 root 1 112 8 856K 572K RUN 0:58 2.98% sh % 767 root 1 112 12 856K 572K RUN 0:37 1.71% sh % 769 root 1 112 16 856K 572K RUN 0:00 0.00% sh % 771 root 1 116 20 856K 572K RUN 0:00 0.00% sh Similar to FreeBSD-5 behaviour, but now the "nice -16" process also gets no CPU. FreeBSD-8 with ULE from 3 years ago: Tests hung, since even a single shell loop wasn't preempted properly. % bde FreeBSD-~5.2 with 4BSD: % last pid: 1178; load averages: 10.99, 9.12, 5.09 up 0+00:11:56 11:44:39 % 37 processes: 12 running, 25 sleeping % CPU: 95.3% user, 4.7% nice, 0.0% system, 0.0% interrupt, 0.0% idle % Mem: 15M Active, 14M Inact, 20M Wired, 80K Cache, 9072K Buf, 952M Free % Swap: % % PID USERNAME THR PRI NICE SIZE RES STATE TIME WCPU COMMAND % 841 root 1 94 -20 856K 572K RUN 4:13 49.80% sh % 843 root 1 93 -16 856K 572K RUN 2:11 24.71% sh % 845 root 1 92 -12 856K 572K RUN 1:09 10.79% sh % 847 root 1 93 -8 856K 572K RUN 0:35 5.66% sh % 849 root 1 93 -4 856K 572K RUN 0:18 2.69% sh % 851 root 1 93 0 856K 572K RUN 0:09 0.98% sh % 853 root 1 96 4 856K 572K RUN 0:05 0.00% sh % 855 root 1 96 8 856K 572K RUN 0:02 0.00% sh % 857 root 1 98 12 856K 572K RUN 0:01 0.00% sh % 859 root 1 98 16 856K 572K RUN 0:01 0.00% sh % 861 root 1 108 20 856K 572K RUN 0:00 0.00% sh The mapping from niceness to %CPU is geometric -- each reduction in niceness of 4 or 5 gives about twice as much CPU. The "nice -20" process contending with the "nice --20" process gets a tiny but nonzero amount of CPU. Other things that my version were very noticeable in these tests: - the shell used to start them and shells used to control them don't need to have equal or larger negative niceness so as to run promptly, provided these shells haven't been hogs and don't bogusly become hogs, since niceness doesn't affect priority unless a process is using too much CPU. - similarly for the processes. They all start up fast since they all start up with equal minimal priority since they haven't used any CPU to begin with (modulo bogus p_estcpu inheritance in sched_fork()). FreeBSD-9 with 4BSD from a few months ago: % last pid: 894; load averages: 11.99, 11.89, 9.78 up 0+00:25:32 12:45:53 % 40 processes: 13 running, 27 sleeping % CPU states: 90.6% user, 7.0% nice, 2.3% system, 0.0% interrupt, 0.0% idle % Mem: 78M Active, 13M Inact, 18M Wired, 440K Cache, 9328K Buf, 886M Free % Swap: % % PID USERNAME THR PRI NICE SIZE RES STATE TIME WCPU COMMAND % 708 root 1 112 -20 856K 572K RUN 12:31 54.93% sh % 710 root 1 112 -16 856K 572K RUN 2:15 7.76% sh % 712 root 1 112 -12 856K 572K RUN 1:49 6.79% sh % 714 root 1 112 -8 856K 572K RUN 1:32 5.96% sh % 716 root 1 112 -4 856K 572K RUN 1:21 4.98% sh % 718 root 1 112 0 856K 572K RUN 1:12 3.91% sh % 720 root 1 112 4 856K 572K RUN 0:45 2.83% sh % 722 root 1 112 8 856K 572K RUN 0:29 1.90% sh % 724 root 1 112 12 856K 572K RUN 0:28 0.98% sh % 726 root 1 112 16 856K 572K RUN 0:00 0.00% sh % 728 root 1 116 20 856K 572K RUN 0:00 0.00% sh Same as FreeBSD-8, except the nonlinearity for "nice --20" is even larger and the dynamic range for the others is even smaller. Bruce