From owner-svn-src-all@FreeBSD.ORG Thu Feb 24 21:47:44 2011 Return-Path: Delivered-To: svn-src-all@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4504F1065670; Thu, 24 Feb 2011 21:47:44 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail06.syd.optusnet.com.au (mail06.syd.optusnet.com.au [211.29.132.187]) by mx1.freebsd.org (Postfix) with ESMTP id BB54C8FC14; Thu, 24 Feb 2011 21:47:43 +0000 (UTC) Received: from c122-107-114-89.carlnfd1.nsw.optusnet.com.au (c122-107-114-89.carlnfd1.nsw.optusnet.com.au [122.107.114.89]) by mail06.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id p1OLlLNX025736 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 25 Feb 2011 08:47:22 +1100 Date: Fri, 25 Feb 2011 08:47:21 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: John Baldwin In-Reply-To: <201102241435.09011.jhb@freebsd.org> Message-ID: <20110225070237.F983@besplex.bde.org> References: <201102241613.p1OGDXpM047076@svn.freebsd.org> <201102241347.39267.jhb@freebsd.org> <5965E5EC-A725-423A-9420-B84AD09993DC@elvandar.org> <201102241435.09011.jhb@freebsd.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: Remko Lodder , svn-src-all@FreeBSD.org, src-committers@FreeBSD.org, davidxu@FreeBSD.org, svn-src-head@FreeBSD.org, Remko Lodder Subject: Re: svn commit: r219003 - head/usr.bin/nice X-BeenThere: svn-src-all@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "SVN commit messages for the entire src tree \(except for " user" and " projects" \)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 24 Feb 2011 21:47:44 -0000 On Thu, 24 Feb 2011, John Baldwin wrote: > On Thursday, February 24, 2011 2:03:33 pm Remko Lodder wrote: >> [contex restored: +A priority of 19 or 20 will prevent a process from taking any cycles from +others at nice 0 or better.] >> On Feb 24, 2011, at 7:47 PM, John Baldwin wrote: >> >>> Are you sure that this statement applies to both ULE and 4BSD? The two >>> schedulers treat nice values a bit differently. >> >> No I am not sure that the statement applies, given your response I understand >> that both schedulers work differently. Can you or David tell me what the difference >> is so that I can properly document it? I thought that the tool is doin the same for all >> schedulers, but that the backend might treat it differently. I'm sure that testing would show that it doesn't apply in FreeBSD. It is supposed to apply only approximately in FreeBSD, but niceness handling in FreeBSD is quite broken so it doesn't apply at all. Also, the magic numbers of 19 and 20 probably don't apply in FreeBSD. These were because there nicenesses that are the same mod 2 (maybe after adding 1) have the same effect, since priorities that are the same mode RQ_PPQ = 4 have the same effect and the niceness space was scaled to the priority space by multiplying by NICE_WEIGHT = 2. But NICE_WEIGHT has been broken to be 1 in FreeBSD with SCHED_4BSD and doesn't apply with SCHED_ULE. With SCHED_4BSD, there are 4 (not 2) nice values near 20 that give the same behaviour. It strictly only applies to broken schedulers. Preventing a process from taking *any* cycles gives priority inversion livelock. FreeBSD has priority propagation to prevent this. > In the case of ULE, ULE decides first if you are interactive or not. If a > thread is interactive, nice is completely ignored. For non-interactive > threads, ULE computes a priority based on how CPU hoggish the thread is. > The nice value is then added to that priority. Thus, a nice value applied > to a slightly hoggish process might still end up with a lower priority value > (and thus "more" important) than a very hoggish process with a nice value of > 0. I don't know much about ULE, but it never worked right for me, especially for my tests of niceness. > In 4BSD it is somewhat similar in that when you sleep on a socket ('sbwait') > or select, etc. in the kernel, the nice value is effectively ignored. 4BSD > relies on the priority values passed to tsleep(), etc. to mark interactive > processes whereas ULE uses its own set of heuristics. The effect though is > that nice is also ignored for interactive processes under 4BSD and is then > added to the resulting 'user priority' (which for 4BSD is always based on > how CPU hoggish a process is). I think for 4BSD it might be true that certain > nice values will never yield to certain other nice values, but I'm not sure > that '0' and '19' are the right numbers there. Niceness isn't really ignored for interactive processes, since to obtain the priority boost on waking up after blocking they need to actually run enough to block, and large differences in niceness tend to prevent this. It should be large differences in niceness and not just the difference between 0 and 19 or 20 that prevent the lower priority process running (except via priority propagation and boosts). In FreeBSD-4 or FreeBSD-3. I imported fixes from NetBSD which among other things made niceness sort of work. There was still a large problem with clamping of the hoggishness variable (p_estcpu is clamped by ESTCPULIM()). This gives nonlinearities in the scaling from hoggishness to priority. NetBSD had the same problem. There was a relatively small problem with congestion in priority space combined with the limit on hoggishness causing the mapping from niceness space to priority space not being quite right to give the desired separation between events of different niceness. The mapping used NICE_WEIGHT = 2 to expand from niceness space to priority space. NetBSD apparently still uses this, since this is what makes the magic numbers of 19 and 20 have the same behaviour -- priority space has buckets of size RQ_PPQ = 4, with priorities that are the same mod RQ_PPQ making little difference to scheduling; we would like to expand niceness by NICE_WEIGHT = RQ_PPQ so that different values of niceness actually have an effect, but priority space was too congested to allow this, so we settled for NICE_WEIGHT = 2. RQ_PPQ / NICE_WEIGHT was then 2, so it took differences in niceness of 2 to have an effect. Apparently there is a bias of 1, so that it is nicenesses of 19 and 20 and not 18 and 19 which end up in the same priority bucket. The problem with congestion became relatively large in FreeBSD-5 and is still large. The priority space became more congested so as to fit interrupt threads and rtprio threads in the same space (rtprio threads used to use separate queues, and priorities didn't apply to them in the normal way). This more than doubled the congestion. It takes 2*32 slots for rtprio and about 64 for ithreads. PZERO only changed from 22 to 84 (which I think was not enough and is related to rtprio priorities not being mapped very well, which was recently improved by you (jhb)), but PUSER changed from 50 to 160. The nice space of size 41 (-20 to +20) must now be mapped to the user priority space of size 96 (160 to 255) where it was mapped to a space of size 206 (50 to 255). Expansion by a factor of 4 is even more impossible than before (4*41 would fit in 206, but would leave insufficient space for normal operation without niceness). NICE_WEIGHT was reduced to 1 so as to fit. This reduced the dynamic range of the effect of niceness significanlty. It now takes a niceness difference of 20 to get the same effect as a niceness difference of 10 did in FreeBSD-4 and presumably still does in NetBSD. Apart from small differences in niceness not having enough effect to be very useful, there is no way to reach a %CPU difference of 1:infinity for a niceness difference of a mere 20. I think "nice 20" didn't give anywhere near this ratio even in FreeBSD-4. Now it is further away from giving this. If we really want a ration of 1:infinity, this could be implemented by special handling of niceness values near 20, but the nonlinearity gets in the way of this and FreeBSD never had any special handling. In my version of SCHED_4BSD, the relative effects of niceness are according to a table. I normally use a geometric scale: static int niceweights[PRIO_MAX - PRIO_MIN + 1] = { #if 1 /* * Geometric niceness. The weight at index i is * floor(2 * 3 * pow(2.0, i / 4.0) + 0.5). */ 6, 7, 8, 10, 12, 14, 17, 20, 24, 29, 34, 40, 48, 57, 68, 81, 96, 114, 136, 161, 192, 228, 272, 323, 384, 457, 543, 646, 768, 913, 1086, 1292, 1536, 1827, 2172, 2583, 3072, 3653, 4344, 5166, 6144, #else /* * Arithmetic niceness. The weight at index i is * 2 * 2 * 2 * 3 * 3 * 5 * 7 / (40 - i) * (except the one at index 40 is an approximation for infinity). */ 63, 64, 66, 68, 70, 72, 74, 76, 78, 81, 84, 86, 90, 93, 96, 100, 105, 109, 114, 120, 126, 132, 140, 148, 157, 168, 180, 193, 210, 229, 252, 280, 315, 360, 420, 504, 630, 840, 1260, 2520, 20000, #endif }; So with 1 process at nice 20 and another at nice 0, the %CPU ration is 192:6144 = 1:32 with geometric niceness. A ratio actually achieved was 144:4471 ~= 1:31: % last pid: 1228; load averages: 2.00, 2.00, 1.93 up 0+01:46:56 08:37:31 % 32 processes: 3 running, 29 sleeping % CPU: 96.5% user, 3.1% nice, 0.0% system, 0.4% interrupt, 0.0% idle % Mem: 28M Active, 21M Inact, 54M Wired, 16K Cache, 58M Buf, 899M Free % Swap: % % PID USERNAME THR PRI NICE SIZE RES STATE TIME WCPU COMMAND % 1030 root 1 122 0 856K 572K RUN 74:31 95.02% sh % 1031 root 1 126 20 856K 572K RUN 2:24 2.05% sh This is implemented mainly by incrementing td_estcpu by niceweights[... ->p_nice - PRI_MIN] instead of by 1 (or 0 after clamping) in sched_clock(). Clamping and its nonlinearity are also avoided/fixed. td_estcpu can grow very large and must be scaled to a priority according to its maximum across all threads instead of according to the buggy maximum given by clamping. No one cares about this since then never use niceness :-). Niceness is even less useful on multi-CPU systems. Bruce