From owner-svn-src-all@FreeBSD.ORG  Thu Feb 24 21:47:44 2011
Return-Path: <owner-svn-src-all@FreeBSD.ORG>
Delivered-To: svn-src-all@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 4504F1065670;
	Thu, 24 Feb 2011 21:47:44 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from mail06.syd.optusnet.com.au (mail06.syd.optusnet.com.au
	[211.29.132.187])
	by mx1.freebsd.org (Postfix) with ESMTP id BB54C8FC14;
	Thu, 24 Feb 2011 21:47:43 +0000 (UTC)
Received: from c122-107-114-89.carlnfd1.nsw.optusnet.com.au
	(c122-107-114-89.carlnfd1.nsw.optusnet.com.au [122.107.114.89])
	by mail06.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
	p1OLlLNX025736
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Fri, 25 Feb 2011 08:47:22 +1100
Date: Fri, 25 Feb 2011 08:47:21 +1100 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: John Baldwin <jhb@FreeBSD.org>
In-Reply-To: <201102241435.09011.jhb@freebsd.org>
Message-ID: <20110225070237.F983@besplex.bde.org>
References: <201102241613.p1OGDXpM047076@svn.freebsd.org>
	<201102241347.39267.jhb@freebsd.org>
	<5965E5EC-A725-423A-9420-B84AD09993DC@elvandar.org>
	<201102241435.09011.jhb@freebsd.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: Remko Lodder <remko@elvandar.org>, svn-src-all@FreeBSD.org,
	src-committers@FreeBSD.org, davidxu@FreeBSD.org,
	svn-src-head@FreeBSD.org, Remko Lodder <remko@FreeBSD.org>
Subject: Re: svn commit: r219003 - head/usr.bin/nice
X-BeenThere: svn-src-all@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: "SVN commit messages for the entire src tree \(except for &quot;
	user&quot; and &quot; projects&quot; \)" <svn-src-all.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/svn-src-all>,
	<mailto:svn-src-all-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/svn-src-all>
List-Post: <mailto:svn-src-all@freebsd.org>
List-Help: <mailto:svn-src-all-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/svn-src-all>,
	<mailto:svn-src-all-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 24 Feb 2011 21:47:44 -0000

On Thu, 24 Feb 2011, John Baldwin wrote:

> On Thursday, February 24, 2011 2:03:33 pm Remko Lodder wrote:
>>
[contex restored:
+A priority of 19 or 20 will prevent a process from taking any cycles from
+others at nice 0 or better.]

>> On Feb 24, 2011, at 7:47 PM, John Baldwin wrote:
>>
>>> Are you sure that this statement applies to both ULE and 4BSD?  The two
>>> schedulers treat nice values a bit differently.
>>
>> No I am not sure that the statement applies, given your response I understand
>> that both schedulers work differently. Can you or David tell me what the difference
>> is so that I can properly document it? I thought that the tool is doin the same for all
>> schedulers, but that the backend might treat it differently.

I'm sure that testing would show that it doesn't apply in FreeBSD.  It is
supposed to apply only approximately in FreeBSD, but niceness handling in
FreeBSD is quite broken so it doesn't apply at all.  Also, the magic numbers
of 19 and 20 probably don't apply in FreeBSD.  These were because there
nicenesses that are the same mod 2 (maybe after adding 1) have the same
effect, since priorities that are the same mode RQ_PPQ = 4 have the same
effect and the niceness space was scaled to the priority space by
multiplying by NICE_WEIGHT = 2.  But NICE_WEIGHT has been broken to be 1
in FreeBSD with SCHED_4BSD and doesn't apply with SCHED_ULE.  With
SCHED_4BSD, there are 4 (not 2) nice values near 20 that give the same
behaviour.

It strictly only applies to broken schedulers.  Preventing a process
from taking *any* cycles gives priority inversion livelock.  FreeBSD
has priority propagation to prevent this.

> In the case of ULE, ULE decides first if you are interactive or not.  If a
> thread is interactive, nice is completely ignored.  For non-interactive
> threads, ULE computes a priority based on how CPU hoggish the thread is.
> The nice value is then added to that priority.  Thus, a nice value applied
> to a slightly hoggish process might still end up with a lower priority value
> (and thus "more" important) than a very hoggish process with a nice value of
> 0.

I don't know much about ULE, but it never worked right for me, especially
for my tests of niceness.

> In 4BSD it is somewhat similar in that when you sleep on a socket ('sbwait')
> or select, etc. in the kernel, the nice value is effectively ignored.  4BSD
> relies on the priority values passed to tsleep(), etc. to mark interactive
> processes whereas ULE uses its own set of heuristics.  The effect though is
> that nice is also ignored for interactive processes under 4BSD and is then
> added to the resulting 'user priority' (which for 4BSD is always based on
> how CPU hoggish a process is).  I think for 4BSD it might be true that certain
> nice values will never yield to certain other nice values, but I'm not sure
> that '0' and '19' are the right numbers there.

Niceness isn't really ignored for interactive processes, since to obtain
the priority boost on waking up after blocking they need to actually run
enough to block, and large differences in niceness tend to prevent this.
It should be large differences in niceness and not just the difference
between 0 and 19 or 20 that prevent the lower priority process running
(except via priority propagation and boosts).

In FreeBSD-4 or FreeBSD-3. I imported fixes from NetBSD which among other
things made niceness sort of work.  There was still a large problem with
clamping of the hoggishness variable (p_estcpu is clamped by ESTCPULIM()).
This gives nonlinearities in the scaling from hoggishness to priority.
NetBSD had the same problem.

There was a relatively small problem with congestion in priority space
combined with the limit on hoggishness causing the mapping from niceness
space to priority space not being quite right to give the desired
separation between events of different niceness.  The mapping used
NICE_WEIGHT = 2 to expand from niceness space to priority space.  NetBSD
apparently still uses this, since this is what makes the magic numbers
of 19 and 20 have the same behaviour -- priority space has buckets of
size RQ_PPQ = 4, with priorities that are the same mod RQ_PPQ making
little difference to scheduling; we would like to expand niceness by
NICE_WEIGHT = RQ_PPQ so that different values of niceness actually
have an effect, but priority space was too congested to allow this,
so we settled for NICE_WEIGHT = 2.  RQ_PPQ / NICE_WEIGHT was then 2,
so it took differences in niceness of 2 to have an effect.  Apparently
there is a bias of 1, so that it is nicenesses of 19 and 20 and not
18 and 19 which end up in the same priority bucket.

The problem with congestion became relatively large in FreeBSD-5 and is
still large.  The priority space became more congested so as to fit
interrupt threads and rtprio threads in the same space (rtprio threads
used to use separate queues, and priorities didn't apply to them in
the normal way).  This more than doubled the congestion.  It takes 2*32
slots for rtprio and about 64 for ithreads.  PZERO only changed from
22 to 84 (which I think was not enough and is related to rtprio priorities
not being mapped very well, which was recently improved by you (jhb)),
but PUSER changed from 50 to 160.  The nice space of size 41 (-20 to +20)
must now be mapped to the user priority space of size 96 (160 to 255) where
it was mapped to a space of size 206 (50 to 255).  Expansion by a factor
of 4 is even more impossible than before (4*41 would fit in 206, but would
leave insufficient space for normal operation without niceness).
NICE_WEIGHT was reduced to 1 so as to fit.  This reduced the dynamic range
of the effect of niceness significanlty.  It now takes a niceness difference
of 20 to get the same effect as a niceness difference of 10 did in FreeBSD-4
and presumably still does in NetBSD.  Apart from small differences in
niceness not having enough effect to be very useful, there is no way
to reach a %CPU difference of 1:infinity for a niceness difference of a
mere 20.  I think "nice 20" didn't give anywhere near this ratio even in
FreeBSD-4.  Now it is further away from giving this.  If we really want
a ration of 1:infinity, this could be implemented by special handling
of niceness values near 20, but the nonlinearity gets in the way of this
and FreeBSD never had any special handling.

In my version of SCHED_4BSD, the relative effects of niceness are
according to a table.  I normally use a geometric scale:

static int niceweights[PRIO_MAX - PRIO_MIN + 1] = {
#if 1
 	/*
 	 * Geometric niceness.  The weight at index i is
 	 * floor(2 * 3 * pow(2.0, i / 4.0) + 0.5).
 	 */
 	6, 7, 8, 10, 12, 14, 17, 20,
 	24, 29, 34, 40, 48, 57, 68, 81,
 	96, 114, 136, 161, 192, 228, 272, 323,
 	384, 457, 543, 646, 768, 913, 1086, 1292,
 	1536, 1827, 2172, 2583, 3072, 3653, 4344, 5166,
 	6144,
#else
 	/*
 	 * Arithmetic niceness.  The weight at index i is
 	 * 2 * 2 * 2 * 3 * 3 * 5 * 7 / (40 - i)
 	 * (except the one at index 40 is an approximation for infinity).
 	 */
 	63, 64, 66, 68, 70, 72, 74, 76,
 	78, 81, 84, 86, 90, 93, 96, 100,
 	105, 109, 114, 120, 126, 132, 140, 148,
 	157, 168, 180, 193, 210, 229, 252, 280,
 	315, 360, 420, 504, 630, 840, 1260, 2520,
 	20000,
#endif
};

So with 1 process at nice 20 and another at nice 0, the %CPU ration is
192:6144 = 1:32 with geometric niceness.  A ratio actually achieved was
144:4471 ~= 1:31:

% last pid:  1228;  load averages:  2.00,  2.00,  1.93    up 0+01:46:56  08:37:31
% 32 processes:  3 running, 29 sleeping
% CPU: 96.5% user,  3.1% nice,  0.0% system,  0.4% interrupt,  0.0% idle
% Mem: 28M Active, 21M Inact, 54M Wired, 16K Cache, 58M Buf, 899M Free
% Swap:
% 
%   PID USERNAME  THR PRI NICE   SIZE    RES STATE    TIME   WCPU COMMAND
%  1030 root        1 122    0   856K   572K RUN     74:31 95.02% sh
%  1031 root        1 126   20   856K   572K RUN      2:24  2.05% sh

This is implemented mainly by incrementing td_estcpu by niceweights[...
->p_nice - PRI_MIN] instead of by 1 (or 0 after clamping) in sched_clock().
Clamping and its nonlinearity are also avoided/fixed.  td_estcpu can
grow very large and must be scaled to a priority according to its maximum
across all threads instead of according to the buggy maximum given by
clamping.

No one cares about this since then never use niceness :-).  Niceness is
even less useful on multi-CPU systems.

Bruce