From owner-freebsd-current  Sun Feb  4  6:46:20 2001
Delivered-To: freebsd-current@freebsd.org
Received: from mailman.zeta.org.au (mailman.zeta.org.au [203.26.10.16])
	by hub.freebsd.org (Postfix) with ESMTP id 61BB037B4EC
	for <freebsd-current@FreeBSD.ORG>; Sun,  4 Feb 2001 06:45:55 -0800 (PST)
Received: from bde.zeta.org.au (bde.zeta.org.au [203.2.228.102])
	by mailman.zeta.org.au (8.9.3/8.8.7) with ESMTP id BAA12281;
	Mon, 5 Feb 2001 01:45:27 +1100
Date: Mon, 5 Feb 2001 01:45:08 +1100 (EST)
From: Bruce Evans <bde@zeta.org.au>
X-Sender: bde@besplex.bde.org
To: Cejka Rudolf <cejkar@dcse.fee.vutbr.cz>
Cc: Sheldon Hearn <sheldonh@uunet.co.za>, freebsd-current@FreeBSD.ORG
Subject: Re: Does task scheduler work correctly? (... nice bug fix)
In-Reply-To: <20010201201413.A55503@dcse.fee.vutbr.cz>
Message-ID: <Pine.BSF.4.21.0102042312190.9576-100000@besplex.bde.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-current@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

On Thu, 1 Feb 2001, Cejka Rudolf wrote:

> Sheldon Hearn wrote (2001/02/01):
> > What I remember of the discussions that surrounded this one, your
> > summary is correct.  The only thing is that nice isn't so much _broken_
> > as it just isn't doing what you'd expect it to. :-)
> 
> Ok, scheduler in -current is not broken. But I'm afraid that in -stable

Nah, it is broken.

> it is - can (niced) process cause a lock of machine?. Currently, we
> have dual processor box with 4.2-STABLE and it silently locks too often.
> With current scheduler in mind, it is hard to say if I should search in
> HW or SW for a potential fix...

I think only processes with niceness less than -5 can cause lockups.
This is because they get assigned priorities below some kernel priorities,
so priority inversion is possible.  I think priority inversions between
processes running in user mode are impossible (because low-priority
(higher priority numbered) processes can't block in user mode).

> > I don't think any of the FreeBSD manual pages suggest that nice 20
> > processes aren't supposed to get _any_ CPU time.
> 
> Maybe. But there are some conventions and two-process sensitivity
> 2.5:1 is not very big (is low). Solaris and Linux have much bigger
> ratio (sufficient). So why FreeBSD has to be so much different?

Just a bug.  The relative weights of the priorities are mostly determined
accidentally by the following formula in resetpriority():

		newpriority = PUSER + p->p_estcpu / INVERSE_ESTCPU_WEIGHT +
		    NICE_WEIGHT * (p->p_nice - PRIO_MIN);

For process mixes consisting of only long-running cpu hog processes, the
priorities tend to a common limit (this is fairly obvious -- lower
priority processes get run more; this increases their priority until
it is not lower, so the priority of all the processes is similar; also,
correct design of the algorithm that decays p_estcpu should ensure that
the common priority doesn't hunt).

For the simple case of 2 long-running cpu hog processes with nicenesses
n1 and n2, the priority for both tends towards a limit of approximately
104 (see ps output).  This corresponds to the (p_estcpu / weight) term
for the process with the lowest nice value tending towards its maximum
value of (PRIO_MAX - PRIO_MIN - PPQ == 36); the maximum is nearly reached
because of correct design of of the decay algorithm, and actually reached
because of bugs).  The p_estcpu's for the two processes can be calculated
from this:

    104 = 48 + p->p_estcpu / 8 + 1 * (p->p_nice + 20)
    p->p_estcpu = (36 - p->p_nice) * 8

In the limit, the runtimes are to a first approximation proportional to
p_estcpu (higher terms depend on the decay algorithm), so for processes
with niceness n0 and n1, the relative runtimes are approximately:

    (36 - n0) : (36 - n1)

This formula is neither simple nor good.  For n0 = 0 and n1 = 20, it gives
the ratio of 36:16, which is close to the 2.5:1 ratio observed by Cejka.
I observe a ratio closer to 3:1 for the runtimes, and the formula gives
almost the exact ratio for the p_estcpu's.  The formula breaks down near
(n0 = -16, n1 = 20), since the p_estcpu term can never raise the priority
by more than 36, so the ratio is 1:0 for n0 < - 16 and n1 = 20.  This is
a bug.

> Insensitivity of nice were problem in the past and it is going back.

For RELENG_4 before I imported the fixes from NetBSD, the corresponding
calculations are:

    127 = 50 + p->p_estcpu / 4 + 2 * p->p_nice
    p->p_estcpu = (38.5 - p->p_nice) * 8
    ratio = (38.5 - n0) : (38.5 - n1)
    ratio for nicenesses (0, 20) = 38.5:18.5

so -current is almost perfectly bug for bug compatible in this area with
RELENG_4! :-(

In RELENG_4, the corresponding calculations are:

    86 = 50 + p->p_estcpu / 8 + 2 * p->p_nice
    p->p_estcpu = (18 - p->p_nice) * 16
    ratio = (18 - n0) : (18 - n1)
    ratio for nicenesses (0, 20) = 18:-2 (formula invalid, actually 1:0)
    ratio for nicenesses (0, 17) = 18:1

Quick fix:

---
Index: kern/kern_synch.c
===================================================================
RCS file: /home/ncvs/src/sys/kern/kern_synch.c,v
retrieving revision 1.124
diff -c -2 -r1.124 kern_synch.c
*** kern/kern_synch.c	2001/01/31 04:29:51	1.124
--- kern/kern_synch.c	2001/02/03 15:16:44
***************
*** 1039,1044 ****
  	mtx_enter(&sched_lock, MTX_SPIN);
  	if (p->p_rtprio.type == RTP_PRIO_NORMAL) {
! 		newpriority = PUSER + p->p_estcpu / INVERSE_ESTCPU_WEIGHT +
! 		    NICE_WEIGHT * (p->p_nice - PRIO_MIN);
  		newpriority = min(newpriority, MAXPRI);
  		p->p_usrpri = newpriority;
--- 1039,1046 ----
  	mtx_enter(&sched_lock, MTX_SPIN);
  	if (p->p_rtprio.type == RTP_PRIO_NORMAL) {
! 		newpriority = PUSER +
! 		    (PRIO_MAX - PRIO_MIN + 1) *
! 		    (p->p_estcpu / INVERSE_ESTCPU_WEIGHT) /
! 		    (PRIO_MAX - p->p_nice + 1);
  		newpriority = min(newpriority, MAXPRI);
  		p->p_usrpri = newpriority;
Index: sys/proc.h
===================================================================
RCS file: /home/ncvs/src/sys/sys/proc.h,v
retrieving revision 1.145
diff -c -2 -r1.145 proc.h
*** sys/proc.h	2001/01/31 04:29:52	1.145
--- sys/proc.h	2001/02/03 15:56:49
***************
*** 493,505 ****
   * XXX macros for scheduler.  Shouldn't be here, but currently needed for
   * bounding the dubious p_estcpu inheritance in wait1().
   * INVERSE_ESTCPU_WEIGHT is only suitable for statclock() frequencies in
   * the range 100-256 Hz (approximately).
   */
  #define	ESTCPULIM(e) \
!     min((e), INVERSE_ESTCPU_WEIGHT * (NICE_WEIGHT * (PRIO_MAX - PRIO_MIN) - \
! 	     PPQ) + INVERSE_ESTCPU_WEIGHT - 1)
! #define	INVERSE_ESTCPU_WEIGHT	8	/* 1 / (priorities per estcpu level). */
! #define	NICE_WEIGHT	1		/* Priorities per nice level. */
! #define	PPQ		(128 / NQS)	/* Priorities per queue. */
  
  struct mtx;
--- 493,505 ----
   * XXX macros for scheduler.  Shouldn't be here, but currently needed for
   * bounding the dubious p_estcpu inheritance in wait1().
+  * XXX PPQ no longer needed for that.
   * INVERSE_ESTCPU_WEIGHT is only suitable for statclock() frequencies in
   * the range 100-256 Hz (approximately).
   */
  #define	ESTCPULIM(e) \
!     min((e), (INVERSE_ESTCPU_WEIGHT * (MAXPRI + 1 - PUSER) - 1) * \
! 	     (PRIO_MAX + 1) / (PRIO_MAX - PRIO_MIN + 1))
! #define	INVERSE_ESTCPU_WEIGHT	16	/* Resol. of estcpu for priorities. */
! #define	PPQ	((MAXPRI + 1) / NQS)	/* Priorities per queue. */
  
  struct mtx;
---

This works by making the steady-state value of p_estcpu proportional to
(PRIO_MAX - p->p_nice + 1) = (21 - p->p_nice).  This gives interestingly
different ramping up of the priorities: the initial priority is now
independent of p_nice, so niced processes aren't completely locked out
by short-lived unniced processes.

I have a better fix that uses a table of multipliers instead of the
divisor (PRIO_MAX - p->p_nice + 1) in the above.  It is unfinished in
different ways.  Correctly limiting p_estcpu and/or scaling is a
problem in all versions.

Bruce


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message