From owner-freebsd-current Sun Feb 4 6:46:20 2001 Delivered-To: freebsd-current@freebsd.org Received: from mailman.zeta.org.au (mailman.zeta.org.au [203.26.10.16]) by hub.freebsd.org (Postfix) with ESMTP id 61BB037B4EC for ; Sun, 4 Feb 2001 06:45:55 -0800 (PST) Received: from bde.zeta.org.au (bde.zeta.org.au [203.2.228.102]) by mailman.zeta.org.au (8.9.3/8.8.7) with ESMTP id BAA12281; Mon, 5 Feb 2001 01:45:27 +1100 Date: Mon, 5 Feb 2001 01:45:08 +1100 (EST) From: Bruce Evans X-Sender: bde@besplex.bde.org To: Cejka Rudolf Cc: Sheldon Hearn , freebsd-current@FreeBSD.ORG Subject: Re: Does task scheduler work correctly? (... nice bug fix) In-Reply-To: <20010201201413.A55503@dcse.fee.vutbr.cz> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-current@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Thu, 1 Feb 2001, Cejka Rudolf wrote: > Sheldon Hearn wrote (2001/02/01): > > What I remember of the discussions that surrounded this one, your > > summary is correct. The only thing is that nice isn't so much _broken_ > > as it just isn't doing what you'd expect it to. :-) > > Ok, scheduler in -current is not broken. But I'm afraid that in -stable Nah, it is broken. > it is - can (niced) process cause a lock of machine?. Currently, we > have dual processor box with 4.2-STABLE and it silently locks too often. > With current scheduler in mind, it is hard to say if I should search in > HW or SW for a potential fix... I think only processes with niceness less than -5 can cause lockups. This is because they get assigned priorities below some kernel priorities, so priority inversion is possible. I think priority inversions between processes running in user mode are impossible (because low-priority (higher priority numbered) processes can't block in user mode). > > I don't think any of the FreeBSD manual pages suggest that nice 20 > > processes aren't supposed to get _any_ CPU time. > > Maybe. But there are some conventions and two-process sensitivity > 2.5:1 is not very big (is low). Solaris and Linux have much bigger > ratio (sufficient). So why FreeBSD has to be so much different? Just a bug. The relative weights of the priorities are mostly determined accidentally by the following formula in resetpriority(): newpriority = PUSER + p->p_estcpu / INVERSE_ESTCPU_WEIGHT + NICE_WEIGHT * (p->p_nice - PRIO_MIN); For process mixes consisting of only long-running cpu hog processes, the priorities tend to a common limit (this is fairly obvious -- lower priority processes get run more; this increases their priority until it is not lower, so the priority of all the processes is similar; also, correct design of the algorithm that decays p_estcpu should ensure that the common priority doesn't hunt). For the simple case of 2 long-running cpu hog processes with nicenesses n1 and n2, the priority for both tends towards a limit of approximately 104 (see ps output). This corresponds to the (p_estcpu / weight) term for the process with the lowest nice value tending towards its maximum value of (PRIO_MAX - PRIO_MIN - PPQ == 36); the maximum is nearly reached because of correct design of of the decay algorithm, and actually reached because of bugs). The p_estcpu's for the two processes can be calculated from this: 104 = 48 + p->p_estcpu / 8 + 1 * (p->p_nice + 20) p->p_estcpu = (36 - p->p_nice) * 8 In the limit, the runtimes are to a first approximation proportional to p_estcpu (higher terms depend on the decay algorithm), so for processes with niceness n0 and n1, the relative runtimes are approximately: (36 - n0) : (36 - n1) This formula is neither simple nor good. For n0 = 0 and n1 = 20, it gives the ratio of 36:16, which is close to the 2.5:1 ratio observed by Cejka. I observe a ratio closer to 3:1 for the runtimes, and the formula gives almost the exact ratio for the p_estcpu's. The formula breaks down near (n0 = -16, n1 = 20), since the p_estcpu term can never raise the priority by more than 36, so the ratio is 1:0 for n0 < - 16 and n1 = 20. This is a bug. > Insensitivity of nice were problem in the past and it is going back. For RELENG_4 before I imported the fixes from NetBSD, the corresponding calculations are: 127 = 50 + p->p_estcpu / 4 + 2 * p->p_nice p->p_estcpu = (38.5 - p->p_nice) * 8 ratio = (38.5 - n0) : (38.5 - n1) ratio for nicenesses (0, 20) = 38.5:18.5 so -current is almost perfectly bug for bug compatible in this area with RELENG_4! :-( In RELENG_4, the corresponding calculations are: 86 = 50 + p->p_estcpu / 8 + 2 * p->p_nice p->p_estcpu = (18 - p->p_nice) * 16 ratio = (18 - n0) : (18 - n1) ratio for nicenesses (0, 20) = 18:-2 (formula invalid, actually 1:0) ratio for nicenesses (0, 17) = 18:1 Quick fix: --- Index: kern/kern_synch.c =================================================================== RCS file: /home/ncvs/src/sys/kern/kern_synch.c,v retrieving revision 1.124 diff -c -2 -r1.124 kern_synch.c *** kern/kern_synch.c 2001/01/31 04:29:51 1.124 --- kern/kern_synch.c 2001/02/03 15:16:44 *************** *** 1039,1044 **** mtx_enter(&sched_lock, MTX_SPIN); if (p->p_rtprio.type == RTP_PRIO_NORMAL) { ! newpriority = PUSER + p->p_estcpu / INVERSE_ESTCPU_WEIGHT + ! NICE_WEIGHT * (p->p_nice - PRIO_MIN); newpriority = min(newpriority, MAXPRI); p->p_usrpri = newpriority; --- 1039,1046 ---- mtx_enter(&sched_lock, MTX_SPIN); if (p->p_rtprio.type == RTP_PRIO_NORMAL) { ! newpriority = PUSER + ! (PRIO_MAX - PRIO_MIN + 1) * ! (p->p_estcpu / INVERSE_ESTCPU_WEIGHT) / ! (PRIO_MAX - p->p_nice + 1); newpriority = min(newpriority, MAXPRI); p->p_usrpri = newpriority; Index: sys/proc.h =================================================================== RCS file: /home/ncvs/src/sys/sys/proc.h,v retrieving revision 1.145 diff -c -2 -r1.145 proc.h *** sys/proc.h 2001/01/31 04:29:52 1.145 --- sys/proc.h 2001/02/03 15:56:49 *************** *** 493,505 **** * XXX macros for scheduler. Shouldn't be here, but currently needed for * bounding the dubious p_estcpu inheritance in wait1(). * INVERSE_ESTCPU_WEIGHT is only suitable for statclock() frequencies in * the range 100-256 Hz (approximately). */ #define ESTCPULIM(e) \ ! min((e), INVERSE_ESTCPU_WEIGHT * (NICE_WEIGHT * (PRIO_MAX - PRIO_MIN) - \ ! PPQ) + INVERSE_ESTCPU_WEIGHT - 1) ! #define INVERSE_ESTCPU_WEIGHT 8 /* 1 / (priorities per estcpu level). */ ! #define NICE_WEIGHT 1 /* Priorities per nice level. */ ! #define PPQ (128 / NQS) /* Priorities per queue. */ struct mtx; --- 493,505 ---- * XXX macros for scheduler. Shouldn't be here, but currently needed for * bounding the dubious p_estcpu inheritance in wait1(). + * XXX PPQ no longer needed for that. * INVERSE_ESTCPU_WEIGHT is only suitable for statclock() frequencies in * the range 100-256 Hz (approximately). */ #define ESTCPULIM(e) \ ! min((e), (INVERSE_ESTCPU_WEIGHT * (MAXPRI + 1 - PUSER) - 1) * \ ! (PRIO_MAX + 1) / (PRIO_MAX - PRIO_MIN + 1)) ! #define INVERSE_ESTCPU_WEIGHT 16 /* Resol. of estcpu for priorities. */ ! #define PPQ ((MAXPRI + 1) / NQS) /* Priorities per queue. */ struct mtx; --- This works by making the steady-state value of p_estcpu proportional to (PRIO_MAX - p->p_nice + 1) = (21 - p->p_nice). This gives interestingly different ramping up of the priorities: the initial priority is now independent of p_nice, so niced processes aren't completely locked out by short-lived unniced processes. I have a better fix that uses a table of multipliers instead of the divisor (PRIO_MAX - p->p_nice + 1) in the above. It is unfinished in different ways. Correctly limiting p_estcpu and/or scaling is a problem in all versions. Bruce To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-current" in the body of the message