Date: Sat, 22 Dec 2001 23:14:04 +1100 (EST) From: Bruce Evans <bde@zeta.org.au> To: Jake Burkholder <jake@locore.ca> Cc: Luigi Rizzo <rizzo@aciri.org>, John Baldwin <jhb@FreeBSD.ORG>, <current@FreeBSD.ORG>, Peter Wemm <peter@wemm.org> Subject: Re: vm_zeropage priority problems. Message-ID: <20011222213623.J7890-100000@gamplex.bde.org> In-Reply-To: <20011222031349.B62219@locore.ca>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, 22 Dec 2001, Jake Burkholder wrote: > Apparently, On Sat, Dec 22, 2001 at 06:48:26PM +1100, > Bruce Evans said words to the effect of; > > Index: kern_synch.c > > =================================================================== > > RCS file: /home/ncvs/src/sys/kern/kern_synch.c,v > > retrieving revision 1.167 > > diff -u -2 -r1.167 kern_synch.c > > --- kern_synch.c 18 Dec 2001 00:27:17 -0000 1.167 > > +++ kern_synch.c 19 Dec 2001 16:01:26 -0000 > > @@ -936,18 +1058,18 @@ > > struct thread *td; > > { > > - struct kse *ke = td->td_kse; > > - struct ksegrp *kg = td->td_ksegrp; > > + struct ksegrp *kg; > > > > - if (td) { > > - ke->ke_cpticks++; > > - kg->kg_estcpu = ESTCPULIM(kg->kg_estcpu + 1); > > - if ((kg->kg_estcpu % INVERSE_ESTCPU_WEIGHT) == 0) { > > - resetpriority(td->td_ksegrp); > > - if (kg->kg_pri.pri_level >= PUSER) > > - kg->kg_pri.pri_level = kg->kg_pri.pri_user; > > - } > > - } else { > > + if (td == NULL) > > panic("schedclock"); > > - } > > + td->td_kse->ke_cpticks++; > > + kg = td->td_ksegrp; > > +#ifdef NEW_SCHED > > + kg->kg_estcpu += niceweights[kg->kg_nice - PRIO_MIN]; > > +#else > > + kg->kg_estcpu++; > > +#endif > > + resetpriority(kg); > > + if (kg->kg_pri.pri_level >= PUSER) > > + kg->kg_pri.pri_level = kg->kg_pri.pri_user; > > } > > I'm curious why you removed the ESTCPULIM and INVERSE_ESTCPU_WEIGHT > calculations even in the OLD_SCHED case. Do these turn out to have > no effect in general? ESTCPULIM basically breaks scheduling if it is are hit (clipping to it prevents accumulation of hog points that would cause cpu hogs to be run less). This is a problem in practice. I use dynamic limits even in the !NEW_SCHED case. I forgot that I did this or I would have included more context to show it (see below). kg->kg_estcpu is allowed to grow without explicit limit and scaled to fit in the priority range. This requires fixing sorcerer's-apprentice growth of kg_estcpu in fork() and exit(). kg_estcpu has natural limits but they are quite large (a constant multiple of the load average). INVERSE_ESTCPU_WEIGHT is not used because it goes with static scaling, and "% INVERSE_ESTCPU_WEIGHT" optimization (which depends on the internals of resetpriority()) is not so easy to do. Here are the corresponding changes for resetpriority(): %%% Index: kern_synch.c =================================================================== RCS file: /home/ncvs/src/sys/kern/kern_synch.c,v retrieving revision 1.167 diff -u -2 -r1.167 kern_synch.c --- kern_synch.c 18 Dec 2001 00:27:17 -0000 1.167 +++ kern_synch.c 22 Dec 2001 07:34:15 -0000 @@ -844,15 +949,32 @@ register struct ksegrp *kg; { + u_int estcpu; register unsigned int newpriority; mtx_lock_spin(&sched_lock); if (kg->kg_pri.pri_class == PRI_TIMESHARE) { - newpriority = PUSER + kg->kg_estcpu / INVERSE_ESTCPU_WEIGHT + + estcpu = kg->kg_estcpu; + if (estcpu > estcpumax) + estcpu = estcpumax; +#ifdef NEW_SCHED + newpriority = PUSER + + (((u_int64_t)estcpu * estcpumul) >> ESTCPU_SHIFT); +#else + newpriority = PUSER + + (((u_int64_t)estcpu * estcpumul) >> ESTCPU_SHIFT) + NICE_WEIGHT * (kg->kg_nice - PRIO_MIN); - newpriority = min(max(newpriority, PRI_MIN_TIMESHARE), - PRI_MAX_TIMESHARE); +#endif + if (newpriority < PUSER) + newpriority = PUSER; + if (newpriority > PRI_MAX_TIMESHARE) { + Debugger("newpriority botch"); + newpriority = PRI_MAX_TIMESHARE; + } kg->kg_pri.pri_user = newpriority; - } - maybe_resched(kg); + maybe_resched(kg, newpriority); + } else + /* XXX doing anything here is dubious. */ + /* XXX was: need_resched(). */ + maybe_resched(kg, kg->kg_pri.pri_user); mtx_unlock_spin(&sched_lock); } %%% > > Most of the changes here are to fix style bugs. In the NEW_SCHED case, > > the relative weights for each priority are determined by the niceweights[] > > table. kg->kg_estcpu is limited only by INT_MAX and priorities are > > assigned according to relative values of kg->kg_estcpu (code for this is > > not shown). The NEW_SCHED case has not been tried since before SMPng > > broke scheduling some more by compressing the priority ranges. > > It is relatively easy to uncompress the priority ranges if that is > desirable. What range is best? The original algorithm works best with something close to the old range of 50-127 (PUSER = 50, MAXPRI = 127) for positively niced processes alone. This gives unniced processes a priority range of 50-127 and permits nice -20'ed processes to have a much larger (numerically) base priority than unniced ones while still allowing room for their priority to grow (range 90-127). Negatively niced processes were handled dubiously at best (they ran into the kernel priorities). Brian Feldman reduced the priority range for unniced processes to 68-127 and you reduced it some more to 180-223. The main problem with the reduced rangesis that the algorithm gives approximately an exponential dependency of the cpu cycles allocated to a process on the process's niceness. The base for the exponential is invisible and hard to change, so decreasing the range by a factor of 78/44 significantly reduces the effects of niceness. I think my nicewights[] algorithm can handle this. It supports almost any dependency of cycles on niceness. However, I don't know how it can be made to work right for the entire priority range. An exponential dependency would grow too fast for the range 0-255 if it grows fast enough for the user range 180-233. I used the following program to generate (old) niceweights[] tables. Defining EXP gives an exponential table with niceness 0 haveing 32 times as much priority as niceness 20. The default approximates the old -current behaviour (which isn't actually exponential). %%% #include <math.h> main() { int i; for (i = 0; i <= 40; i++) { if (i % 8 == 0) printf("\t"); #ifdef EXP printf("%d,", (int)floor(2 * 3 * pow(2.0, i / 4.0) + 0.5)); #else if (i == 40) printf("65536\n"); else printf("%d,", 2 * 2 * 2 * 3 * 3 * 5 * 7 / (40 - i)); #endif if (i % 8 == 7) printf("\n"); else printf(" "); } } %%% Bruce To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-current" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20011222213623.J7890-100000>