Date: Sat, 26 Feb 2011 18:30:42 +1100 (EST) From: Bruce Evans <brde@optusnet.com.au> To: John Baldwin <jhb@freebsd.org> Cc: Remko Lodder <remko@elvandar.org>, svn-src-all@freebsd.org, src-committers@freebsd.org, davidxu@freebsd.org, Bruce Evans <brde@optusnet.com.au>, svn-src-head@freebsd.org, Remko Lodder <remko@freebsd.org> Subject: Re: svn commit: r219003 - head/usr.bin/nice Message-ID: <20110226174551.Y1199@besplex.bde.org> In-Reply-To: <201102250737.43182.jhb@freebsd.org> References: <201102241613.p1OGDXpM047076@svn.freebsd.org> <20110225070237.F983@besplex.bde.org> <F83465F5-A56B-461D-A9E6-52CB38A4FA25@elvandar.org> <201102250737.43182.jhb@freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, 25 Feb 2011, John Baldwin wrote: > On Friday, February 25, 2011 5:23:04 am Remko Lodder wrote: >> >> On Feb 24, 2011, at 10:47 PM, Bruce Evans wrote: >> >>> On Thu, 24 Feb 2011, John Baldwin wrote: >>> >>>> On Thursday, February 24, 2011 2:03:33 pm Remko Lodder wrote: >>>>> >>> [contex restored: >>> +A priority of 19 or 20 will prevent a process from taking any cycles from >>> +others at nice 0 or better.] >> >> >> [strip information overload]. >> >> So, what is the consensus instead of overwelming me with too much > information? > > Take that sentence out. It is not accurate for our schedulers. And an accurate sentence would require condensing even more information :-). I just remembered another difference for the multi-CPU case that is probably very large. Long ago, I hacked SCHED_4BSD to scale td_estcpu by the number of CPUs (INVERSE_ESTCPU_WEIGHT = 8 for !SMP but (8 * smp_cpus) for SMP. Without this, td_estcpu built up too fast (really, failed to decay fast enough due to the decay not being scaled by the number of CPUs) in some common cases, so the nonlinear region of mapping from td_estcpu to td_priority was often reached. With this, td_priority is too insensitive to td_estcpu in some hopefully-less-common cases. I haven't tested SCHED_4BSD on a multi-CPU system recently, but the test method of pinning all the CPU hogs to 1 CPU gives an uncommon case for which the hack works especially badly. The pinning gives only 1 active CPU so the !SMP scale factor should be applied, but it isn't. So td_estcpu decays 8 times faster than it should ((?) -- I think the scaling is linear), and may even decay to 0 before it is used. If there is still some left when the thread is rescheduled, then the sensitivity is just reduced by a factor of 8, but if there is none then the scheduling becomes unfair. OTOH, the sensitivity to niceness isn't changed, so in the best case where the decay is not to 0, niceness become smp_cpus times more sensitive than ordinary scheduling relative to the !SMP case. And this is a relatively simple case to understand and fix. The load average and/or INVERSE_ESTCPU_WEIGHT probably need to be per-CPU to give the old algorithm a chance of working, but only if there is some manual scheduling by pinning threads to CPUs. Otherwise, assignment of threads to CPUs should make things average out -- non-affinity is required to make things sort of work. In my version of SCHED_4BSD, the scale factor is calculated dynamically according to the maximum td_estcpu. This too gives insensitivity of scheduling to td_estcpu, but only when the maximum td_estcpu is large, in which case there is nothing much that can be done (when you have a large dynamic range of td_estcpu's, it is just impossible to map them linearly or logarithmically into the small priority space and it may be impossible to map them uniquely). I don't try to support SMP, and my maximum td_estcpu only does the right thing if the maximum td_estcpu for threads that will run on each CPU is almost the same for all CPUs. Bruce
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20110226174551.Y1199>