From owner-cvs-all@FreeBSD.ORG  Sat Apr 12 00:42:32 2003
Return-Path: <owner-cvs-all@FreeBSD.ORG>
Delivered-To: cvs-all@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id C560537B401; Sat, 12 Apr 2003 00:42:32 -0700 (PDT)
Received: from mailman.zeta.org.au (mailman.zeta.org.au [203.26.10.16])
	by mx1.FreeBSD.org (Postfix) with ESMTP
	id 9A9C343FAF; Sat, 12 Apr 2003 00:42:30 -0700 (PDT)
	(envelope-from bde@zeta.org.au)
Received: from katana.zip.com.au (katana.zip.com.au [61.8.7.246])
	by mailman.zeta.org.au (8.9.3/8.8.7) with ESMTP id RAA17493;
	Sat, 12 Apr 2003 17:42:15 +1000
Date: Sat, 12 Apr 2003 17:42:14 +1000 (EST)
From: Bruce Evans <bde@zeta.org.au>
X-X-Sender: bde@gamplex.bde.org
To: Jeff Roberson <jroberson@chesapeake.net>
In-Reply-To: <20030411145419.E37530-100000@mail.chesapeake.net>
Message-ID: <20030412162452.N2065@gamplex.bde.org>
References: <20030411145419.E37530-100000@mail.chesapeake.net>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
cc: cvs-src@FreeBSD.org
cc: Jeff Roberson <jeff@FreeBSD.org>
cc: src-committers@FreeBSD.org
cc: cvs-all@FreeBSD.org
Subject: Re: cvs commit: src/sys/kern sched_ule.c
X-BeenThere: cvs-all@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: CVS commit messages for the entire tree <cvs-all.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/cvs-all>,
	<mailto:cvs-all-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/cvs-all>
List-Post: <mailto:cvs-all@freebsd.org>
List-Help: <mailto:cvs-all-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/cvs-all>,
	<mailto:cvs-all-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 12 Apr 2003 07:42:33 -0000

On Fri, 11 Apr 2003, Jeff Roberson wrote:

> On Fri, 11 Apr 2003, Bruce Evans wrote:
>
> > On Thu, 10 Apr 2003, Jeff Roberson wrote:
> >
> > > In this benchmark I have many processes that each run for 10ms and then
> > > sleep for 10ms.  Each of them has a different nice value.  I record many
> > > statistics, but below is the amount of cpu time given to each.
> > >
> > >          -20      -15      -10      -5       -1       0
> > > ULE      5.753623 5.282634 4.856530 3.450129 3.126798 2.423626
> > > 4BSD     5.737184 4.694133 4.185013 3.363824 2.711405 2.260881
> > >
> > >          1        5        10       15       20
> > > ULE      2.105255 0.657852 0.429098 0.411582 0.501759
> > > 4BSD     2.452399 1.506065 0.859527 0.681111 0.427333
> > >
> > >
> > > You can see that ule is quite regular until it gets to the positive nice
> > > values where it is not as smooth as 4bsd.  I'll fix this.
> >
> > The poor dynamic range of niceness for the 4bsd scheduler in current should
> > not be aimed for.  In the above it is about 5.74:0.43 = 13:1.  It should be
> > more like 100:1 or 1000:1.
>
> I don't think this is so bad considering that they were only attempting to
> use 50% of the cpu.  That leaves a lot of idle time for other threads.

I'm not sure how much using only 50% of the CPU affects the results, but I
thin it is not much if there are many process.  If there are more than
2 un-nice ones then the combination of those 2 will attempt to use 100% of
the CPU, and similarly for 2 nice ones.  OTOH, with only 1 un-nice process
and 1 nice one then each should get the 50% that it wants.

> Although in ULE it is currently very easy to tune by adjusting your min
> and max slice.  Since each tick is 10ms on x86 ule currently does 10:1 for
> processes of nice 0 and 19.  If hz was 1000 it could easily be made 100:1.
> I'm not sure if this is really desirable though.

I think having some very large rations is useful for processes like
setiathome and the kernel pagezero.  You really want the to get as little
CPU as possible, perhaps without using a completely different scheduling
policy like idprio or an infinite ratio which gives much the same thing.

Dividing up 1 second into slices doesn't work so well when there are
more than 2 processes.  With N times as many (long-lived) processes
the scheduling decisions would have to be done for N times as long to
get the same ratios.  Or they can be done for N times as long for the
same number of processes to reduce the granularity by a factor of 1/N.
The 4BSD scheduler in -current sort of does the former, and my version
of it sort of does both.  I'm a bit concerned about more inertia in
my version of it caused by this but haven't noticed any problems in
practice.

> ...
> What I could do is implement cpu distribution vs nice as a curve that
> tapers of considerably once you get past 19 nice values.  This would allow
> really nice programs to slowly creep along.  The problem is that my
> scheduler can only adjust the amount of cpu time a process gets by
> adjusting its slice size.  The minimum is actually quite high.  I may
> introduce some divisor for the slice that essentially causes this thread
> to get skipped over x times before it gets to run for the minimum amount.
> This would give more more granularity than 10ms for every pass through the
> run queue.

I think you need something like that for unusual loads.  I don't see
how more than HZ processes can be scheduled properly without a multiplier
even if they all have the same nice value.

> > I think interactivity is mostly a different problem.  When we give
> > some CPU to niced processes, it is important that we only give them a
> > tiny amount.  Another problem with the small dynamic range is that it
> > always gives a non-tiny amount.  Scheduling granularity may be a
> > problem, especially in the 4BSD scheduler - we can't give processes
> > less than the quantum (if they want it), and with a quantum of 1/10
> > second just 10 niced processes getting the minimal amount would give
> > a delay of 1 second for interactive processes.  These problems are
> > limited by old bugfeatures in BSD for at least the 4BSD scheduler
> > (interactive processes wake up at a kernel priority and normally
> > essentially keep that priority when they return to userland, since we
> > neglect to reschedule them).

> ULE solves this differently.  Interactivity is determined by the ratio of
> voluntary sleep time to actual runtime.  If it figures out that a process
> is interactive it always puts it on the current queue which means it
> doesn't have to wait for the current queue to complete and the next queue
> to be switched to.

It still seems to be missing code to force the actual switch, mainly here:

% void
% sched_userret(struct thread *td)
% {
% 	struct ksegrp *kg;
%
% 	kg = td->td_ksegrp;
%
% 	if (td->td_priority != kg->kg_user_pri) {
% 		mtx_lock_spin(&sched_lock);
% 		td->td_priority = kg->kg_user_pri;
% 		mtx_unlock_spin(&sched_lock);
% 	}
% }

If td->td_priority is actually changed here, then we should run another
process if the change reduces it below the priority of another runnable
thread.  Every assignment to td_priority that doesn't set TDF_NEEDRESCHED
and/or consider an immediate switch is suspect.  I think technically
correct code would set td_priority to a "base" priority (== kg_user_pri
for user processes) after waking up and switch then if necessary, and
only assert that the priority is the user priority here.

> High negative nice values can still delay interactive
> processes that are very nice.  I think this is what the user is indicating
> with negative nice values though isn't he?  Or do you think that nice
> should not impact interactive processes?

It's usually a user error to nice interactive processes.  I'm thinking of
some corner case where you want to control (perhaps just terminate) a
niced process.  It would be, er, nice to not have to wait long to do it.
Maybe I just want something like renice to a higher priority to work for
users.

> It's difficult for me to grasp what behavior is most desirable.  I have a
> framework in place with ULE that would allow me to achieve most anything.
>
> I guess my primary questions are these:
> 1)  What distribution of cpu time vs nice is desirable?

Depends on the user's requirements.  Most users don't use nice :-).

> 2)  Should nice processes starve interactive jobs?

No.

Bruce