From owner-freebsd-arch  Mon Nov 12 16: 0:26 2001
Delivered-To: freebsd-arch@freebsd.org
Received: from InterJet.elischer.org (c421509-a.pinol1.sfba.home.com [24.7.86.9])
	by hub.freebsd.org (Postfix) with ESMTP id A175C37B417
	for <arch@freebsd.org>; Mon, 12 Nov 2001 16:00:20 -0800 (PST)
Received: from localhost (localhost.elischer.org [127.0.0.1])
	by InterJet.elischer.org (8.9.1a/8.9.1) with ESMTP id PAA95484
	for <arch@freebsd.org>; Mon, 12 Nov 2001 15:48:49 -0800 (PST)
Date: Mon, 12 Nov 2001 15:48:47 -0800 (PST)
From: Julian Elischer <julian@elischer.org>
To: arch@freebsd.org
Subject: Thread scheduling in the kernel
Message-ID: <Pine.BSF.4.21.0111121507240.94926-100000@InterJet.elischer.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-arch.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-arch>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-arch>
X-Loop: FreeBSD.ORG


In an attempt to get the next part of the KSE work designed (design before
code you know.. a strange new concept) I've been trying to work out
the "correct" scheduling methods for such a system.

There are a few 'tricks' that need to be taken into account..

a few notes..


1/ Since threads running a syscall hit 'sleep' events
the entities on teh sleep queues must be the  threads.

2/ the entity that is scheduled onto the run queues is the KSE.
(as the name suggests).

3/ If we have only one run queue, then KSEs for several processors
from the same process, may be on the same queue.

4/  If threads 'wake up' they are hung of a list of runnable threads
somewhere. This list could be hanging off the process, or the KSE.
(actually more likely the KSEgroup than the process but...)

5/ If a KSE reaches teh front of the queue, but the process
that is running is not that for which that KSE has some affinity,
does it get out of the way to allow another KSE in the queue
to get run? or does it just run and 'switch' everything over to the new
available processor? Maybe the scheduler looks for the KSE from the same
group, that was assigned to that processor, and runs that, leaving
the original KSE at the head of the queue? 
Maybe that happens until all the KSEs in the queue
that were from that group have been run? In this case it becomes possible
to always have a KSE from that group ready...

Maybe the KSE-GROUP is what is put unto the run queue, and KSEs from that
group are put on all processors that look for work, until all of them 
have been run? (this would ensure that threads from the same process
would all be run at the same time which is sometimes good, and sometimes
bad, depending on the application.

6/ When a Thread is made runnable it gets (in the present system) a
priority. What priority does a KSE in the run queues have when it has
threads of several differnt priorities? Do we sort them in priority order
and drop the priority of the KSE(group) as we go through them
until we have less priority than some other kse?

7/ when a KSE runs out of work, how does it decide whether there is work
that should be stolen from a fellow KSE? How does processor affinity
effect this?

8/ If we had per-processor scheduling queues, How would that effect it?
Which element get's put on the queues? Does a KSE
stay on the run queue if it has un=run threads, even when it's running?
How do we handle the arrival of new runnable threads with a KSE
when it's running but a fellow KSE is not runnable. Do we 
bump the priority of the other KSE and hand it the new threads?


remember: here are the 4 structures:

proc  -   owner of all resources (FDs, memory, user creds) except cpu

Ksegroup -  owner of all scheduler controlling characteristics
	(e.g. nice, realtime, number of processors),  N per process.
	Owner of stats used for scheduling calculations.	

kse -	kind of a placeholder.  It gets scheduled onto 
	a processor (by a yet un-named mechaninsm) and provides
	cpu-cycles for the execution of 'threads' (see next).
	Max. of one per processor per KSE-group.

thread -  The in-kernel incarnation of a user thread that is presently
	in the kernel for some reason (e.g. syscall, pagefault, etc)
	Holds ALL the state needed to resume after sleeping, and is the
	entity that is suspended when the thread hits a 'sleep'.
	"unlimmitted" per KSEgroup. probably have a short-term
	"favourite" KSE/processor.


When a thread blocks, the KSE looks for another thread to run, and if it
doesn't find one, it will create one, and upcall back to the 
userland to see if there are more userland threads to run.
(if not, it returns to yield the processor)

The question that has been giving me headaches is the 
relationship between these elements, and
the definitions of how these structures are linked up and moved
around to provide fair efficient scheduling.

If a KSE has a high priority thread and a low priority thread
runnable in the kernel, but in reverse order, should it take
the high priority from the higher prio. thread and process both,
or should it order the threads and run teh high prio one first.
In this case what happens whan a higher prio. thread becomes runnable
while one is already running, and if the highest prio thread returns to
userland, should teh processor move to userland to follow it, or
switch to the next priority thread in the kernel.?
Do all threads in the kernel have priority over all threads in userland?
(this might be a reasonable decision).

These and other questions are in need of real discussion here on -arch.
We need to somewhere develope a document as to how we want this to work.

If we can have a good discussion here on these topics over a coupel of
days I'll attempt to produce such a document
and submit it for comment as the basis of a second round of discussions.

Julian


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message