From owner-freebsd-arch Mon Nov 12 16: 0:26 2001 Delivered-To: freebsd-arch@freebsd.org Received: from InterJet.elischer.org (c421509-a.pinol1.sfba.home.com [24.7.86.9]) by hub.freebsd.org (Postfix) with ESMTP id A175C37B417 for ; Mon, 12 Nov 2001 16:00:20 -0800 (PST) Received: from localhost (localhost.elischer.org [127.0.0.1]) by InterJet.elischer.org (8.9.1a/8.9.1) with ESMTP id PAA95484 for ; Mon, 12 Nov 2001 15:48:49 -0800 (PST) Date: Mon, 12 Nov 2001 15:48:47 -0800 (PST) From: Julian Elischer To: arch@freebsd.org Subject: Thread scheduling in the kernel Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG In an attempt to get the next part of the KSE work designed (design before code you know.. a strange new concept) I've been trying to work out the "correct" scheduling methods for such a system. There are a few 'tricks' that need to be taken into account.. a few notes.. 1/ Since threads running a syscall hit 'sleep' events the entities on teh sleep queues must be the threads. 2/ the entity that is scheduled onto the run queues is the KSE. (as the name suggests). 3/ If we have only one run queue, then KSEs for several processors from the same process, may be on the same queue. 4/ If threads 'wake up' they are hung of a list of runnable threads somewhere. This list could be hanging off the process, or the KSE. (actually more likely the KSEgroup than the process but...) 5/ If a KSE reaches teh front of the queue, but the process that is running is not that for which that KSE has some affinity, does it get out of the way to allow another KSE in the queue to get run? or does it just run and 'switch' everything over to the new available processor? Maybe the scheduler looks for the KSE from the same group, that was assigned to that processor, and runs that, leaving the original KSE at the head of the queue? Maybe that happens until all the KSEs in the queue that were from that group have been run? In this case it becomes possible to always have a KSE from that group ready... Maybe the KSE-GROUP is what is put unto the run queue, and KSEs from that group are put on all processors that look for work, until all of them have been run? (this would ensure that threads from the same process would all be run at the same time which is sometimes good, and sometimes bad, depending on the application. 6/ When a Thread is made runnable it gets (in the present system) a priority. What priority does a KSE in the run queues have when it has threads of several differnt priorities? Do we sort them in priority order and drop the priority of the KSE(group) as we go through them until we have less priority than some other kse? 7/ when a KSE runs out of work, how does it decide whether there is work that should be stolen from a fellow KSE? How does processor affinity effect this? 8/ If we had per-processor scheduling queues, How would that effect it? Which element get's put on the queues? Does a KSE stay on the run queue if it has un=run threads, even when it's running? How do we handle the arrival of new runnable threads with a KSE when it's running but a fellow KSE is not runnable. Do we bump the priority of the other KSE and hand it the new threads? remember: here are the 4 structures: proc - owner of all resources (FDs, memory, user creds) except cpu Ksegroup - owner of all scheduler controlling characteristics (e.g. nice, realtime, number of processors), N per process. Owner of stats used for scheduling calculations. kse - kind of a placeholder. It gets scheduled onto a processor (by a yet un-named mechaninsm) and provides cpu-cycles for the execution of 'threads' (see next). Max. of one per processor per KSE-group. thread - The in-kernel incarnation of a user thread that is presently in the kernel for some reason (e.g. syscall, pagefault, etc) Holds ALL the state needed to resume after sleeping, and is the entity that is suspended when the thread hits a 'sleep'. "unlimmitted" per KSEgroup. probably have a short-term "favourite" KSE/processor. When a thread blocks, the KSE looks for another thread to run, and if it doesn't find one, it will create one, and upcall back to the userland to see if there are more userland threads to run. (if not, it returns to yield the processor) The question that has been giving me headaches is the relationship between these elements, and the definitions of how these structures are linked up and moved around to provide fair efficient scheduling. If a KSE has a high priority thread and a low priority thread runnable in the kernel, but in reverse order, should it take the high priority from the higher prio. thread and process both, or should it order the threads and run teh high prio one first. In this case what happens whan a higher prio. thread becomes runnable while one is already running, and if the highest prio thread returns to userland, should teh processor move to userland to follow it, or switch to the next priority thread in the kernel.? Do all threads in the kernel have priority over all threads in userland? (this might be a reasonable decision). These and other questions are in need of real discussion here on -arch. We need to somewhere develope a document as to how we want this to work. If we can have a good discussion here on these topics over a coupel of days I'll attempt to produce such a document and submit it for comment as the basis of a second round of discussions. Julian To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message