From owner-freebsd-hackers  Sat Jul 29 15:30:39 1995
Return-Path: hackers-owner
Received: (from majordom@localhost)
          by freefall.cdrom.com (8.6.11/8.6.6) id PAA01061
          for hackers-outgoing; Sat, 29 Jul 1995 15:30:39 -0700
Received: from cs.weber.edu (cs.weber.edu [137.190.16.16])
          by freefall.cdrom.com (8.6.11/8.6.6) with SMTP id PAA01055
          for <hackers@freebsd.org>; Sat, 29 Jul 1995 15:30:36 -0700
Received: by cs.weber.edu (4.1/SMI-4.1.1)
	id AA08430; Sat, 29 Jul 95 16:22:20 MDT
From: terry@cs.weber.edu (Terry Lambert)
Message-Id: <9507292222.AA08430@cs.weber.edu>
Subject: Re: Scheduling Algorithms (was: Re: panic in brelse() ... )
To: kwong@fathergoose.net6c.io.org (Ken Wong)
Date: Sat, 29 Jul 95 16:22:20 MDT
Cc: hackers@freebsd.org
In-Reply-To: <199507291324.JAA00264@fathergoose.net6c.io.org> from "Ken Wong" at Jul 29, 95 09:24:35 am
X-Mailer: ELM [version 2.4dev PL52]
Sender: hackers-owner@freebsd.org
Precedence: bulk

> > The main issue in both of these tasks that are on the table is a
> > divorce of the ready-to-run state from the act of running the
> > code.  I think that this really implies seperate queues based
> > on priority.
> 
> I think single queue is best serve here. if we want realtime and 
> SMP.
> I know some RT SMP system does it by dividing priority into 2 sections
> 	1) RT 0-7 
> 	2) the Rest is Timeshring
> within each group, round robin is used to dispatch proceess with
> the same priority. within group 2) some other algo can still be
> used to change the priority but never goes up to 1). 1) is fix
> priority and pre-emptive. the process table can include cpu which
> it will execute.

SVR4 ES/MP runs multiple queues, as does Solaris.

The big issue here is per processor queue equivalency.  It buys you
the ability to have one processor pull a process off the ready-to-run
queue and run it without hitting a global mutex.

For SMP performance, the ability to avoid global synchronizations is
the biggest win you can have.

The RT issue is addressed by queu prioritization.  In effect, this is
the same as your queue insertion model, with the increased concurrency
on get from multiplying the available semaphore heirarchy by the number
of queues.

In terms of RT, the divorce of the processor as a resource (which is
what divorcing the ready-to-run state setting from the context switch
process itself is, in effect) turns RT events into kernel preemption
points, with processors as anonymous resources.

The per processor queue ownership also has the side effect of saving
the peorcessor instruction and data caches, and TLB settings, from
undue thrashing (as a process goes from one processor to another,
causing the thrash.

Finally, a multiple queue model allows queueing to be order independent
on the basis of priority.  This is an important effect, since an RT
system must invoke priority lending if the system resources are to
be shared instead of block-committed for RT tasks.  What that means
is that the scheduling priority that would normally determine the
queue to which the processor was pushed when ready to run is not the
sole determiner of which queu the process can go on.

If a non-RT process holds a resource that a high priority RT process
needs while a read-to-run medium priority RT process exists, the medium
priority RT process will block the non-RT porcess, in effect blocking
the high priority RT process.

To avoid this, the act of the high priority RT process blocking on
the resource held by the lower priority process will cause the low
priority process to be "lent" the higher priority processes priority
until such time as it unblockes the needed resource.

This prevents the high priority process from suffering starvation at
the hands of a medium priority process because a low priority process
holds a resource that the high priority process wants.

> > It's necessary because multiple processes can be in the run state (for
> > SMP, or even for kernel premption), and RT processes have to be able
> > to be pigs as much as they want (if they are supported).
> 
> the above would solve this problem.

I think it's both a scheduling and a concurrency issue.  I think that
the ordered queue mechanism would have to have multiple priority
based entry order pointers to support priority lending, and that that's
really a more complex design at a higher concurrency cost that you'd
want if you only have one queue in which to implement the ordering.

Look at the complexities ordered insertion in the LRU list in page
replacement algorithms (to implement per vnode working sets, for
instance, to avoid cache thrashing) just to avoid staging the queues.

It's really not worth it.


					Terry Lambert
					terry@cs.weber.edu
---
Any opinions in this posting are my own and not those of my present
or previous employers.