Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 11 Oct 2001 12:35:58 -0700 (PDT)
From:      Julian Elischer <julian@elischer.org>
To:        Orran Y Krieger <okrieg@us.ibm.com>
Cc:        Bryan S Rosenburg <rosnbrg@us.ibm.com>, Marc Auslander <Marc_Auslander@us.ibm.com>, Paul McKenney <Paul.McKenney@us.ibm.com>, Greg Lehey <grog@lemis.com>, Matt Dillon <dillon@blackplane.com>, peter@freebsd.org, arch@freebsd.org
Subject:   Re: Julian Elischer: Re: FreeBSD KSE
Message-ID:  <Pine.BSF.4.21.0110111215040.37124-100000@InterJet.elischer.org>
In-Reply-To: <OFE14D96EE.16A80FF8-ON85256AE1.00823CFD@pok.ibm.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Quick answers.. (I'm at work)

On Wed, 10 Oct 2001, Orran Y Krieger wrote:

> Cool stuff.  A few stream of consciousness comments, since it is late at
> night :-)
> 
> For page faults, the Exokernel system pinned resources and guaranteed that
> those where the only resources used in the user-level scheduler for thread
> scheduling, this seemed to tough for us since text is a problem (they did
> all paging in app as well).   Instead, K42 has a mode we call disabled
> (logically equivalent to interrupt disabled mode in the kernel) where a bit
> is shared between the kernel and app, the kernel enters the app disabled
> (i.e., with the bit set) and page faults that occur when the bit is set are
> handled directly by the kernel.   Are you using one of these techniques for
> handling page faults except when not in the user-land scheduler, or
> something else?
> 


Each KSE (think virtual processor) has a separate mailbox
allocated to it in user space. Whenm the UTS (Userland Thread Scheduler)
schedules a thread, the last thing it does before jumping into that thread
(loading the PC) is to set a pointer in the mailbox to point to a
thread control block who'se format is known to the kernel. If that pointer
is NULL, then we are "IN" the UTS otherwise we are in a thread. When a
thread does a system call (runnung within the KSE) the KSE takes note of
the address in that pointer, and if the thread is blockled for any reason
the kernel state is saved, including that pointer. The pointer is cleared
and an upcall is made to the UTS, which notices that the thread has been
blocked (the pointer is NULL). When the thread is restarted in the kernel,
then at the beginning of the next (or present) quantam for
that KSE, the syscall is first completed, and the completeion state
stored within the control block who's address was stored with the blocked
thread. The state of the thread is made to look as thjough the syscall
completed but was immediatly (atomically) followed by a yield().
All completed threads are linked together on a linked list hanging off the
KSE mailbox.  After all syscalls that are completable are completed, then
the remainder of the quantum is handed to the UTS in the form of an
upcall. The UTS looks in the mailbox and seeing the linked list of
returned threads, adds them to it's own runnable list, and then selects
the next thread to run, choosing from all runnable threads, including
those that just retunred from syscalls.

(It's conceivable that all thios may happen on the return from a hardware
interrupt if the correct KSE is already running, and the mailbox indicates
that teh current thread, running in userland is pre-enpltable at this
time. In this case the kernel will save it's state allong with the
completed syscalls, so that the UTS may choose whether to keep running it
or switch to a higher priority thread that completed a syscall.



> By the way, we use the disabled mode to traverse per-processor data
> structures atomically, i.e., without requiring any locks.   This lets you
> do various things efficiently on an MP in app space, just like disabling
> interrupts in kernel does.    This was important, because very cheap thread
> creation that you get out of a user-level thread model was used by us to
> enable cheap upcalls, event notifaction, a cool IPC facility...  We use
> this to move lots of other functionality out of kernel and inter app, e.g.,
> timers so cancelling timeout very cheap, user-level IP, user-level pipes,


Basically we will use flags in the mailbox to control how the kernel 
treats the particular KSE. (Kernel Schedulable Entity). Obviously
flags in one mailbox will not affect another KSE in the same process 
so they cannot be used to control inter-KSE-communications.
Each KSE will however have private structures and we can use
the mailbox flags to ensure for example that a thread running on a KSE
will not be pre-empted by another thread if it is in a critical region.

Inter KSE communications will require the usual safety measures..




> ...
> 
> Think there is something missing in your forward progress guarantee.   Lets
> say that the kernel will allow 50 threads of an application to be blocked,
> and then not run any more application threads.   If the application has 50
> threads each communicating with each other via a huge chain of pipes (i.e.,
> thread 1 writes to a pipe that thread 2 reads from, thread 2 writes to a
> pipe that thread 3 reads from...) and then lets assume that threads 2-51
> are all blocked on reading.  Then, the kernel won't run thread 1, which is
> the thread needed to unblock all the other threads.   Clearly this is a
> contrived example, but we could think of more realistic ones with threads
> blocked on locks...   In any case, with a monolithic kernel, once you
> allocate threads they are guaranteed to run.  With a user-level thread
> model, we can have these kind of forward progress problems if the number
> that can be blocked is less than the total number.   The way we get around
> this in K42 is to push the full thread state into application level, and
> have a fixed set of resources in the kernel irrespective of the number of
> threads currently blocked.  That is, threads block for page faults and
> system calls in their own address space.  The only time a thread is blocked
> in the kernel is if it faults when "disabled" and in this case the kernel
> handles the page fault without reflecting any state up to the application.

I haven't time to fully think about this right now, however Our scheme is
that there will be SOME limit on the number of threads that can be
suspended in syscalls (or page faults or whatever) at one time, and
that teh Nth one will just block as would happen without KSEs enabled.
(the N-1th will do an upcall with a 'danger' flag set telling the UTS
to get its act together)
In this case you COULD possibly make some unlikely deadlock situation
but I don't think it's worth changing teh design for it..
(pilot error)


> 
> Anyhow, just a couple of thoughts that came up when we looked at your paper
> (obviously the issues we struggled with in our design).   This is very cool
> stuff, would be fun to bounce around designs in more detail.
>          -- Orran
> ---------------------- Forwarded by Orran Y Krieger/Watson/IBM on
> 10/10/2001 07:42 PM ---------------------------
> 
> David Edelsohn <dje@watson.ibm.com> on 10/10/2001 03:56:20 PM
> 
> To:   Orran Y Krieger/Watson/IBM@IBMUS, Bryan S Rosenburg/Watson/IBM@IBMUS,
>       Marc Auslander/Watson/IBM@IBMUS
> cc:   Paul McKenney/Beaverton/IBM@IBMUS
> Subject:  Julian Elischer: Re: FreeBSD KSE
> 
> 
> 
> 
> ------- Forwarded Message
> 
> Date: Wed, 10 Oct 2001 13:39:27 -0700 (PDT)
> From: Julian Elischer <julian@elischer.org>
> To: David Edelsohn <dje@watson.ibm.com>
> cc: Greg Lehey <grog@lemis.com>, Matt Dillon <dillon@blackplane.com>,
>         peter@freebsd.org
> Subject: Re: FreeBSD KSE
> In-Reply-To: <200110101805.OAA21020@makai.watson.ibm.com>
> 
> Hi!
> 
> On Wed, 10 Oct 2001, David Edelsohn wrote:
> 
> >    One aspect of K42 is user thread scheduling, very much like
> > Schedular Activations.  While reading over the FreeBSD design, two
> > questions occurred to us:
> 
> The paper you have read is somewhat out of date..
> The 'verbal' design that we are discussing has a lot of refinements
> and a very different interface..
> 
> >    1) The design only seems to mention blocking due to system calls.
> > Will KSE address threads blocking on a page faults as well?
> 
> That is the eventual aim. (not page faults in the Usserland Thread
> scheduler however).. (duh) :-)
> 
> >    2) Does the KSE design completely ensure forward progress?
> > This is not necessarily a problem in practice, but theoretically
> possible.
> 
> When the process is in the Thread scheduler, it is guaranteed to return
> after any kernel entry, back to the Thread scheduler at the point it left
> off. This ensures that the thread scheduler makes forward progress.
> Threads are only run when the Thread scheduler runs them so it is
> guaranteed to be able to make forward progress on SOME threads on each
> quantum. (at least that's the way I see it..)
> 
> I'd love to discuss some of the 'Cute' ideas that are not yet in the
> paper.  But I'm at work at the moment.
> 
> Feel free to ask me about anything that catches you interest however..
> 
> Most of these ideas have been discussed on the 'arch'
> mailing list
> 
> e.g.
> http://www.FreeBSD.org/cgi/getmsg.cgi?fetch=31777+36162+/usr/local/www/db/text/2001/freebsd-arch/20010930.freebsd-arch
> 
> http://www.FreeBSD.org/cgi/getmsg.cgi?fetch=38716+42446+/usr/local/www/db/text/2001/freebsd-arch/20010930.freebsd-arch
> 
> http://www.FreeBSD.org/cgi/getmsg.cgi?fetch=62467+69962+/usr/local/www/db/text/2000/freebsd-arch/20001203.freebsd-arch
> 
> 
> ------- End of Forwarded Message
> 
> 
> 
> 
> 


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.BSF.4.21.0110111215040.37124-100000>