From owner-freebsd-arch  Mon Jun 25  2: 7:14 2001
Delivered-To: freebsd-arch@freebsd.org
Received: from InterJet.elischer.org (c421509-a.pinol1.sfba.home.com [24.7.86.9])
	by hub.freebsd.org (Postfix) with ESMTP id 9039D37B407
	for <arch@freebsd.org>; Mon, 25 Jun 2001 02:07:05 -0700 (PDT)
	(envelope-from julian@elischer.org)
Received: from elischer.org (InterJet.elischer.org [192.168.1.1])
	by InterJet.elischer.org (8.9.1a/8.9.1) with ESMTP id DAA78351;
	Mon, 25 Jun 2001 03:36:05 -0700 (PDT)
Message-ID: <3B36FDB4.74C96ACB@elischer.org>
Date: Mon, 25 Jun 2001 02:00:36 -0700
From: Julian Elischer <julian@elischer.org>
X-Mailer: Mozilla 4.7 [en] (X11; U; FreeBSD 5.0-CURRENT i386)
X-Accept-Language: en, hu
MIME-Version: 1.0
To: Jason Evans <jasone@canonware.com>
Cc: arch@freebsd.org
Subject: Re: Updated KSEs paper
References: <20010622184626.B47186@canonware.com>
Content-Type: text/plain; charset=iso-8859-2
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-arch.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo?subject=subscribe%20freebsd-arch>
List-Unsubscribe: <mailto:majordomo?subject=unsubscribe%20freebsd-arch>
X-Loop: FreeBSD.ORG

Jason Evans wrote:
> 
> A number of people are going to discuss various subjects at a meeting
> during USENIX, one of them being KSEs.  I won't be able to attend (a sister
> is getting married that day), but wanted to make an updated version of the
> paper available to avoid others having to fix the same design problems as
> I've already fixed.
> 
> The paper still is not by any means perfect, but it addresses most of the
> issues that people brought up in previous discussions on this mailing list.
> Feedback and suggestions are welcome.
> 
> http://people.freebsd.org/~jasone/refs/freebsd_kse/freebsd_kse.html
> http://people.freebsd.org/~jasone/refs/freebsd_kse.ps
> 
> Jason
> 
> To Unsubscribe: send mail to majordomo@FreeBSD.org
> with "unsubscribe freebsd-arch" in the body of the message

here are some comments  on the KSE API as I see it....

ksec_new(ksec_id, cpu_id, kseg_id): 
     Run a thread on this KSEC (whose ID is ksec_id), on the CPU with 
     ID cpu_id, as part of the KSEG with ID kseg_id. 

[julian's comment]
	The UTS knows it's running. It has a pointer to a mailbox which
	will magically be correct for this KSE. It knows what threads are
	runnable and can just jump into one of them. If none are
	runnable, it needs some type of system call that 'yields' 
	the processor. maybe a variant of the usleep call? It got here
	because either something blocked, or a new KSE was created. These
	two cases are effectively identical.. we have a place to schedule
	a thread. It doesn't matter if it's new or recycled...

ksec_preempt(ksec_id, kse_state): 
     The KSEC with ID ksec_id was preempted, with userland execution 
     state ksec_state.

[..]
	The action of pre-empting this might write the state into the 
	userland context storage (with a copyout()). Then the UTS wouldn't
	need notification, right now, just at the next time it might 
	make a difference, i.e. when it next goes to schedule something.
	If we do it right, it will find this thread on the runnable queue
	at that time without us needing to do an explicit notification.

ksec_block(ksec_id): 
     The KSEC with ID ksec_id has blocked in the kernel. 

[..]
	this is treated exactly like the first case. I don't think it needs a 
	separate upcall.

ksec_unblock(ksec_id, kse_state): 
     The KSEC with ID ksec_id has completed in the kernel, with with 
     userland execution state ksec_state. 
	
[..]
	I don't thik a separate upcall is always needed. On completion, the state
	of the returning syscall is written into the userland context, and the 
	context	is writen to the "completed and runnable queue". The next time 
	the UTS	runs it adds the contents of this queue to its runnable queue, and
	schedules as per normal. 

signal(sig_t signum): 
     The process received a signal numbered signum.
[..]
	we haven't decided exactly what this means. This will do as well as 
	anything else I've seen mentionned and better than most.

The following system calls are necessary: 

void kse_init(struct kseu *context): 
     Start using KSEs. context contains the necessary data for the 
     kernel to make upcalls. This function appears to
     return every time an upcall is made. Initially, there is only 
     one KSEG (ID 0), which has a concurrency level of 1.

[..]
	whenever a concurrency is added the caller must supply a different
	stack (or dummy stack) for the system call to return on.
	an added concurrency is in effect an added KSE. Each KSE needs
	a different stack to upcall on (though the stack may be small
	as it will have bounded use.) "context" includes pointers to the mailbox
	that will be used by that KSE. The multiple returns of this call
	will all magically have that mailbox in their hand so you 
	can preload it with anything the UTS will need on an upcall. 

int kseg_create(void): 
     Create a KSEG and return its KSEG ID (unique within this process), 
     or -1 if there is an error (resource limit exceeded).
[..]
	I see this as basically an extension of the next call.
	You automatically get a KSE with that KSEG so it does every thing
	that creating a new KSE does, and needs the 'context' variable
	that a KSE would need. 

int kseg_concurrency(int kseg_id, int adjust): 
     Adjust the concurrency of the KSEG with ID kseg_id. Decrementing 
     the concurrency to 0 destroys the KSEG, as
     soon as there are no more active KSECs in the KSEG. If adjust is 0, 
     the KSEG is not modified, but the
     concurrency is still returned. This system call returns the 
     KSEG's instantaneous concurrency level after adjusting it. 

[..]
	If you increase the concurrency, you have created new KSEs. They need
	their own separate upcall stacks (maybe only dummy stacks but....
	In any case you need to allocate them one by one. Just setting a
	concurrency to "what it is now + 2" is not going to  work
	because the new KSEs don;t know where to return to.


int kseg_bind(int kseg_id, int cpu_id): 
     Bind the KSEG with ID kseg_id to the CPU with ID cpu_id. This 
     system call returns the CPU ID that the KSEG is
     bound to, or -1 if there is an error (invalid CPU ID, or the 
     KSEG's concurrency is greater than 1). 

[..]
	I think the KSEG can bind itself. Same for priority..
	no need to specify KSEG.. It's implicit.

[..]
	We also need a 'yield' version of the usleep call.
	Note that a completing syscall that is already sleeping
	may reawaken the yielded KSE in order to complete
	after which it will upcall again in order to let the UTS
	schedule the satidfied thread.
	
	We also need a KSE_EXIT() for when we know we don't need it any more.

I also argue with the following assertion:

"Additionally, soft processor affinity for KSEs is important 
to performance. KSEs are not generally bound to CPUs, so
KSEs that belong to the same KSEG can potentially compete 
with each other for the same processor; soft processor
affinity tends to reduce such competition, in addition to 
well-known benefits of processor affinity. "

I would argue that limiting (HARD LIMIT) one KSE per KSEG per processor
has no ill effects and simplifies some housekeeping.  KSECs can move between
KSEs in the same KSEG isn a soft-affinity manner to achieve the same thing
and being able to guarantee that the KSEs of a KSEG are never competing
for the same processro ensures that they will never pre-empt each other
which in turn simplifies soem other locking assumptions that must be made
both inthe kernel and in the UTS. (Not proven but my gut feeling).
Thus on a uniprocessor, the will only ever be as many KSEs as there are KSEGs.
Since blocking syscalls return, this has no effect on the threading picture.
There are still Multiple KSECs available.


In 3.6.1 You prove that we an have enough storage to store thread state of
KSECs.

I would like to suggest that it can be proven as follows:
Every user thread includes a thread control block that includes enough
storage for thread context. Since every system call is made by a thread, and 
the 'context' information for the KSE on which the syscall is being made 
inclides a pointer to that storage, the blocked and resuming syscalls
have that storage available to store their state. The context structures
can be of a fixed known format and include an pointer to be used in linking them 
together in the 'completed and runnable' queue pointed to by the KSEU structure
that is handed to the UTS by the upcall. Therefore, there is quaranteed
to be enough storage.


3.6.2 Per-upcall event ordering

Since in my scheme there is only one kind of upcall (well, I think signals
can also be made to look the same),
there is no ordering problem. All information is presented to the UTS
at the same time and it can decide which it wants to handle first.

in the section:
"3.7 Upcall parallelism 

This section is not yet adequately fleshed out. Issues to consider: "
[varous issues shown]

Using my scheme this is not an issue.


"What is your scheme?" I hear you ask.

Basically in implementation if the above scheme with a few twists.

1/ Starting a KSE (as above) gives it it's mailbox.
2/ The KSE is only runnable on a processor on which there is no KSE from that
KSEG
already running. It tries really hard not to shift CPUs. No other KSE
will be using that mailbox, thus no other processor in that KSEG.
3/ The mailbox includes a location that the kernel will look at to find a
pointer
to the (in userspace) thread context block (KSEU?). When the UTS schedules a 
thread, it fills in this location. until then it is NULL, meaning that the UTS
itself is running. All the time the thread is running this pointer os valid
so even if the thread is pre-empted, without warning by the kernel, the
pointer can be used to store it's state.
4/ When a process is blocked and an upcall happens, the kernel zero's out 
that location, and takes a copy of it in teh KSEC that stores the syscall state.
5/ When a syscall is continued, and completes, the location given above
(which was stored along with the sleeping syscall state) is used
to store the state of the returning syscall, just as if it had returned and then
done 
a yield(). It is then linked onto a list of 'completed syscalls' held by the
kernel.
6/ When the next upcall into that KSEG is performed, it first
reaps all the completed syscall blocks, and hangs them
off the mailbox for the upcalling KSE in a known location. 
The UTS when it runs from the upcall
discovers all the completed syscalls, which, to it
look like a whole list of yield()'d threads, and puts them onto its 
run-queue according to the priority of each, then schedules the next
highest priority thread.


enough for now..  more on the whiteboard at USENIX.. (what you're not going?
We'll take notes, ok?)




-- 
+------------------------------------+       ______ _  __
|   __--_|\  Julian Elischer         |       \     U \/ / hard at work in 
|  /       \ julian@elischer.org     +------>x   USA    \ a very strange
| (   OZ    )                                \___   ___ | country !
+- X_.---._/    presently in San Francisco       \_/   \\
          v

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message