FreeBSD Mail Archives

Date:      Wed, 27 Jun 2001 14:46:41 -0700
From:      Jason Evans <jasone@canonware.com>
To:        Julian Elischer <julian@elischer.org>
Cc:        arch@freebsd.org
Subject:   Re: Updated KSEs paper
Message-ID:  <20010627144641.K47186@canonware.com>
In-Reply-To: <3B36FDB4.74C96ACB@elischer.org>; from julian@elischer.org on Mon, Jun 25, 2001 at 02:00:36AM -0700
References:  <20010622184626.B47186@canonware.com> <3B36FDB4.74C96ACB@elischer.org>

On Mon, Jun 25, 2001 at 02:00:36AM -0700, Julian Elischer wrote:
>=20
> ksec_preempt(ksec_id, kse_state):=20
>      The KSEC with ID ksec_id was preempted, with userland execution=20
>      state ksec_state.
>=20
> [..]=15
> 	The action of pre-empting this might write the state into the=20
> 	userland context storage (with a copyout()). Then the UTS wouldn't
> 	need notification, right now, just at the next time it might=20
> 	make a difference, i.e. when it next goes to schedule something.
> 	If we do it right, it will find this thread on the runnable queue
> 	at that time without us needing to do an explicit notification.

The UTS needs notification in order to be able to make fully informed
scheduling decisions.  However, your method for storing the state using
copyout() is important, and something I'd like to hear a more detailed
description of.

We might be talking past each other on this... What I'm trying to say is
that the UTS needs to be explicitly informed of certain events, so even
though we may have cool methods of getting the KSEC state to userland, the
UTS still needs to know about it.

> ksec_block(ksec_id):=20
>      The KSEC with ID ksec_id has blocked in the kernel.=20
>=20
> [..]
> 	this is treated exactly like the first case. I don't think it needs a=20
> 	separate upcall.

Again, the UTS needs notification in order to be able to make fully
informed scheduling decisions.

> ksec_unblock(ksec_id, kse_state):=20
>      The KSEC with ID ksec_id has completed in the kernel, with with=20
>      userland execution state ksec_state.=20
> =09
> [..]
> 	I don't thik a separate upcall is always needed. On completion, the state
> 	of the returning syscall is written into the userland context, and the=
=20
> 	context	is writen to the "completed and runnable queue". The next time=
=20
> 	the UTS	runs it adds the contents of this queue to its runnable queue, a=
nd
> 	schedules as per normal.=20

Again, the UTS needs notification in order to be able to make fully
informed scheduling decisions.

> The following system calls are necessary:=20
>=20
> void kse_init(struct kseu *context):=20
>      Start using KSEs. context contains the necessary data for the=20
>      kernel to make upcalls. This function appears to
>      return every time an upcall is made. Initially, there is only=20
>      one KSEG (ID 0), which has a concurrency level of 1.
>=20
> [..]
> 	whenever a concurrency is added the caller must supply a different
> 	stack (or dummy stack) for the system call to return on.
> 	an added concurrency is in effect an added KSE. Each KSE needs
> 	a different stack to upcall on (though the stack may be small
> 	as it will have bounded use.) "context" includes pointers to the mailbox
> 	that will be used by that KSE. The multiple returns of this call
> 	will all magically have that mailbox in their hand so you=20
> 	can preload it with anything the UTS will need on an upcall.=20

Sounds good.

> int kseg_create(void):=20
>      Create a KSEG and return its KSEG ID (unique within this process),=
=20
>      or -1 if there is an error (resource limit exceeded).
> [..]
> 	I see this as basically an extension of the next call.
> 	You automatically get a KSE with that KSEG so it does every thing
> 	that creating a new KSE does, and needs the 'context' variable
> 	that a KSE would need.=20

Hmm, okay.

> int kseg_concurrency(int kseg_id, int adjust):=20
>      Adjust the concurrency of the KSEG with ID kseg_id. Decrementing=20
>      the concurrency to 0 destroys the KSEG, as
>      soon as there are no more active KSECs in the KSEG. If adjust is 0,=
=20
>      the KSEG is not modified, but the
>      concurrency is still returned. This system call returns the=20
>      KSEG's instantaneous concurrency level after adjusting it.=20
>=20
> [..]
> 	If you increase the concurrency, you have created new KSEs. They need
> 	their own separate upcall stacks (maybe only dummy stacks but....
> 	In any case you need to allocate them one by one. Just setting a
> 	concurrency to "what it is now + 2" is not going to  work
> 	because the new KSEs don;t know where to return to.

Okay, then we need to split this system call into two separate ones:

int kseg_concurrency_inc(int kseg_id, [mailbox for completion state]);
int kseg_concurrency_dec(int kseg_id);

> int kseg_bind(int kseg_id, int cpu_id):=20
>      Bind the KSEG with ID kseg_id to the CPU with ID cpu_id. This=20
>      system call returns the CPU ID that the KSEG is
>      bound to, or -1 if there is an error (invalid CPU ID, or the=20
>      KSEG's concurrency is greater than 1).=20
>=20
> [..]
> 	I think the KSEG can bind itself. Same for priority..
> 	no need to specify KSEG.. It's implicit.

I'm dubious about manual management (in the kernel) of processor binding of
KSEs.  Doing this puts a number of constraints on how we reimplement the
kernel scheduler that I think could make things more complex and
potentially cause correctness issues.  I don't think we need to argue this
particular point much at the moment though; it's a decision that can be put
off until later, and we can change our minds even then. =3D)

We could conceivably leave CPU binding out entirely during the first pass.

> [..]
> 	We also need a 'yield' version of the usleep call.
> 	Note that a completing syscall that is already sleeping
> 	may reawaken the yielded KSE in order to complete
> 	after which it will upcall again in order to let the UTS
> 	schedule the satidfied thread.
> =09
> 	We also need a KSE_EXIT() for when we know we don't need it any more.

That means that the kseg_concurrency_dec() syscall above really needs to
be:

kse_destroy(kse_id kse)

That in turn means that arguments in various places need to explicitly
include kse IDs, whereas I had completely avoided that in the paper.  I
think your suggestions as to how KSEC state should be handled are pretty
good, so am in general agreement with these suggestions.

> I also argue with the following assertion:
>=20
> "Additionally, soft processor affinity for KSEs is important=20
> to performance. KSEs are not generally bound to CPUs, so
> KSEs that belong to the same KSEG can potentially compete=20
> with each other for the same processor; soft processor
> affinity tends to reduce such competition, in addition to=20
> well-known benefits of processor affinity. "
>=20
> I would argue that limiting (HARD LIMIT) one KSE per KSEG per processor
> has no ill effects and simplifies some housekeeping.  KSECs can move betw=
een
> KSEs in the same KSEG isn a soft-affinity manner to achieve the same thing
> and being able to guarantee that the KSEs of a KSEG are never competing
> for the same processro ensures that they will never pre-empt each other
> which in turn simplifies soem other locking assumptions that must be made
> both inthe kernel and in the UTS. (Not proven but my gut feeling).
> Thus on a uniprocessor, the will only ever be as many KSEs as there are K=
SEGs.
> Since blocking syscalls return, this has no effect on the threading pictu=
re.
> There are still Multiple KSECs available.

Like I said above, I don't think that this sort of manual management of
KSEs in the scheduler is a good idea, for complexity and correctness issues.

> In 3.6.1 You prove that we an have enough storage to store thread state of
> KSECs.
>=20
> I would like to suggest that it can be proven as follows:
> Every user thread includes a thread control block that includes enough
> storage for thread context. Since every system call is made by a thread, =
and=20
> the 'context' information for the KSE on which the syscall is being made=
=20
> inclides a pointer to that storage, the blocked and resuming syscalls
> have that storage available to store their state. The context structures
> can be of a fixed known format and include an pointer to be used in linki=
ng them=20
> together in the 'completed and runnable' queue pointed to by the KSEU str=
ucture
> that is handed to the UTS by the upcall. Therefore, there is quaranteed
> to be enough storage.

I like this better.  It's much easier to feel confident that it's
correct. =3D)

> in the section:
> "3.7 Upcall parallelism=20
>=20
> This section is not yet adequately fleshed out. Issues to consider: "
> [varous issues shown]
>=20
> Using my scheme this is not an issue.

Indeed.  Very good. =3D)

> "What is your scheme?" I hear you ask.
>=20
> Basically in implementation if the above scheme with a few twists.
>=20
> 1/ Starting a KSE (as above) gives it it's mailbox.
> 2/ The KSE is only runnable on a processor on which there is no KSE from =
that
> KSEG
> already running. It tries really hard not to shift CPUs. No other KSE
> will be using that mailbox, thus no other processor in that KSEG.

As mentioned (twice) above, I don't necessarily think this is a good idea.

> 3/ The mailbox includes a location that the kernel will look at to find a
> pointer
> to the (in userspace) thread context block (KSEU?). When the UTS schedule=
s a=20
> thread, it fills in this location. until then it is NULL, meaning that th=
e UTS
> itself is running. All the time the thread is running this pointer os val=
id
> so even if the thread is pre-empted, without warning by the kernel, the
> pointer can be used to store it's state.
> 4/ When a process is blocked and an upcall happens, the kernel zero's out=
=20
> that location, and takes a copy of it in teh KSEC that stores the syscall=
 state.
> 5/ When a syscall is continued, and completes, the location given above
> (which was stored along with the sleeping syscall state) is used
> to store the state of the returning syscall, just as if it had returned a=
nd then
> done=20
> a yield(). It is then linked onto a list of 'completed syscalls' held by =
the
> kernel.
> 6/ When the next upcall into that KSEG is performed, it first
> reaps all the completed syscall blocks, and hangs them
> off the mailbox for the upcalling KSE in a known location.=20
> The UTS when it runs from the upcall
> discovers all the completed syscalls, which, to it
> look like a whole list of yield()'d threads, and puts them onto its=20
> run-queue according to the priority of each, then schedules the next
> highest priority thread.

Cool, I think your ideas fix almost all of the outstanding problems with
the paper (and it's simpler too!).  We still have a few differences of
opinion, such as KSE-CPU binding, but this design is getting quite close to
what I would consider ready for implementation.

I'll try to update the KSE paper in a week or so to fold in your ideas and
any others that come up at USENIX.

Thanks,
Jason

> enough for now..  more on the whiteboard at USENIX.. (what you're not goi=
ng?
> We'll take notes, ok?)

Believe me, I really wish I were there. =3D(

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20010627144641.K47186>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation