Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 20 Sep 1997 05:58:01 +0000 (GMT)
From:      Terry Lambert <tlambert@primenet.com>
To:        jlemon@americantv.com (Jonathan Lemon)
Cc:        tlambert@primenet.com, nate@mt.sri.com, current@FreeBSD.ORG
Subject:   Re: FYI: regarding our rfork(2)
Message-ID:  <199709200558.WAA20208@usr02.primenet.com>
In-Reply-To: <19970919221431.23526@right.PCS> from "Jonathan Lemon" at Sep 19, 97 10:14:31 pm

next in thread | previous in thread | raw e-mail | index | archive | help
> > 	This is an unsatisfying soloution, mostly because kernel
> > 	threads block on blocking calls from user threads.  This
> > 	means that I can only ever have N blocking calls outstanding,
> > 	and a total of (M-N) threads, which are ready to run will
> > 	not get quantum, regardless of the scheduling class used.
> 
> What about other kernel/user thread implementations?  Eg: scheduler
> activations, as put forth by Anderson, et.al.  From what they describe,
> there is no limit to the number of blocking calls a user-level process
> can make.  Unfortunately, I feel that they have glossed over some of the
> implementation details in their paper, making it difficult to evaluate.

Yes.

To "ungloss":

A simple version of activations can be had by having a split call
context, and utilizing an async call gate, with markers.

An async call gate with markers means that there is an alternate gate
for all system calls.

In the case that the system call won't block, the call proceeds
normally to completion (getpid()).

In the cass that the call would block (it has a marker in the sysent[}
structure tagging it as a potentially blocking call), the call proceeds
normally, until such time as it would block.

If it doesn't block, then it proceeds normally to completion (read()
with the page in core).

If it blocks, then you pull a "call context record" off a freelist,
and point to the process environment, so it can be restored on call
completion (read() with the page not in core, etc.).

This "call context record" contains the kernel stack for the call;
you replace the process stack with the call stack, and return to
the caller on the new kernel stack with "EASYNC" to flag that the
call was queued rather than completed.  The actual return from the
call is abrogated to be an error indication.  All out-of-band returns
are disallowed: an extra 0th parameter is inserted before the actuall
call arguments to pass a pointer that is set to "NULL" if the call
completes, and the address of the context if it's queued.  Ideally,
the call gate sets this to NULL in user space before calling.  This
os OK, since only one call entry on the proc will exist simultaneously
for a given kernel schedulable entity (process/kernel thread/whatever).

In the kernel, the sleep is scheduled on the original stack and
the context record, as a "context sleep".

The "EASYNC" can be treated by the cooperative user space scheduler
as an "activation" in the Anderson sense, but has slightly simplified
semantics because of the reduced conditions under which kernel code
must call back to user space.  Unlike the activations described in
the Anderson paper, there are no calls from kernel to user space.

An "EASYNC" return is a request to the user space scheduler to
schedule another thread.

When the queued call completes, it uses the context record to update
the user data.  Since it has the original kernel stack at the time
of the call that went async, it can complete it's output.  It does
this by storing the return value(s) in the context record, and doing
copyouts, if necessary, to user parameters.  Because the context
points to the user process, the page table data is available for this
to be successful.

It then queues the completed context record on the processes "completed"
list, hung off the proc struct.

Now the tricky part: notifying the user process that an async gated
call has completed.

Typically, completion notification (and an "activation" for the thread
waiting for the event) wants to occur:

1)	When another call has been made and gone "EASYNC".  This
	case can be handled by another error return, "EASYNCDONE",
	which both notifies that the current call has gone async,
	that there are one or more completed async calls that want
	to have their status reaped (ie: pending "activations").  A
	seperate call is used to return the queued completion contexts
	to user space.

2)	When another call has been made that *won't* go async.  This
	requires a "fake completion".  To do this, "EASYNCSYNC" is
	returned on call completion, even though the call has not gone
	async.  The context record pointer returned is fake; it's only
	purpose is to provide access to the return values from the
	completed call.  But in so doing, the user space scheduler
	receives an "activation" for other async calls which have
	subsequently completed.  The same call is used to reap the
	status as in step #1, using the fake context.  Additional
	contexts are reaped as necessary.

3)	When all user threads are blocking.  Typically, this is handled
	by queueing the blocking operation on a context, as normal, but
	then *not* returning to user space until a completion has been
	queued on the process: ie: actually sleeping the process.  To
	recover from this, the first call to return returns "EASYNCSYNC"
	and recoveres as if it were a non-blocking call providing an
	"activation" for prior blocking calls, as in #2, above.  In other
	words, the process blocks with an effective "poll" or "select"
	on completion events.  This saves an additional call analogous
	to "aiowait".

An analogous call to "aiocancel" is not necessary; normal signal and
kernel process termination is in effect.

This is actually the reason I tend to recommend an async call gate
whenever kernel threading comes up... probably this wasn't very
obvious to most people until now; it might have looked as if I had
gone off at an oblique angle with no justification... 8-).


					Regards,
					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199709200558.WAA20208>