Date: Sat, 20 Sep 1997 05:58:01 +0000 (GMT) From: Terry Lambert <tlambert@primenet.com> To: jlemon@americantv.com (Jonathan Lemon) Cc: tlambert@primenet.com, nate@mt.sri.com, current@FreeBSD.ORG Subject: Re: FYI: regarding our rfork(2) Message-ID: <199709200558.WAA20208@usr02.primenet.com> In-Reply-To: <19970919221431.23526@right.PCS> from "Jonathan Lemon" at Sep 19, 97 10:14:31 pm
next in thread | previous in thread | raw e-mail | index | archive | help
> > This is an unsatisfying soloution, mostly because kernel > > threads block on blocking calls from user threads. This > > means that I can only ever have N blocking calls outstanding, > > and a total of (M-N) threads, which are ready to run will > > not get quantum, regardless of the scheduling class used. > > What about other kernel/user thread implementations? Eg: scheduler > activations, as put forth by Anderson, et.al. From what they describe, > there is no limit to the number of blocking calls a user-level process > can make. Unfortunately, I feel that they have glossed over some of the > implementation details in their paper, making it difficult to evaluate. Yes. To "ungloss": A simple version of activations can be had by having a split call context, and utilizing an async call gate, with markers. An async call gate with markers means that there is an alternate gate for all system calls. In the case that the system call won't block, the call proceeds normally to completion (getpid()). In the cass that the call would block (it has a marker in the sysent[} structure tagging it as a potentially blocking call), the call proceeds normally, until such time as it would block. If it doesn't block, then it proceeds normally to completion (read() with the page in core). If it blocks, then you pull a "call context record" off a freelist, and point to the process environment, so it can be restored on call completion (read() with the page not in core, etc.). This "call context record" contains the kernel stack for the call; you replace the process stack with the call stack, and return to the caller on the new kernel stack with "EASYNC" to flag that the call was queued rather than completed. The actual return from the call is abrogated to be an error indication. All out-of-band returns are disallowed: an extra 0th parameter is inserted before the actuall call arguments to pass a pointer that is set to "NULL" if the call completes, and the address of the context if it's queued. Ideally, the call gate sets this to NULL in user space before calling. This os OK, since only one call entry on the proc will exist simultaneously for a given kernel schedulable entity (process/kernel thread/whatever). In the kernel, the sleep is scheduled on the original stack and the context record, as a "context sleep". The "EASYNC" can be treated by the cooperative user space scheduler as an "activation" in the Anderson sense, but has slightly simplified semantics because of the reduced conditions under which kernel code must call back to user space. Unlike the activations described in the Anderson paper, there are no calls from kernel to user space. An "EASYNC" return is a request to the user space scheduler to schedule another thread. When the queued call completes, it uses the context record to update the user data. Since it has the original kernel stack at the time of the call that went async, it can complete it's output. It does this by storing the return value(s) in the context record, and doing copyouts, if necessary, to user parameters. Because the context points to the user process, the page table data is available for this to be successful. It then queues the completed context record on the processes "completed" list, hung off the proc struct. Now the tricky part: notifying the user process that an async gated call has completed. Typically, completion notification (and an "activation" for the thread waiting for the event) wants to occur: 1) When another call has been made and gone "EASYNC". This case can be handled by another error return, "EASYNCDONE", which both notifies that the current call has gone async, that there are one or more completed async calls that want to have their status reaped (ie: pending "activations"). A seperate call is used to return the queued completion contexts to user space. 2) When another call has been made that *won't* go async. This requires a "fake completion". To do this, "EASYNCSYNC" is returned on call completion, even though the call has not gone async. The context record pointer returned is fake; it's only purpose is to provide access to the return values from the completed call. But in so doing, the user space scheduler receives an "activation" for other async calls which have subsequently completed. The same call is used to reap the status as in step #1, using the fake context. Additional contexts are reaped as necessary. 3) When all user threads are blocking. Typically, this is handled by queueing the blocking operation on a context, as normal, but then *not* returning to user space until a completion has been queued on the process: ie: actually sleeping the process. To recover from this, the first call to return returns "EASYNCSYNC" and recoveres as if it were a non-blocking call providing an "activation" for prior blocking calls, as in #2, above. In other words, the process blocks with an effective "poll" or "select" on completion events. This saves an additional call analogous to "aiowait". An analogous call to "aiocancel" is not necessary; normal signal and kernel process termination is in effect. This is actually the reason I tend to recommend an async call gate whenever kernel threading comes up... probably this wasn't very obvious to most people until now; it might have looked as if I had gone off at an oblique angle with no justification... 8-). Regards, Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199709200558.WAA20208>