From owner-freebsd-arch Mon Jun 25 2: 7:14 2001 Delivered-To: freebsd-arch@freebsd.org Received: from InterJet.elischer.org (c421509-a.pinol1.sfba.home.com [24.7.86.9]) by hub.freebsd.org (Postfix) with ESMTP id 9039D37B407 for ; Mon, 25 Jun 2001 02:07:05 -0700 (PDT) (envelope-from julian@elischer.org) Received: from elischer.org (InterJet.elischer.org [192.168.1.1]) by InterJet.elischer.org (8.9.1a/8.9.1) with ESMTP id DAA78351; Mon, 25 Jun 2001 03:36:05 -0700 (PDT) Message-ID: <3B36FDB4.74C96ACB@elischer.org> Date: Mon, 25 Jun 2001 02:00:36 -0700 From: Julian Elischer X-Mailer: Mozilla 4.7 [en] (X11; U; FreeBSD 5.0-CURRENT i386) X-Accept-Language: en, hu MIME-Version: 1.0 To: Jason Evans Cc: arch@freebsd.org Subject: Re: Updated KSEs paper References: <20010622184626.B47186@canonware.com> Content-Type: text/plain; charset=iso-8859-2 Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Jason Evans wrote: > > A number of people are going to discuss various subjects at a meeting > during USENIX, one of them being KSEs. I won't be able to attend (a sister > is getting married that day), but wanted to make an updated version of the > paper available to avoid others having to fix the same design problems as > I've already fixed. > > The paper still is not by any means perfect, but it addresses most of the > issues that people brought up in previous discussions on this mailing list. > Feedback and suggestions are welcome. > > http://people.freebsd.org/~jasone/refs/freebsd_kse/freebsd_kse.html > http://people.freebsd.org/~jasone/refs/freebsd_kse.ps > > Jason > > To Unsubscribe: send mail to majordomo@FreeBSD.org > with "unsubscribe freebsd-arch" in the body of the message here are some comments on the KSE API as I see it.... ksec_new(ksec_id, cpu_id, kseg_id): Run a thread on this KSEC (whose ID is ksec_id), on the CPU with ID cpu_id, as part of the KSEG with ID kseg_id. [julian's comment] The UTS knows it's running. It has a pointer to a mailbox which will magically be correct for this KSE. It knows what threads are runnable and can just jump into one of them. If none are runnable, it needs some type of system call that 'yields' the processor. maybe a variant of the usleep call? It got here because either something blocked, or a new KSE was created. These two cases are effectively identical.. we have a place to schedule a thread. It doesn't matter if it's new or recycled... ksec_preempt(ksec_id, kse_state): The KSEC with ID ksec_id was preempted, with userland execution state ksec_state. [..] The action of pre-empting this might write the state into the userland context storage (with a copyout()). Then the UTS wouldn't need notification, right now, just at the next time it might make a difference, i.e. when it next goes to schedule something. If we do it right, it will find this thread on the runnable queue at that time without us needing to do an explicit notification. ksec_block(ksec_id): The KSEC with ID ksec_id has blocked in the kernel. [..] this is treated exactly like the first case. I don't think it needs a separate upcall. ksec_unblock(ksec_id, kse_state): The KSEC with ID ksec_id has completed in the kernel, with with userland execution state ksec_state. [..] I don't thik a separate upcall is always needed. On completion, the state of the returning syscall is written into the userland context, and the context is writen to the "completed and runnable queue". The next time the UTS runs it adds the contents of this queue to its runnable queue, and schedules as per normal. signal(sig_t signum): The process received a signal numbered signum. [..] we haven't decided exactly what this means. This will do as well as anything else I've seen mentionned and better than most. The following system calls are necessary: void kse_init(struct kseu *context): Start using KSEs. context contains the necessary data for the kernel to make upcalls. This function appears to return every time an upcall is made. Initially, there is only one KSEG (ID 0), which has a concurrency level of 1. [..] whenever a concurrency is added the caller must supply a different stack (or dummy stack) for the system call to return on. an added concurrency is in effect an added KSE. Each KSE needs a different stack to upcall on (though the stack may be small as it will have bounded use.) "context" includes pointers to the mailbox that will be used by that KSE. The multiple returns of this call will all magically have that mailbox in their hand so you can preload it with anything the UTS will need on an upcall. int kseg_create(void): Create a KSEG and return its KSEG ID (unique within this process), or -1 if there is an error (resource limit exceeded). [..] I see this as basically an extension of the next call. You automatically get a KSE with that KSEG so it does every thing that creating a new KSE does, and needs the 'context' variable that a KSE would need. int kseg_concurrency(int kseg_id, int adjust): Adjust the concurrency of the KSEG with ID kseg_id. Decrementing the concurrency to 0 destroys the KSEG, as soon as there are no more active KSECs in the KSEG. If adjust is 0, the KSEG is not modified, but the concurrency is still returned. This system call returns the KSEG's instantaneous concurrency level after adjusting it. [..] If you increase the concurrency, you have created new KSEs. They need their own separate upcall stacks (maybe only dummy stacks but.... In any case you need to allocate them one by one. Just setting a concurrency to "what it is now + 2" is not going to work because the new KSEs don;t know where to return to. int kseg_bind(int kseg_id, int cpu_id): Bind the KSEG with ID kseg_id to the CPU with ID cpu_id. This system call returns the CPU ID that the KSEG is bound to, or -1 if there is an error (invalid CPU ID, or the KSEG's concurrency is greater than 1). [..] I think the KSEG can bind itself. Same for priority.. no need to specify KSEG.. It's implicit. [..] We also need a 'yield' version of the usleep call. Note that a completing syscall that is already sleeping may reawaken the yielded KSE in order to complete after which it will upcall again in order to let the UTS schedule the satidfied thread. We also need a KSE_EXIT() for when we know we don't need it any more. I also argue with the following assertion: "Additionally, soft processor affinity for KSEs is important to performance. KSEs are not generally bound to CPUs, so KSEs that belong to the same KSEG can potentially compete with each other for the same processor; soft processor affinity tends to reduce such competition, in addition to well-known benefits of processor affinity. " I would argue that limiting (HARD LIMIT) one KSE per KSEG per processor has no ill effects and simplifies some housekeeping. KSECs can move between KSEs in the same KSEG isn a soft-affinity manner to achieve the same thing and being able to guarantee that the KSEs of a KSEG are never competing for the same processro ensures that they will never pre-empt each other which in turn simplifies soem other locking assumptions that must be made both inthe kernel and in the UTS. (Not proven but my gut feeling). Thus on a uniprocessor, the will only ever be as many KSEs as there are KSEGs. Since blocking syscalls return, this has no effect on the threading picture. There are still Multiple KSECs available. In 3.6.1 You prove that we an have enough storage to store thread state of KSECs. I would like to suggest that it can be proven as follows: Every user thread includes a thread control block that includes enough storage for thread context. Since every system call is made by a thread, and the 'context' information for the KSE on which the syscall is being made inclides a pointer to that storage, the blocked and resuming syscalls have that storage available to store their state. The context structures can be of a fixed known format and include an pointer to be used in linking them together in the 'completed and runnable' queue pointed to by the KSEU structure that is handed to the UTS by the upcall. Therefore, there is quaranteed to be enough storage. 3.6.2 Per-upcall event ordering Since in my scheme there is only one kind of upcall (well, I think signals can also be made to look the same), there is no ordering problem. All information is presented to the UTS at the same time and it can decide which it wants to handle first. in the section: "3.7 Upcall parallelism This section is not yet adequately fleshed out. Issues to consider: " [varous issues shown] Using my scheme this is not an issue. "What is your scheme?" I hear you ask. Basically in implementation if the above scheme with a few twists. 1/ Starting a KSE (as above) gives it it's mailbox. 2/ The KSE is only runnable on a processor on which there is no KSE from that KSEG already running. It tries really hard not to shift CPUs. No other KSE will be using that mailbox, thus no other processor in that KSEG. 3/ The mailbox includes a location that the kernel will look at to find a pointer to the (in userspace) thread context block (KSEU?). When the UTS schedules a thread, it fills in this location. until then it is NULL, meaning that the UTS itself is running. All the time the thread is running this pointer os valid so even if the thread is pre-empted, without warning by the kernel, the pointer can be used to store it's state. 4/ When a process is blocked and an upcall happens, the kernel zero's out that location, and takes a copy of it in teh KSEC that stores the syscall state. 5/ When a syscall is continued, and completes, the location given above (which was stored along with the sleeping syscall state) is used to store the state of the returning syscall, just as if it had returned and then done a yield(). It is then linked onto a list of 'completed syscalls' held by the kernel. 6/ When the next upcall into that KSEG is performed, it first reaps all the completed syscall blocks, and hangs them off the mailbox for the upcalling KSE in a known location. The UTS when it runs from the upcall discovers all the completed syscalls, which, to it look like a whole list of yield()'d threads, and puts them onto its run-queue according to the priority of each, then schedules the next highest priority thread. enough for now.. more on the whiteboard at USENIX.. (what you're not going? We'll take notes, ok?)  -- +------------------------------------+ ______ _ __ | __--_|\ Julian Elischer | \ U \/ / hard at work in | / \ julian@elischer.org +------>x USA \ a very strange | ( OZ ) \___ ___ | country ! +- X_.---._/ presently in San Francisco \_/ \\ v To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message