Date: Mon, 18 Jan 1999 22:26:32 +0000 (GMT) From: Terry Lambert <tlambert@primenet.com> To: dillon@apollo.backplane.com (Matthew Dillon) Cc: jasone@canonware.com, dillon@apollo.backplane.com, tlambert@primenet.com, hackers@FreeBSD.ORG Subject: Re: Path to SMP Message-ID: <199901182226.PAA04448@usr05.primenet.com> In-Reply-To: <199901171924.LAA51387@apollo.backplane.com> from "Matthew Dillon" at Jan 17, 99 11:24:38 am
next in thread | previous in thread | raw e-mail | index | archive | help
> :Actually, I don't think it's enough to have the ability to run multiple > :"clone()d" processes in supervisor mode, in parallel. The problem is that > :these processes can block in the kernel. For an N->M mapping of > :threads->processes, this means a process can block, thus preventing other > :... > :schedulable user threads from running. In the LinuxThreads model, every > :user thread is associated with a clone()d process, but this doesn't scale > :well, and has the additional problem of slow thread context switches > :... > > There is nothing magic about a supervisor-supervisor context switch, > and the overhead of cloning processes exists only because the full > struct proc is being duplicated - also not necessary. > > All we really need is the concept of a 'task' verses 'process'. A process > is made up of one or more scheduling entities called 'tasks'. tasks are > trivial to create, switch between, and destroy. I think it can be > that simple. A "task" is an execution context. For a thread entering the kernel, this is, in fact, a system call context that contains: (1) A reference to the caller's VM space. (2) A kernel stack (3) The program counter This is exactly what's needed for an async system call interface, as well. > The problem UNIXs have now is that the scheduling entity is also the > resource management entity, so you start to have to go through loops > to make it efficient. I think you meant "hoops"? The problem is that the kernel is doing work on behalf of a user process, either in the trap or the fault entry mechanisms. For the interrupt entry, the kernel is doing work on behalf of hardware -- latently tied to a request by a user process, but not the same thing, since there is a sleep/wakeup mechanism synchronizing the completion notification. In other words, faults and traps have an associated process context, while interrupts don't. I agree that there needs to be the concept of a trap/fault context that is seperate from the idea of a process. That's what an async call gate gives you, in fact: a method of divorcing trap context from a process. > If interrupts are moved into kernel threads, stack overhead becomes much > more predictable so the stack utilization resource might come in at, say, > 8K per thread instead of 16K or 32K per thread. Moving interrupts into kernel threads would be a mistake. Interrupts don't need a context other than the kernel VM space. On the other hand, interrupts should be low latency, and then if there is additional processing that is required seperate from the act of getting the interrupt freed, *that* can go into a seperate "upper level" soft interrupt mechanism (e.g., as queued "work to do"). You see that in the serial driver and the network stacks, but it really applies to all interrupts. The virtue of this for SMP is to allow interrupt processing of shared interrupts to occur on one processor only (a dispatch implies temporary ownership of the Interrupt "resource"), and other interrupts can be simultaneously dispatched to other processors. They are given back to the "virtual wire" when the interrupt is reenabled (acknowledged). It's probably useful to think of these as async contexts, with the same stack and program counter requirements, but a pointer to the kernel, not a user space VM. I think these aren't threads, because anything in the kernel gets the kernel VM space access, and I think they aren't scheduling entities at all, so scheduler information would be overkill. They only execute as a result of being "scheduled" by a hardware event. You could probably statically allocate one of these for every hardware interrupt that's possible, and leave it at that. The interrupt handler would execute on the interrupt private stack. A split interrupt model also has a better chance of meeting the timing constraints of hard real-time hardware task duration semantics. > Also, you can still > implement a semi-synchronous cross-thread call without having to deal > with the overhead of a fully-async design. I don't think you need inter-thread IPC within the kernel, unless you are considering the sleep/wakeup mechnisms as a type of IPC. Even so, interrupts are the result of a slept context, eventually, and they don't have to operate in the context of preempting a kernel stack from a process in order to be able to execute, if done correctly. So even if you are communicating a wakeup *to* a task, you aren't communicating it *from* a task context, you're doing it from an interrupt context. The issue with the scheduler is to consider kernel execution contexts that are the result of a trap as preemptible by faults and interrupts. That is, that CPU's are just another consumable resource, and what you're really scheduling is what context is currently active in the CPU, and not "quantum". Which leaves us with the problem of how to get the CPU's into user space; the problem is one of statically associating a context that has no kernel stack with the user space "process". This is a pretty trivial problem to solve, without going to a monolithic number of kernel threads (which would seriously damage your cache locality). So in user space, you have the processes VM, and you have your program counter, but you don't have a kernel stack. The only part you need to replicate to get multiple CPU's into a single user space process is the program counter. You could do this very easily by requesting a "run in me" -- a type of system call. But pretty obviously, you don't exceed the number of CPU's, and the default is probably one, the one implied by the processes being instanced in the first place. More kernel threads don't buy you more CPU cycles, unless you are looking at a kernel thread as a means of allocating N out of every N + M quantums. And that's a scheduling policy issue that doesn't belong in the hands of the user space programmer anyway, but in the hands of the system administrative policy (or by proxy, in a credential restricted scheduling policy API). It's pretty obvious from this that the context an interrupt needs vs, the context that a process needs are two very different animals. One could easily imagine queueing a blocking system call from CPU #1, and that call being serviced by CPU #4, while the enqueueing call itself returns immediately to user space, and the process continuing to have its code executed by CPU's #1 and #3. Returning to the "run in me" call, we see that it's a component of the user space call conversion scheduler, and not anything to do with kernel threads. Because the calls are made asynchronously, the process need not give away its quantum, merely to make a system call. In actuality, the act of blocking a kernel thread on a system call is what causes unnecessary process context switch overhead, and the related overhead normally associated with the "L1 cache busting" and the APIC bus contention overhead from the IPI's to implement MESI based cache coherency that occurs from unbounded migration. So a design that uses async calls, in fact, neatly sidesteps the cache and scheduling problems that follow from the idea of blocking a kernel thread and switching to a different context because of the block. Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199901182226.PAA04448>