FreeBSD Mail Archives

Date:      Mon, 18 Jan 1999 22:26:32 +0000 (GMT)
From:      Terry Lambert <tlambert@primenet.com>
To:        dillon@apollo.backplane.com (Matthew Dillon)
Cc:        jasone@canonware.com, dillon@apollo.backplane.com, tlambert@primenet.com, hackers@FreeBSD.ORG
Subject:   Re: Path to SMP
Message-ID:  <199901182226.PAA04448@usr05.primenet.com>
In-Reply-To: <199901171924.LAA51387@apollo.backplane.com> from "Matthew Dillon" at Jan 17, 99 11:24:38 am

> :Actually, I don't think it's enough to have the ability to run multiple
> :"clone()d" processes in supervisor mode, in parallel.  The problem is that
> :these processes can block in the kernel.  For an N->M mapping of
> :threads->processes, this means a process can block, thus preventing other
> :...
> :schedulable user threads from running.  In the LinuxThreads model, every
> :user thread is associated with a clone()d process, but this doesn't scale
> :well, and has the additional problem of slow thread context switches
> :...
> 
>     There is nothing magic about a supervisor-supervisor context switch,
>     and the overhead of cloning processes exists only because the full
>     struct proc is being duplicated - also not necessary.  
> 
>     All we really need is the concept of a 'task' verses 'process'.  A process
>     is made up of one or more scheduling entities called 'tasks'.  tasks are
>     trivial to create, switch between, and destroy.  I think it can be
>     that simple.

A "task" is an execution context.  For a thread entering the kernel,
this is, in fact, a system call context that contains:

	(1)	A reference to the caller's VM space.
	(2)	A kernel stack
	(3)	The program counter

This is exactly what's needed for an async system call interface, as
well.

>     The problem UNIXs have now is that the scheduling entity is also the
>     resource management entity, so you start to have to go through loops
>     to make it efficient.

I think you meant "hoops"?

The problem is that the kernel is doing work on behalf of a user
process, either in the trap or the fault entry mechanisms.

For the interrupt entry, the kernel is doing work on behalf of
hardware -- latently tied to a request by a user process, but not
the same thing, since there is a sleep/wakeup mechanism synchronizing
the completion notification.

In other words, faults and traps have an associated process context,
while interrupts don't.


I agree that there needs to be the concept of a trap/fault context that
is seperate from the idea of a process.  That's what an async call gate
gives you, in fact: a method of divorcing trap context from a process.


>     If interrupts are moved into kernel threads, stack overhead becomes much
>     more predictable so the stack utilization resource might come in at, say,
>     8K per thread instead of 16K or 32K per thread.

Moving interrupts into kernel threads would be a mistake.  Interrupts
don't need a context other than the kernel VM space.

On the other hand, interrupts should be low latency, and then if there
is additional processing that is required seperate from the act of
getting the interrupt freed, *that* can go into a seperate "upper level"
soft interrupt mechanism (e.g., as queued "work to do").  You see that
in the serial driver and the network stacks, but it really applies to
all interrupts.

The virtue of this for SMP is to allow interrupt processing of shared
interrupts to occur on one processor only (a dispatch implies temporary
ownership of the Interrupt "resource"), and other interrupts can be
simultaneously dispatched to other processors.  They are given back to
the "virtual wire" when the interrupt is reenabled (acknowledged).

It's probably useful to think of these as async contexts, with the same
stack and program counter requirements, but a pointer to the kernel,
not a user space VM.

I think these aren't threads, because anything in the kernel gets the
kernel VM space access, and I think they aren't scheduling entities at
all, so scheduler information would be overkill.  They only execute as
a result of being "scheduled" by a hardware event.

You could probably statically allocate one of these for every hardware
interrupt that's possible, and leave it at that.  The interrupt handler
would execute on the interrupt private stack.


A split interrupt model also has a better chance of meeting the timing
constraints of hard real-time hardware task duration semantics.


>     Also, you can still 
>     implement a semi-synchronous cross-thread call without having to deal
>     with the overhead of a fully-async design.


I don't think you need inter-thread IPC within the kernel, unless you
are considering the sleep/wakeup mechnisms as a type of IPC.  Even so,
interrupts are the result of a slept context, eventually, and they
don't have to operate in the context of preempting a kernel stack from
a process in order to be able to execute, if done correctly.  So even
if you are communicating a wakeup *to* a task, you aren't communicating
it *from* a task context, you're doing it from an interrupt context.

The issue with the scheduler is to consider kernel execution contexts
that are the result of a trap as preemptible by faults and interrupts.
That is, that CPU's are just another consumable resource, and what you're
really scheduling is what context is currently active in the CPU, and
not "quantum".


Which leaves us with the problem of how to get the CPU's into user space;
the problem is one of statically associating a context that has no kernel
stack with the user space "process".  This is a pretty trivial problem to
solve, without going to a monolithic number of kernel threads (which would
seriously damage your cache locality).

So in user space, you have the processes VM, and you have your program
counter, but you don't have a kernel stack.  The only part you need to
replicate to get multiple CPU's into a single user space process is
the program counter.  You could do this very easily by requesting a
"run in me" -- a type of system call.  But pretty obviously, you don't
exceed the number of CPU's, and the default is probably one, the one
implied by the processes being instanced in the first place.  More
kernel threads don't buy you more CPU cycles, unless you are looking
at a kernel thread as a means of allocating N out of every N + M quantums.
And that's a scheduling policy issue that doesn't belong in the hands
of the user space programmer anyway, but in the hands of the system
administrative policy (or by proxy, in a credential restricted
scheduling policy API).

It's pretty obvious from this that the context an interrupt needs vs,
the context that a process needs are two very different animals.

One could easily imagine queueing a blocking system call from CPU #1,
and that call being serviced by CPU #4, while the enqueueing call
itself returns immediately to user space, and the process continuing
to have its code executed by CPU's #1 and #3.


Returning to the "run in me" call, we see that it's a component of
the user space call conversion scheduler, and not anything to do
with kernel threads.


Because the calls are made asynchronously, the process need not give
away its quantum, merely to make a system call.

In actuality, the act of blocking a kernel thread on a system call is
what causes unnecessary process context switch overhead, and the related
overhead normally associated with the "L1 cache busting" and the APIC
bus contention overhead from the IPI's to implement MESI based cache
coherency that occurs from unbounded migration.

So a design that uses async calls, in fact, neatly sidesteps the
cache and scheduling problems that follow from the idea of blocking
a kernel thread and switching to a different context because of the
block.


					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199901182226.PAA04448>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation