From owner-freebsd-arch  Sun Nov 28 14:28:42 1999
Delivered-To: freebsd-arch@freebsd.org
Received: from ns1.yes.no (ns1.yes.no [195.204.136.10])
	by hub.freebsd.org (Postfix) with ESMTP id 3AB7F14BD0
	for <freebsd-arch@freebsd.org>; Sun, 28 Nov 1999 14:28:39 -0800 (PST)
	(envelope-from eivind@bitbox.follo.net)
Received: from bitbox.follo.net (bitbox.follo.net [195.204.143.218])
	by ns1.yes.no (8.9.3/8.9.3) with ESMTP id XAA28492
	for <freebsd-arch@freebsd.org>; Sun, 28 Nov 1999 23:28:39 +0100 (CET)
Received: (from eivind@localhost)
	by bitbox.follo.net (8.8.8/8.8.6) id XAA59884
	for freebsd-arch@freebsd.org; Sun, 28 Nov 1999 23:28:37 +0100 (MET)
Received: from alpo.whistle.com (alpo.whistle.com [207.76.204.38])
	by hub.freebsd.org (Postfix) with ESMTP id 3CCC614BD0
	for <arch@freebsd.org>; Sun, 28 Nov 1999 14:28:01 -0800 (PST)
	(envelope-from julian@whistle.com)
Received: from current1.whiste.com (current1.whistle.com [207.76.205.22])
	by alpo.whistle.com (8.9.1a/8.9.1) with ESMTP id OAA26450
	for <arch@freebsd.org>; Sun, 28 Nov 1999 14:28:00 -0800 (PST)
Date: Sun, 28 Nov 1999 14:28:00 -0800 (PST)
From: Julian Elischer <julian@whistle.com>
To: arch@freebsd.org
Subject: Re: Which is the truth? (sycalls and traps)  (fwd)
Message-ID: <Pine.BSF.4.10.9911281412510.544-100000@current1.whistle.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG


Peter says:
:I was rather suprised when I found out just how expensive kernel entry was
:some time ago..  What I was doing was a reentrant syscall that aquired no
:locks and ran about 5 instructions in kernel context..  Anyway, it took
:something like 300 times longer to do that (called via int $0x81) than to
:do a 'call' to equivalent code in userland.  Anyway, with overheads on that
:scale, whether we push 5 or 8 or whatever registers in the handler is
:almost lost in the noise.
:
:Cheers,
:-Peter

Matt says:
    Well, it could be 300x but that's like comparing a cache hit to a cache
    miss - in real terms a UP syscall takes, what, 1-3 uS?  An SMP syscall
    takes 6 uS.  This on a PIII-450.  Both times can be cut down to less
    then 500nS with fairly simple optimizations.  Unless you are doing 
    hundreds of thousands of context switches a second the overhead is in 
    the noise in real terms, and *definitely* in the noise if you tack on
    a task switch in the middle of that.

    Having the kernel do the context switch between threads has a huge number
    of advantages that should outweight or at least equal the minor increase
    in overhead.  A couple of points that have been brought up in recent 
    emails:

	* blockages due to VM faults

All vm faults that do not occur with the Sp in the UTS's stack
(a quick way of finding out if the UTS is running) can be telegraphed to
the UTS, which should be able to schedule another thread.
(If the UTS is running then we just block the entire process)

	* blockages due to file I/O (not even network I/O)
there is no need for this to require 
The kernel to do this. The UTS can be notified and schedule a new task 
with a lot more knowledge of what is needed than the kernel can.
of course there si always the case of co-operative scheduling, where
teh UTS decides and the kernel 'does'.

	* disk parallelism (thread A reads file block from kernel cache,
	  thread B reads file block and has a cache miss).

Once again I don't think this required the kernel to do the change.

	* event synchronization

what events?

	* kernel state

Kernel state? Kernel state is probably going to be associated with the
process, and not with teh KSEs that are sharing its quantum.


    Even if one were to use an asynchronous call gate one then has to deal
    with the additional overhead imposed by the asynch call gate when a 
    syscall could have been run from the disk cache (that is, not block).  
    Personally speaking, I think async call gates are a huge mistake without
    a prioritized, vectorable software interrupt mechanism to go along with
    it.  The current unix signal mechanism is simply not up to the task.

I don't think there is too much overhead...a copyout() of the 
syscall return values.


    There are serious issues with async call gates including potential 
    resource hogging issues that frankly scare the hell out of me.  I would
    prefer a kernel stack for each thread and I would prefer a syscall to
    set a thread runnable/not-runnable.  Such a syscall could specify an
    optional cpu and optional run interval. 

You don't need a system stack for a thread that is not doing IO. you only
need to keep one available per thread that enters the kernel. Wheen the
thread enters userspace again, you can keep the same stack hanging around.
If the thread in user space chages, and the new thread does a syscall,
then it comes back and you still have the same stack sitting around.. 1
stack, N threads.. Until one blocks. then you grab a new one. (or block if
you can't but you should keep a cache of them sitting around.. Most
threading programs that have thousands of threads don't use them to do IO
but to implement active objects or some sort. They expect the
thread-switch overhead to be minuscule, and they can probably make do with
10 KSEs fro 1000 threads.


    There are simply too many things that a UP scheduler does not have access
    to - such as knowing whether a syscall can complete without blocking or
    not - to allow the UP scheduler to actually perform the context switch.

You don't care if it is GOING to block.. you handle that when it happens.


					-Matt
					Matthew Dillon 
					<dillon@backplane.com>


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message