From owner-freebsd-hackers  Sat Jul 29 19:16:18 1995
Return-Path: hackers-owner
Received: (from majordom@localhost)
          by freefall.cdrom.com (8.6.11/8.6.6) id TAA10035
          for hackers-outgoing; Sat, 29 Jul 1995 19:16:18 -0700
Received: from cs.weber.edu (cs.weber.edu [137.190.16.16])
          by freefall.cdrom.com (8.6.11/8.6.6) with SMTP id TAA10029
          for <freebsd-hackers@freebsd.org>; Sat, 29 Jul 1995 19:16:15 -0700
Received: by cs.weber.edu (4.1/SMI-4.1.1)
	id AA10428; Sat, 29 Jul 95 20:08:37 MDT
From: terry@cs.weber.edu (Terry Lambert)
Message-Id: <9507300208.AA10428@cs.weber.edu>
Subject: Re: pthreads
To: julian@ref.tfs.com (Julian Elischer)
Date: Sat, 29 Jul 95 20:08:27 MDT
Cc: bakul@netcom.com, freebsd-hackers@freebsd.org
In-Reply-To: <199507300010.RAA07970@ref.tfs.com> from "Julian Elischer" at Jul 29, 95 05:10:07 pm
X-Mailer: ELM [version 2.4dev PL52]
Sender: hackers-owner@freebsd.org
Precedence: bulk

> Kirk McKusic and co. had a discussion on this topic
> when I didi the BSD4.4 course at UCB..
> they were of the opinion that with recent changes to the
> efficiency of forking, the answer was to create the new
> 'rfork' call, where a forking process can decide what resources it wants
> to share with it's child..
> options include:
> text space data space, stacks, file descriptor tables etc.

Sequent has a call called "sfork", which I implemented in the
UnixWare kernel using a proc structure change so you can set the
inheritance flag.

The point is to inhereit the per process open file table on a fork.

There was code posted here to do that using a new sfork system
call.

Other than global context data (which is seperable in an application
that was written to run in a threaded environment anyway), there is
no reason to run in a threaded environment that supplies effectively
nothing more than system call contexts, unless you like allocating
your own limited stacks statically at thread start.

Stack sharing is a dumb idea; if everything you mentioned was
shared, then what you've invented is vfork without calling
exec.

That's already been invented.

> using this approach, how do you tell two processes that are sharing 
> all resources from threads?

If you can't tell two processes from two threads, then that is
*exactly* the supporting argument *against* using threads instead
of simply using processes instead.

The implementation difference is that the pointer to your global data
needs to point explicitly to shared memory (instead of implicitly).
A tradeoff between "thread_create" startup code complication and
"shmget" startup code complication.

The other difference, which shows up only in an SMP environemnt, is
that you can more easily independently schedule a process than a
thread because of the mutex complications involved.

The point of LWP on SunOS was to cause a process to consume as much
of its scheduling quantum as it could possibly consume.  Thus it
avoids the process context switch overhead for as long as possible,
which is something to avoid, especially since *that* is where you
lose cache locality.

The kernel thread implementation buys you minimal benefit in terms
of TLB flushing over a full context switch (assuming the threads are
otherwise incapable of differentiating themselves).  You still eat
the register set flushing (on SPARC) and the stack switch and the
L1 cache invalidation, etc., etc.

A minimal benefit for the added cost, and one that doesn't require
that particular implementation to achieve.


Where are kernel threads good?

1)	As contexts for kernel level tasks and daemons; LFS's cleaner
	and the standard updated could benefit.  So could the
	implementation of external pagers and CPU emulation for
	binary compatability.

2)	To avoid crossing protection domains for system level daemons;
	this is a minimal benefit, since things like nfsd and biod
	have implemented this with alternate technology.

3)	For SMP scalability ...*but* only when combined with some
	form of cooperative scheduling to user space sync-to-async
	conversions with the thread set internal scheduling.  The
	real benefit is, and continues to be, avoidance of context
	switch overhead.  The use of kernel threads to allow the
	user space threads to be scheduled on multiple CPU resources
	is *not* beneficial unless it is combined -- otherwise, you
	might as well be using seperate processes instead for all
	the good it will do you.


Really, a general async mechanism would be the best "next step", with
support for making *any* potentially blocking call into a call queue
plus a context switch.

This is relatively easy to implement at the system call and libc
level.


					Terry Lambert
					terry@cs.weber.edu
---
Any opinions in this posting are my own and not those of my present
or previous employers.