From owner-freebsd-hackers  Thu Jul  2 16:16:41 1998
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Received: (from majordom@localhost)
          by hub.freebsd.org (8.8.8/8.8.8) id QAA27382
          for freebsd-hackers-outgoing; Thu, 2 Jul 1998 16:16:41 -0700 (PDT)
          (envelope-from owner-freebsd-hackers@FreeBSD.ORG)
Received: from smtp01.primenet.com (daemon@smtp01.primenet.com [206.165.6.131])
          by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id QAA27376
          for <freebsd-hackers@FreeBSD.ORG>; Thu, 2 Jul 1998 16:16:38 -0700 (PDT)
          (envelope-from tlambert@usr09.primenet.com)
Received: (from daemon@localhost)
	by smtp01.primenet.com (8.8.8/8.8.8) id QAA29672;
	Thu, 2 Jul 1998 16:16:35 -0700 (MST)
Received: from usr09.primenet.com(206.165.6.209)
 via SMTP by smtp01.primenet.com, id smtpd029649; Thu Jul  2 16:16:30 1998
Received: (from tlambert@localhost)
	by usr09.primenet.com (8.8.5/8.8.5) id QAA12148;
	Thu, 2 Jul 1998 16:16:22 -0700 (MST)
From: Terry Lambert <tlambert@primenet.com>
Message-Id: <199807022316.QAA12148@usr09.primenet.com>
Subject: Re: pthreads
To: rotel@indigo.ie
Date: Thu, 2 Jul 1998 23:16:21 +0000 (GMT)
Cc: tlambert@primenet.com, jabley@clear.co.nz, freebsd-hackers@FreeBSD.ORG
In-Reply-To: <199807021321.OAA00589@indigo.ie> from "Niall Smart" at Jul 2, 98 02:21:53 pm
X-Mailer: ELM [version 2.4 PL25]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-hackers@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

> > John Birrell rewrote lots of it in -current, with an eye toward
> > bringing the code up to the Draft 10 standard (the ratified standard),
> > and he and John Dyson did a lot of work to support a kernel
> > implementation, also in -current, using rfork() and some rather
> > complicated stack management.
> 
> This is basically sharing a number of kernel processes among a set
> of threads, right?  Do you know if any progress was made towards
> a LWP scheme?  If John Dyson's async I/O code is in place that
> would help a lot on that area I think.

The async I/O is easily used to implement a call conversion scheduler;
it's not much help in an LWP scheme (in the Solaris, not the SunOS,
sense of LWP).  What it buys you is overlapped I/O, which you don't
really get with the current pthreads implementation (it's more of a
"just-in-time" I/O).

Far better than simple async I/O would be an async call gate.  This
would let you make blocking calls that were unrelated to I/O in an
async fashion as well (for example, acquisition of a semaphore).
Alas, according to POSIX, async I/O is the future (though it could
be implemented on an async call gate in a library, and then ignored).

The rfork()-based kernel threading is for SMP scalability.  It is
generally limited to one kernel thread per user space thread.  This
can be subjugated to N kernel threads for M user space threads, M > N,
in two ways.  The first is to allow only N blocking calls to be
outstanding, and to starve those threads that are ready to run, but
waiting on a kernel scheduling context (kernel thread) in which to
run.  The second is to create a new kernel thread when the blocking
threshold (generally by cooperative scheduling: the kernel signals
a user space scheduler thread, which wakes up and spawns a kernel
thread to add to the thread group).

These approached both have problems, but the second has the highest
scalability without starvation of its own threads (but it can't be
throttled on its competition for quantum without some hard limit
that turns it into the first approach when the limit is enforced).


> > John Dyson did a number of patches for CPU affinity
> 
> CPU affinity?  You mean the threading library can pass scheduling
> hints to the kernel for a set of processes?

No.  CPU affinity is for protection of the L1 and L2 cache contents
by making threads "prefer" one CPU over the other(s).  It is an
important prescondition for SMP scaling of multithreaded applications.
Without it, your effective cache is reduced by the Nth root of the
cache size, for N processors.

Each kernel thread or process is, in the abstract, a kernel schedulable
entity.

You want to minimize the context switching between kernel schedulable
entites.  If you make a blocking call on a kernel thread, that kernel
thread is preempted, and the competition is thrown open to all other
threads/processes to be the next scheduled.  This is not very optimal.

An optimal implementation would combine an async call gate, which would
mean that once you got the quantum, the threaded process got to use all
of it, with kernel threads for SMP scalability.  There is some merit to
kernel threads in terms of saying:

	There are a total of E kernel schedulable entites on the
	system; I want my threaded process to get N quanta out of
	every E quanta ( N << E ).

Basically, establishing that a threaded process competes as N processes.
The merit in this approach is very small, however, and is predicated on
two ideas: (1) that a threaded process will be competing with conventional
processes for quantum, and (2) that the process priority system is not
sufficient to make the competition "fair".

Fairness arguments are always arguments about transitioning from an
old system to a new system.


> Was this threading model an interim measure until someone wrote one
> based on LWP or intended to be the way that it would always be done?

Well, opinions are varied; I've presented mine, above.  If I had to
boil it down to one (long) sentence, I'd say:

	Once the scheduler gives me a quantum, it's *my* quantum,
	and I shouldn't be penalized with context switch overhead
	and losing the remainder of my partially used quantum just
	because I want to make a system call.

> There are a number of problems with this approach (outlined in
> a paper called "Scheduler Activations: Effective Kernel Support for
> the User Level Management of Parallelism", ask me for a copy if you
> want one) althought it is much easier to implement than a LWP based
> model.

I've read the "activations" paper.  I don't like them; they imply a
message passing architecture.  There are also unaddressed starvation
issues that occur when you are ready to block and some other thread
in your group is ready to run.  Without an overall accounting of
quanta outside your programs virtual machine, there are problems.


					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message