From owner-freebsd-current@FreeBSD.ORG  Sun Oct 29 17:33:50 2006
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
X-Original-To: freebsd-current@freebsd.org
Delivered-To: freebsd-current@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id AE43816A412
	for <freebsd-current@freebsd.org>; Sun, 29 Oct 2006 17:33:50 +0000 (UTC)
	(envelope-from cswiger@mac.com)
Received: from pi.codefab.com (pi.codefab.com [199.103.21.227])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 0108743D49
	for <freebsd-current@freebsd.org>; Sun, 29 Oct 2006 17:33:49 +0000 (GMT)
	(envelope-from cswiger@mac.com)
Received: from localhost (localhost [127.0.0.1])
	by pi.codefab.com (Postfix) with ESMTP id 5D34B5E0A;
	Sun, 29 Oct 2006 12:33:49 -0500 (EST)
X-Virus-Scanned: amavisd-new at codefab.com
Received: from pi.codefab.com ([127.0.0.1])
	by localhost (pi.codefab.com [127.0.0.1]) (amavisd-new, port 10024)
	with ESMTP id ST5gT4LQw3Bu; Sun, 29 Oct 2006 12:33:47 -0500 (EST)
Received: from [192.168.1.251] (pool-68-161-112-7.ny325.east.verizon.net
	[68.161.112.7])
	(using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
	(No client certificate requested)
	by pi.codefab.com (Postfix) with ESMTP id EF64A5D82;
	Sun, 29 Oct 2006 12:33:46 -0500 (EST)
Message-ID: <4544E5F8.9050300@mac.com>
Date: Sun, 29 Oct 2006 12:33:44 -0500
From: Chuck Swiger <cswiger@mac.com>
User-Agent: Thunderbird 1.5.0.7 (Windows/20060909)
MIME-Version: 1.0
To: Matthew Dillon <dillon@apollo.backplane.com>
References: <45425D92.8060205@elischer.org>
	<200610281132.21466.davidxu@freebsd.org>
	<20061028105454.S69980@fledge.watson.org>
	<20061028194125.GL30707@riyal.ugcs.caltech.edu>
	<20061028204357.A83519@fledge.watson.org>
	<200610290344.k9T3itAw054920@apollo.backplane.com>
In-Reply-To: <200610290344.k9T3itAw054920@apollo.backplane.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: freebsd-current@freebsd.org
Subject: Re: Comments on the  KSE option
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
	<freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>, 
	<mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 29 Oct 2006 17:33:50 -0000

Matthew Dillon wrote:
>: I think the notion of fairness is orthogonal to M:N threading.  M:N is about 
>: efficiently representing user threading to kernel space, as well as avoiding 
>: kernel involvement in user context switches when not needed.  Fairness is 
>: about how the kernel allocates time slices to user processes/threads. 
>: Fairness can be implemented for both 1:1 and M:N, with the primary differences 
>: being in bookkeeping.
> 
>     Yes, this is precisely what I mean.  Very well said.

Agreed.

>     What we are talking about here is primarily algorithmic complexity
>     and physical resource limitations (e.g. like kernel memory).  Having
>     the kernel scheduler only deal with (N) threads, where N is limited by
>     the number of physical cpus, is a far easier problem for the kernel
>     to solve in all respects then having the kernel deal with (M*N)
>     individual threads.

Yes.  A userspace program would like to assume that it can have as many 
parallel threads of execution as it wants to have, but the kernel only needs 
to schedule N of them at a time from the set of runnable tasks.

Most of the debate comes from whether userland threads are counted in M, or 
only userland processes.

>     I personally see no reason why a program
>     couldn't have 10,000 threads, or 100,000 threads, or a million threads,
>     but the kernel is the wrong place to try to manage them if your system
>     only has N cpus (N=2,4,8,16,32, etc).  You have to ask yourself, what
>     exactly is the kernel accomplishing trying to manage all those threads
>     for a single application when it only has N cpu contexts to work with
>     anyhow?

Well, it is solving the same problem that the kernel needs to solve when 
someone sets up a web hosting machine with a hundred jails (or a thousand), 
times anywhere from ten to 200+ preforked httpd processes.

>  The answer is: The kernel should only have to worry about the
>  N cpu contexts and kernel memory reosurces for those contexts.

Yes.  Of course, counting N is becoming more complicated with Hyperthreading 
and multi-CPU cores...and expected values of N are becoming much larger.

>     (2) Just because the POSIX scheduler implements all sorts of different
>     scopes and priority schemes says NOTHING AT ALL about how programs
>     operating under such a scheduler should be apportioned cpu relative
>     to OTHER PROGRAMS WHICH ARE INDEPENDANTLY RUNNING ON THE SYSTEM.

At one point, Mach attempted to solve this by putting all of the 
(system-scope) threads spawned by a particular user into bins (called 
thread_groups, task_group?, IIRC), creating a list, and putting each runnable 
kernel thread into a separate individual bin, and then allocating each of the 
N available units of CPU execution one at a time onto a bin by selecting the 
(non-blocked) thread in that bin with the highest priority.

One would attempt to sort this list by priority, where the priority of a bin 
initially is the priority of the most important runnable thread contained in 
the thread_group; by putting high-priority bins first (ie, kernel ISRs after 
an interrupt, so as to minimize latency), and then tasks owned by the 
superuser, or user tasks have been granted the privilege to create their own 
thread_group to run at higher priority (ie, "soft realtime"), and then normal 
user processes, and then "idle taskgroups", which get placed behind everything 
else and only run on a CPU if there is no runnable thread anywhere.

Once a bin has been granted a unit of CPU time, it is removed from the head of 
the list; if this particular bin still contains runnable threads it's priority 
is reduced temporarily (if it is not privileged like an ISR or a realtime 
task) so that it will appended at or towards the end of the list.

>  POSIX
>     is an abstraction (or virtualization out of available resources),
>     just like everything else.  If you try to treat it as a hard requirement
>     the only result will be a broken system that might happily run everything
>     else into the ground and stop allowing root ssh logins in order to 
>     accomodate a badly written POSIX program.  There are many third party
>     applications that set POSIX priorities, in particular realtime
>     priorities, that I'd rather they not actually use.

OK.  So what administrative interface would you like to use to restrict this?

Or, did you really mean that you don't believe the kernel should even have the 
capability to honor realtime priority?

Does this still apply if the user is your grandparents and the realtime task 
is a DVD viewer?  Go ahead.  Explain to grandma that you'd prefer her movie to 
skip so you can run SSHd reliably.  :-)

I've always thought that Unix should be configured with a safety to avoid 
footshooting, but if the administrator flips the safety, aims straight down, 
and fires, that they (or privileged userland processes which have asked for 
the gun and flipped the safety to "realtime") are allowed to shoot themselves.

>     All a program can ever really do when requesting POSIX scheduling 
>     resources is compete against itself.  It is the system operator, at a
>     higher level, that must control how those resources compete with 
>     other programs.  That should be clear to everyone it is so obvious.
> 
>     It is a whole lot easier for the kernel to give the system operator
>     this power if the kernel scheduler does not have to juggle thousands
>     of threads.  It is very easy to write a scheduler for threaded
>     applications when the most you have to deal with is N threads
>     (N=ncpus) per application.

Whether a process can create additional KSEs (ie, can create threads in the 
system scope) or whether it is restricted to using only process scope should 
be an administrative tunable, just as whether or not realtime priority is allowed.

Given the timesharing orientation that many people find desirable about Unix, 
it seems entirely reasonable to ship the system with threads defaulting to 
process scope and limited so that realtime is not enabled by default...

>     Now lets consider programs which fork() instead of thread.  The argument
>     that threading is equivalent to forking from a management standpoint
>     is just plain silly.  From a design standpoint, programmers are very
>     well aware of the resources required to fork(), and consequently
>     per-fork tasks are generally much, MUCH better understood by system
>     operators in the management context then per-thread tasks.  per-thread
>     tasks tend to be opaque...  you never know how a threaded program might
>     be written.  You just cannot treat the two as equivalent or even close
>     to equivalent.

...However, if system scope is not forbidden, the kernel ought to be optimized 
to handle threaded Apache-2 just the same way it treats the classic preforked 
child process model.  The kernel ought to be willing to run separate threads 
within an individual process like a threaded Apache or a threaded Java at the 
same time on separate CPUs.

-- 
-Chuck