From owner-freebsd-current@FreeBSD.ORG  Sun Oct 29 03:44:56 2006
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
X-Original-To: freebsd-current@freebsd.org
Delivered-To: freebsd-current@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 9109A16A407;
	Sun, 29 Oct 2006 03:44:56 +0000 (UTC)
	(envelope-from dillon@apollo.backplane.com)
Received: from apollo.backplane.com (apollo.backplane.com [216.240.41.2])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 2129F43D5A;
	Sun, 29 Oct 2006 03:44:56 +0000 (GMT)
	(envelope-from dillon@apollo.backplane.com)
Received: from apollo.backplane.com (localhost [127.0.0.1])
	by apollo.backplane.com (8.13.7/8.13.4) with ESMTP id k9T3itqx054921;
	Sat, 28 Oct 2006 20:44:55 -0700 (PDT)
Received: (from dillon@localhost)
	by apollo.backplane.com (8.13.7/8.13.4/Submit) id k9T3itAw054920;
	Sat, 28 Oct 2006 20:44:55 -0700 (PDT)
Date: Sat, 28 Oct 2006 20:44:55 -0700 (PDT)
From: Matthew Dillon <dillon@apollo.backplane.com>
Message-Id: <200610290344.k9T3itAw054920@apollo.backplane.com>
To: Robert Watson <rwatson@freebsd.org>
References: <45425D92.8060205@elischer.org>
	<200610281132.21466.davidxu@freebsd.org>
	<20061028105454.S69980@fledge.watson.org>
	<20061028194125.GL30707@riyal.ugcs.caltech.edu>
	<20061028204357.A83519@fledge.watson.org>
Cc: Julian Elischer <julian@elischer.org>, Paul Allen <nospam@ugcs.caltech.edu>,
	David Xu <davidxu@freebsd.org>, freebsd-current@freebsd.org
Subject: Re: Comments on the  KSE option
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
	<freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>, 
	<mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 29 Oct 2006 03:44:56 -0000


:I think the notion of fairness is orthogonal to M:N threading.  M:N is about 
:efficiently representing user threading to kernel space, as well as avoiding 
:kernel involvement in user context switches when not needed.  Fairness is 
:about how the kernel allocates time slices to user processes/threads. 
:Fairness can be implemented for both 1:1 and M:N, with the primary differences 
:being in bookkeeping.

    Yes, this is precisely what I mean.  Very well said.

    What we are talking about here is primarily algorithmic complexity
    and physical resource limitations (e.g. like kernel memory).  Having
    the kernel scheduler only deal with (N) threads, where N is limited by
    the number of physical cpus, is a far easier problem for the kernel
    to solve in all respects then having the kernel deal with (M*N)
    individual threads.  I personally see no reason why a program
    couldn't have 10,000 threads, or 100,000 threads, or a million threads,
    but the kernel is the wrong place to try to manage them if your system
    only has N cpus (N=2,4,8,16,32, etc).  You have to ask yourself, what
    exactly is the kernel accomplishing trying to manage all those threads
    for a single application when it only has N cpu contexts to work with
    anyhow?  The answer is: The kernel should only have to worry about the
    N cpu contexts and kernel memory reosurces for those contexts.

					----

    From the point of view of POSIX threading and a resource limits, people
    need to understand two things:  

    (1) setrlimit was NEVER designed as a system moderation tool.  It
    was designed to cause runaway programs to fail, period.  setrlimit
    cannot, in fact, be used as a system moderation tool.  Not very well
    anyway.  setrlimit especially breaks down when you have a huge range
    of acceptable values, because higher values tend to muliply out and
    you wind up losing the protection that setrlimit was designed to
    supply.  A good example of this is having a per-process descriptor
    limit AND a per-user process limit.  X*Y often exceeds the size of
    the kernel's global descriptor table.  Oops!

    (2) Just because the POSIX scheduler implements all sorts of different
    scopes and priority schemes says NOTHING AT ALL about how programs
    operating under such a scheduler should be apportioned cpu relative
    to OTHER PROGRAMS WHICH ARE INDEPENDANTLY RUNNING ON THE SYSTEM.  POSIX
    is an abstraction (or virtualization out of available resources),
    just like everything else.  If you try to treat it as a hard requirement
    the only result will be a broken system that might happily run everything
    else into the ground and stop allowing root ssh logins in order to 
    accomodate a badly written POSIX program.  There are many third party
    applications that set POSIX priorities, in particular realtime
    priorities, that I'd rather they not actually use.  Most of these 
    programs set these priorities based on the author's attempt to tune 
    them on a single operating system (e.g. linux) and in a single operating
    environment. 

    All a program can ever really do when requesting POSIX scheduling 
    resources is compete against itself.  It is the system operator, at a
    higher level, that must control how those resources compete with 
    other programs.  That should be clear to everyone it is so obvious.

    It is a whole lot easier for the kernel to give the system operator
    this power if the kernel scheduler does not have to juggle thousands
    of threads.  It is very easy to write a scheduler for threaded
    applications when the most you have to deal with is N threads
    (N=ncpus) per application.

					--

    Now lets consider programs which fork() instead of thread.  The argument
    that threading is equivalent to forking from a management standpoint
    is just plain silly.  From a design standpoint, programmers are very
    well aware of the resources required to fork(), and consequently
    per-fork tasks are generally much, MUCH better understood by system
    operators in the management context then per-thread tasks.  per-thread
    tasks tend to be opaque...  you never know how a threaded program might
    be written.  You just cannot treat the two as equivalent or even close
    to equivalent.

						-Matt