From owner-freebsd-arch  Mon Apr 23  8:27:53 2001
Delivered-To: freebsd-arch@freebsd.org
Received: from rina.r.dl.itc.u-tokyo.ac.jp (rina.r.dl.itc.u-tokyo.ac.jp [133.11.199.247])
	by hub.freebsd.org (Postfix) with ESMTP id B10DB37B423
	for <arch@FreeBSD.ORG>; Mon, 23 Apr 2001 08:27:46 -0700 (PDT)
	(envelope-from tanimura@r.dl.itc.u-tokyo.ac.jp)
Received: (from uucp@localhost)
	by rina.r.dl.itc.u-tokyo.ac.jp (8.11.3+3.4W/3.7W-rina.r-20010412) with UUCP id f3NFPAV75414 ;
	Tue, 24 Apr 2001 00:25:10 +0900 (JST)
Received: (from root@localhost)
	by sohgo.carrots.uucp.r.dl.itc.u-tokyo.ac.jp (8.11.3+3.4W/3.7W) with UUCP id f3NFHCU01590 ;
	Tue, 24 Apr 2001 00:17:12 +0900 (JST)
Received: from bunko.nkth.carrots.uucp.r.dl.itc.u-tokyo.ac.jp (localhost [127.0.0.1])
	by bunko.carrots.uucp.r.dl.itc.u-tokyo.ac.jp (8.11.1+3.4W/3.7W) with ESMTP id f3NEsnN26340 ;
	Mon, 23 Apr 2001 23:54:50 +0900 (JST)
Message-Id: <200104231454.f3NEsnN26340@bunko.carrots.uucp.r.dl.itc.u-tokyo.ac.jp>
Date: Mon, 23 Apr 2001 23:54:49 +0900
From: Seigo Tanimura <tanimura@r.dl.itc.u-tokyo.ac.jp>
To: Matt Dillon <dillon@earth.backplane.com>
Cc: Seigo Tanimura <tanimura@r.dl.itc.u-tokyo.ac.jp>,
	arch@FreeBSD.ORG, bde@zeta.org.au
Subject: Re: Mmap(2) should start just below stack (was: Re: Bumping up {MAX,DFL}*SIZ in i386)
In-Reply-To: <200104070026.f370QfY50900@earth.backplane.com>
References: <200103191056.f2JAuox00630@rina.r.dl.itc.u-tokyo.ac.jp>
	<Pine.BSF.4.21.0103201350200.41190-100000@besplex.bde.org>
	<200103230517.f2N5HXx08605@rina.r.dl.itc.u-tokyo.ac.jp>
	<200104050506.f3556Xw28400@rina.r.dl.itc.u-tokyo.ac.jp>
	<200104070026.f370QfY50900@earth.backplane.com>
User-Agent: Wanderlust/2.4.1 (Stand By Me) SEMI/1.13.7 (Awazu) FLIM/1.13.2 (Kasanui) MULE XEmacs/21.1 (patch 14) (Cuyahoga Valley) (i386--freebsd)
Organization: Digital Library Research Division, Information Techinology Centre, The University of Tokyo
MIME-Version: 1.0 (generated by SEMI 1.13.7 - "Awazu")
Content-Type: text/plain; charset=US-ASCII
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

On Fri, 6 Apr 2001 17:26:41 -0700 (PDT),
  Matt Dillon <dillon@earth.backplane.com> said:

dillon> :|   Process Stack    |
dillon> :+--------------------+ down to 3GB - max of RLIMIT_STACK
dillon> :|Reserved for Process|
dillon> :|       Stack        |
dillon> :+--------------------+ 3GB - max of RLIMIT_STACK
dillon> :|  Mmap(2)ed space   |
dillon> :|  (mmap(2), dynamic | This may be fragmented.
dillon> :|   linker, shared   |
dillon> :|    objects, etc)   |
dillon> :+--------------------+ down to end of bss + max of RLIMIT_DATA
dillon> :|    Mmap(2) Heap    |
dillon> :+--------------------+ end of bss + max of RLIMIT_DATA
dillon> :|Reserved for Malloc |
dillon> :+--------------------+ up to end of bss + max of RLIMIT_DATA (break)
dillon> :|   Malloc(3) Heap   |

dillon>     suid-root programs often adjust resources upwards in order to avoid
dillon>     potential root compromises due to allocation failures at just the
dillon>     wrong time (coupled with a badly written program).  Most commonly this
dillon>     means RLIMIT_DATA will be increased and, of course, many programs will
dillon>     also increase RLIMIT_DATA.  However, the same problem with suid-root
dillon>     programs exists for RLIMIT_STACK as well.

dillon>     So using a RLIMIT_STACK based solution instead of RLIMIT_DATA only
dillon>     partially solves the problem.

dillon>     We also have a similar issue with fork().  Process A fork()'s and
dillon>     adjusts the resource limits for the child process downward or upward.

At that point, we have to reserve a certain size of an address region
either below the max of RLIMIT_STACK or above the max of RLIMIT_DATA.

As the reserved address space is likely to be unused in many cases,
the size should be kept small. While the size of the data handled by a
process can go up to gigabytes, the size of the stack consumed by a
process is only up to several ten megabytes. (unless you are trying to
solve a problem that cannot be solved by a Turing machine :) Hence our
option should be to reserve a region down to 3GB - MAXSSIZ (which is
the lower limit of RLIMIT_STACK) and possibly an additional safety
zone below the stack.

Then a user process vm space looks something like this:

|   Process Stack    |
+--------------------+ down to 3GB - max of RLIMIT_STACK
|Reserved for Process|
|       Stack        |
+--------------------+ 3GB - max of RLIMIT_STACK
|Reserved for growth |
|  of RLIMIT_STACK   |
+--------------------+ 3GB - MAXSSIZ
|  Safety zone for   |
|  buffer overflow   |
+--------------------+ 3GB - (MAXSSIZ + SAFETY_ZONE_SIZE)
|  Mmap(2)ed space   |
|  (mmap(2), dynamic | This may be fragmented.
|   linker, shared   |
|    objects, etc)   |
+--------------------+ down to end of bss + max of RLIMIT_DATA
|    Mmap(2) Heap    |
+--------------------+ end of bss + max of RLIMIT_DATA
|Reserved for Malloc |
+--------------------+ up to end of bss + max of RLIMIT_DATA (break)
|   Malloc(3) Heap   |

It should be enough for SAFETY_ZONE_SIZE to be about 16-32MB.

Since MAXSSIZ + SAFETY_ZONE_SIZE is much smaller than MAXDSIZE,
reserving the stack space and the safety zone is less likely to be the
obstacle to reserve a large mmap(2) space than now.

-- 
Seigo Tanimura <tanimura@r.dl.itc.u-tokyo.ac.jp> <tanimura@FreeBSD.org>

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Mon Apr 23 11:29: 6 2001
Delivered-To: freebsd-arch@freebsd.org
Received: from fledge.watson.org (fledge.watson.org [204.156.12.50])
	by hub.freebsd.org (Postfix) with ESMTP id A585837B422
	for <freebsd-arch@FreeBSD.org>; Mon, 23 Apr 2001 11:28:59 -0700 (PDT)
	(envelope-from robert@fledge.watson.org)
Received: from fledge.watson.org (robert@fledge.pr.watson.org [192.0.2.3])
	by fledge.watson.org (8.11.3/8.11.3) with SMTP id f3NITMf02680
	for <freebsd-arch@FreeBSD.org>; Mon, 23 Apr 2001 14:29:23 -0400 (EDT)
	(envelope-from robert@fledge.watson.org)
Date: Mon, 23 Apr 2001 14:29:22 -0400 (EDT)
From: Robert Watson <rwatson@FreeBSD.org>
X-Sender: robert@fledge.watson.org
To: freebsd-arch@FreeBSD.org
Subject: jailNG
Message-ID: <Pine.NEB.3.96L.1010423141823.91472L-100000@fledge.watson.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG


This weekend I was spending some time tweaking the jail(8) code to improve
it's SMPng-happiness as well as manageability.  Unfortunately, I ended up
rewriting it in the process :-).  I changed the model somewhat so that
jails are now persistently configred, joined, et al, and broke out the
chroot() from the creation/joining process, as with increased namespaces
(such as System V IPC) creating a nice clean failure was increasingly
difficult.  Aspects of individual jails may now be managed using sysctl's,
which appears to work reasonably well.  Clearly there's a lot of work left
to do, but I'd appreciate comments if people are interested:

  http://www.watson.org/~robert/jailng/

Simple example:

dev# ./jailctl 
usage:
  jailctl create [jailname]
  jailctl destroy [jailname]
  jailctl join [jailname] [-c chrootpath] [path] [cmd] [args...]
dev# ./jailctl create test
dev# sysctl -a | grep jail
jail.instance.test.sysvipc_permitted: 0
jail.instance.test.set_hostname_permitted: 1
jail.instance.test.socket_ipv4_permitted: 1
jail.instance.test.socket_unix_permitted: 1
jail.instance.test.socket_route_permitted: 1
jail.instance.test.socket_other_permitted: 0
jail.instance.test.ipv4addr: 0
dev# ./jailctl join test -c /tmp /bin/sh
# ps ax
  PID  TT  STAT      TIME COMMAND
  907  d0  DWJ    0:00.02 /bin/sh
  908  d0  RW+J   0:00.00 ps ax
# exit
dev# ./jailctl destroy test
dev# 

I also have a jailinit(8) in the works which would allow improved
startup/shutdown in the style of init(8) (sans the whole sigchild thing).
Another feature I'd like to add is a jail signal call that allows a signal
to be delivered to all processes inside a jail from outside, allowing an
easier forceable shutdown.

Robert N M Watson             FreeBSD Core Team, TrustedBSD Project
robert@fledge.watson.org      NAI Labs, Safeport Network Services


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Mon Apr 23 13:25:22 2001
Delivered-To: freebsd-arch@freebsd.org
Received: from cicero1.cybercity.dk (cicero1.cybercity.dk [212.242.40.4])
	by hub.freebsd.org (Postfix) with ESMTP id DF2B337B628
	for <freebsd-arch@FreeBSD.ORG>; Mon, 23 Apr 2001 13:25:14 -0700 (PDT)
	(envelope-from hroi@asdf.dk)
Received: from usr00.cybercity.dk (usr00.cybercity.dk [212.242.40.34])
	by cicero1.cybercity.dk (Postfix) with ESMTP id 62AF315FC93
	for <freebsd-arch@FreeBSD.ORG>; Mon, 23 Apr 2001 22:25:12 +0200 (CEST)
Received: from asdf.dk (port18.ds1-noe.adsl.cybercity.dk [212.242.52.19])
	by usr00.cybercity.dk (8.9.3/8.9.3) with ESMTP id WAA61826
	for <freebsd-arch@FreeBSD.ORG>; Mon, 23 Apr 2001 22:25:45 +0200 (CEST)
	(envelope-from hroi@asdf.dk)
Message-ID: <3AE48FFB.69A6142E@asdf.dk>
Date: Mon, 23 Apr 2001 22:26:35 +0200
From: Hroi Sigurdsson <hroi@asdf.dk>
Organization: Expert Knob Twiddlers
X-Mailer: Mozilla 4.76 [en] (Windows NT 5.0; U)
X-Accept-Language: en
MIME-Version: 1.0
Cc: freebsd-arch@FreeBSD.ORG
Subject: Re: jailNG
References: <Pine.NEB.3.96L.1010423141823.91472L-100000@fledge.watson.org>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

Robert Watson wrote:

>   http://www.watson.org/~robert/jailng/

Very nice! What about the possibility of setting a non-overridable
"nice" value on jails or maybe rlimit?

-- 
Hroi Sigurdsson

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Mon Apr 23 16:44:35 2001
Delivered-To: freebsd-arch@freebsd.org
Received: from CPE-61-9-164-106.vic.bigpond.net.au (CPE-61-9-138-241.vic.bigpond.net.au [61.9.138.241])
	by hub.freebsd.org (Postfix) with ESMTP id 20C3437B422
	for <arch@freebsd.org>; Mon, 23 Apr 2001 16:44:32 -0700 (PDT)
	(envelope-from darrenr@reed.wattle.id.au)
Received: (from root@localhost)
	by CPE-61-9-164-106.vic.bigpond.net.au (8.11.0/8.11.0) id f3NNiU624919
	for <arch@freebsd.org>; Tue, 24 Apr 2001 09:44:30 +1000 (EST)
From: Darren Reed <darrenr@reed.wattle.id.au>
Message-Id: <200104232344.JAA10103@avalon.reed.wattle.id.au>
Subject: User-defined bit in sysctl flags ?
To: arch@freebsd.org
Date: Tue, 24 Apr 2001 09:44:05 +1000 (EST)
X-Mailer: ELM [version 2.4ME+ PL37 (25)]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG


What do people think about having a range of bits in oid_kind that are
not used by FreeBSD but are only to be used by ``private'' sysctl handlers?

e.g.

#define CTLFLAG_PRIVATE 0x000ffff0

The idea is so you can do this:

#define SYSCTL_IPF(parent, nbr, name, access, ptr, val, descr) \
        SYSCTL_OID(parent, nbr, name, CTLTYPE_INT|access, \
                   ptr, val, sysctl_ipf_int, "I", descr);
SYSCTL_IPF(_net_inet_ipf, OID_AUTO, fr_tcpidletimeout, CTLFLAG_RW|CTL_PRIV,
           &fr_tcpidletimeout, 0, "");

and have CTL_PRIV be a bit which sysctl_ipf_int understands and not
have to worry about the value of CTL_PRIV ever being afflicted with
double-use by a FreeBSD flag because CTL_PRIV is part of CTLFLAG_PRIVATE.

Any objections to committing it to -current in the next week or so ?

Darren

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Mon Apr 23 17: 0:36 2001
Delivered-To: freebsd-arch@freebsd.org
Received: from fledge.watson.org (fledge.watson.org [204.156.12.50])
	by hub.freebsd.org (Postfix) with ESMTP id 9E5D637B440
	for <freebsd-arch@FreeBSD.ORG>; Mon, 23 Apr 2001 17:00:33 -0700 (PDT)
	(envelope-from robert@fledge.watson.org)
Received: from fledge.watson.org (robert@fledge.pr.watson.org [192.0.2.3])
	by fledge.watson.org (8.11.3/8.11.3) with SMTP id f3O00lf06647;
	Mon, 23 Apr 2001 20:00:47 -0400 (EDT)
	(envelope-from robert@fledge.watson.org)
Date: Mon, 23 Apr 2001 20:00:47 -0400 (EDT)
From: Robert Watson <rwatson@FreeBSD.ORG>
X-Sender: robert@fledge.watson.org
To: Hroi Sigurdsson <hroi@asdf.dk>
Cc: freebsd-arch@FreeBSD.ORG
Subject: Re: jailNG
In-Reply-To: <3AE48FFB.69A6142E@asdf.dk>
Message-ID: <Pine.NEB.3.96L.1010423195108.99299B-100000@fledge.watson.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG


On Mon, 23 Apr 2001, Hroi Sigurdsson wrote:

> Robert Watson wrote:
> 
> >   http://www.watson.org/~robert/jailng/
> 
> Very nice! What about the possibility of setting a non-overridable
> "nice" value on jails or maybe rlimit? 

One issue that does need to be addressed in the new code is a problem
inherited from the old code: a number of services are addressed on the
global scope rather than the jail scope, including resource
limits/accounting.

One challenge in the jail implementation is a way to do this such that the
jail code remains (relatively) cleanly abstracted from the remainder of
the system.  This is generally true of a number of namespace-based
services, including System V IPC.  I've toyed with a number of ideas,
including a p->p_namespace, but haven't reached any firm conclusions yet,
especially regarding situations where multiple issues (not just jail()) 
might be associated with namespace management.  In the mean time, I'll
continue my general cleanup of the authorization code. 

Robert N M Watson             FreeBSD Core Team, TrustedBSD Project
robert@fledge.watson.org      NAI Labs, Safeport Network Services


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Tue Apr 24 11:54:56 2001
Delivered-To: freebsd-arch@freebsd.org
Received: from smtp05.primenet.com (smtp05.primenet.com [206.165.6.135])
	by hub.freebsd.org (Postfix) with ESMTP id 6D46A37B423
	for <arch@freebsd.org>; Tue, 24 Apr 2001 11:54:54 -0700 (PDT)
	(envelope-from tlambert@usr08.primenet.com)
Received: (from daemon@localhost)
	by smtp05.primenet.com (8.9.3/8.9.3) id LAA28456;
	Tue, 24 Apr 2001 11:54:49 -0700 (MST)
Received: from usr08.primenet.com(206.165.6.208)
 via SMTP by smtp05.primenet.com, id smtpdAAAY9aOI3; Tue Apr 24 11:54:43 2001
Received: (from tlambert@localhost)
	by usr08.primenet.com (8.8.5/8.8.5) id LAA02463;
	Tue, 24 Apr 2001 11:55:18 -0700 (MST)
From: Terry Lambert <tlambert@primenet.com>
Message-Id: <200104241855.LAA02463@usr08.primenet.com>
Subject: vm/swap_pager.c swap_pager_swap_init()
To: arch@freebsd.org
Date: Tue, 24 Apr 2001 18:55:14 +0000 (GMT)
Cc: terry@lambert.org
X-Mailer: ELM [version 2.5 PL2]
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

It seems to me that the pages calculation in this function is
wrong:

	/*
	 * Initialize our zone.  Right now I'm just guessing on the number
	 * we need based on the number of pages in the system.  Each swblock
	 * can hold 16 pages, so this is probably overkill.
	 */
	
	n = cnt.v_page_count * 2;

In particular, for a 4G system, it seems that this should be more
bounded, e.g.:

	/*
	 * Provide backing store for only 2*physical memory limit.
	 * This approximately halves the amount of memory otherwise
	 * required in a 4G system, relative to the previous 'n'
	 * calculation.  It could probably be reduced by half again.
	 */
	n = cnt.v_page_count * 2;
	n = min(n, 128*1024);	/* (4G / PAGE_SIZE)  / 16 * 2 */

Irealize that this changes for the Alpha nad IA64, and should be
more general, but I haven't found the address space limitation
defined anywhere.

Comments?


					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Wed Apr 25  8:38:27 2001
Delivered-To: freebsd-arch@freebsd.org
Received: from ringworld.nanolink.com (ringworld.nanolink.com [195.24.48.13])
	by hub.freebsd.org (Postfix) with SMTP id B0C8537B422
	for <arch@FreeBSD.org>; Wed, 25 Apr 2001 08:38:24 -0700 (PDT)
	(envelope-from roam@orbitel.bg)
Received: (qmail 56829 invoked by uid 1000); 25 Apr 2001 15:36:40 -0000
Date: Wed, 25 Apr 2001 18:36:40 +0300
From: Peter Pentchev <roam@orbitel.bg>
To: arch@FreeBSD.org
Subject: gid_t vs. plain int
Message-ID: <20010425183640.C54687@ringworld.oblivion.bg>
Mail-Followup-To: arch@FreeBSD.org
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.2.5i
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

Hi,

OK.  I've (kinda) had enough.

Is there a reason that struct group in <group.h> does not define 'gr_gid'
as a gid_t value, but as a plain int?  This makes all kinds of things
go berserk with gcc -Wall -W, and causes dozens of (totally unneeded)
casts.

Is there some standard that says pw_gid is gid_t, but gr_gid is int?
If not, would anyone be interested in patches (yes, I'm prepared to sweep
the whole source tree), making gr_gid a gid_t?

G'luck,
Peter

-- 
This sentence no verb.

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Wed Apr 25  8:46:29 2001
Delivered-To: freebsd-arch@freebsd.org
Received: from ringworld.nanolink.com (ringworld.nanolink.com [195.24.48.13])
	by hub.freebsd.org (Postfix) with SMTP id 6FAD537B424
	for <arch@FreeBSD.org>; Wed, 25 Apr 2001 08:46:26 -0700 (PDT)
	(envelope-from roam@orbitel.bg)
Received: (qmail 57036 invoked by uid 1000); 25 Apr 2001 15:44:43 -0000
Date: Wed, 25 Apr 2001 18:44:43 +0300
From: Peter Pentchev <roam@orbitel.bg>
To: arch@FreeBSD.org
Subject: Re: gid_t vs. plain int
Message-ID: <20010425184443.D54687@ringworld.oblivion.bg>
Mail-Followup-To: arch@FreeBSD.org
References: <20010425183640.C54687@ringworld.oblivion.bg>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.2.5i
In-Reply-To: <20010425183640.C54687@ringworld.oblivion.bg>; from roam@orbitel.bg on Wed, Apr 25, 2001 at 06:36:40PM +0300
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

On Wed, Apr 25, 2001 at 06:36:40PM +0300, Peter Pentchev wrote:
> Hi,
> 
> OK.  I've (kinda) had enough.
> 
> Is there a reason that struct group in <group.h> does not define 'gr_gid'

of course that should read <grp.h>, not <group.h>.

> as a gid_t value, but as a plain int?  This makes all kinds of things
> go berserk with gcc -Wall -W, and causes dozens of (totally unneeded)
> casts.
> 
> Is there some standard that says pw_gid is gid_t, but gr_gid is int?
> If not, would anyone be interested in patches (yes, I'm prepared to sweep
> the whole source tree), making gr_gid a gid_t?

G'luck,
Peter

-- 
This sentence contains exactly threee erors.

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Wed Apr 25  9:53:52 2001
Delivered-To: freebsd-arch@freebsd.org
Received: from fw.wintelcom.net (ns1.wintelcom.net [209.1.153.20])
	by hub.freebsd.org (Postfix) with ESMTP id DF41A37B423
	for <arch@FreeBSD.ORG>; Wed, 25 Apr 2001 09:53:49 -0700 (PDT)
	(envelope-from bright@fw.wintelcom.net)
Received: (from bright@localhost)
	by fw.wintelcom.net (8.10.0/8.10.0) id f3PGrl824270;
	Wed, 25 Apr 2001 09:53:47 -0700 (PDT)
Date: Wed, 25 Apr 2001 09:53:47 -0700
From: Alfred Perlstein <bright@wintelcom.net>
To: Peter Pentchev <roam@orbitel.bg>
Cc: arch@FreeBSD.ORG
Subject: Re: gid_t vs. plain int
Message-ID: <20010425095347.I1790@fw.wintelcom.net>
References: <20010425183640.C54687@ringworld.oblivion.bg> <20010425184443.D54687@ringworld.oblivion.bg>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.2.5i
In-Reply-To: <20010425184443.D54687@ringworld.oblivion.bg>; from roam@orbitel.bg on Wed, Apr 25, 2001 at 06:44:43PM +0300
X-all-your-base: are belong to us.
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

* Peter Pentchev <roam@orbitel.bg> [010425 08:46] wrote:
> On Wed, Apr 25, 2001 at 06:36:40PM +0300, Peter Pentchev wrote:
> > Hi,
> > 
> > OK.  I've (kinda) had enough.
> > 
> > Is there a reason that struct group in <group.h> does not define 'gr_gid'
> 
> of course that should read <grp.h>, not <group.h>.
> 
> > as a gid_t value, but as a plain int?  This makes all kinds of things
> > go berserk with gcc -Wall -W, and causes dozens of (totally unneeded)
> > casts.
> > 
> > Is there some standard that says pw_gid is gid_t, but gr_gid is int?
> > If not, would anyone be interested in patches (yes, I'm prepared to sweep
> > the whole source tree), making gr_gid a gid_t?

It looks like a worthy task, I would ask Bruce and Wollman about it
before taking it on if it looks like a lot of work just to make sure
it's the right thing.

-- 
-Alfred Perlstein - [alfred@freebsd.org]
Daemon News Magazine in your snail-mail! http://magazine.daemonnews.org/

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Wed Apr 25 10:48:59 2001
Delivered-To: freebsd-arch@freebsd.org
Received: from magellan.palisadesys.com (magellan.palisadesys.com [192.188.162.211])
	by hub.freebsd.org (Postfix) with ESMTP id 6636937B423
	for <arch@FreeBSD.ORG>; Wed, 25 Apr 2001 10:48:56 -0700 (PDT)
	(envelope-from ghelmer@palisadesys.com)
Received: from CAPELLA (capella.palisadesys.com [192.188.162.112])
	(authenticated (0 bits))
	by magellan.palisadesys.com (8.11.2/8.11.2) with ESMTP id f3PHmnZ28561
	(using TLSv1/SSLv3 with cipher RC4-MD5 (128 bits) verified NO);
	Wed, 25 Apr 2001 12:48:49 -0500
From: "Guy Helmer" <ghelmer@palisadesys.com>
To: "Alfred Perlstein" <bright@wintelcom.net>,
	"Peter Pentchev" <roam@orbitel.bg>
Cc: <arch@FreeBSD.ORG>
Subject: RE: gid_t vs. plain int
Date: Wed, 25 Apr 2001 12:49:26 -0500
Message-ID: <IIEMKPDDELBAEMJFGFHIOEDICAAA.ghelmer@palisadesys.com>
MIME-Version: 1.0
Content-Type: text/plain;
	charset="us-ascii"
Content-Transfer-Encoding: 7bit
X-Priority: 3 (Normal)
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook IMO, Build 9.0.2416 (9.0.2910.0)
In-Reply-To: <20010425095347.I1790@fw.wintelcom.net>
X-MimeOLE: Produced By Microsoft MimeOLE V5.50.4522.1200
Importance: Normal
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

On Wednesday, April 25, 2001 11:54 AM Alfred Perlstein wrote:
> * Peter Pentchev <roam@orbitel.bg> [010425 08:46] wrote:
> > On Wed, Apr 25, 2001 at 06:36:40PM +0300, Peter Pentchev wrote:
> > > Hi,
> > >
> > > OK.  I've (kinda) had enough.
> > >
> > > Is there a reason that struct group in <group.h> does not
> define 'gr_gid'
> >
> > of course that should read <grp.h>, not <group.h>.
> >
> > > as a gid_t value, but as a plain int?  This makes all kinds of things
> > > go berserk with gcc -Wall -W, and causes dozens of (totally unneeded)
> > > casts.
> > >
> > > Is there some standard that says pw_gid is gid_t, but gr_gid is int?
> > > If not, would anyone be interested in patches (yes, I'm
> prepared to sweep
> > > the whole source tree), making gr_gid a gid_t?
>
> It looks like a worthy task, I would ask Bruce and Wollman about it
> before taking it on if it looks like a lot of work just to make sure
> it's the right thing.

PR 22210 addresses this issue and a fix, along with comments from Garrett
Wollman about his anticipated fix.  However, the PR is over six months old
:-)

Guy

Guy Helmer, Ph.D.
http://www.palisadesys.com/~ghelmer/
Sr. Software Engineer, Palisade Systems
ghelmer@palisadesys.com


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Wed Apr 25 11: 0:28 2001
Delivered-To: freebsd-arch@freebsd.org
Received: from filk.iinet.net.au (syncopation-dns.iinet.net.au [203.59.24.29])
	by hub.freebsd.org (Postfix) with SMTP id 7C88B37B424
	for <Arch@freebsd.org>; Wed, 25 Apr 2001 11:00:05 -0700 (PDT)
	(envelope-from julian@elischer.org)
Received: (qmail 6093 invoked by uid 666); 25 Apr 2001 18:03:09 -0000
Received: from i181-032.nv.iinet.net.au (HELO elischer.org) (203.59.181.32)
  by mail.m.iinet.net.au with SMTP; 25 Apr 2001 18:03:09 -0000
Message-ID: <3AE71067.FF4BD029@elischer.org>
Date: Wed, 25 Apr 2001 10:59:03 -0700
From: Julian Elischer <julian@elischer.org>
X-Mailer: Mozilla 4.7 [en] (X11; U; FreeBSD 5.0-CURRENT i386)
X-Accept-Language: en, hu
MIME-Version: 1.0
To: Arch@freebsd.org, alfred@freebsd.org,
	Robert Watson <rwatson@FreeBSD.ORG>,
	Daniel Eischen <eischen@vigrid.com>
Subject: KSE threading support (first parts)
Content-Type: multipart/mixed;
 boundary="------------56834A9EA7789B526697FC9C"
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

This is a multi-part message in MIME format.
--------------56834A9EA7789B526697FC9C
Content-Type: text/plain; charset=iso-8859-2
Content-Transfer-Encoding: 7bit

After discussing this with Jason Evans before he took his new position, 
and having looked at his patches from December, and My similar patches from 
january, here is a 'merged' patch.

It breaks the proc structure into 4 parts.

proc... owns all 'total process' resources. (e.g. address space, 
        limits, files)
kseg... KSE 'group'. Anything to do with working out the quanta to
        be given to the threads (KSEs). A scheduling abstraction.
kse.... Actual scheduable entity for a processor
        (if the KSEG has a quantum for it)
ksec... Where a thread stores its context when it is blocked
        so tha the kse can return to either the user, or another 
        unblocked kse to continue using is quanta.

This compiles cleanly and SHOULD run (it did run in an 
earlier incarnation). It is by no means final, but rather
designed to give us a starting point in discussions.

In this view, KSEGs are on the run queue and when they get some 
quanta the KSEs hanging off them are run.
If 2 KSEs are running, the KSEG's quanta are exhausted a twice
the rate. 
Each KSE has a very strong affinity for one processor
and KSECs have a weak affinity for a KSE. If a KSE runs out
of work but has time, it will 'poach' a KSEC from another KSE in the
same KSEG list.

In this patch the linkages are not set up at all.
All that is done is that the structures are
defined and used instead of a monolithic 'proc' struct.
The new structures are 'included' in the  proc structure
to maintain compatibility and to allow code to be changed slowely.

What really needs to be done is for everyone who is interested to go over 
rather arbitrary allocation of fields to structures that
I did and make suggested changes.

Also I've punted on most things to do with signals as we haven't
really discussed how we want signals to be handled in a KSE world..
(ca each KSEG or KSE get individual signals? do we need to 
define a special 'signal' KSE? If so is that all it does?

What happens to the 'u-area'?

how do we define a "cur-kse" similar to curproc?
(do we need one?)
presently the processor state is stored all over the place 
when a process is suspended..
This needs to be brought together so it can be put into the KSEC.
Who understands that stuff?

Some of the next steps would be:
1/ figure out what we want for signals etc..
2/ get the contexts actually stored in the KSEC structure
   when a proces is suspended. (instead of some strange pcb in funny memory
   near the u area)
3/ Set up the linkages between these structures, and
4/ start using 'kse' instead of 'proc' in a bunch of places
and using the linkages to find the appropriate other
structures when needed.
5/ Add code to make new KSEs so that the 1:1 Mapping is no longer
true.
6/ Add syscalls to start making KSEs other than the one that 
is built into the process.
7/ start making upcalls


-- 
      __--_|\  Julian Elischer
     /       \ julian@elischer.org
    (   OZ    ) World tour 2000-2001
---> X_.---._/  
            v
--------------56834A9EA7789B526697FC9C
Content-Type: text/plain; charset=iso-8859-2;
 name="proc.4-26.diff"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
 filename="proc.4-26.diff"

Index: kern/kern_fork.c
===================================================================
RCS file: /unused/cvs/freebsd/src/sys/kern/kern_fork.c,v
retrieving revision 1.110
diff -u -r1.110 kern_fork.c
--- kern/kern_fork.c	2001/03/28 11:52:53	1.110
+++ kern/kern_fork.c	2001/04/25 17:11:22
@@ -390,6 +390,24 @@
 	    (unsigned) ((caddr_t)&p2->p_endcopy - (caddr_t)&p2->p_startcopy));
 	PROC_UNLOCK(p1);
 
+	bzero(&p2->p_kse.ke_startzero,
+	    (unsigned) ((caddr_t)&p2->p_kse.ke_endzero
+			- (caddr_t)&p2->p_kse.ke_startzero));
+	bcopy(&p1->p_kse.ke_startcopy, &p2->p_kse.ke_startcopy,
+	    (unsigned) ((caddr_t)&p2->p_kse.ke_endcopy
+			- (caddr_t)&p2->p_kse.ke_startcopy));
+
+	bzero(&p2->p_ksec.kc_startzero,
+	    (unsigned) ((caddr_t)&p2->p_ksec.kc_endzero
+			- (caddr_t)&p2->p_ksec.kc_startzero));
+	bcopy(&p1->p_ksec.kc_startcopy, &p2->p_ksec.kc_startcopy,
+	    (unsigned) ((caddr_t)&p2->p_ksec.kc_endcopy
+			- (caddr_t)&p2->p_ksec.kc_startcopy));
+
+	bcopy(&p1->p_kseg.kg_startcopy, &p2->p_kseg.kg_startcopy,
+	    (unsigned) ((caddr_t)&p2->p_kseg.kg_endcopy
+			- (caddr_t)&p2->p_kseg.kg_startcopy));
+
 	mtx_init(&p2->p_mtx, "process lock", MTX_DEF);
 	PROC_LOCK(p2);
 
Index: sys/proc.h
===================================================================
RCS file: /unused/cvs/freebsd/src/sys/sys/proc.h,v
retrieving revision 1.160
diff -u -r1.160 proc.h
--- sys/proc.h	2001/04/20 22:34:48	1.160
+++ sys/proc.h	2001/04/25 17:20:51
@@ -147,17 +147,200 @@
  * either lock is sufficient for read access, but both locks must be held
  * for write access.
  */
+
 struct ithd;
 struct nlminfo;
+/*
+ * Here we define the four structures used for process information.
+ * The first is the ksec. It stands for "Kernel Schedulabale Entity Context".
+ * This structure contains all the information as to where a thread of 
+ * execution was when it was suspended, why it was suspended, and anything else
+ * that will be needed to restart it when it is rescheduled. Always
+ * associated with a KSE, but can be reassigned to an equivalent KSE for
+ * load balancing.
+ */
+struct ksec;
+
+/* 
+ * The second structure is the Kernel Schedulable Entity. (KSE)
+ * As long as this is scheduled, it will continue to run any KSECs that
+ * are assigned to it until either it runs out of KSECs or CPU.
+ * It runs on one CPU and is assigned a quantum of time. When a KSEC is
+ * blocked, The KSE continues to run and will search for another KSEC
+ * in a runnable state amongst those it has. It May decide to return to user
+ * mode with a new 'empty' KSEC if there are no runnable KSECs.
+ * KSEs are associated with a KSE for cache reasons, but a sheduled KSE with
+ * no runnable KSECs will try take a KSEC from a sibling KSE before
+ * surrendering its quantum.
+ */
+struct kse;
+
+/*
+ * The KSEG is allocated resources across a number of CPUs.
+ * (Including a number of CPUxQUANTA. It parcels these QUANTA up among
+ * Its KSEs, each of which should be running in a different CPU.
+ * Priority and total avaliable sheduled quanta are properties of a KSEG.
+ * Multiple KSEGs in a single process compete against each other
+ * for total quanta in the same way that a forked child competes against
+ * it's parent process.
+ */
+struct kseg;
+
+/*
+ * A process is the owner of all system resources allocated to a task.
+ * All KSEGs under one process see, and have the same access to, these
+ * resources (e.g. files, memory, sockets, permissions). A process may 
+ * compete for CPU cycles on the same basis as a forked process cluster
+ * by spawning several KSEGs. 
+ */
+struct proc;
+
+/***************
+ * In pictures:
+ With a single run queue used by all processors:
+
+ RUNQ: --->KSEG---KSEG--...             SLEEPQ:[]---KSEC---KSEC---KSEC
+	   |                                   []---KSEC
+	   KSE---KSEC--KSEC--KSEC              []
+	   |                                   []---KSEC---KSEC
+	   KSE--KSEC--KSEC
+
+  (processors run KSEs from the head KSEG until they are exhausted or
+  the KSEG exhausts its quantum) 
+
+With PER-CPU run queues:
+it may be easier to put the KSEs on the run queues directly
+They would be given priorities calculated from the KSEG.
+
+ *
+ *****************/
+
+/*
+ * Kernel runnable context. This is what is put to sleep and reactivated.
+ * (Kernel Schedulable Entity Context)
+ * The first KSE available in the correct group will run this context.
+ * If several are available, use the one on the same CPU as last time.
+ */
+struct	ksec {
+	/*** New fields for KSE linkage ***/
+	/* While it is possible to find the proc via the kse->kseg->proc
+	 * it is directly held here for efficiency  (etc.)
+	 */
+	struct proc	*kc_proc;	/* Associated process. */
+	struct kseg	*kc_kseg;	/* Associated KSEG. */
+	struct kse	*kc_kse;	/* Associated KSE. */
+
+	TAILQ_ENTRY(ksec) kc_ksegq;	/* All ksecs in this kseg */
+	TAILQ_ENTRY(ksec) kc_slpqk;	/* (j) Sleep/run queue. */
+
+	/* the fields below will mutate into those above */
+	TAILQ_ENTRY(proc) kc_procq;	/* (j) Run/mutex queue. */
+	TAILQ_ENTRY(proc) kc_slpq;	/* (j) Sleep queue. */
+		/* The following fields are all zeroed upon creation in fork. */
+#define	kc_startzero kc_dupfd
+	int	kc_flag;		/* (c) P_* flags. */
+	int	kc_sflag;		/* (j) PS_* flags. */
+	int	kc_stat;		/* (j) S* process status. */
+	int	kc_dupfd;		/* (c) ret value from fdopen. XXX */
+	void	*kc_wchan;		/* (j) Sleep address. */
+	const char *kc_wmesg;		/* (j) Reason for sleep. */
+	u_char	kc_lastcpu;		/* (j) Last cpu we were on. */
+	short	kc_locks;		/* (*) DEBUG: lockmgr count of locks */
+	u_int	kc_stops;		/* (c) Procfs event bitmask. */
+	u_int	kc_stype;		/* (c) Procfs stop event type. */
+	char	kc_step;		/* (c) Procfs stop *once* flag. */
+	u_char	kc_pfsflags;		/* (c) Procfs flags. */
+	struct	klist kc_klist;		/* (c) Knotes attached to this proc. */
+	struct mtx *kc_blocked;		/* (j Mutex process is blocked on. */
+	const char *kc_mtxname;		/* (j) Name of mutex blocked on. */
+	LIST_HEAD(, mtx) kc_contested;	/* (j) Contested locks. */
+		/* End area that is zeroed on creation. */
+		/* The following fields are all copied upon creation in fork. */
+	struct	lock_list_entry *kc_sleeplocks; /* (k) Held sleep locks. */
+	register_t kc_retval[2];	/* (k) Syscall aux returns. */
+#define	kc_endzero kc_slpcallout
+#define	kc_startcopy kc_endzero
+	struct	callout kc_slpcallout;/* (h) Callout for sleep. */
+	struct	mdproc kc_md;	/* (k) Any machine-dependent fields. */
+	/* eventually  struct mdksec.... */
+		/* End area that is copied on creation. */
+#define	kc_endcopy kc_addr
+	struct	user *kc_addr;	/* (k) Kernel virtual addr of u-area (CPU). */
+	struct	pasleep kc_asleep;	/* (k) Used by asleep()/await(). */
+};
+
+/*
+ * The schedulable entity that can be given a context to run.
+ * A process may have several of these. Probably one per processor
+ * but posibly a few more. In this universe they are grouped
+ * with a KSEG that contains the priority and niceness
+ * for the group.
+ */
+struct	kse {
+	struct proc	*ke_proc;	/* Associated process. */
+	struct kseg	*ke_kseg;	/* Associated KSEG. */
+	TAILQ_ENTRY(kse) ke_kseq;	/* Queue of KSEs in ke_kseg. */
+	struct ksec	*ke_ksec;	/* Associated KSEC, if running. */
+	TAILQ_HEAD(ke_ksec_hd, ksec);	/* Runnable KSECs waiting on this KSE */
+	struct	pstats *ke_stats;	/* (bk) Accounting/statistics (CPU). */
+/* The following fields are all zeroed upon creation in fork. */
+#define ke_startzero ke_estcpu
+	int	ke_flag;	/* (c) P_* flags. */
+	int	ke_sflag;	/* (j) PS_* flags. */
+	int	ke_stat;	/* (j) S* process status. */
+	u_int	ke_estcpu;	/* (j) Time averaged value of ke_cpticks. */
+	int	ke_cpticks;	/* (j) Ticks of cpu time. */
+	fixpt_t	ke_pctcpu;	/* (j) %cpu during p_swtime. */
+	u_int64_t ke_uu;	/* (j) Previous user time in microsec. */
+	u_int64_t ke_su;	/* (j) Previous system time in microsec. */
+	u_int64_t ke_iu;	/* (j) Previous interrupt time in microsec. */
+	u_int64_t ke_uticks;	/* (j) Statclock hits in user mode. */
+	u_int64_t ke_sticks;	/* (j) Statclock hits in system mode. */
+	u_int64_t ke_iticks;	/* (j) Statclock hits processing intr. */
+	u_int	ke_slptime;	/* (j) Time since last blocked. */
+	u_char	ke_oncpu;	/* (j) Which cpu we are on. */
+	char	ke_rqindex;	/* (j) Run queue index. */
+	int	ke_intr_nesting_level; /* (n) Interrupt recursion. */
+/* End area that is zeroed on creation. */
+/* The following fields are all copied upon creation in fork. */
+#define ke_endzero ke_priority
+#define ke_startcopy ke_endzero
+	u_char	ke_priority;	/* (j) Process priority. */
+	u_char	ke_usrpri; /* (j) User priority based on p_cpu and p_nice. */
+/* End area that is copied on creation. */
+#define ke_endcopy ke_ithd
+	struct	ithd *ke_ithd;	/* (b) For interrupt threads only. */
+};
 
+/*
+ * Kernel-scheduled entity group (KSEG).  The scheduler considers each KSEG to
+ * be an indivisible unit from a time-sharing perspective, though each KSEG may
+ * contain multiple KSEs.
+ */
+struct	kseg {
+	struct proc	*kg_proc;	/* Process that contains this KSEG. */
+	TAILQ_ENTRY(kseg) kg_ksegq;	/* Queue of KSEGs in kg_proc. */
+	TAILQ_HEAD(kg_kse_hd, kse);	/* Queue of KSEs in this KSEG. */
+	TAILQ_HEAD(kg_ksec_hd, ksec);	/* Queue of KSECs in this KSEG. */
+/* The following fields are all copied upon creation in fork. */
+#define	kg_startcopy kg_itcallout
+	struct	callout kg_itcallout;	/* (h) Interval timer callout. */
+	struct	priority kg_pri;	/* (j) Process priority. */
+	char	kg_nice;		/* (j?/k?) Process "nice" value. */
+	struct	rtprio kg_rtprio;	/* (j) Realtime priority. */
+/* End area that is copied on creation. */
+#define	kg_endcopy kg_dummy
+	int kg_dummy;
+};
+
 struct	proc {
-	TAILQ_ENTRY(proc) p_procq;	/* (j) Run/mutex queue. */
-	TAILQ_ENTRY(proc) p_slpq;	/* (j) Sleep queue. */
 	LIST_ENTRY(proc) p_list;	/* (d) List of all processes. */
 
 	/* substructures: */
+	TAILQ_HEAD(p_ksegq, kseg);	/* Queue of KSEGs. */
 	struct	pcred *p_cred;		/* (c + k) Process owner's identity. */
 	struct	filedesc *p_fd;		/* (b) Ptr to open files structure. */
+	/* accumulated stats for all owned KSEs? */
 	struct	pstats *p_stats;	/* (b) Accounting/statistics (CPU). */
 	struct	plimit *p_limit;	/* (m) Process limits. */
 	struct	vm_object *p_upages_obj;/* (a) Upages object. */
@@ -168,7 +351,61 @@
 
 #define	p_ucred		p_cred->pc_ucred
 #define	p_rlimit	p_limit->pl_rlimit
-
+/*
+ * Compatibility defines for while we are using a
+ * single one in the proc struct during development.
+ */
+	struct kseg p_kseg;
+#define	p_itcallout		p_kseg.kg_itcallout
+#define	p_pri			p_kseg.kg_pri
+#define	p_nice			p_kseg.kg_nice
+#define	p_rtprio		p_kseg.kg_rtprio
+
+	struct kse p_kse;
+#define	p_stats			p_kse.ke_stats
+#define	p_estcpu		p_kse.ke_estcpu
+#define	p_cpticks		p_kse.ke_cpticks
+#define	p_pctcpu		p_kse.ke_pctcpu
+#define	p_uu			p_kse.ke_uu
+#define	p_su			p_kse.ke_su
+#define	p_iu			p_kse.ke_iu
+#define	p_uticks		p_kse.ke_uticks
+#define	p_sticks		p_kse.ke_sticks
+#define	p_iticks		p_kse.ke_iticks
+#define	p_slptime		p_kse.ke_slptime
+#define	p_oncpu			p_kse.ke_oncpu
+#define	p_rqindex		p_kse.ke_rqindex
+#define	p_usrpri		p_kse.ke_usrpri
+#define	p_ithd			p_kse.ke_ithd
+#define p_intr_nesting_level	p_kse.ke_intr_nesting_level
+
+	struct ksec p_ksec;
+#define	p_procq			p_ksec.kc_procq
+#define	p_slpq			p_ksec.kc_slpq
+#define	p_dupfd			p_ksec.kc_dupfd
+#define	p_wchan			p_ksec.kc_wchan
+#define	p_wmesg			p_ksec.kc_wmesg
+#define	p_lastcpu		p_ksec.kc_lastcpu
+#define	p_locks			p_ksec.kc_locks
+#define	p_stops			p_ksec.kc_stops
+#define	p_stype			p_ksec.kc_stype
+#define	p_retval		p_ksec.kc_retval
+#define	p_step			p_ksec.kc_step
+#define	p_pfsflags		p_ksec.kc_pfsflags
+#define	p_klist			p_ksec.kc_klist
+#define	p_blocked		p_ksec.kc_blocked
+#define	p_mtxname		p_ksec.kc_mtxname
+#define	p_contested		p_ksec.kc_contested
+#define	p_sleeplocks		p_ksec.kc_sleeplocks
+#define	p_slpcallout		p_ksec.kc_slpcallout
+#define	p_md			p_ksec.kc_md
+#define	p_asleep		p_ksec.kc_asleep
+
+
+	/*
+	 * The following don't make too much sense..
+	 * See the kc_ or ke_ versions of the same flags
+	 */
 	int	p_flag;			/* (c) P_* flags. */
 	int	p_sflag;		/* (j) PS_* flags. */
 	int	p_stat;			/* (j) S* process status. */
@@ -183,80 +420,47 @@
 /* The following fields are all zeroed upon creation in fork. */
 #define	p_startzero	p_oppid
 
-	pid_t	p_oppid;	 /* (c + e) Save parent pid during ptrace. XXX */
-	int	p_dupfd;	 /* (c) Sideways ret value from fdopen. XXX */
+	pid_t	p_oppid;	 	/* (c + e) Save ppid in ptrace. XXX */
 	struct	vmspace *p_vmspace;	/* (b) Address space. */
 
 	/* scheduling */
-	u_int	p_estcpu;	 /* (j) Time averaged value of p_cpticks. */
-	int	p_cpticks;	 /* (j) Ticks of cpu time. */
-	fixpt_t	p_pctcpu;	 /* (j) %cpu during p_swtime. */
-	struct	callout p_slpcallout;	/* (h) Callout for sleep. */
-	void	*p_wchan;	 /* (j) Sleep address. */
-	const char *p_wmesg;	 /* (j) Reason for sleep. */
-	u_int	p_swtime;	 /* (j) Time swapped in or out. */
-	u_int	p_slptime;	 /* (j) Time since last blocked. */
+	u_int	p_swtime;	 	/* (j) Time swapped in or out. */
 
-	struct	callout p_itcallout;	/* (h) Interval timer callout. */
 	struct	itimerval p_realtimer;	/* (h?/k?) Alarm timer. */
-	u_int64_t p_runtime;	/* (j) Real time in microsec. */
-	u_int64_t p_uu;		/* (j) Previous user time in microsec. */
-	u_int64_t p_su;		/* (j) Previous system time in microsec. */
-	u_int64_t p_iu;		/* (j) Previous interrupt time in microsec. */
-	u_int64_t p_uticks;	/* (j) Statclock hits in user mode. */
-	u_int64_t p_sticks;	/* (j) Statclock hits in system mode. */
-	u_int64_t p_iticks;	/* (j) Statclock hits processing intr. */
+	u_int64_t p_runtime;		/* (j) Real time in microsec. */
 
 	int	p_traceflag;		/* (j?) Kernel trace points. */
 	struct	vnode *p_tracep;	/* (j?) Trace to vnode. */
 
-	sigset_t p_siglist;	/* (c) Signals arrived but not delivered. */
+	sigset_t p_siglist;		/* (c) Sigs arrived, not delivered. */
 
 	struct	vnode *p_textvp;	/* (b) Vnode of executable. */
 
 	struct	mtx p_mtx;		/* (k) Lock for this struct. */
 	u_int	p_spinlocks;		/* (k) Count of held spin locks. */
-	char	p_lock;		/* (c) Process lock (prevent swap) count. */
-	u_char	p_oncpu;		/* (j) Which cpu we are on. */
-	u_char	p_lastcpu;		/* (j) Last cpu we were on. */
-	char	p_rqindex;		/* (j) Run queue index. */
-
-	short	p_locks;	/* (*) DEBUG: lockmgr count of held locks */
-	u_int	p_stops;		/* (c) Procfs event bitmask. */
-	u_int	p_stype;		/* (c) Procfs stop event type. */
-	char	p_step;			/* (c) Procfs stop *once* flag. */
-	u_char	p_pfsflags;		/* (c) Procfs flags. */
-	char	p_pad3[2];		/* Alignment. */
-	register_t p_retval[2];		/* (k) Syscall aux returns. */
+	char	p_lock;			/* (c) Process (prevent swap) count. */
+	char	p_pad3[3];		/* Alignment. */
 	struct	sigiolst p_sigiolst;	/* (c) List of sigio sources. */
 	int	p_sigparent;		/* (c) Signal to parent on exit. */
-	sigset_t p_oldsigmask;	/* (c) Saved mask from before sigpause. */
+	sigset_t p_oldsigmask;		/* (c) Saved mask from pre sigpause. */
 	int	p_sig;			/* (n) For core dump/debugger XXX. */
 	u_long	p_code;			/* (n) For core dump/debugger XXX. */
-	struct	klist p_klist;	/* (c) Knotes attached to this process. */
-	struct	lock_list_entry *p_sleeplocks; /* (k) Held sleep locks. */
-	struct	mtx *p_blocked;		/* (j) Mutex process is blocked on. */
-	const char *p_mtxname;		/* (j) Name of mutex blocked on. */
-	LIST_HEAD(, mtx) p_contested;	/* (j) Contested locks. */
 
 	struct nlminfo	*p_nlminfo;	/* (?) only used by/for lockd */
 	void	*p_aioinfo;	/* (c) ASYNC I/O info. */
-	struct	ithd *p_ithd;	/* (b) For interrupt threads only. */
-	int	p_intr_nesting_level;	/* (k) Interrupt recursion. */
 
 /* End area that is zeroed on creation. */
-#define	p_endzero	p_startcopy
-
 /* The following fields are all copied upon creation in fork. */
 #define	p_startcopy	p_sigmask
+#define	p_endzero	p_startcopy
 
+	/* We haven't defined how KSEs do signals yet */
 	sigset_t p_sigmask;	/* (c) Current signal mask. */
 	stack_t	p_sigstk;	/* (c) Stack pointer and on-stack flag. */
 
 	int	p_magic;	/* (b) Magic number. */
-	struct	priority p_pri;	/* (j) Process priority. */
-	char	p_nice;		/* (j?/k?) Process "nice" value. */
 	char	p_comm[MAXCOMLEN + 1];	/* (b) Process name. */
+	int	p_kse_enabled;	/* (b) 0, unless using KSEs this proc. */
 
 	struct 	pgrp *p_pgrp;	/* (e?/c?) Pointer to process group. */
 	struct 	sysentvec *p_sysent; /* (b) System call dispatch information. */
@@ -266,7 +470,6 @@
 #define	p_endcopy	p_addr
 
 	struct	user *p_addr;	/* (k) Kernel virtual addr of u-area (CPU). */
-	struct	mdproc p_md;	/* (k) Any machine-dependent fields. */
 
 	u_short	p_xstat;	/* (c) Exit status for wait; also stop sig. */
 	u_short	p_acflag;	/* (c) Accounting flags. */
@@ -274,7 +477,6 @@
 
 	struct proc *p_peers;	/* (c) */
 	struct proc *p_leader;	/* (c) */
-	struct	pasleep p_asleep;	/* (k) Used by asleep()/await(). */
 	void	*p_emuldata;	/* (c) Emulator state data. */
 };
 
@@ -293,9 +495,10 @@
 #define	SMTX	7		/* Blocked on a mutex. */
 
 /* These flags are kept in p_flag. */
+/* In a KSE world some go to a KSEC or a KSE (*)*/
 #define	P_ADVLOCK	0x00001	/* Process may hold a POSIX advisory lock. */
 #define	P_CONTROLT	0x00002	/* Has a controlling terminal. */
-#define	P_KTHREAD	0x00004 /* Kernel thread. */
+#define	P_KTHREAD	0x00004 /* Kernel thread. (*)*/
 #define	P_NOLOAD	0x00008	/* Ignore during load avg calculations. */
 #define	P_PPWAIT	0x00010	/* Parent is waiting for child to exec/exit. */
 #define	P_SELECT	0x00040	/* Selecting; wakeup/waiting danger. */
@@ -305,6 +508,7 @@
 #define	P_WAITED	0x01000	/* Debugging process has waited for child. */
 #define	P_WEXIT		0x02000	/* Working on exiting. */
 #define	P_EXEC		0x04000	/* Process called exec. */
+#define	P_KSES		0x08000	/* Process is using KSEs. */
 
 /* Should be moved to machine-dependent areas. */
 

--------------56834A9EA7789B526697FC9C--


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Wed Apr 25 11: 9:47 2001
Delivered-To: freebsd-arch@freebsd.org
Received: from fw.wintelcom.net (ns1.wintelcom.net [209.1.153.20])
	by hub.freebsd.org (Postfix) with ESMTP
	id D2C3937B422; Wed, 25 Apr 2001 11:09:41 -0700 (PDT)
	(envelope-from bright@fw.wintelcom.net)
Received: (from bright@localhost)
	by fw.wintelcom.net (8.10.0/8.10.0) id f3PI9ev26785;
	Wed, 25 Apr 2001 11:09:40 -0700 (PDT)
Date: Wed, 25 Apr 2001 11:09:40 -0700
From: Alfred Perlstein <alfred@FreeBSD.ORG>
To: Julian Elischer <julian@elischer.org>
Cc: Arch@FreeBSD.ORG, Robert Watson <rwatson@FreeBSD.ORG>,
	Daniel Eischen <eischen@vigrid.com>
Subject: Re: KSE threading support (first parts)
Message-ID: <20010425110940.L1790@fw.wintelcom.net>
References: <3AE71067.FF4BD029@elischer.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.2.5i
In-Reply-To: <3AE71067.FF4BD029@elischer.org>; from julian@elischer.org on Wed, Apr 25, 2001 at 10:59:03AM -0700
X-all-your-base: are belong to us.
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

* Julian Elischer <julian@elischer.org> [010425 11:00] wrote:
> After discussing this with Jason Evans before he took his new position, 
> and having looked at his patches from December, and My similar patches from 
> january, here is a 'merged' patch.
> 
> It breaks the proc structure into 4 parts.
> 
> proc... owns all 'total process' resources. (e.g. address space, 
>         limits, files)
> kseg... KSE 'group'. Anything to do with working out the quanta to
>         be given to the threads (KSEs). A scheduling abstraction.
> kse.... Actual scheduable entity for a processor
>         (if the KSEG has a quantum for it)
> ksec... Where a thread stores its context when it is blocked
>         so tha the kse can return to either the user, or another 
>         unblocked kse to continue using is quanta.
> 
> This compiles cleanly and SHOULD run (it did run in an 
> earlier incarnation). It is by no means final, but rather
> designed to give us a starting point in discussions.
> 
> In this view, KSEGs are on the run queue and when they get some 
> quanta the KSEs hanging off them are run.
> If 2 KSEs are running, the KSEG's quanta are exhausted a twice
> the rate. 
> Each KSE has a very strong affinity for one processor
> and KSECs have a weak affinity for a KSE. If a KSE runs out
> of work but has time, it will 'poach' a KSEC from another KSE in the
> same KSEG list.
> 
> In this patch the linkages are not set up at all.
> All that is done is that the structures are
> defined and used instead of a monolithic 'proc' struct.
> The new structures are 'included' in the  proc structure
> to maintain compatibility and to allow code to be changed slowely.
> 
> What really needs to be done is for everyone who is interested to go over 
> rather arbitrary allocation of fields to structures that
> I did and make suggested changes.
> 
> Also I've punted on most things to do with signals as we haven't
> really discussed how we want signals to be handled in a KSE world..
> (ca each KSEG or KSE get individual signals? do we need to 
> define a special 'signal' KSE? If so is that all it does?
> 
> What happens to the 'u-area'?

It makes sense that it stays except for struct pcb.  Honestly
swapping out the pcbs could be left as something to re-optimize
later, they can take a signifigant amount of space, but nowadays
it's not that big of a deal.

> how do we define a "cur-kse" similar to curproc?
> (do we need one?)

yes.

> presently the processor state is stored all over the place 
> when a process is suspended..
> This needs to be brought together so it can be put into the KSEC.
> Who understands that stuff?

That's your job.  Refer to Jason Evans if he's available.

You should also ask John Baldwin about proc locking as this
stuff is definetly going to require locking in order to function
properly.

> Some of the next steps would be:
> 1/ figure out what we want for signals etc..

Afaik Solaris tried many different ways to propogate signals across 
thier lwps, afaik they found the task so complex and so hard to get
right that the latest implementation makes on lwp the signal target.

Most likely then signals would be still be in struct proc or the
initial kse.

> 2/ get the contexts actually stored in the KSEC structure
>    when a proces is suspended. (instead of some strange pcb in funny memory
>    near the u area)

huh?

> 3/ Set up the linkages between these structures, and
> 4/ start using 'kse' instead of 'proc' in a bunch of places
> and using the linkages to find the appropriate other
> structures when needed.
> 5/ Add code to make new KSEs so that the 1:1 Mapping is no longer
> true.
> 6/ Add syscalls to start making KSEs other than the one that 
> is built into the process.
> 7/ start making upcalls
> 

ok, when are you going to have these done? :)

One other question, have you looked at the recent lwp/kse support added
to NetBSD?  Is there anything to learn/avoid?

-Alfred


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Wed Apr 25 11:37:39 2001
Delivered-To: freebsd-arch@freebsd.org
Received: from pcnet1.pcnet.com (pcnet1.pcnet.com [204.213.232.3])
	by hub.freebsd.org (Postfix) with ESMTP
	id DD85137B422; Wed, 25 Apr 2001 11:37:33 -0700 (PDT)
	(envelope-from eischen@vigrid.com)
Received: (from eischen@localhost)
	by pcnet1.pcnet.com (8.8.7/PCNet) id OAA24589;
	Wed, 25 Apr 2001 14:36:53 -0400 (EDT)
Date: Wed, 25 Apr 2001 14:36:53 -0400 (EDT)
From: Daniel Eischen <eischen@vigrid.com>
To: Julian Elischer <julian@elischer.org>
Cc: Arch@freebsd.org, alfred@freebsd.org,
	Robert Watson <rwatson@freebsd.org>
Subject: Re: KSE threading support (first parts)
In-Reply-To: <3AE71067.FF4BD029@elischer.org>
Message-ID: <Pine.SUN.3.91.1010425141253.20558A-100000@pcnet1.pcnet.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

On Wed, 25 Apr 2001, Julian Elischer wrote:
> In this view, KSEGs are on the run queue and when they get some 
> quanta the KSEs hanging off them are run.
> If 2 KSEs are running, the KSEG's quanta are exhausted a twice
> the rate. 

Don't we eventually want per-CPU run queues?  Then how do
multiple KSEs hanging off a KSEG get scheduled then if the quanta
are in the KSEG?  Round robin?

> Each KSE has a very strong affinity for one processor
> and KSECs have a weak affinity for a KSE. If a KSE runs out
> of work but has time, it will 'poach' a KSEC from another KSE in the
> same KSEG list.

Again, if KSEs can have a strong affinity for 1 processor and there
can be multiple KSEs hanging off a KSEG, then how do you schedule
these KSEs when we have per-CPU run queues?  It makes scheduling
these KSEs more difficult than it needs to be.

I still don't see the need to have multiple KSEs within a KSEG ;-)

> In this patch the linkages are not set up at all.
> All that is done is that the structures are
> defined and used instead of a monolithic 'proc' struct.
> The new structures are 'included' in the  proc structure
> to maintain compatibility and to allow code to be changed slowely.
> 
> What really needs to be done is for everyone who is interested to go over 
> rather arbitrary allocation of fields to structures that
> I did and make suggested changes.
> 
> Also I've punted on most things to do with signals as we haven't
> really discussed how we want signals to be handled in a KSE world..
> (ca each KSEG or KSE get individual signals? do we need to 
> define a special 'signal' KSE? If so is that all it does?

Signals should be sent (via an upcall) to the first available
KSE to return to userland (return from syscall, after preemption,
etc.).  The userland thread scheduler will pick a thread to
receive the signal.  If the thread is running or in one
of the scheduling queues for the current KSEG, it will
be able to handle it without any other assist from the kernel.
If the thread is running or in one of the scheduling queues for
another KSEG, it will mark the signal pending in the target
thread and "signal" the appropriate KSEG with help from the
kernel (one of the new user<->kernel interfaces or syscalls).

(We may have to replace "KSEG" in the above with "KSE")

It might be nice to have a general way of sending messages
between KSEGs (KSEs?).

> What happens to the 'u-area'?
> 
> how do we define a "cur-kse" similar to curproc?
> (do we need one?)
> presently the processor state is stored all over the place 
> when a process is suspended..
> This needs to be brought together so it can be put into the KSEC.
> Who understands that stuff?
> 
> Some of the next steps would be:
> 1/ figure out what we want for signals etc..

Ask me for help in this area.  I know what the userland scheduler
has to do when dispatching signals to threads.

> 2/ get the contexts actually stored in the KSEC structure
>    when a proces is suspended. (instead of some strange pcb in funny memory
>    near the u area)
> 3/ Set up the linkages between these structures, and
> 4/ start using 'kse' instead of 'proc' in a bunch of places
> and using the linkages to find the appropriate other
> structures when needed.
> 5/ Add code to make new KSEs so that the 1:1 Mapping is no longer
> true.
> 6/ Add syscalls to start making KSEs other than the one that 
> is built into the process.
> 7/ start making upcalls

Can't we start with 7 ;-)

-- 
Dan Eischen

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Wed Apr 25 13: 5: 6 2001
Delivered-To: freebsd-arch@freebsd.org
Received: from gw.nectar.com (gw.nectar.com [208.42.49.153])
	by hub.freebsd.org (Postfix) with ESMTP id 59D8837B422
	for <arch@FreeBSD.org>; Wed, 25 Apr 2001 13:05:03 -0700 (PDT)
	(envelope-from nectar@nectar.com)
Received: from hamlet.nectar.com (hamlet.nectar.com [10.0.1.102])
	by gw.nectar.com (Postfix) with ESMTP
	id B1311194C7; Wed, 25 Apr 2001 15:05:02 -0500 (CDT)
Received: (from nectar@localhost)
	by hamlet.nectar.com (8.11.3/8.9.3) id f3PK52F02351;
	Wed, 25 Apr 2001 15:05:02 -0500 (CDT)
	(envelope-from nectar@spawn.nectar.com)
Date: Wed, 25 Apr 2001 15:05:02 -0500
From: "Jacques A. Vidrine" <n@nectar.com>
To: Peter Pentchev <roam@orbitel.bg>
Cc: arch@FreeBSD.org
Subject: Re: gid_t vs. plain int
Message-ID: <20010425150502.B2200@hamlet.nectar.com>
References: <20010425183640.C54687@ringworld.oblivion.bg>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.2.5i
In-Reply-To: <20010425183640.C54687@ringworld.oblivion.bg>; from roam@orbitel.bg on Wed, Apr 25, 2001 at 06:36:40PM +0300
X-Url: http://www.nectar.com/
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

On Wed, Apr 25, 2001 at 06:36:40PM +0300, Peter Pentchev wrote:
> Hi,
> 
> OK.  I've (kinda) had enough.
> 
> Is there a reason that struct group in <group.h> does not define 'gr_gid'
> as a gid_t value, but as a plain int?  This makes all kinds of things
> go berserk with gcc -Wall -W, and causes dozens of (totally unneeded)
> casts.
> 
> Is there some standard that says pw_gid is gid_t, but gr_gid is int?
> If not, would anyone be interested in patches (yes, I'm prepared to sweep
> the whole source tree), making gr_gid a gid_t?

ISO/IEC  9945-1: 1996  (POSIX  1003.1)  says that  a  group  structure
`includes the members':

   char *    gr_name   The name of the group
   gid_t     gr_gid    The numerical group ID
   char **   gr_mem    A null-terminated vector of pointers to the
                       individual member names

Also, the  getgr* functions  which take a  group number  argument have
prototypes with `gid_t'.

I say go ahead and clean it up.

Cheers,
-- 
Jacques Vidrine / n@nectar.com / jvidrine@verio.net / nectar@FreeBSD.org

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Wed Apr 25 13:26: 7 2001
Delivered-To: freebsd-arch@freebsd.org
Received: from fledge.watson.org (fledge.watson.org [204.156.12.50])
	by hub.freebsd.org (Postfix) with ESMTP id E706F37B424
	for <freebsd-arch@FreeBSD.org>; Wed, 25 Apr 2001 13:26:02 -0700 (PDT)
	(envelope-from arr@watson.org)
Received: from localhost (arr@localhost)
	by fledge.watson.org (8.11.3/8.11.3) with SMTP id f3PKQWu41141
	for <freebsd-arch@FreeBSD.org>; Wed, 25 Apr 2001 16:26:33 -0400 (EDT)
	(envelope-from arr@watson.org)
Date: Wed, 25 Apr 2001 16:26:32 -0400 (EDT)
From: "Andrew R. Reiter" <arr@watson.org>
To: freebsd-arch@FreeBSD.org
Subject: libevent & fbsd
Message-ID: <Pine.NEB.3.96L.1010425161957.40963B-100000@fledge.watson.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

Hey,

Im doing some -audit work and Im about to start writing some patches to
change alot of the select(2) calls to use k{queue,event}().  Niels Provos
has started writing (or has semi-finished a release, afaik) of his
libevent code.  Basically it's a general interface to multiple types of
event handling code (select(), poll(), kqueue/event) on file descriptors.
I am interested in using something like this, possibly, instead of hacking
through some of the code and moving kqueue/event into them.

I am wondering if anyone has spoken to Niels about perhaps getting it into
our tree?  or if anyone has any thoughts on this at all? :-)

The url for the code is:

http://www.monkey.org/~provos/libevent/

Thanks,

Andrew

*-------------.................................................
| Andrew R. Reiter 
| arr@fledge.watson.org
| "It requires a very unusual mind
|   to undertake the analysis of the obvious" -- A.N. Whitehead


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Wed Apr 25 13:49: 9 2001
Delivered-To: freebsd-arch@freebsd.org
Received: from fw.wintelcom.net (ns1.wintelcom.net [209.1.153.20])
	by hub.freebsd.org (Postfix) with ESMTP id 0CAFA37B422
	for <freebsd-arch@FreeBSD.ORG>; Wed, 25 Apr 2001 13:49:07 -0700 (PDT)
	(envelope-from bright@fw.wintelcom.net)
Received: (from bright@localhost)
	by fw.wintelcom.net (8.10.0/8.10.0) id f3PKn2301895;
	Wed, 25 Apr 2001 13:49:02 -0700 (PDT)
Date: Wed, 25 Apr 2001 13:49:02 -0700
From: Alfred Perlstein <bright@wintelcom.net>
To: "Andrew R. Reiter" <arr@watson.org>
Cc: freebsd-arch@FreeBSD.ORG, provos@openbsd.org
Subject: Re: libevent & fbsd
Message-ID: <20010425134902.S1790@fw.wintelcom.net>
References: <Pine.NEB.3.96L.1010425161957.40963B-100000@fledge.watson.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.2.5i
In-Reply-To: <Pine.NEB.3.96L.1010425161957.40963B-100000@fledge.watson.org>; from arr@watson.org on Wed, Apr 25, 2001 at 04:26:32PM -0400
X-all-your-base: are belong to us.
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

cc'd Niels Provos.

* Andrew R. Reiter <arr@watson.org> [010425 13:26] wrote:
> Hey,
> 
> Im doing some -audit work and Im about to start writing some patches to
> change alot of the select(2) calls to use k{queue,event}().  Niels Provos
> has started writing (or has semi-finished a release, afaik) of his
> libevent code.  Basically it's a general interface to multiple types of
> event handling code (select(), poll(), kqueue/event) on file descriptors.
> I am interested in using something like this, possibly, instead of hacking
> through some of the code and moving kqueue/event into them.
> 
> I am wondering if anyone has spoken to Niels about perhaps getting it into
> our tree?  or if anyone has any thoughts on this at all? :-)
> 
> The url for the code is:
> 
> http://www.monkey.org/~provos/libevent/

Niels, please excuse me if I'm jumping to conclusions here, and I
realize that the library seems to be in very early beta form (ver
0.3), however, it looks like libevent's model is not complex enough
to support effecient use of kqueue.  This is because EV_ONESHOT is
always OR'd into the event flags, espcially when EV_READ is called
for.

What you really want to do is provide a way to keep a generic list
of "constantly polled fd" within your library.  The idea is for
instance you have an application (take IRCd for instance) where
you have several thousand clients, it'd be much more optimal to
register the read event once (~EV_ONESHOT) then have the application
call back when it's no longer interested in the read events.

The same could be said for EV_WRITE events, for streaming applications
you don't want EV_ONESHOT because as soon as the event fires you're
most likely going to blast the pipe full then request notification
when more space is available.  The only time you'd want the event
cleared is when you're out of data and the socket isn't full.
Perhaps for EV_WRITE/EV_READ a hints based mechanism could be used
to specify whether interest will most likely remain for the event
asked for.

-- 
-Alfred Perlstein - [alfred@freebsd.org]
Instead of asking why a piece of software is using "1970s technology,"
start asking why software is ignoring 30 years of accumulated wisdom.

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Wed Apr 25 14:14:29 2001
Delivered-To: freebsd-arch@freebsd.org
Received: from citi.umich.edu (citi.umich.edu [141.211.92.141])
	by hub.freebsd.org (Postfix) with ESMTP id 84C3437B43F
	for <freebsd-arch@FreeBSD.ORG>; Wed, 25 Apr 2001 14:14:27 -0700 (PDT)
	(envelope-from provos@citi.umich.edu)
Received: from citi.umich.edu (ssh-mapper.citi.umich.edu [141.211.92.147])
	by citi.umich.edu (Postfix) with ESMTP
	id 05BC4207C1; Wed, 25 Apr 2001 17:14:27 -0400 (EDT)
Subject: Re: libevent & fbsd 
From: Niels Provos <provos@citi.umich.edu>
In-Reply-To: Alfred Perlstein, Wed, 25 Apr 2001 13:49:02 PDT
To: Alfred Perlstein <bright@wintelcom.net>
Cc: "Andrew R. Reiter" <arr@watson.org>, freebsd-arch@FreeBSD.ORG
Date: Wed, 25 Apr 2001 17:14:26 -0400
Message-Id: <20010425211427.05BC4207C1@citi.umich.edu>
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

In message <20010425134902.S1790@fw.wintelcom.net>, Alfred Perlstein writes:
>What you really want to do is provide a way to keep a generic list
>of "constantly polled fd" within your library.  The idea is for
>instance you have an application (take IRCd for instance) where
>you have several thousand clients, it'd be much more optimal to
>register the read event once (~EV_ONESHOT) then have the application
>call back when it's no longer interested in the read events.
I am aware of this problem.  My goal was to create a very easy to use
and intuitive API that would abstract away complexities that people
are experiencing with asynchronous I/O.  If you have an idea to extend
the API in a simple way that would make better use of the capabilities
of kqueue-like systems, please let me know.

I know of people who use libevent in a commercial environment, and
they are very happy with it.

Greetings,
 Niels.

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Wed Apr 25 15:25:58 2001
Delivered-To: freebsd-arch@freebsd.org
Received: from mailman.zeta.org.au (mailman.zeta.org.au [203.26.10.16])
	by hub.freebsd.org (Postfix) with ESMTP id 55C2437B423
	for <arch@FreeBSD.ORG>; Wed, 25 Apr 2001 15:25:55 -0700 (PDT)
	(envelope-from bde@zeta.org.au)
Received: from bde.zeta.org.au (bde.zeta.org.au [203.2.228.102])
	by mailman.zeta.org.au (8.9.3/8.8.7) with ESMTP id IAA20732;
	Thu, 26 Apr 2001 08:25:49 +1000
Date: Thu, 26 Apr 2001 08:24:48 +1000 (EST)
From: Bruce Evans <bde@zeta.org.au>
X-Sender: bde@besplex.bde.org
To: Peter Pentchev <roam@orbitel.bg>
Cc: arch@FreeBSD.ORG
Subject: Re: gid_t vs. plain int
In-Reply-To: <20010425183640.C54687@ringworld.oblivion.bg>
Message-ID: <Pine.BSF.4.21.0104260813260.27478-100000@besplex.bde.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

On Wed, 25 Apr 2001, Peter Pentchev wrote:

> Is there a reason that struct group in <group.h> does not define 'gr_gid'
> as a gid_t value, but as a plain int?  This makes all kinds of things

Historical reasons, and because wollman still hasn't committed his header
cleanups which fix this and many other related problems.

> go berserk with gcc -Wall -W, and causes dozens of (totally unneeded)
> casts.

The casts might be needed to support K&R compilers on systems with
sizeof(gid_t) < sizeof(int), but mostly make things worse by hiding
bugs.

> Is there some standard that says pw_gid is gid_t, but gr_gid is int?

POSIX.1-1990 says that both are gid_t.

BTW, the kernel still uses int for gids in many places, e.g.,
kern/syscalls.master says that chown(2) takes an "int gid" arg.  This
depends on various type puns to work.  Similarly for many other syscall
args.

Bruce


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Wed Apr 25 19:51:30 2001
Delivered-To: freebsd-arch@freebsd.org
Received: from khavrinen.lcs.mit.edu (khavrinen.lcs.mit.edu [18.24.4.193])
	by hub.freebsd.org (Postfix) with ESMTP id D289537B422
	for <arch@freebsd.org>; Wed, 25 Apr 2001 19:51:25 -0700 (PDT)
	(envelope-from wollman@khavrinen.lcs.mit.edu)
Received: (from wollman@localhost)
	by khavrinen.lcs.mit.edu (8.9.3/8.9.3) id WAA15799;
	Wed, 25 Apr 2001 22:51:10 -0400 (EDT)
	(envelope-from wollman)
Date: Wed, 25 Apr 2001 22:51:10 -0400 (EDT)
From: Garrett Wollman <wollman@khavrinen.lcs.mit.edu>
Message-Id: <200104260251.WAA15799@khavrinen.lcs.mit.edu>
To: bde@zeta.org.au
Subject: Re: gid_t vs. plain int
X-Newsgroups: mit.lcs.mail.freebsd-arch
In-Reply-To: <Pine.BSF.4.21.0104260813260.27478-100000@besplex.bde.org>
References: <20010425183640.C54687@ringworld.oblivion.bg>
Organization: MIT Laboratory for Computer Science
Cc: arch@freebsd.org
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

Bruce writes:

>BTW, the kernel still uses int for gids in many places, e.g.,
>kern/syscalls.master says that chown(2) takes an "int gid" arg.  This
>depends on various type puns to work.

Of course it has to do that in order to allow for the possibility that
gid_t might still be a short.  Of course, this precludes gid_t from
being something-longer-than-int, but much larger parts of the ABI would
have to change at the same time in that case.

-GAWollman

-- 
Garrett A. Wollman   | O Siem / We are all family / O Siem / We're all the same
wollman@lcs.mit.edu  | O Siem / The fires of freedom 
Opinions not those of| Dance in the burning flame
MIT, LCS, CRS, or NSA|                     - Susan Aglukark and Chad Irschick

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Wed Apr 25 21:27:52 2001
Delivered-To: freebsd-arch@freebsd.org
Received: from mailman.zeta.org.au (mailman.zeta.org.au [203.26.10.16])
	by hub.freebsd.org (Postfix) with ESMTP id 8002B37B423
	for <arch@freebsd.org>; Wed, 25 Apr 2001 21:27:49 -0700 (PDT)
	(envelope-from bde@zeta.org.au)
Received: from bde.zeta.org.au (bde.zeta.org.au [203.2.228.102])
	by mailman.zeta.org.au (8.9.3/8.8.7) with ESMTP id OAA31938;
	Thu, 26 Apr 2001 14:27:42 +1000
Date: Thu, 26 Apr 2001 14:26:23 +1000 (EST)
From: Bruce Evans <bde@zeta.org.au>
X-Sender: bde@besplex.bde.org
To: Garrett Wollman <wollman@khavrinen.lcs.mit.edu>
Cc: arch@freebsd.org
Subject: Re: gid_t vs. plain int
In-Reply-To: <200104260251.WAA15799@khavrinen.lcs.mit.edu>
Message-ID: <Pine.BSF.4.21.0104261354440.29302-100000@besplex.bde.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

On Wed, 25 Apr 2001, Garrett Wollman wrote:

> Bruce writes:
> 
> >BTW, the kernel still uses int for gids in many places, e.g.,
> >kern/syscalls.master says that chown(2) takes an "int gid" arg.  This
> >depends on various type puns to work.
> 
> Of course it has to do that in order to allow for the possibility that
> gid_t might still be a short.

No.  If gid_t were short, then type puns are neither necessary nor
sufficient for handling it properly.  Lying about the arg types in
syscalls.master just makes it harder for trap.c:syscall() to convert
the args.  syscall() repacks the args into the syscall args structs
declared in <sys/sysproto.h>.  It "just happens" that the repacking
can be implemented using a simple copyin() on i386's and alphas.

> Of course, this precludes gid_t from
> being something-longer-than-int, but much larger parts of the ABI would
> have to change at the same time in that case.

The syscall args structs handle all cases that are likely to happen
in practice, including short args on little-endian machines (minor
adjustments are required for short args on big-endian machines).
Short args already occur in practice for at least lchmod() (because
NetBSD debogotified syscalls.master before FreeBSD obtained lchmod()
from NetBSD).

Bruce


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Thu Apr 26 10:15:42 2001
Delivered-To: freebsd-arch@freebsd.org
Received: from filk.iinet.net.au (syncopation-dns.iinet.net.au [203.59.24.29])
	by hub.freebsd.org (Postfix) with SMTP id 365B537B71B
	for <Arch@FreeBSD.ORG>; Thu, 26 Apr 2001 10:15:33 -0700 (PDT)
	(envelope-from julian@elischer.org)
Received: (qmail 13034 invoked by uid 666); 26 Apr 2001 17:18:42 -0000
Received: from i179-136.nv.iinet.net.au (HELO elischer.org) (203.59.179.136)
  by mail.m.iinet.net.au with SMTP; 26 Apr 2001 17:18:42 -0000
Message-ID: <3AE85776.92D6BD90@elischer.org>
Date: Thu, 26 Apr 2001 10:14:30 -0700
From: Julian Elischer <julian@elischer.org>
X-Mailer: Mozilla 4.7 [en] (X11; U; FreeBSD 5.0-CURRENT i386)
X-Accept-Language: en, hu
MIME-Version: 1.0
To: Alfred Perlstein <alfred@FreeBSD.ORG>
Cc: Arch@FreeBSD.ORG, Robert Watson <rwatson@FreeBSD.ORG>,
	Daniel Eischen <eischen@vigrid.com>, John Baldwin <jhb@FreeBSD.org>
Subject: Re: KSE threading support (first parts)
References: <3AE71067.FF4BD029@elischer.org> <20010425110940.L1790@fw.wintelcom.net>
Content-Type: text/plain; charset=iso-8859-2
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

Alfred Perlstein wrote:
> 
> 
> > Also I've punted on most things to do with signals as we haven't
> > really discussed how we want signals to be handled in a KSE world..
> > (ca each KSEG or KSE get individual signals? do we need to
> > define a special 'signal' KSE? If so is that all it does?
> >
> > What happens to the 'u-area'?
> 
> It makes sense that it stays except for struct pcb.  Honestly
> swapping out the pcbs could be left as something to re-optimize
> later, they can take a signifigant amount of space, but nowadays
> it's not that big of a deal.

so how much work is it to move the pcbs into the proc struct?
(and thus into the ksec struct)
does anyone see any reason that that would not work?
Is thre anything special about having it in the u area? (other
than swapping)

> 
> > how do we define a "cur-kse" similar to curproc?
> > (do we need one?)
> 
> yes.

I will look at seeing if I can do this...

> 
> > presently the processor state is stored all over the place
> > when a process is suspended..
> > This needs to be brought together so it can be put into the KSEC.
> > Who understands that stuff?
> 
> That's your job.  Refer to Jason Evans if he's available.

gee thanks..
I don't really have a grip on all the ways that traps etc
can need to save context.. 
I REALLY don't get the floating point context stuff.
Some state is stored on the user tack, some on the kernel stack 
and some in the pcb (and maybe some in the proc struct.)

to complicate thigs a little:
Some things such as segment registers may be "per KSE"
where normal registers are "per KSEC". 

> 
> You should also ask John Baldwin about proc locking as this
> stuff is definetly going to require locking in order to function
> properly.
> 
> > Some of the next steps would be:
> > 1/ figure out what we want for signals etc..
> 
> Afaik Solaris tried many different ways to propogate signals across
> thier lwps, afaik they found the task so complex and so hard to get
> right that the latest implementation makes one lwp the signal target.
> 
> Most likely then signals would be still be in struct proc or the
> initial kse.

I was thinking about this..
I think that signals should be delivered to the UTS
and it should be up to the UTS to decide what to do about it..
In that case they would be delivered to the first available
kernel->user boundary crossing for that process.
> 
> > 2/ get the contexts actually stored in the KSEC structure
> >    when a proces is suspended. (instead of some strange pcb in funny memory
> >    near the u area)
> 
> huh?

I mean that I get a headach when looking at where all the
registers, segment registers etc. are all stored as it looks as if 
it's rather mixed up.. It'd be nice if it were all in one place,
and the KSEC is where that should be.

> 
> > 3/ Set up the linkages between these structures, and
> > 4/ start using 'kse' instead of 'proc' in a bunch of places
> > and using the linkages to find the appropriate other
> > structures when needed.
> > 5/ Add code to make new KSEs so that the 1:1 Mapping is no longer
> > true.
> > 6/ Add syscalls to start making KSEs other than the one that
> > is built into the process.
> > 7/ start making upcalls
> >
> 
> ok, when are you going to have these done? :)
> 
> One other question, have you looked at the recent lwp/kse support added
> to NetBSD?  Is there anything to learn/avoid?

I've had only a small look so far 
sorting hte wheat from the chaff is a hard task and of course it requires
understanding a lot that I'm not too solid on. (e.g. UVM).

> 
> -Alfred

-- 
      __--_|\  Julian Elischer
     /       \ julian@elischer.org
    (   OZ    ) World tour 2000-2001
---> X_.---._/  
            v

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Thu Apr 26 11:25:18 2001
Delivered-To: freebsd-arch@freebsd.org
Received: from filk.iinet.net.au (syncopation-dns.iinet.net.au [203.59.24.29])
	by hub.freebsd.org (Postfix) with SMTP id 1097D37B424
	for <Arch@freebsd.org>; Thu, 26 Apr 2001 11:25:06 -0700 (PDT)
	(envelope-from julian@elischer.org)
Received: (qmail 13168 invoked by uid 666); 26 Apr 2001 18:28:13 -0000
Received: from i179-136.nv.iinet.net.au (HELO elischer.org) (203.59.179.136)
  by mail.m.iinet.net.au with SMTP; 26 Apr 2001 18:28:13 -0000
Message-ID: <3AE867C2.3B657214@elischer.org>
Date: Thu, 26 Apr 2001 11:24:02 -0700
From: Julian Elischer <julian@elischer.org>
X-Mailer: Mozilla 4.7 [en] (X11; U; FreeBSD 5.0-CURRENT i386)
X-Accept-Language: en, hu
MIME-Version: 1.0
To: Daniel Eischen <eischen@vigrid.com>
Cc: Arch@freebsd.org, alfred@freebsd.org,
	Robert Watson <rwatson@freebsd.org>
Subject: Re: KSE threading support (first parts)
References: <Pine.SUN.3.91.1010425141253.20558A-100000@pcnet1.pcnet.com>
Content-Type: text/plain; charset=iso-8859-2
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

Daniel Eischen wrote:
> 
> On Wed, 25 Apr 2001, Julian Elischer wrote:
> > In this view, KSEGs are on the run queue and when they get some
> > quanta the KSEs hanging off them are run.
> > If 2 KSEs are running, the KSEG's quanta are exhausted a twice
> > the rate.
> 
> Don't we eventually want per-CPU run queues?  Then how do
> multiple KSEs hanging off a KSEG get scheduled then if the quanta
> are in the KSEG?  Round robin?

Nominally yes we do, but I must admit that I see a single queue
with multiple processors reading off it as actually being easier
to use and implement and I think it may even have some better
use characteristics. I have nighmares thinking about doing it with
multiple (per processor) run queues. Allocating quanta and priority
between KSEs on different queues is much more tricky.

> 
> > Each KSE has a very strong affinity for one processor
> > and KSECs have a weak affinity for a KSE. If a KSE runs out
> > of work but has time, it will 'poach' a KSEC from another KSE in the
> > same KSEG list.
> 
> Again, if KSEs can have a strong affinity for 1 processor and there
> can be multiple KSEs hanging off a KSEG, then how do you schedule
> these KSEs when we have per-CPU run queues?  It makes scheduling
> these KSEs more difficult than it needs to be.

In that case the KSEs are put on the run queues when there is at 
least one KSEC ready to run for each of them in the KSEG. If you 
have 3 KSECs ready to run and 4 processors, then you put three KSEs 
on run queues. If you have 6 KSECs ready to run then you put 4 on 
the run queues. The first 2 to complete or block will do a second KSEC
if there is still some of the quantum left. The priority they are
scheduled with is taken from the KSEG (probably).

> 
> I still don't see the need to have multiple KSEs within a KSEG ;-)

KSEs in the same KSEG are using the same pool of quanta to 
complete KSECs.
SOMETHING has to hold that information. A KSE is a virtual 
processor where a KSEG is a virtual multiprocessor.

You allocate quanta to the KSEG. KSEs use these quanta. 
The KSEG is competing (almost) fairly with other processes 
in the system. If you want "system" thread scheduling, you 
can create more KSEGs. They compete against other processes and 
against each other for slices of the 'real' machine.

> 
> > In this patch the linkages are not set up at all.
> > All that is done is that the structures are
> > defined and used instead of a monolithic 'proc' struct.
> > The new structures are 'included' in the  proc structure
> > to maintain compatibility and to allow code to be changed slowely.
> >
> > What really needs to be done is for everyone who is interested to go over
> > rather arbitrary allocation of fields to structures that
> > I did and make suggested changes.
> >
> > Also I've punted on most things to do with signals as we haven't
> > really discussed how we want signals to be handled in a KSE world..
> > (ca each KSEG or KSE get individual signals? do we need to
> > define a special 'signal' KSE? If so is that all it does?
> 
> Signals should be sent (via an upcall) to the first available
> KSE to return to userland (return from syscall, after preemption,
> etc.).  The userland thread scheduler will pick a thread to
> receive the signal.  If the thread is running or in one
> of the scheduling queues for the current KSEG, it will
> be able to handle it without any other assist from the kernel.
> If the thread is running or in one of the scheduling queues for
> another KSEG, it will mark the signal pending in the target
> thread and "signal" the appropriate KSEG with help from the
> kernel (one of the new user<->kernel interfaces or syscalls).

OK so 'signals' and everything to do with them are "Per process".
I may edit the patch to indicate this. This does indicate a mutex 
with SMP so that if two processors return their KSEs to userland
at the same time, they don't deliver the same signal twice.
Can two KSEs (KSEs are on different processors) deliver
DIFFERENT signals to userland at the same time?


> 
> (We may have to replace "KSEG" in the above with "KSE")

yes, you are correct.. it should read: (I think)
> Signals should be sent (via an upcall) to the first available
> KSE to return to userland (return from syscall, after preemption,
> etc.).  The userland thread scheduler will pick a thread to
> receive the signal.  If the thread is running or in one
> of the scheduling queues for the current KSEG, it will
> be able to handle it without any other assist from the kernel.

Is this what you mean?

This is tricky... when a KSE returns to userland it is running 
NO threads. All threaded syscalls return to userland in the 'suspended'
state, so that the UTS can decide what to run. All syscalls return via
an upcall to the UTS (actually the original newkse() call returns
infinitly many times.. that is how the upcall is achieved). The return
values, error returns and data movements have been made to the appropriate
memory locations.. It's as if the thread did a 'yield()' immediatly
after returning from a normal syscall..
So we can be sure that THIS KSE isn't running the interrupt thread. 
If the thread is however being run on a different KSE (regardless of 
whether in this KSEG or not) then the signal must be noted so that 
the thread can see it at some future time. If it's not running but in 
another KSEG then it's treated as if running, (the signal noted) and the 
UTS will make it runnable at the next opportunity that that KSEG
is runnable. (If we ran the thread on this KSE regardless of the fact
that it's from another KSEG, then it will be running with a priority
other than what the programmer assigned it. (maybe he wants lower 
priority signal handling)).

> If the thread is running or in one of the scheduling queues for
> another KSE, it will mark the signal pending in the target
> thread and "signal" the appropriate KSEG with help from the
> kernel (one of the new user<->kernel interfaces or syscalls).

If a KSEG is not running because it had no work, then
yes, you need to wake up one of its KSEs to handle the signal.


> 
> It might be nice to have a general way of sending messages
> between KSEGs (KSEs?).

Userland-to-kernel? or userland-to-userland?
"kind of like a signal?" :-)


> 
> > What happens to the 'u-area'?
> >
> > how do we define a "cur-kse" similar to curproc?
> > (do we need one?)
> > presently the processor state is stored all over the place
> > when a process is suspended..
> > This needs to be brought together so it can be put into the KSEC.
> > Who understands that stuff?
> >
> > Some of the next steps would be:
> > 1/ figure out what we want for signals etc..
> 
> Ask me for help in this area.  I know what the userland scheduler
> has to do when dispatching signals to threads.


> 
> > 2/ get the contexts actually stored in the KSEC structure
> >    when a proces is suspended. (instead of some strange pcb in funny memory
> >    near the u area)
> > 3/ Set up the linkages between these structures, and
> > 4/ start using 'kse' instead of 'proc' in a bunch of places
> > and using the linkages to find the appropriate other
> > structures when needed.
> > 5/ Add code to make new KSEs so that the 1:1 Mapping is no longer
> > true.
> > 6/ Add syscalls to start making KSEs other than the one that
> > is built into the process.
> > 7/ start making upcalls
> 
> Can't we start with 7 ;-)

well, really, there are 4 new syscalls and the upcall is the multiple
return of one of them.

kse_id ksecreate(struct retblock *rblk, boolean newkseg);
kseyield(timeout); /* never returns.. comes back as upcall when awakened */
ksewakeup(kse_id sleeper);
ksefinish(); /* just never returns (unless we are last kse in which case,
upcalls) */


upcalls return with certain information in the retblock..
1/ why the KSE upcalled. (a bitmap of reasons (there may have been more
than one reason, (e.g. 3 returned syscalls and 2 signals and a wakeup).

2/ head of linked list of completed syscall status blocks.
(these should be allocated in the thread control blocks that the UTS uses
and will include room for a pointer that the kernel ignores but which the 
UTS can use to find the start of that thread control block. Also enough 
information so that the kernel can store enough thread run state so that 
the thread can be made to look as if it has just done a 'yield()'. (so it 
can be restarted in the same way that other threads can be restarted.))

> 
> --
> Dan Eischen

-- 
      __--_|\  Julian Elischer
     /       \ julian@elischer.org
    (   OZ    ) World tour 2000-2001
---> X_.---._/  
            v

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Thu Apr 26 11:50:43 2001
Delivered-To: freebsd-arch@freebsd.org
Received: from meow.osd.bsdi.com (meow.osd.bsdi.com [204.216.28.88])
	by hub.freebsd.org (Postfix) with ESMTP
	id A509737B422; Thu, 26 Apr 2001 11:50:34 -0700 (PDT)
	(envelope-from jhb@FreeBSD.org)
Received: from laptop.baldwin.cx (john@jhb-laptop.osd.bsdi.com [204.216.28.241])
	by meow.osd.bsdi.com (8.11.2/8.11.2) with ESMTP id f3QInuG50061;
	Thu, 26 Apr 2001 11:49:56 -0700 (PDT)
	(envelope-from jhb@FreeBSD.org)
Message-ID: <XFMail.010426114914.jhb@FreeBSD.org>
X-Mailer: XFMail 1.4.0 on FreeBSD
X-Priority: 3 (Normal)
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 8bit
MIME-Version: 1.0
In-Reply-To: <3AE85776.92D6BD90@elischer.org>
Date: Thu, 26 Apr 2001 11:49:14 -0700 (PDT)
From: John Baldwin <jhb@FreeBSD.org>
To: Julian Elischer <julian@elischer.org>
Subject: Re: KSE threading support (first parts)
Cc: Daniel Eischen <eischen@vigrid.com>,
	Robert Watson <rwatson@FreeBSD.org>, Arch@FreeBSD.org,
	Alfred Perlstein <alfred@FreeBSD.org>, jasone@FreeBSD.org
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG


On 26-Apr-01 Julian Elischer wrote:
>> > how do we define a "cur-kse" similar to curproc?
>> > (do we need one?)
>> 
>> yes.
> 
> I will look at seeing if I can do this...

Trivially.  Just use a per-cpu variable 'curkse' and do the equivalent of
's/curproc/PCPU_GET(curkse)/' as needed.  Some other tweaks will be needed in
some asm files as well, but that is easy.

>> > presently the processor state is stored all over the place
>> > when a process is suspended..
>> > This needs to be brought together so it can be put into the KSEC.
>> > Who understands that stuff?
>> 
>> That's your job.  Refer to Jason Evans if he's available.
> 
> gee thanks..
> I don't really have a grip on all the ways that traps etc
> can need to save context.. 
> I REALLY don't get the floating point context stuff.
> Some state is stored on the user tack, some on the kernel stack 
> and some in the pcb (and maybe some in the proc struct.)

The pcb is used to save state while a thread is switched out.  When a
trap/exception/interrupt occurs the state is saved in a stack frame in the
kernel.  The FP state is a little tricky because we don't want to save it and
restore it at every context switch, so we use a type of lazy switching where we
only save it if we are using it and only restore it if we are using it, but a
bit more complicated.  All of this should be per-thread instead of per-proces
and won't be that hard.  Hardly any of this needs changing.

> to complicate thigs a little:
> Some things such as segment registers may be "per KSE"
> where normal registers are "per KSEC". 

Stick all the registers in the same place.  It doesn't hurt to duplicate the 4
seg regs in a couple of places, and the miniscule gain is hardly worth the
extra effort involved.  State like this really should be per-thread.

>> You should also ask John Baldwin about proc locking as this
>> stuff is definetly going to require locking in order to function
>> properly.

At first what Jason was planning on doing I think was just letting hte lock for
the process lock all the kse's, kseg's, ksec's, etc. associated with a proc as
well as the proc itself.  I wouldn't worry too much about this at first.

>> > Some of the next steps would be:
>> > 1/ figure out what we want for signals etc..
>> 
>> Afaik Solaris tried many different ways to propogate signals across
>> thier lwps, afaik they found the task so complex and so hard to get
>> right that the latest implementation makes one lwp the signal target.
>> 
>> Most likely then signals would be still be in struct proc or the
>> initial kse.
> 
> I was thinking about this..
> I think that signals should be delivered to the UTS
> and it should be up to the UTS to decide what to do about it..
> In that case they would be delivered to the first available
> kernel->user boundary crossing for that process.

Userland is not available to create signal stacks, etc.  You can make signals
still be a process property adn the first kse (or ksec or whatever one is a
runnable thread/context) that returns to userland from interrupt, etc. will
execute ast() on the way out and post any signals.  If you leave signals as
being per-process I see there being hardly any changes needed in any of the
signal handling code.

>> > 2/ get the contexts actually stored in the KSEC structure
>> >    when a proces is suspended. (instead of some strange pcb in funny
>> >    memory
>> >    near the u area)
>> 
>> huh?
> 
> I mean that I get a headach when looking at where all the
> registers, segment registers etc. are all stored as it looks as if 
> it's rather mixed up.. It'd be nice if it were all in one place,
> and the KSEC is where that should be.

The pcb should be per-thread, yes.

>> > 3/ Set up the linkages between these structures, and
>> > 4/ start using 'kse' instead of 'proc' in a bunch of places
>> > and using the linkages to find the appropriate other
>> > structures when needed.
>> > 5/ Add code to make new KSEs so that the 1:1 Mapping is no longer
>> > true.
>> > 6/ Add syscalls to start making KSEs other than the one that
>> > is built into the process.
>> > 7/ start making upcalls
>> >
>> 
>> ok, when are you going to have these done? :)
>> 
>> One other question, have you looked at the recent lwp/kse support added
>> to NetBSD?  Is there anything to learn/avoid?
> 
> I've had only a small look so far 
> sorting hte wheat from the chaff is a hard task and of course it requires
> understanding a lot that I'm not too solid on. (e.g. UVM).

My only concern at this point in time is that I think 5.0 is fragile enough as
it is.  I'd rather that KSE not come in until 6.0-CURRENT so that 5.x has a
fighting chance of being stable, but that is just my opinion.

-- 

John Baldwin <jhb@FreeBSD.org> -- http://www.FreeBSD.org/~jhb/
PGP Key: http://www.baldwin.cx/~john/pgpkey.asc
"Power Users Use the Power to Serve!"  -  http://www.FreeBSD.org/

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Thu Apr 26 12: 6:38 2001
Delivered-To: freebsd-arch@freebsd.org
Received: from dragon.nuxi.com (trang.nuxi.com [209.152.133.57])
	by hub.freebsd.org (Postfix) with ESMTP
	id A6CE737B422; Thu, 26 Apr 2001 12:06:36 -0700 (PDT)
	(envelope-from obrien@NUXI.com)
Received: (from obrien@localhost)
	by dragon.nuxi.com (8.11.3/8.11.1) id f3QJ6UJ92992;
	Thu, 26 Apr 2001 12:06:30 -0700 (PDT)
	(envelope-from obrien)
Date: Thu, 26 Apr 2001 12:06:30 -0700
From: "David O'Brien" <obrien@FreeBSD.ORG>
To: Julian Elischer <julian@elischer.org>
Cc: Arch@FreeBSD.ORG, Daniel Eischen <eischen@vigrid.com>
Subject: Re: KSE threading support (first parts)
Message-ID: <20010426120630.A92915@dragon.nuxi.com>
Reply-To: obrien@FreeBSD.ORG
References: <3AE71067.FF4BD029@elischer.org> <20010425110940.L1790@fw.wintelcom.net> <3AE85776.92D6BD90@elischer.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.2.5i
In-Reply-To: <3AE85776.92D6BD90@elischer.org>; from julian@elischer.org on Thu, Apr 26, 2001 at 10:14:30AM -0700
X-Operating-System: FreeBSD 5.0-CURRENT
Organization: The NUXI BSD group
X-Pgp-Rsa-Fingerprint: B7 4D 3E E9 11 39 5F A3  90 76 5D 69 58 D9 98 7A
X-Pgp-Rsa-Keyid: 1024/34F9F9D5
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

Uh people.  

We really, really NEED to agree on the design here.  Jason's paper
(http://people.freebsd.org/~jasone/refs/freebsd_kse/freebsd_kse.html) is
explains all this.

Before any more work is done on KSE's I really feel people should either
agree fully with the paper, or debate its contents first.

I really doubt a single person will develop KSE, so it is imperative
there is a common sheet of music.

-- 
-- David  (obrien@FreeBSD.org)

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Thu Apr 26 13:11: 8 2001
Delivered-To: freebsd-arch@freebsd.org
Received: from pcnet1.pcnet.com (pcnet1.pcnet.com [204.213.232.3])
	by hub.freebsd.org (Postfix) with ESMTP
	id CED4037B422; Thu, 26 Apr 2001 13:11:02 -0700 (PDT)
	(envelope-from eischen@vigrid.com)
Received: (from eischen@localhost)
	by pcnet1.pcnet.com (8.8.7/PCNet) id QAA14182;
	Thu, 26 Apr 2001 16:10:19 -0400 (EDT)
Date: Thu, 26 Apr 2001 16:10:18 -0400 (EDT)
From: Daniel Eischen <eischen@vigrid.com>
To: Julian Elischer <julian@elischer.org>
Cc: Arch@freebsd.org, alfred@freebsd.org,
	Robert Watson <rwatson@freebsd.org>
Subject: Re: KSE threading support (first parts)
In-Reply-To: <3AE867C2.3B657214@elischer.org>
Message-ID: <Pine.SUN.3.91.1010426145508.194A-100000@pcnet1.pcnet.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

On Thu, 26 Apr 2001, Julian Elischer wrote:
> Daniel Eischen wrote:
> > I still don't see the need to have multiple KSEs within a KSEG ;-)
> 
> KSEs in the same KSEG are using the same pool of quanta to 
> complete KSECs.
> SOMETHING has to hold that information. A KSE is a virtual 
> processor where a KSEG is a virtual multiprocessor.
> 
> You allocate quanta to the KSEG. KSEs use these quanta. 
> The KSEG is competing (almost) fairly with other processes 
> in the system. If you want "system" thread scheduling, you 
> can create more KSEGs. They compete against other processes and 
> against each other for slices of the 'real' machine.

Right, like I've said before, it's easier just to combine the KSEG and KSE
into one entity and forget about fair scheduling.  Limit the number of
"combined KSE/KSEGs" to the number of processors (scope system threads
still get their own "combined KSE/KSEG" to satisfy POSIX).  The common
case is a single processor system anyways, and in that case it doesn't
make sense to have more than 1 KSE in a KSEG.

In a multiprocessor system, if you have an extra quantum, who really
cares?  If someone really does care, then it can be a kernel tunable
or resource limit.

I just don't see any benefit for the added complexity.  It certainly
is easier for the UTS with a combined KSE/KSEG.

> > Signals should be sent (via an upcall) to the first available
> > KSE to return to userland (return from syscall, after preemption,
> > etc.).  The userland thread scheduler will pick a thread to
> > receive the signal.  If the thread is running or in one
> > of the scheduling queues for the current KSEG, it will
> > be able to handle it without any other assist from the kernel.
> > If the thread is running or in one of the scheduling queues for
> > another KSEG, it will mark the signal pending in the target
> > thread and "signal" the appropriate KSEG with help from the
> > kernel (one of the new user<->kernel interfaces or syscalls).
> 
> OK so 'signals' and everything to do with them are "Per process".
> I may edit the patch to indicate this. This does indicate a mutex 
> with SMP so that if two processors return their KSEs to userland
> at the same time, they don't deliver the same signal twice.
> Can two KSEs (KSEs are on different processors) deliver
> DIFFERENT signals to userland at the same time?

I suppose they could, as long as they are delivered via an
upcall (on the special stack used for upcalls, and the running
thread marked as preempted).  The UTS will have to use some
locking mechanisms, but it has to do that normally anyways.

> > 
> > (We may have to replace "KSEG" in the above with "KSE")
> 
> yes, you are correct.. it should read: (I think)
> > Signals should be sent (via an upcall) to the first available
> > KSE to return to userland (return from syscall, after preemption,
> > etc.).  The userland thread scheduler will pick a thread to
> > receive the signal.  If the thread is running or in one
> > of the scheduling queues for the current KSEG, it will
> > be able to handle it without any other assist from the kernel.
> 
> Is this what you mean?

For the most part.  But if the target thread is running in one
of the other KSEs for that KSEG, then it will still require an
assist from the kernel.

> This is tricky... when a KSE returns to userland it is running 
> NO threads. All threaded syscalls return to userland in the 'suspended'
> state, so that the UTS can decide what to run.

This is only when syscalls or when you need to notify the KSE of
special events (signals, interruptions from other KSEs, etc).
Normally, a syscall that doesn't block just returns without
any upcall.

> All syscalls return via
> an upcall to the UTS (actually the original newkse() call returns
> infinitly many times.. that is how the upcall is achieved). The return
> values, error returns and data movements have been made to the appropriate
> memory locations.. It's as if the thread did a 'yield()' immediatly
> after returning from a normal syscall..
> So we can be sure that THIS KSE isn't running the interrupt thread. 

Running threads can cause synchronous signals, so it's quite possible
the running thread generated the signal.  The KSE in which the thread
was running would then get the notification.

I don't see any problem with this as long as interrupted thread
contexts are available to the UTS.

> If the thread is however being run on a different KSE (regardless of 
> whether in this KSEG or not) then the signal must be noted so that 
> the thread can see it at some future time. If it's not running but in 
> another KSEG then it's treated as if running, (the signal noted) and the 
> UTS will make it runnable at the next opportunity that that KSEG
> is runnable. (If we ran the thread on this KSE regardless of the fact
> that it's from another KSEG, then it will be running with a priority
> other than what the programmer assigned it. (maybe he wants lower 
> priority signal handling)).

1996 POSIX spec says that signals should be delivered "as soon as
possible".  This leaves some leeway (I'll have to see if Austin
changes any of this), but my approach in the current threads library
is to deliver the signal right away unless the thread is in a critical
region (in which case the signal is delivered when it exits the
critical region).

> > If the thread is running or in one of the scheduling queues for
> > another KSE, it will mark the signal pending in the target
> > thread and "signal" the appropriate KSEG with help from the
> > kernel (one of the new user<->kernel interfaces or syscalls).
> 
> If a KSEG is not running because it had no work, then
> yes, you need to wake up one of its KSEs to handle the signal.

Yeah, but I was thinking more along the lines of interrupting a
currently running KSE.

> > It might be nice to have a general way of sending messages
> > between KSEGs (KSEs?).
> 
> Userland-to-kernel? or userland-to-userland?
> "kind of like a signal?" :-)

Userland to userland with an assist from the kernel.  KSE A wants
to interrupt KSE B and send B an upcall message of some sort.
The UTS knows what the message format is, but the kernel doesn't
need to know other than possibly its message type and size.

-- 
Dan Eischen

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Thu Apr 26 17:15:12 2001
Delivered-To: freebsd-arch@freebsd.org
Received: from earth.backplane.com (earth-nat-cw.backplane.com [208.161.114.67])
	by hub.freebsd.org (Postfix) with ESMTP
	id 99C2137B422; Thu, 26 Apr 2001 17:15:10 -0700 (PDT)
	(envelope-from dillon@earth.backplane.com)
Received: (from dillon@localhost)
	by earth.backplane.com (8.11.2/8.11.2) id f3R0FAi62512;
	Thu, 26 Apr 2001 17:15:10 -0700 (PDT)
	(envelope-from dillon)
Date: Thu, 26 Apr 2001 17:15:10 -0700 (PDT)
From: Matt Dillon <dillon@earth.backplane.com>
Message-Id: <200104270015.f3R0FAi62512@earth.backplane.com>
To: "David O'Brien" <obrien@FreeBSD.ORG>
Cc: Julian Elischer <julian@elischer.org>, Arch@FreeBSD.ORG,
	Daniel Eischen <eischen@vigrid.com>
Subject: Re: KSE threading support (first parts)
References: <3AE71067.FF4BD029@elischer.org> <20010425110940.L1790@fw.wintelcom.net> <3AE85776.92D6BD90@elischer.org> <20010426120630.A92915@dragon.nuxi.com>
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG


:Uh people.  
:
:We really, really NEED to agree on the design here.  Jason's paper
:(http://people.freebsd.org/~jasone/refs/freebsd_kse/freebsd_kse.html) is
:explains all this.
:
:Before any more work is done on KSE's I really feel people should either
:agree fully with the paper, or debate its contents first.
:
:I really doubt a single person will develop KSE, so it is imperative
:there is a common sheet of music.
:
:-- 
:-- David  (obrien@FreeBSD.org)

    I've read it.  I was under the impression from prior discussions that
    KSEs belonging to the same process had to be serialized... that you
    couldn't run them concurrently with each other.  I can't imagine how 
    we could possibly run KSEs belonging to the same process concurrently
    anyway.  I think I prefer the original rfork()/KSE model.

							-Matt


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Thu Apr 26 22:10:36 2001
Delivered-To: freebsd-arch@freebsd.org
Received: from filk.iinet.net.au (syncopation-dns.iinet.net.au [203.59.24.29])
	by hub.freebsd.org (Postfix) with SMTP id 512B037B423
	for <Arch@FreeBSD.ORG>; Thu, 26 Apr 2001 22:10:33 -0700 (PDT)
	(envelope-from julian@elischer.org)
Received: (qmail 16501 invoked by uid 666); 27 Apr 2001 05:13:43 -0000
Received: from i177-040.nv.iinet.net.au (HELO elischer.org) (203.59.177.40)
  by mail.m.iinet.net.au with SMTP; 27 Apr 2001 05:13:43 -0000
Message-ID: <3AE8FF0A.AFAF3AE1@elischer.org>
Date: Thu, 26 Apr 2001 22:09:30 -0700
From: Julian Elischer <julian@elischer.org>
X-Mailer: Mozilla 4.7 [en] (X11; U; FreeBSD 5.0-CURRENT i386)
X-Accept-Language: en, hu
MIME-Version: 1.0
To: obrien@FreeBSD.ORG
Cc: Arch@FreeBSD.ORG, Daniel Eischen <eischen@vigrid.com>
Subject: Re: KSE threading support (first parts)
References: <3AE71067.FF4BD029@elischer.org> <20010425110940.L1790@fw.wintelcom.net> <3AE85776.92D6BD90@elischer.org> <20010426120630.A92915@dragon.nuxi.com>
Content-Type: text/plain; charset=iso-8859-2
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

David O'Brien wrote:
> 
> Uh people.
> 
> We really, really NEED to agree on the design here.  Jason's paper
> (http://people.freebsd.org/~jasone/refs/freebsd_kse/freebsd_kse.html) is
> explains all this.
> 
> Before any more work is done on KSE's I really feel people should either
> agree fully with the paper, or debate its contents first.
> 
> I really doubt a single person will develop KSE, so it is imperative
> there is a common sheet of music.


I helped develop that paper..
We are not planning on ignoring it, just clarifying it and maybe producing 
a new version of it. it doesn't cover all details (e.g. how signals 
are handled) to great depth.


> 
> --
> -- David  (obrien@FreeBSD.org)

-- 
      __--_|\  Julian Elischer
     /       \ julian@elischer.org
    (   OZ    ) World tour 2000-2001
---> X_.---._/  
            v

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Thu Apr 26 22:37:51 2001
Delivered-To: freebsd-arch@freebsd.org
Received: from filk.iinet.net.au (syncopation-dns.iinet.net.au [203.59.24.29])
	by hub.freebsd.org (Postfix) with SMTP id 7E30837B42C
	for <Arch@freebsd.org>; Thu, 26 Apr 2001 22:37:43 -0700 (PDT)
	(envelope-from julian@elischer.org)
Received: (qmail 16686 invoked by uid 666); 27 Apr 2001 05:40:53 -0000
Received: from i177-040.nv.iinet.net.au (HELO elischer.org) (203.59.177.40)
  by mail.m.iinet.net.au with SMTP; 27 Apr 2001 05:40:53 -0000
Message-ID: <3AE90567.CA50293E@elischer.org>
Date: Thu, 26 Apr 2001 22:36:39 -0700
From: Julian Elischer <julian@elischer.org>
X-Mailer: Mozilla 4.7 [en] (X11; U; FreeBSD 5.0-CURRENT i386)
X-Accept-Language: en, hu
MIME-Version: 1.0
To: Daniel Eischen <eischen@vigrid.com>
Cc: Arch@freebsd.org, alfred@freebsd.org,
	Robert Watson <rwatson@freebsd.org>
Subject: Re: KSE threading support (first parts)
References: <Pine.SUN.3.91.1010426145508.194A-100000@pcnet1.pcnet.com>
Content-Type: text/plain; charset=iso-8859-2
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

Daniel Eischen wrote:
> 
> On Thu, 26 Apr 2001, Julian Elischer wrote:
> > Daniel Eischen wrote:
> > > I still don't see the need to have multiple KSEs within a KSEG ;-)
> >
> > KSEs in the same KSEG are using the same pool of quanta to
> > complete KSECs.
> > SOMETHING has to hold that information. A KSE is a virtual
> > processor where a KSEG is a virtual multiprocessor.
> >
> > You allocate quanta to the KSEG. KSEs use these quanta.
> > The KSEG is competing (almost) fairly with other processes
> > in the system. If you want "system" thread scheduling, you
> > can create more KSEGs. They compete against other processes and
> > against each other for slices of the 'real' machine.
> 
> Right, like I've said before, it's easier just to combine the KSEG and KSE
> into one entity and forget about fair scheduling.  Limit the number of
> "combined KSE/KSEGs" to the number of processors (scope system threads
> still get their own "combined KSE/KSEG" to satisfy POSIX).  The common
> case is a single processor system anyways, and in that case it doesn't
> make sense to have more than 1 KSE in a KSEG.

By grouping KSEs ito an equivalence group, I give the system enough 
information to allow it to schedule resuming KSECs on an 'equivalent'
but idle KSE. i.e. a syscall may be initiated on one kse and completed on
another as long as they are in the same KSEG.
The UTS doesn't really have to think to much about this except to 
know that the paralelism in a KSEG is equal to the lesser of
the number of KSEs and the number of processors. (having more KSEs 
than processors is completely pointless, and I happen to think it 
should not be allowed.)

I tried to get away from having a KSEG but it ended up getting 
more complicated again. 

Most threaded applications would have one KSEG and N KSEs 
(N==num-processors)
or ONE KSEG and one KSE (which is still useful as you have N KSECs).

Few apps would have more than one KSEG and hardly any would have 
more than 2.


> 
> In a multiprocessor system, if you have an extra quantum, who really
> cares?  If someone really does care, then it can be a kernel tunable
> or resource limit.
> 
> I just don't see any benefit for the added complexity.  It certainly
> is easier for the UTS with a combined KSE/KSEG.

but I think it will be simpler WITH it..
Treat the KSEs in a KSEG as interchangable. threads that go into 
the system on one may be reported to have completed their syscall 
on another.


> 
> > > Signals should be sent (via an upcall) to the first available
> > > KSE to return to userland (return from syscall, after preemption,
> > > etc.).  The userland thread scheduler will pick a thread to
> > > receive the signal.  If the thread is running or in one
> > > of the scheduling queues for the current KSEG, it will
> > > be able to handle it without any other assist from the kernel.
> > > If the thread is running or in one of the scheduling queues for
> > > another KSEG, it will mark the signal pending in the target
> > > thread and "signal" the appropriate KSEG with help from the
> > > kernel (one of the new user<->kernel interfaces or syscalls).
> >
> > OK so 'signals' and everything to do with them are "Per process".
> > I may edit the patch to indicate this. This does indicate a mutex
> > with SMP so that if two processors return their KSEs to userland
> > at the same time, they don't deliver the same signal twice.
> > Can two KSEs (KSEs are on different processors) deliver
> > DIFFERENT signals to userland at the same time?
> 
> I suppose they could, as long as they are delivered via an
> upcall (on the special stack used for upcalls, and the running
> thread marked as preempted).  The UTS will have to use some
> locking mechanisms, but it has to do that normally anyways.
> 
> > >
> > > (We may have to replace "KSEG" in the above with "KSE")
> >
> > yes, you are correct.. it should read: (I think)
> > > Signals should be sent (via an upcall) to the first available
> > > KSE to return to userland (return from syscall, after preemption,
> > > etc.).  The userland thread scheduler will pick a thread to
> > > receive the signal.  If the thread is running or in one
> > > of the scheduling queues for the current KSEG, it will
> > > be able to handle it without any other assist from the kernel.
> >
> > Is this what you mean?
> 
> For the most part.  But if the target thread is running in one
> of the other KSEs for that KSEG, then it will still require an
> assist from the kernel.

Why? Surely it is to be considered to be processing another signal.
note teh signal, and it should pick it up when it's completed the one 
its doing..

> 
> > This is tricky... when a KSE returns to userland it is running
> > NO threads. All threaded syscalls return to userland in the 'suspended'
> > state, so that the UTS can decide what to run.
> 
> This is only when syscalls or when you need to notify the KSE of
> special events (signals, interruptions from other KSEs, etc).
> Normally, a syscall that doesn't block just returns without
> any upcall.

well, I don't know that this is true..  the thread could starve other threads
by doing only non-blocking syscalls. Ihad thought that all syscalls would return
via upcalls to allow the UTS to decide whether to pre-empt them.

> 
> > All syscalls return via
> > an upcall to the UTS (actually the original newkse() call returns
> > infinitly many times.. that is how the upcall is achieved). The return
> > values, error returns and data movements have been made to the appropriate
> > memory locations.. It's as if the thread did a 'yield()' immediatly
> > after returning from a normal syscall..
> > So we can be sure that THIS KSE isn't running the interrupt thread.
> 
> Running threads can cause synchronous signals, so it's quite possible
> the running thread generated the signal.  The KSE in which the thread
> was running would then get the notification.

the KSE would, but when the upcall is made, the previous running thread 
is made to lookas if it had yielded.. the KSE is running but the thread
is 'suspended'.

> 
> I don't see any problem with this as long as interrupted thread
> contexts are available to the UTS.


> 
> > If the thread is however being run on a different KSE (regardless of
> > whether in this KSEG or not) then the signal must be noted so that
> > the thread can see it at some future time. If it's not running but in
> > another KSEG then it's treated as if running, (the signal noted) and the
> > UTS will make it runnable at the next opportunity that that KSEG
> > is runnable. (If we ran the thread on this KSE regardless of the fact
> > that it's from another KSEG, then it will be running with a priority
> > other than what the programmer assigned it. (maybe he wants lower
> > priority signal handling)).
> 
> 1996 POSIX spec says that signals should be delivered "as soon as
> possible".  This leaves some leeway (I'll have to see if Austin
> changes any of this), but my approach in the current threads library
> is to deliver the signal right away unless the thread is in a critical
> region (in which case the signal is delivered when it exits the
> critical region).
> 
> > > If the thread is running or in one of the scheduling queues for
> > > another KSE, it will mark the signal pending in the target
> > > thread and "signal" the appropriate KSEG with help from the
> > > kernel (one of the new user<->kernel interfaces or syscalls).
> >
> > If a KSEG is not running because it had no work, then
> > yes, you need to wake up one of its KSEs to handle the signal.
> 
> Yeah, but I was thinking more along the lines of interrupting a
> currently running KSE.

If you are returning from the kernel, all threads are effectively 
suspeended and equivalent. The KSE is always 'idle' when returning 
to the UTS. The UTS then decides what to run next.. (usually the 
thread that was just active, but not always.)

> 
> > > It might be nice to have a general way of sending messages
> > > between KSEGs (KSEs?).
> >
> > Userland-to-kernel? or userland-to-userland?
> > "kind of like a signal?" :-)
> 
> Userland to userland with an assist from the kernel.  KSE A wants
> to interrupt KSE B and send B an upcall message of some sort.
> The UTS knows what the message format is, but the kernel doesn't
> need to know other than possibly its message type and size.


fair enough.

> 
> --
> Dan Eischen

-- 
      __--_|\  Julian Elischer
     /       \ julian@elischer.org
    (   OZ    ) World tour 2000-2001
---> X_.---._/  
            v

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Fri Apr 27  4:55:26 2001
Delivered-To: freebsd-arch@freebsd.org
Received: from pcnet1.pcnet.com (pcnet1.pcnet.com [204.213.232.3])
	by hub.freebsd.org (Postfix) with ESMTP
	id AD4FE37B422; Fri, 27 Apr 2001 04:55:20 -0700 (PDT)
	(envelope-from eischen@vigrid.com)
Received: (from eischen@localhost)
	by pcnet1.pcnet.com (8.8.7/PCNet) id HAA26845;
	Fri, 27 Apr 2001 07:54:42 -0400 (EDT)
Date: Fri, 27 Apr 2001 07:54:42 -0400 (EDT)
From: Daniel Eischen <eischen@vigrid.com>
To: Julian Elischer <julian@elischer.org>
Cc: Arch@freebsd.org, alfred@freebsd.org,
	Robert Watson <rwatson@freebsd.org>
Subject: Re: KSE threading support (first parts)
In-Reply-To: <3AE90567.CA50293E@elischer.org>
Message-ID: <Pine.SUN.3.91.1010427072216.22918A-100000@pcnet1.pcnet.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

On Thu, 26 Apr 2001, Julian Elischer wrote:
> Daniel Eischen wrote:
> > Right, like I've said before, it's easier just to combine the KSEG and KSE
> > into one entity and forget about fair scheduling.  Limit the number of
> > "combined KSE/KSEGs" to the number of processors (scope system threads
> > still get their own "combined KSE/KSEG" to satisfy POSIX).  The common
> > case is a single processor system anyways, and in that case it doesn't
> > make sense to have more than 1 KSE in a KSEG.
> 
> By grouping KSEs ito an equivalence group, I give the system enough 
> information to allow it to schedule resuming KSECs on an 'equivalent'
> but idle KSE. i.e. a syscall may be initiated on one kse and completed on
> another as long as they are in the same KSEG.

I think we need (in the UTS) to have curthread->curkse.  Every time
the thread changes KSEs, the UTS has to setup change this.  We'll
also have curkse that will have curkse->curthread.  The curkse
is the thing pointed at by the ldt (%fs).

> The UTS doesn't really have to think to much about this except to 
> know that the paralelism in a KSEG is equal to the lesser of
> the number of KSEs and the number of processors. (having more KSEs 
> than processors is completely pointless, and I happen to think it 
> should not be allowed.)

Right, I thought we all agreed that more KSEs than CPUs is pointless.

The problem is how does the UTS schedule threads to a set of KSEs
within the same KSEG.  There will be one scheduling queue for the
main process (in the UTS) and this is where we'll potentially
have multiple KSEs available.  Scope system threads get their
own KSEG/KSE, and the remaining threads run in the main process.

If KSEs have processor affinity, then the UTS should try to
keep threads running on the same KSE and still load balance
to ensure each threads gets its fair share of processor time.

> I tried to get away from having a KSEG but it ended up getting 
> more complicated again. 
> 
> Most threaded applications would have one KSEG and N KSEs 
> (N==num-processors)
> or ONE KSEG and one KSE (which is still useful as you have N KSECs).
> 
> Few apps would have more than one KSEG and hardly any would have 
> more than 2.

Scope system threads get their own KSEG.  I'm sure there are a lot
of applications out there that create more than 1 or 2 scope
system threads.

> > In a multiprocessor system, if you have an extra quantum, who really
> > cares?  If someone really does care, then it can be a kernel tunable
> > or resource limit.
> > 
> > I just don't see any benefit for the added complexity.  It certainly
> > is easier for the UTS with a combined KSE/KSEG.
> 
> but I think it will be simpler WITH it..
> Treat the KSEs in a KSEG as interchangable. threads that go into 
> the system on one may be reported to have completed their syscall 
> on another.

But I don't think it _is_ easier.

Whatever.  I can't be too concerned with this right now.  It's more
important that we get _something_ that's better than the current
threads library and the NxN Linuxthreads model.  It can always be
improved later.

> > For the most part.  But if the target thread is running in one
> > of the other KSEs for that KSEG, then it will still require an
> > assist from the kernel.
> 
> Why? Surely it is to be considered to be processing another signal.
> note teh signal, and it should pick it up when it's completed the one 
> its doing..

You've got 2 KSEs (A and B) running.  Another process sends a SIGUSR1 to 
this process.  The next KSE (A) to cross the kernel->user boundary gets an
upcall to notify it that it got SIGUSR1.  The UTS decides that it
should be handled by the thread running on the other KSE (B).

> > > This is tricky... when a KSE returns to userland it is running
> > > NO threads. All threaded syscalls return to userland in the 'suspended'
> > > state, so that the UTS can decide what to run.
> > 
> > This is only when syscalls or when you need to notify the KSE of
> > special events (signals, interruptions from other KSEs, etc).
> > Normally, a syscall that doesn't block just returns without
> > any upcall.
> 
> well, I don't know that this is true..  the thread could starve other threads
> by doing only non-blocking syscalls. Ihad thought that all syscalls would return
> via upcalls to allow the UTS to decide whether to pre-empt them.

We don't need fine-grained resolution like that.  And I wouldn't
want to run the UTS scheduler every time that a system call was
made - yech!  At a minimum, you'd have to have an extra context
switch for each syscall.  You only need to make an upcall when the
KSE is preempted to run another process (or KSE).  This limits the
UTS thread quantum to a multiple of the kernels quantum.

We could also continue to use a setitimer type of mechanism with
a fixed interval.  Scope system threads don't need to be notified
of timing signals since there are no other threads to be run.  It's
only the KSEs within the main process that need some sort of timing
signal.

> > > All syscalls return via
> > > an upcall to the UTS (actually the original newkse() call returns
> > > infinitly many times.. that is how the upcall is achieved). The return
> > > values, error returns and data movements have been made to the appropriate
> > > memory locations.. It's as if the thread did a 'yield()' immediatly
> > > after returning from a normal syscall..
> > > So we can be sure that THIS KSE isn't running the interrupt thread.
> > 
> > Running threads can cause synchronous signals, so it's quite possible
> > the running thread generated the signal.  The KSE in which the thread
> > was running would then get the notification.
> 
> the KSE would, but when the upcall is made, the previous running thread 
> is made to lookas if it had yielded.. the KSE is running but the thread
> is 'suspended'.

OK, I just wanted to make sure we were on the same page.  To me, 
'suspended' is preempted.

> > > If a KSEG is not running because it had no work, then
> > > yes, you need to wake up one of its KSEs to handle the signal.
> > 
> > Yeah, but I was thinking more along the lines of interrupting a
> > currently running KSE.
> 
> If you are returning from the kernel, all threads are effectively 
> suspeended and equivalent. The KSE is always 'idle' when returning 
> to the UTS. The UTS then decides what to run next.. (usually the 
> thread that was just active, but not always.)

KSEs have upcalls, not KSEGs.  If you return from the kernel in one
KSE, that does not mean that threads are suspended and not running
in one of the other KSEs within that KSEG (and you can certainly
have scope system threads in their own KSEG/KSE pair that are
running also).  I'm thinking multiprocessor.

-- 
Dan Eischen

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Fri Apr 27  8:36: 7 2001
Delivered-To: freebsd-arch@freebsd.org
Received: from merchandisewholesale.com (ci392057-b.ruthfd1.tn.home.com [24.15.72.99])
	by hub.freebsd.org (Postfix) with SMTP id 032C437B424
	for <freebsd-arch@freebsd.org>; Fri, 27 Apr 2001 08:35:52 -0700 (PDT)
	(envelope-from cs@merchandisewholesale.com)
From: "Merchandise WholeSale" <cs@merchandisewholesale.com>
To: <freebsd-arch@freebsd.org>
Subject: Grand Opening
Mime-Version: 1.0
Content-Type: text/plain; charset="ISO-8859-1"
Date: Fri, 27 Apr 2001 10:30:23 -0700
Reply-To: "Merchandise WholeSale" <cs@merchandisewholesale.com>
Content-Transfer-Encoding: 8bit
Message-Id: <20010427153552.032C437B424@hub.freebsd.org>
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG


	First off I would like to Thank You for taking time to read this 
letter. Second of all your e-mail address was pulled from an on-line 
source. This is the only & last message you'll receive from us, so you 
don't have to worry about an unsubscribe list or spam. Nor will we give 
your e-mail out to any one else. I'd like to stop, and tell you about a new 
ON-LINE Retail store. Merchandise Wholesale, a retail store that has over 
2,000 products  for home,travel,jewelry,personal needs etc... Please take 
time out when you have it to browse our ON-LINE directory at  
http://www.merchandisewholesale.com  Click on any images of the item to 
enlarge. Our site is always under constant change for the better. 

Thanks for your precious time, HTTP://MERCHANDISEWHOLESALE.COM
  
   promotions@merchandisewholesale.com
   
   
To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Fri Apr 27  9:10:36 2001
Delivered-To: freebsd-arch@freebsd.org
Received: from ns.yogotech.com (ns.yogotech.com [206.127.123.66])
	by hub.freebsd.org (Postfix) with ESMTP
	id 7F99937B422; Fri, 27 Apr 2001 09:10:28 -0700 (PDT)
	(envelope-from nate@yogotech.com)
Received: from nomad.yogotech.com (nomad.yogotech.com [206.127.123.131])
	by ns.yogotech.com (8.9.3/8.9.3) with ESMTP id KAA20944;
	Fri, 27 Apr 2001 10:10:20 -0600 (MDT)
	(envelope-from nate@nomad.yogotech.com)
Received: (from nate@localhost)
	by nomad.yogotech.com (8.8.8/8.8.8) id KAA18653;
	Fri, 27 Apr 2001 10:10:14 -0600 (MDT)
	(envelope-from nate)
From: Nate Williams <nate@yogotech.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Message-ID: <15081.39397.944224.776391@nomad.yogotech.com>
Date: Fri, 27 Apr 2001 10:10:13 -0600 (MDT)
To: Matt Dillon <dillon@earth.backplane.com>
Cc: "David O'Brien" <obrien@FreeBSD.ORG>,
	Julian Elischer <julian@elischer.org>, Arch@FreeBSD.ORG,
	Daniel Eischen <eischen@vigrid.com>
Subject: Re: KSE threading support (first parts)
In-Reply-To: <200104270015.f3R0FAi62512@earth.backplane.com>
References: <3AE71067.FF4BD029@elischer.org>
	<20010425110940.L1790@fw.wintelcom.net>
	<3AE85776.92D6BD90@elischer.org>
	<20010426120630.A92915@dragon.nuxi.com>
	<200104270015.f3R0FAi62512@earth.backplane.com>
X-Mailer: VM 6.75 under 21.1 (patch 12) "Channel Islands" XEmacs Lucid
Reply-To: nate@yogotech.com (Nate Williams)
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

> :Uh people.  
> :
> :We really, really NEED to agree on the design here.  Jason's paper
> :(http://people.freebsd.org/~jasone/refs/freebsd_kse/freebsd_kse.html) is
> :explains all this.
> :
> :Before any more work is done on KSE's I really feel people should either
> :agree fully with the paper, or debate its contents first.
> :
> :I really doubt a single person will develop KSE, so it is imperative
> :there is a common sheet of music.
> :
> :-- 
> :-- David  (obrien@FreeBSD.org)
> 
>     I've read it.  I was under the impression from prior discussions that
>     KSEs belonging to the same process had to be serialized... that you
>     couldn't run them concurrently with each other.

What's the point of SMP then?  This would give us essentially a
'single-threaded' process, since only one thread/process can be running
at any one point in time.  Arguable, this is still better than the
current situation where if a thread blocks, the entire process blocks,
but if we've got an idle CPU, why not allow another thread run in a
second KSE on the idle processor?

>     I can't imagine how 
>     we could possibly run KSEs belonging to the same process concurrently
>     anyway.

Think 'multi-threaded' applications.  It's trivial to design a program
where multiple threads are independant of one another.


Nate

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Fri Apr 27 10: 2:16 2001
Delivered-To: freebsd-arch@freebsd.org
Received: from earth.backplane.com (earth-nat-cw.backplane.com [208.161.114.67])
	by hub.freebsd.org (Postfix) with ESMTP
	id E88FA37B422; Fri, 27 Apr 2001 10:02:13 -0700 (PDT)
	(envelope-from dillon@earth.backplane.com)
Received: (from dillon@localhost)
	by earth.backplane.com (8.11.2/8.11.2) id f3RH1Tk05185;
	Fri, 27 Apr 2001 10:01:29 -0700 (PDT)
	(envelope-from dillon)
Date: Fri, 27 Apr 2001 10:01:29 -0700 (PDT)
From: Matt Dillon <dillon@earth.backplane.com>
Message-Id: <200104271701.f3RH1Tk05185@earth.backplane.com>
To: Nate Williams <nate@yogotech.com>
Cc: "David O'Brien" <obrien@FreeBSD.ORG>,
	Julian Elischer <julian@elischer.org>, Arch@FreeBSD.ORG,
	Daniel Eischen <eischen@vigrid.com>
Subject: Re: KSE threading support (first parts)
References: <3AE71067.FF4BD029@elischer.org>
	<20010425110940.L1790@fw.wintelcom.net>
	<3AE85776.92D6BD90@elischer.org>
	<20010426120630.A92915@dragon.nuxi.com>
	<200104270015.f3R0FAi62512@earth.backplane.com> <15081.39397.944224.776391@nomad.yogotech.com>
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG


:...
:> :
:> :Before any more work is done on KSE's I really feel people should either
:> :agree fully with the paper, or debate its contents first.
:> :
:> :I really doubt a single person will develop KSE, so it is imperative
:> :there is a common sheet of music.
:> :
:> :-- 
:> :-- David  (obrien@FreeBSD.org)
:> 
:>     I've read it.  I was under the impression from prior discussions that
:>     KSEs belonging to the same process had to be serialized... that you
:>     couldn't run them concurrently with each other.
:
:What's the point of SMP then?  This would give us essentially a
:'single-threaded' process, since only one thread/process can be running
:at any one point in time.  Arguable, this is still better than the
:current situation where if a thread blocks, the entire process blocks,
:but if we've got an idle CPU, why not allow another thread run in a
:second KSE on the idle processor?
:
:>     I can't imagine how 
:>     we could possibly run KSEs belonging to the same process concurrently
:>     anyway.
:
:Think 'multi-threaded' applications.  It's trivial to design a program
:where multiple threads are independant of one another.
:
:Nate

    Try reading my posting again, Nate, carefully.  You missed the whole
    thing.

						-Matt


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Fri Apr 27 10: 6: 7 2001
Delivered-To: freebsd-arch@freebsd.org
Received: from ns.yogotech.com (ns.yogotech.com [206.127.123.66])
	by hub.freebsd.org (Postfix) with ESMTP
	id E3E3537B423; Fri, 27 Apr 2001 10:06:00 -0700 (PDT)
	(envelope-from nate@yogotech.com)
Received: from nomad.yogotech.com (nomad.yogotech.com [206.127.123.131])
	by ns.yogotech.com (8.9.3/8.9.3) with ESMTP id LAA21824;
	Fri, 27 Apr 2001 11:05:57 -0600 (MDT)
	(envelope-from nate@nomad.yogotech.com)
Received: (from nate@localhost)
	by nomad.yogotech.com (8.8.8/8.8.8) id LAA18857;
	Fri, 27 Apr 2001 11:05:52 -0600 (MDT)
	(envelope-from nate)
From: Nate Williams <nate@yogotech.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Message-ID: <15081.42735.860662.876478@nomad.yogotech.com>
Date: Fri, 27 Apr 2001 11:05:51 -0600 (MDT)
To: Matt Dillon <dillon@earth.backplane.com>
Cc: Nate Williams <nate@yogotech.com>,
	"David O'Brien" <obrien@FreeBSD.ORG>,
	Julian Elischer <julian@elischer.org>, Arch@FreeBSD.ORG,
	Daniel Eischen <eischen@vigrid.com>
Subject: Re: KSE threading support (first parts)
In-Reply-To: <200104271701.f3RH1Tk05185@earth.backplane.com>
References: <3AE71067.FF4BD029@elischer.org>
	<20010425110940.L1790@fw.wintelcom.net>
	<3AE85776.92D6BD90@elischer.org>
	<20010426120630.A92915@dragon.nuxi.com>
	<200104270015.f3R0FAi62512@earth.backplane.com>
	<15081.39397.944224.776391@nomad.yogotech.com>
	<200104271701.f3RH1Tk05185@earth.backplane.com>
X-Mailer: VM 6.75 under 21.1 (patch 12) "Channel Islands" XEmacs Lucid
Reply-To: nate@yogotech.com (Nate Williams)
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

> :> :Before any more work is done on KSE's I really feel people should either
> :> :agree fully with the paper, or debate its contents first.
> :> :
> :> :I really doubt a single person will develop KSE, so it is imperative
> :> :there is a common sheet of music.
> :> :
> :> :-- 
> :> :-- David  (obrien@FreeBSD.org)
> :> 
> :>     I've read it.  I was under the impression from prior discussions that
> :>     KSEs belonging to the same process had to be serialized... that you
> :>     couldn't run them concurrently with each other.
> :
> :What's the point of SMP then?  This would give us essentially a
> :'single-threaded' process, since only one thread/process can be running
> :at any one point in time.  Arguable, this is still better than the
> :current situation where if a thread blocks, the entire process blocks,
> :but if we've got an idle CPU, why not allow another thread run in a
> :second KSE on the idle processor?
> :
> :>     I can't imagine how 
> :>     we could possibly run KSEs belonging to the same process concurrently
> :>     anyway.
> :
> :Think 'multi-threaded' applications.  It's trivial to design a program
> :where multiple threads are independant of one another.
> :
> :Nate
> 
>     Try reading my posting again, Nate, carefully.  You missed the whole
>     thing.

I read it, and this is what I hear you saying in a nutshell.

KSEs belonging to the same process are serialized, and can not be run
concurrently.

What I'm saying:

KSEs belonging to the same process can be run concurrently if we have
multiple processors.

Where did I miss what you were saying?


Nate


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Fri Apr 27 10:18:16 2001
Delivered-To: freebsd-arch@freebsd.org
Received: from earth.backplane.com (earth-nat-cw.backplane.com [208.161.114.67])
	by hub.freebsd.org (Postfix) with ESMTP
	id D7EAD37B423; Fri, 27 Apr 2001 10:18:12 -0700 (PDT)
	(envelope-from dillon@earth.backplane.com)
Received: (from dillon@localhost)
	by earth.backplane.com (8.11.2/8.11.2) id f3RHHGp05457;
	Fri, 27 Apr 2001 10:17:16 -0700 (PDT)
	(envelope-from dillon)
Date: Fri, 27 Apr 2001 10:17:16 -0700 (PDT)
From: Matt Dillon <dillon@earth.backplane.com>
Message-Id: <200104271717.f3RHHGp05457@earth.backplane.com>
To: Nate Williams <nate@yogotech.com>
Cc: "David O'Brien" <obrien@FreeBSD.ORG>,
	Julian Elischer <julian@elischer.org>, Arch@FreeBSD.ORG,
	Daniel Eischen <eischen@vigrid.com>
Subject: Re: KSE threading support (first parts)
References: <3AE71067.FF4BD029@elischer.org>
	<20010425110940.L1790@fw.wintelcom.net>
	<3AE85776.92D6BD90@elischer.org>
	<20010426120630.A92915@dragon.nuxi.com>
	<200104270015.f3R0FAi62512@earth.backplane.com>
	<15081.39397.944224.776391@nomad.yogotech.com>
	<200104271701.f3RH1Tk05185@earth.backplane.com> <15081.42735.860662.876478@nomad.yogotech.com>
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG


:
:I read it, and this is what I hear you saying in a nutshell.
:
:KSEs belonging to the same process are serialized, and can not be run
:concurrently.
:
:What I'm saying:
:
:KSEs belonging to the same process can be run concurrently if we have
:multiple processors.
:
:Where did I miss what you were saying?
:
:Nate
    
    You seem to believe that not being able to run KSE's for the same
    process concurrently somehow kills the whole concept of SMP.

    Well, that's complete bullshit.  KSE's are extremely short-running
    affairs in kernel mode, especially when you consider the most likely
    asynchronizing case (a simple blocking situation that will most commonly
    be in a read() or write()).  Serializing them within the context of a
    single process will actually *IMPROVE* SMP performance, not make it worse.

    Running multiple kernel contexts for the same process on different
    cpu's concurrently means that you must now lock every single aspect 
    of the 'current process' concept, and cannot make any assumptions
    whatsoever in regards to accessing elements of the current process.

    Well, that's just plain insane.  You will wind up with so many fragging
    locks and mutexes in the kernel that what performance gain you might
    have thought you could get is now completely blown away by the locking
    overhead.

    This is another aspect of the problem you run into when you start
    trying to preempt a process running in the kernel arbitrarily.  Suddenly
    all the assumptions you were able to make before that resulted in
    optimal code paths now must be thrown out the window and replaced with
    a godaweful number of locks to protect kernel contexts from unexpected
    interruptions.  That's insane as well.  You are introducing a 'solution'
    to a problem that doesn't exist and breaking any chance we have of 
    getting a reliable kernel in anything less then a few years in the process.

    If we were writing a kernel completely from scratch we could probably
    construct it to allow these things, but trying to do it with the current
    base is impossible -- you will never get something reliable or efficient
    at the end of this road.  Or perhaps I should phrase it:  The only way
    you will get anything close to reliable will be to effectively revert
    the system to the days of the single giant lock, because you will need
    so many fraggin locks to deal with the consequences you might as well
    have a single big giant lock.

						-Matt


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Fri Apr 27 12:12: 2 2001
Delivered-To: freebsd-arch@freebsd.org
Received: from ns.yogotech.com (ns.yogotech.com [206.127.123.66])
	by hub.freebsd.org (Postfix) with ESMTP
	id 1F85A37B424; Fri, 27 Apr 2001 12:10:01 -0700 (PDT)
	(envelope-from nate@yogotech.com)
Received: from nomad.yogotech.com (nomad.yogotech.com [206.127.123.131])
	by ns.yogotech.com (8.9.3/8.9.3) with ESMTP id NAA23837;
	Fri, 27 Apr 2001 13:09:48 -0600 (MDT)
	(envelope-from nate@nomad.yogotech.com)
Received: (from nate@localhost)
	by nomad.yogotech.com (8.8.8/8.8.8) id NAA19281;
	Fri, 27 Apr 2001 13:09:46 -0600 (MDT)
	(envelope-from nate)
From: Nate Williams <nate@yogotech.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Message-ID: <15081.50170.297579.938254@nomad.yogotech.com>
Date: Fri, 27 Apr 2001 13:09:46 -0600 (MDT)
To: Matt Dillon <dillon@earth.backplane.com>
Cc: Nate Williams <nate@yogotech.com>,
	"David O'Brien" <obrien@FreeBSD.ORG>,
	Julian Elischer <julian@elischer.org>, Arch@FreeBSD.ORG,
	Daniel Eischen <eischen@vigrid.com>
Subject: Re: KSE threading support (first parts)
In-Reply-To: <200104271717.f3RHHGp05457@earth.backplane.com>
References: <3AE71067.FF4BD029@elischer.org>
	<20010425110940.L1790@fw.wintelcom.net>
	<3AE85776.92D6BD90@elischer.org>
	<20010426120630.A92915@dragon.nuxi.com>
	<200104270015.f3R0FAi62512@earth.backplane.com>
	<15081.39397.944224.776391@nomad.yogotech.com>
	<200104271701.f3RH1Tk05185@earth.backplane.com>
	<15081.42735.860662.876478@nomad.yogotech.com>
	<200104271717.f3RHHGp05457@earth.backplane.com>
X-Mailer: VM 6.75 under 21.1 (patch 12) "Channel Islands" XEmacs Lucid
Reply-To: nate@yogotech.com (Nate Williams)
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

> :I read it, and this is what I hear you saying in a nutshell.
> :
> :KSEs belonging to the same process are serialized, and can not be run
> :concurrently.
> :
> :What I'm saying:
> :
> :KSEs belonging to the same process can be run concurrently if we have
> :multiple processors.
> :
> :Where did I miss what you were saying?
> :
> :Nate
>     
>     You seem to believe that not being able to run KSE's for the same
>     process concurrently somehow kills the whole concept of SMP.

No, it kills one of the biggest reasons for supporting KSE.  Otherwise,
a single process can only take advantage of a single processor.

>     Well, that's complete bullshit.  KSE's are extremely short-running
>     affairs in kernel mode, especially when you consider the most likely
>     asynchronizing case (a simple blocking situation that will most commonly
>     be in a read() or write()).

Not necessarily.  My experience with developing and running applications
on Solaris says that having multiple KSE's/process is a *huge* win.

>      Serializing them within the context of a > single process will
>      actually *IMPROVE* SMP performance, not make it worse.

Why?

>     Running multiple kernel contexts for the same process on different
>     cpu's concurrently means that you must now lock every single aspect 
>     of the 'current process' concept

Which has to be done anyway, since the processor will be running
multiple processes in any case, and that a process may migrate to a
different processor depending on process load.

Affinity is a goal, but there's no guarantee that a process will
*always* execute on the same processor.

In essence, you're limiting the design of a threaded program to
serialized processes, which is completely bogus.

>     Well, that's just plain insane.  You will wind up with so many fragging
>     locks and mutexes in the kernel that what performance gain you might
>     have thought you could get is now completely blown away by the locking
>     overhead.

See above.  This has to be done in any case, and is done now.  The
problem is no more difficult with the addition of KSE's, and removes one
of the single biggest advantages of using KSE's.

Out of curiousity, have you read the KSE papers at all?  They are able
to deal with concurrency without all of the complexity you imply must
exist.

>     This is another aspect of the problem you run into when you start
>     trying to preempt a process running in the kernel arbitrarily.  Suddenly
>     all the assumptions you were able to make before that resulted in
>     optimal code paths now must be thrown out the window and replaced with
>     a godaweful number of locks to protect kernel contexts from unexpected
>     interruptions.

*sarcasm on*
Heck, then we should just throw out KSE's, since they are way too
complex and just stick with the current 'BGL' model, right?
*sarcasm off*

It doesn't come for free.  There is no way to have progress without some
additional complexity.  The question we must ask is does the complexity
we add buy us anything.  I believe it does, as do many other people.

Certainly Solaris's ability to scale shows that there is something to be
said for having a pre-emptive kernel.

>     That's insane as well.  You are introducing a 'solution' to a
>     problem that doesn't exist

Matt, honestly, there's no reason to change the existing FreeBSD model
at all, if we're running on a single-processor.  It's not broken in any
way.

However, the current model does not scale with multiple processors.  One
of the stated goals of the later releases of FreeBSD is to create an OS
that scales better on multiple processors, so the current 'model' is not
adequate.  It's a solution to a new problem, one that *does* exist in
BSD if we accept that fact that we want to run better on multiple
processors.  Hence, the KSE model, which is one of many solution to the
scaling problem, and the solution that was decided to be a good
solution.

Another 'goal' is the ability to write threaded programs that run
effeciently on both UP and SMP hardware.  KSE's can help with this, but
a 'serialized KSE' model won't allow a I/O intensive application to benefit
from adding multiple CPU's.

An  example of such an application is one that does the
following (UDP packets were used in this example, for streaming...)

1) One thread is in kernel context in select(), waiting for packets,
   which are thrown onto a queue back in userland and the thread returns
   to kernel land.

2) Another thread processes these packets into two classes, and these
   packets are stuck onto a two different queue.
    a) Data packets
    b) Query packets

3) The data queue is read by another thread, which writes them out to disk.

4) The query packets are processed by another thread, which reads the
information off the disk (the data may be old, or new, so there is some
contention between threads 3/4), and sticks it onto the 'send' queue.

5) A final thread reads information from the send queue, and sends it
out to the requestors as BW is available.

Not only is this example not made up, it's very similar to a project I
completed over 2 years ago.  It's a bit more complicated than this, but
you get the general picture.

Not only did this application scale well (on Solaris), it also had very
few bottlenecks since we were able to minimize thread contention with
some clever data structures.

In our case, the # of packets sent/receive was the biggest bottleneck,
so the limit wasn't one of hardware (in terms of I/O bandwidth), but CPU
processing of the packets.

Adding more CPU's to the mix allowed us to create an application that
ran faster by throwing more CPU at it (if CPU was a bottleneck).  If CPU
wasn't a bottleneck, then the application had no scaling issues on
modern hardware.

>     If we were writing a kernel completely from scratch we could probably
>     construct it to allow these things, but trying to do it with the current
>     base is impossible -- you will never get something reliable or efficient
>     at the end of this road.

I believe that in the end, many parts of the system will be re-written,
or at least revamped to support multi-tasking to some degree.  Even with
serialized KSE's, there's still an issue of pre-emption, since multiple
processes may be accessing the same data structures (on different
CPU's).

>     Or perhaps I should phrase it:  The only way
>     you will get anything close to reliable will be to effectively revert
>     the system to the days of the single giant lock, because you will need
>     so many fraggin locks to deal with the consequences you might as well
>     have a single big giant lock.

I'm not so naive to suggest that it's going to be simple.  If it were
goign to be simple task, it would have been done already.  However, just
because it's difficult and time consuming doesn't mean it's not
worthwhile.


Nate

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Fri Apr 27 12:44: 8 2001
Delivered-To: freebsd-arch@freebsd.org
Received: from pcnet1.pcnet.com (pcnet1.pcnet.com [204.213.232.3])
	by hub.freebsd.org (Postfix) with ESMTP
	id 7B34937B424; Fri, 27 Apr 2001 12:44:04 -0700 (PDT)
	(envelope-from eischen@vigrid.com)
Received: (from eischen@localhost)
	by pcnet1.pcnet.com (8.8.7/PCNet) id PAA16949;
	Fri, 27 Apr 2001 15:43:25 -0400 (EDT)
Date: Fri, 27 Apr 2001 15:43:20 -0400 (EDT)
From: Daniel Eischen <eischen@vigrid.com>
To: "Daniel C. Sobral" <dcs@newsguy.com>
Cc: Julian Elischer <julian@elischer.org>, Arch@FreeBSD.ORG,
	alfred@FreeBSD.ORG, Robert Watson <rwatson@FreeBSD.ORG>
Subject: Re: KSE threading support (first parts)
In-Reply-To: <3AE9B93C.E8060911@newsguy.com>
Message-ID: <Pine.SUN.3.91.1010427151727.12501A-100000@pcnet1.pcnet.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

On Fri, 27 Apr 2001, Daniel C. Sobral wrote:
> Daniel Eischen wrote:
> > 
> > Right, like I've said before, it's easier just to combine the KSEG and KSE
> > into one entity and forget about fair scheduling.  Limit the number of
> > "combined KSE/KSEGs" to the number of processors (scope system threads
> > still get their own "combined KSE/KSEG" to satisfy POSIX).  The common
> > case is a single processor system anyways, and in that case it doesn't
> > make sense to have more than 1 KSE in a KSEG.
> 
> First and foremost, you must preserve the current behavior (because
> that's what I want, that's what we have atm, and it's there in POSIX),
> where a process quanta is a process quanta and that's that, no matter
> how many threads the process has.

You get that if you don't use pthread_setconcurrency() and don't
create any system scope threads.  This means you get 1 KSE in
1 KSEG.

> If you happen to want the "system" scope, which is also in POSIX, then
> each thread has a quanta of it's own.
> 
> Process scope: one KSEG, N KSE.

Not quite.  1 KSEG, 1 KSE by default.  You have to set the concurrency
level to get more than 1 KSE.

When you use pthread_setconcurrency() under Solaris, you get a LWP
for each concurrency level that Solaris grants to you.  Each LWP
under Solaris gets its own quantum.  With the proposed implementation
for FreeBSD, pthread_setconcurrency would give you multiple KSEs
(limited to CPUs) within the same KSEG, but these KSEs wouldn't
give you additional quantum.

I'd rather see us emulate Solaris if possible, but whatever.

> System scope: N KSEG, 1 KSE per KSEG.

Right.  And for Process _and_ System scope: S+1 KSEG, S+1 KSE, where
S=number of system scope threads.

> > In a multiprocessor system, if you have an extra quantum, who really
> > cares?  If someone really does care, then it can be a kernel tunable
> > or resource limit.
> > 
> > I just don't see any benefit for the added complexity.  It certainly
> > is easier for the UTS with a combined KSE/KSEG.
> 
> Yes, but I want to see you do process scope and system scope without it.

Read what I've written again.  You don't need more than 1 KSE within
the _same_ KSEG to do this.  You can have as many process scope
threads as you want with one KSE in the main process' KSEG.  If you
want to additionally create system scope threads, then no problem;
each system scope thread gets its own KSEG/KSE pair. 

-- 
Dan Eischen

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Fri Apr 27 12:50:55 2001
Delivered-To: freebsd-arch@freebsd.org
Received: from pcnet1.pcnet.com (pcnet1.pcnet.com [204.213.232.3])
	by hub.freebsd.org (Postfix) with ESMTP id B5EC737B423
	for <Arch@FreeBSD.ORG>; Fri, 27 Apr 2001 12:50:53 -0700 (PDT)
	(envelope-from eischen@vigrid.com)
Received: (from eischen@localhost)
	by pcnet1.pcnet.com (8.8.7/PCNet) id PAA18118;
	Fri, 27 Apr 2001 15:50:13 -0400 (EDT)
Date: Fri, 27 Apr 2001 15:50:11 -0400 (EDT)
From: Daniel Eischen <eischen@vigrid.com>
To: Nate Williams <nate@yogotech.com>
Cc: Matt Dillon <dillon@earth.backplane.com>,
	Julian Elischer <julian@elischer.org>, Arch@FreeBSD.ORG
Subject: Re: KSE threading support (first parts)
In-Reply-To: <15081.50170.297579.938254@nomad.yogotech.com>
Message-ID: <Pine.SUN.3.91.1010427154434.12501B-100000@pcnet1.pcnet.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

On Fri, 27 Apr 2001, Nate Williams wrote:
> >     Well, that's complete bullshit.  KSE's are extremely short-running
> >     affairs in kernel mode, especially when you consider the most likely
> >     asynchronizing case (a simple blocking situation that will most commonly
> >     be in a read() or write()).
> 
> Not necessarily.  My experience with developing and running applications
> on Solaris says that having multiple KSE's/process is a *huge* win.

You do know that the proposed implementation isn't quite like
Solaris (KSEs don't get their own quantum).  You better holler
if you want it ;-)

-- 
Dan Eischen

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Fri Apr 27 12:59:26 2001
Delivered-To: freebsd-arch@freebsd.org
Received: from ns.yogotech.com (ns.yogotech.com [206.127.123.66])
	by hub.freebsd.org (Postfix) with ESMTP id C556737B422
	for <Arch@FreeBSD.ORG>; Fri, 27 Apr 2001 12:59:23 -0700 (PDT)
	(envelope-from nate@yogotech.com)
Received: from nomad.yogotech.com (nomad.yogotech.com [206.127.123.131])
	by ns.yogotech.com (8.9.3/8.9.3) with ESMTP id NAA24634;
	Fri, 27 Apr 2001 13:58:54 -0600 (MDT)
	(envelope-from nate@nomad.yogotech.com)
Received: (from nate@localhost)
	by nomad.yogotech.com (8.8.8/8.8.8) id NAA19479;
	Fri, 27 Apr 2001 13:58:53 -0600 (MDT)
	(envelope-from nate)
From: Nate Williams <nate@yogotech.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Message-ID: <15081.53117.150505.145701@nomad.yogotech.com>
Date: Fri, 27 Apr 2001 13:58:53 -0600 (MDT)
To: Daniel Eischen <eischen@vigrid.com>
Cc: Nate Williams <nate@yogotech.com>,
	Matt Dillon <dillon@earth.backplane.com>,
	Julian Elischer <julian@elischer.org>, Arch@FreeBSD.ORG
Subject: Re: KSE threading support (first parts)
In-Reply-To: <Pine.SUN.3.91.1010427154434.12501B-100000@pcnet1.pcnet.com>
References: <15081.50170.297579.938254@nomad.yogotech.com>
	<Pine.SUN.3.91.1010427154434.12501B-100000@pcnet1.pcnet.com>
X-Mailer: VM 6.75 under 21.1 (patch 12) "Channel Islands" XEmacs Lucid
Reply-To: nate@yogotech.com (Nate Williams)
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

> > >     Well, that's complete bullshit.  KSE's are extremely short-running
> > >     affairs in kernel mode, especially when you consider the most likely
> > >     asynchronizing case (a simple blocking situation that will most commonly
> > >     be in a read() or write()).
> > 
> > Not necessarily.  My experience with developing and running applications
> > on Solaris says that having multiple KSE's/process is a *huge* win.
> 
> You do know that the proposed implementation isn't quite like
> Solaris (KSEs don't get their own quantum).  You better holler
> if you want it ;-)

I'm not sure how much a difference that makes, but to be honest, I
haven't thought about the consequences of it much. :(


Nate

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Fri Apr 27 13: 8:35 2001
Delivered-To: freebsd-arch@freebsd.org
Received: from fw.wintelcom.net (ns1.wintelcom.net [209.1.153.20])
	by hub.freebsd.org (Postfix) with ESMTP id 58ECC37B422
	for <Arch@FreeBSD.ORG>; Fri, 27 Apr 2001 13:08:33 -0700 (PDT)
	(envelope-from bright@fw.wintelcom.net)
Received: (from bright@localhost)
	by fw.wintelcom.net (8.10.0/8.10.0) id f3RK8Qd06823;
	Fri, 27 Apr 2001 13:08:26 -0700 (PDT)
Date: Fri, 27 Apr 2001 13:08:26 -0700
From: Alfred Perlstein <bright@wintelcom.net>
To: Daniel Eischen <eischen@vigrid.com>
Cc: Nate Williams <nate@yogotech.com>,
	Matt Dillon <dillon@earth.backplane.com>,
	Julian Elischer <julian@elischer.org>, Arch@FreeBSD.ORG
Subject: Re: KSE threading support (first parts)
Message-ID: <20010427130826.G18676@fw.wintelcom.net>
References: <15081.50170.297579.938254@nomad.yogotech.com> <Pine.SUN.3.91.1010427154434.12501B-100000@pcnet1.pcnet.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.2.5i
In-Reply-To: <Pine.SUN.3.91.1010427154434.12501B-100000@pcnet1.pcnet.com>; from eischen@vigrid.com on Fri, Apr 27, 2001 at 03:50:11PM -0400
X-all-your-base: are belong to us.
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

* Daniel Eischen <eischen@vigrid.com> [010427 12:50] wrote:
> On Fri, 27 Apr 2001, Nate Williams wrote:
> > >     Well, that's complete bullshit.  KSE's are extremely short-running
> > >     affairs in kernel mode, especially when you consider the most likely
> > >     asynchronizing case (a simple blocking situation that will most commonly
> > >     be in a read() or write()).
> > 
> > Not necessarily.  My experience with developing and running applications
> > on Solaris says that having multiple KSE's/process is a *huge* win.
> 
> You do know that the proposed implementation isn't quite like
> Solaris (KSEs don't get their own quantum).  You better holler
> if you want it ;-)

There's two things on the issue that I'd like to bring up.

   The concepts are cool, however the implementation you guys are
   discussion really hurt my head, not in a bad way, but conceptually
   the concepts look quite daunting.  Kudos if you guys get it done
   though!

   Being able to have threads used in a "this application wants to
   utilize _all_ available system reasources" meaning if you have
   more than one processor, I want to see mysql, apache, whatever
   using it (by default!).  If your model doesn't include this then
   please don't bother continuing, the stability issues versus the
   gain don't work for me at all.  Sorry, correctness is sort of
   out of style nowadays especially since every other OS allows
   this and totes the performance gains of thier system.

-- 
-Alfred Perlstein - [alfred@freebsd.org]
Represent yourself, show up at BABUG http://www.babug.org/

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Fri Apr 27 13:14:12 2001
Delivered-To: freebsd-arch@freebsd.org
Received: from ns.yogotech.com (ns.yogotech.com [206.127.123.66])
	by hub.freebsd.org (Postfix) with ESMTP id 325A837B423
	for <Arch@FreeBSD.ORG>; Fri, 27 Apr 2001 13:14:00 -0700 (PDT)
	(envelope-from nate@yogotech.com)
Received: from nomad.yogotech.com (nomad.yogotech.com [206.127.123.131])
	by ns.yogotech.com (8.9.3/8.9.3) with ESMTP id OAA24832;
	Fri, 27 Apr 2001 14:10:40 -0600 (MDT)
	(envelope-from nate@nomad.yogotech.com)
Received: (from nate@localhost)
	by nomad.yogotech.com (8.8.8/8.8.8) id OAA19524;
	Fri, 27 Apr 2001 14:10:38 -0600 (MDT)
	(envelope-from nate)
From: Nate Williams <nate@yogotech.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Message-ID: <15081.53821.755743.746621@nomad.yogotech.com>
Date: Fri, 27 Apr 2001 14:10:37 -0600 (MDT)
To: Alfred Perlstein <bright@wintelcom.net>
Cc: Daniel Eischen <eischen@vigrid.com>,
	Nate Williams <nate@yogotech.com>,
	Matt Dillon <dillon@earth.backplane.com>,
	Julian Elischer <julian@elischer.org>, Arch@FreeBSD.ORG
Subject: Re: KSE threading support (first parts)
In-Reply-To: <20010427130826.G18676@fw.wintelcom.net>
References: <15081.50170.297579.938254@nomad.yogotech.com>
	<Pine.SUN.3.91.1010427154434.12501B-100000@pcnet1.pcnet.com>
	<20010427130826.G18676@fw.wintelcom.net>
X-Mailer: VM 6.75 under 21.1 (patch 12) "Channel Islands" XEmacs Lucid
Reply-To: nate@yogotech.com (Nate Williams)
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

> > > >     Well, that's complete bullshit.  KSE's are extremely short-running
> > > >     affairs in kernel mode, especially when you consider the most likely
> > > >     asynchronizing case (a simple blocking situation that will most commonly
> > > >     be in a read() or write()).
> > > 
> > > Not necessarily.  My experience with developing and running applications
> > > on Solaris says that having multiple KSE's/process is a *huge* win.
> > 
> > You do know that the proposed implementation isn't quite like
> > Solaris (KSEs don't get their own quantum).  You better holler
> > if you want it ;-)
> 
> There's two things on the issue that I'd like to bring up.
> 
>    The concepts are cool, however the implementation you guys are
>    discussion really hurt my head, not in a bad way, but conceptually
>    the concepts look quite daunting.  Kudos if you guys get it done
>    though!
> 
>    Being able to have threads used in a "this application wants to
>    utilize _all_ available system reasources" meaning if you have
>    more than one processor, I want to see mysql, apache, whatever
>    using it (by default!).  If your model doesn't include this then
>    please don't bother continuing, the stability issues versus the
>    gain don't work for me at all.

Having 'serialized' KSE's (which Matt wants) means that an application
will be *UNABLE* to use all of the system resources, because only one
thread in threaded application (apache, mysql, etc..) is allowed to run
at one time, no matter how many CPU's are there.


Nate

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Fri Apr 27 13:35:19 2001
Delivered-To: freebsd-arch@freebsd.org
Received: from fw.wintelcom.net (ns1.wintelcom.net [209.1.153.20])
	by hub.freebsd.org (Postfix) with ESMTP id 8602A37B422
	for <Arch@FreeBSD.ORG>; Fri, 27 Apr 2001 13:35:16 -0700 (PDT)
	(envelope-from bright@fw.wintelcom.net)
Received: (from bright@localhost)
	by fw.wintelcom.net (8.10.0/8.10.0) id f3RKYXo07478;
	Fri, 27 Apr 2001 13:34:33 -0700 (PDT)
Date: Fri, 27 Apr 2001 13:34:33 -0700
From: Alfred Perlstein <bright@wintelcom.net>
To: Nate Williams <nate@yogotech.com>
Cc: Daniel Eischen <eischen@vigrid.com>,
	Matt Dillon <dillon@earth.backplane.com>,
	Julian Elischer <julian@elischer.org>, Arch@FreeBSD.ORG
Subject: Re: KSE threading support (first parts)
Message-ID: <20010427133433.H18676@fw.wintelcom.net>
References: <15081.50170.297579.938254@nomad.yogotech.com> <Pine.SUN.3.91.1010427154434.12501B-100000@pcnet1.pcnet.com> <20010427130826.G18676@fw.wintelcom.net> <15081.53821.755743.746621@nomad.yogotech.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.2.5i
In-Reply-To: <15081.53821.755743.746621@nomad.yogotech.com>; from nate@yogotech.com on Fri, Apr 27, 2001 at 02:10:37PM -0600
X-all-your-base: are belong to us.
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

* Nate Williams <nate@yogotech.com> [010427 13:14] wrote:
> > > > >     Well, that's complete bullshit.  KSE's are extremely short-running
> > > > >     affairs in kernel mode, especially when you consider the most likely
> > > > >     asynchronizing case (a simple blocking situation that will most commonly
> > > > >     be in a read() or write()).
> > > > 
> > > > Not necessarily.  My experience with developing and running applications
> > > > on Solaris says that having multiple KSE's/process is a *huge* win.
> > > 
> > > You do know that the proposed implementation isn't quite like
> > > Solaris (KSEs don't get their own quantum).  You better holler
> > > if you want it ;-)
> > 
> > There's two things on the issue that I'd like to bring up.
> > 
> >    The concepts are cool, however the implementation you guys are
> >    discussion really hurt my head, not in a bad way, but conceptually
> >    the concepts look quite daunting.  Kudos if you guys get it done
> >    though!
> > 
> >    Being able to have threads used in a "this application wants to
> >    utilize _all_ available system reasources" meaning if you have
> >    more than one processor, I want to see mysql, apache, whatever
> >    using it (by default!).  If your model doesn't include this then
> >    please don't bother continuing, the stability issues versus the
> >    gain don't work for me at all.
> 
> Having 'serialized' KSE's (which Matt wants) means that an application
> will be *UNABLE* to use all of the system resources, because only one
> thread in threaded application (apache, mysql, etc..) is allowed to run
> at one time, no matter how many CPU's are there.

It doesn't seem like that's what Daniel is saying, which is that
the default will be like this, but that applications or the startup
code will have the choice.

However that's true then we might as well scrap the project, it
just brings the complexity out of userland and into the kernel,
sure we can schedule IO better, but then we might as well cop out
and use aio and some special signal system for handling faults back
into the uts.  It's just a lot simpler to go with rfork threads or
a simpler model than all this complexity just to satisfy Terry's
view of who should get what quantum.  Honestly if you ask anyone
they expect to be able to cheat with threads the same way they
cheat by using multiple processes to gain additional CPU.

-- 
-Alfred Perlstein - [alfred@freebsd.org]
Daemon News Magazine in your snail-mail! http://magazine.daemonnews.org/

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Fri Apr 27 15:56: 0 2001
Delivered-To: freebsd-arch@freebsd.org
Received: from smtp10.phx.gblx.net (smtp10.phx.gblx.net [206.165.6.140])
	by hub.freebsd.org (Postfix) with ESMTP id 4EA9B37B422
	for <arch@freebsd.org>; Fri, 27 Apr 2001 15:55:55 -0700 (PDT)
	(envelope-from tlambert@usr02.primenet.com)
Received: (from daemon@localhost)
	by smtp10.phx.gblx.net (8.9.3/8.9.3) id PAA99626;
	Fri, 27 Apr 2001 15:55:54 -0700
Received: from usr02.primenet.com(206.165.6.202)
 via SMTP by smtp10.phx.gblx.net, id smtpd0vTNEa; Fri Apr 27 15:55:45 2001
Received: (from tlambert@localhost)
	by usr02.primenet.com (8.8.5/8.8.5) id QAA13810;
	Fri, 27 Apr 2001 16:06:01 -0700 (MST)
From: Terry Lambert <tlambert@primenet.com>
Message-Id: <200104272306.QAA13810@usr02.primenet.com>
Subject: Re: KSE threading support (first parts)
To: arch@freebsd.org, terry@lambert.org
Date: Fri, 27 Apr 2001 23:06:01 +0000 (GMT)
X-Mailer: ELM [version 2.5 PL2]
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

Alfred Perlstein wrote:
] It doesn't seem like that's what Daniel is saying, which is that
] the default will be like this, but that applications or the startup
] code will have the choice.
] 
] However that's true then we might as well scrap the project, it
] just brings the complexity out of userland and into the kernel,
] sure we can schedule IO better, but then we might as well cop out
] and use aio and some special signal system for handling faults back
] into the uts.  It's just a lot simpler to go with rfork threads or
] a simpler model than all this complexity just to satisfy Terry's
] view of who should get what quantum.  Honestly if you ask anyone
] they expect to be able to cheat with threads the same way they
] cheat by using multiple processes to gain additional CPU.

I personally do not give a flying " " if processes cheat in order
to compete unfairly for quantum; that's an administrative issue,
and I think FreeBSD already has far too many arbitrary limits to
"protect" the user from doing useful work^W^W^Whurting themselves.

My main emphasis has always been:

o	Minimize context switch overhead

o	Reduce the overal scheduler complexity; in particular,
	IMO, it is nearly impossible to get thread group
	affinity to work correctly by hacking up a scheduler,
	without resulting in starvation for other processes,
	so The Scheduler Is Not The Place To Do Affinity.

o	I would like to see SMP scalability


The approach which Julian touts tends to fail to achieve SMP
scaling, unless you explicitly ask for it.  In Julian's model,
there is a KSEG per CPU on which you want to be able to run,
and all KSEs live in the context of a KSEG.  In the Julian and
Archie model, KSEs do _not_ move between KSEGs, without an Act
Of God.

I think this model is wrong.  I would completely discard the
concept of a KSEG, entirely.  In place of that "easy CPU
affinity" model, which is what it is intended to gurantee, I
would create per-CPU run queues.  I would control affinity and
load balancing through the decision to migrate or to not migrate
a particular KSE between CPUs.

In other words, the complexity of the model which Jason Evans
arrived at from the Big Threads Design Meeting in Foster City
that about 120 of us attended, is optimal for achieving my
design goals, without going to ascyn call gates.

IMO, going to async call gates could result in as much as an
additional 25% improvement in performance.  Unfortunately, it
would also mean a lot of extra overhead to ensure binary
backward compatability, since it would put all of the standard
POSIX semantics into the libraries, and the default behaviour
would be completely asynchronous.  Binary compatability would
mean using a different INT than INT 0x80 for system calls, and
putting backward compatability cruft into a module to support
any binary which some moron decided to link static because they
thought that linking it static makes it somehow safer from single
file damage failure than using libc.so and ld.so.

My ideal implementation would use async call gates.  In effect,
this would be the same as implementing VMS ASTs in all of FreeBSD.


That all said, the current project, as it was envisioned by Jason
Evans, does not have the limitations which you and Nate fear,
unless you cop out on the implementation, and do what Julian and
Archie wanted with KSEG non-migration of KSEs (and the concommitant
single scheduler run queue for all CPUs), or if you take Matt's
approach, and serialize execution everywhere, not just within a
particular KSEG (Matt's approach prevents the need for a single
process to be able to exist on a run queue as more than one entry
instance, which makes some things easier).

I would call both the Matt approach, and the Julian/Archie approach
"overly conservative"; they are both in excess of 12 years behind
the state of the Art.  But I would also level the same criticism
at using the 6 year old technology of scheduler activations to
avoid going to true async call gates.


In any case, you and Nate are getting upset at shortcuts that
people want to take in implementation, not at the design itself.

Cut it out.


					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Fri Apr 27 16: 6:12 2001
Delivered-To: freebsd-arch@freebsd.org
Received: from fw.wintelcom.net (ns1.wintelcom.net [209.1.153.20])
	by hub.freebsd.org (Postfix) with ESMTP id E34A437B422
	for <arch@FreeBSD.ORG>; Fri, 27 Apr 2001 16:06:08 -0700 (PDT)
	(envelope-from bright@fw.wintelcom.net)
Received: (from bright@localhost)
	by fw.wintelcom.net (8.10.0/8.10.0) id f3RN67x11637;
	Fri, 27 Apr 2001 16:06:07 -0700 (PDT)
Date: Fri, 27 Apr 2001 16:06:07 -0700
From: Alfred Perlstein <bright@wintelcom.net>
To: Terry Lambert <tlambert@primenet.com>
Cc: arch@FreeBSD.ORG, terry@lambert.org
Subject: Re: KSE threading support (first parts)
Message-ID: <20010427160607.M18676@fw.wintelcom.net>
References: <200104272306.QAA13810@usr02.primenet.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.2.5i
In-Reply-To: <200104272306.QAA13810@usr02.primenet.com>; from tlambert@primenet.com on Fri, Apr 27, 2001 at 11:06:01PM +0000
X-all-your-base: are belong to us.
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

* Terry Lambert <tlambert@primenet.com> [010427 15:56] wrote:
> 
> In other words, the complexity of the model which Jason Evans
> arrived at from the Big Threads Design Meeting in Foster City
> that about 120 of us attended, is optimal for achieving my
> design goals, without going to ascyn call gates.

The way I envision async call gates is something like each syscall
could borrow a spare pcb and do an rfork back into the user
application using the borrowed pcb and allowing the syscall to
proceed as scheduled as another kernel thread, upon return it would
somehow notify the process of completion.

> IMO, going to async call gates could result in as much as an
> additional 25% improvement in performance.  Unfortunately, it
> would also mean a lot of extra overhead to ensure binary
> backward compatability, since it would put all of the standard
> POSIX semantics into the libraries, and the default behaviour
> would be completely asynchronous.  Binary compatability would
> mean using a different INT than INT 0x80 for system calls, and
> putting backward compatability cruft into a module to support
> any binary which some moron decided to link static because they
> thought that linking it static makes it somehow safer from single
> file damage failure than using libc.so and ld.so.
> 
> My ideal implementation would use async call gates.  In effect,
> this would be the same as implementing VMS ASTs in all of FreeBSD.

Actually, why not just have a syscall that turns on the async
behavior?

> That all said, the current project, as it was envisioned by Jason
> Evans, does not have the limitations which you and Nate fear,
> unless you cop out on the implementation, and do what Julian and
> Archie wanted with KSEG non-migration of KSEs (and the concommitant
> single scheduler run queue for all CPUs), or if you take Matt's
> approach, and serialize execution everywhere, not just within a
> particular KSEG (Matt's approach prevents the need for a single
> process to be able to exist on a run queue as more than one entry
> instance, which makes some things easier).
> 
> I would call both the Matt approach, and the Julian/Archie approach
> "overly conservative"; they are both in excess of 12 years behind
> the state of the Art.  But I would also level the same criticism
> at using the 6 year old technology of scheduler activations to
> avoid going to true async call gates.
> 
> 
> In any case, you and Nate are getting upset at shortcuts that
> people want to take in implementation, not at the design itself.
> 
> Cut it out.

Well if we have an implementation where the implementators are
unwilling or incapable (because of time constraints, or getting
hit by a bus, etc) of doing the more optimized version then what's
the point besideds getting more IO concurrancy?  I don't know, it
just that if someone has a terrific idea that seems to have astounding
complexity and they don't feel like they want to or can take the
final step with it, then it really should not be considered.

btw, I've read some on scheduler activations, where some references
on async call gates?

-- 
-Alfred Perlstein - [alfred@freebsd.org]
Represent yourself, show up at BABUG http://www.babug.org/

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Fri Apr 27 18:28: 9 2001
Delivered-To: freebsd-arch@freebsd.org
Received: from mail.viasoft.com.cn (unknown [61.153.1.177])
	by hub.freebsd.org (Postfix) with ESMTP
	id 777DC37B422; Fri, 27 Apr 2001 18:28:01 -0700 (PDT)
	(envelope-from bsddiy@163.net)
Received: from xyf ([192.168.1.204])
	by mail.viasoft.com.cn (8.9.3/8.9.3) with ESMTP id JAA02080;
	Sat, 28 Apr 2001 09:24:43 +0800
Message-ID: <001e01c0cf82$ac300460$cc01a8c0@xyf>
From: "David Xu" <bsddiy@163.net>
To: "Matt Dillon" <dillon@earth.backplane.com>,
	"Nate Williams" <nate@yogotech.com>
Cc: "David O'Brien" <obrien@FreeBSD.ORG>,
	"Julian Elischer" <julian@elischer.org>, <Arch@FreeBSD.ORG>,
	"Daniel Eischen" <eischen@vigrid.com>
References: <3AE71067.FF4BD029@elischer.org><20010425110940.L1790@fw.wintelcom.net><3AE85776.92D6BD90@elischer.org><20010426120630.A92915@dragon.nuxi.com><200104270015.f3R0FAi62512@earth.backplane.com><15081.39397.944224.776391@nomad.yogotech.com><200104271701.f3RH1Tk05185@earth.backplane.com> <15081.42735.860662.876478@nomad.yogotech.com> <200104271717.f3RHHGp05457@earth.backplane.com>
Subject: Re: KSE threading support (first parts)
Date: Sat, 28 Apr 2001 09:28:31 +0800
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 5.00.2615.200
X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2615.200
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG


----- Original Message -----
From: Matt Dillon <dillon@earth.backplane.com>
To: Nate Williams <nate@yogotech.com>
Cc: David O'Brien <obrien@FreeBSD.ORG>; Julian Elischer
<julian@elischer.org>; <Arch@FreeBSD.ORG>; Daniel Eischen
<eischen@vigrid.com>
Sent: Saturday, April 28, 2001 1:17 AM
Subject: Re: KSE threading support (first parts)


>
>     You seem to believe that not being able to run KSE's for the same
>     process concurrently somehow kills the whole concept of SMP.
>
>     Well, that's complete bullshit.  KSE's are extremely short-running
>     affairs in kernel mode, especially when you consider the most likely
>     asynchronizing case (a simple blocking situation that will most
commonly
>     be in a read() or write()).  Serializing them within the context of a
>     single process will actually *IMPROVE* SMP performance, not make it
worse.

No, most multi-threaded programs use threads to improve concurrent I/O,
think about MySQL, why our version is slower than their Linux version?
because
we serialize read()/write() in a multi-threaded program, this is our
failure.


>     Running multiple kernel contexts for the same process on different
>     cpu's concurrently means that you must now lock every single aspect
>     of the 'current process' concept, and cannot make any assumptions
>     whatsoever in regards to accessing elements of the current process.
>     Well, that's just plain insane.  You will wind up with so many
fragging
>     locks and mutexes in the kernel that what performance gain you might
>     have thought you could get is now completely blown away by the locking
>     overhead.
>

Yes, it must be done! I believe FreeBSD-current already make proc structure
and
many other resources to support concurrent access. I don't think there will
have
many new lock-down should be made. a KSE is a scheduler unit, just like a
proc
in running queue, I don't see there are so many different on this concept.

>     This is another aspect of the problem you run into when you start
>     trying to preempt a process running in the kernel arbitrarily.
Suddenly
>     all the assumptions you were able to make before that resulted in
>     optimal code paths now must be thrown out the window and replaced with
>     a godaweful number of locks to protect kernel contexts from unexpected
>     interruptions.  That's insane as well.  You are introducing a
'solution'
>     to a problem that doesn't exist and breaking any chance we have of
>     getting a reliable kernel in anything less then a few years in the
process.
>
>     If we were writing a kernel completely from scratch we could probably
>     construct it to allow these things, but trying to do it with the
current
>     base is impossible -- you will never get something reliable or
efficient
>     at the end of this road.  Or perhaps I should phrase it:  The only way
>     you will get anything close to reliable will be to effectively revert
>     the system to the days of the single giant lock, because you will need
>     so many fraggin locks to deal with the consequences you might as well
>     have a single big giant lock.
>
> -Matt

a well designed multi-thread program can cleanly dispatch its internal tasks
to different threads, it will avoid collision on its internal resources.
BGL is a joke and bogus for SMP, don't talk about it.

Regards,
David Xu


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Fri Apr 27 18:37: 7 2001
Delivered-To: freebsd-arch@freebsd.org
Received: from mail.viasoft.com.cn (unknown [61.153.1.177])
	by hub.freebsd.org (Postfix) with ESMTP id 0599537B422
	for <Arch@FreeBSD.ORG>; Fri, 27 Apr 2001 18:37:02 -0700 (PDT)
	(envelope-from bsddiy@163.net)
Received: from xyf ([192.168.1.204])
	by mail.viasoft.com.cn (8.9.3/8.9.3) with ESMTP id JAA02128;
	Sat, 28 Apr 2001 09:34:37 +0800
Message-ID: <002c01c0cf83$f3d594a0$cc01a8c0@xyf>
From: "David Xu" <bsddiy@163.net>
To: "Alfred Perlstein" <bright@wintelcom.net>,
	"Nate Williams" <nate@yogotech.com>
Cc: "Daniel Eischen" <eischen@vigrid.com>,
	"Matt Dillon" <dillon@earth.backplane.com>,
	"Julian Elischer" <julian@elischer.org>, <Arch@FreeBSD.ORG>
References: <15081.50170.297579.938254@nomad.yogotech.com> <Pine.SUN.3.91.1010427154434.12501B-100000@pcnet1.pcnet.com> <20010427130826.G18676@fw.wintelcom.net> <15081.53821.755743.746621@nomad.yogotech.com> <20010427133433.H18676@fw.wintelcom.net>
Subject: Re: KSE threading support (first parts)
Date: Sat, 28 Apr 2001 09:38:29 +0800
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 5.00.2615.200
X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2615.200
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG


----- Original Message -----
From: Alfred Perlstein <bright@wintelcom.net>
To: Nate Williams <nate@yogotech.com>
Cc: Daniel Eischen <eischen@vigrid.com>; Matt Dillon
<dillon@earth.backplane.com>; Julian Elischer <julian@elischer.org>;
<Arch@FreeBSD.ORG>
Sent: Saturday, April 28, 2001 4:34 AM
Subject: Re: KSE threading support (first parts)


> * Nate Williams <nate@yogotech.com> [010427 13:14] wrote:
> > > > > >     Well, that's complete bullshit.  KSE's are extremely
short-running
> > > > > >     affairs in kernel mode, especially when you consider the
most likely
> > > > > >     asynchronizing case (a simple blocking situation that will
most commonly
> > > > > >     be in a read() or write()).
> > > > >
> > > > > Not necessarily.  My experience with developing and running
applications
> > > > > on Solaris says that having multiple KSE's/process is a *huge*
win.
> > > >
> > > > You do know that the proposed implementation isn't quite like
> > > > Solaris (KSEs don't get their own quantum).  You better holler
> > > > if you want it ;-)
> > >
> > > There's two things on the issue that I'd like to bring up.
> > >
> > >    The concepts are cool, however the implementation you guys are
> > >    discussion really hurt my head, not in a bad way, but conceptually
> > >    the concepts look quite daunting.  Kudos if you guys get it done
> > >    though!
> > >
> > >    Being able to have threads used in a "this application wants to
> > >    utilize _all_ available system reasources" meaning if you have
> > >    more than one processor, I want to see mysql, apache, whatever
> > >    using it (by default!).  If your model doesn't include this then
> > >    please don't bother continuing, the stability issues versus the
> > >    gain don't work for me at all.
> >
> > Having 'serialized' KSE's (which Matt wants) means that an application
> > will be *UNABLE* to use all of the system resources, because only one
> > thread in threaded application (apache, mysql, etc..) is allowed to run
> > at one time, no matter how many CPU's are there.
>
> It doesn't seem like that's what Daniel is saying, which is that
> the default will be like this, but that applications or the startup
> code will have the choice.
>
> However that's true then we might as well scrap the project, it
> just brings the complexity out of userland and into the kernel,
> sure we can schedule IO better, but then we might as well cop out
> and use aio and some special signal system for handling faults back
> into the uts.  It's just a lot simpler to go with rfork threads or
> a simpler model than all this complexity just to satisfy Terry's
> view of who should get what quantum.  Honestly if you ask anyone
> they expect to be able to cheat with threads the same way they
> cheat by using multiple processes to gain additional CPU.
>

so you ignore NxN and MxN thread model discuss,  and follow
fuck Linux thread model design. maybe I can not expect such
advance feature will be in  Free OS.
sigh, where is Jason Evans? we need your help.

David Xu


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Fri Apr 27 18:37:24 2001
Delivered-To: freebsd-arch@freebsd.org
Received: from dragon.nuxi.com (trang.nuxi.com [209.152.133.57])
	by hub.freebsd.org (Postfix) with ESMTP id 1E71D37B422
	for <Arch@FreeBSD.ORG>; Fri, 27 Apr 2001 18:37:23 -0700 (PDT)
	(envelope-from obrien@NUXI.com)
Received: (from obrien@localhost)
	by dragon.nuxi.com (8.11.3/8.11.1) id f3S1as627839;
	Fri, 27 Apr 2001 18:36:54 -0700 (PDT)
	(envelope-from obrien)
Date: Fri, 27 Apr 2001 18:36:53 -0700
From: "David O'Brien" <obrien@FreeBSD.ORG>
To: David Xu <bsddiy@163.net>
Cc: Arch@FreeBSD.ORG
Subject: Re: KSE threading support (first parts)
Message-ID: <20010427183653.A23824@dragon.nuxi.com>
Reply-To: obrien@FreeBSD.ORG
References: <3AE71067.FF4BD029@elischer.org><20010425110940.L1790@fw.wintelcom.net><3AE85776.92D6BD90@elischer.org><20010426120630.A92915@dragon.nuxi.com><200104270015.f3R0FAi62512@earth.backplane.com><15081.39397.944224.776391@nomad.yogotech.com><200104271701.f3RH1Tk05185@earth.backplane.com> <15081.42735.860662.876478@nomad.yogotech.com> <200104271717.f3RHHGp05457@earth.backplane.com> <001e01c0cf82$ac300460$cc01a8c0@xyf>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.2.5i
In-Reply-To: <001e01c0cf82$ac300460$cc01a8c0@xyf>; from bsddiy@163.net on Sat, Apr 28, 2001 at 09:28:31AM +0800
X-Operating-System: FreeBSD 5.0-CURRENT
Organization: The NUXI BSD group
X-Pgp-Rsa-Fingerprint: B7 4D 3E E9 11 39 5F A3  90 76 5D 69 58 D9 98 7A
X-Pgp-Rsa-Keyid: 1024/34F9F9D5
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

On Sat, Apr 28, 2001 at 09:28:31AM +0800, David Xu wrote:
> BGL is a joke and bogus for SMP, don't talk about it.

*sigh*.  It is certainly [one of] the slowest implementations.  But it
isn't a total joke.  If your process mix is say userland computationally
bound (say statistical simulations), FreeBSD 4's BGL is fine.
T D.XU DON'T TALK ABSOLUTES K PLZ THNX

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Fri Apr 27 19:21:20 2001
Delivered-To: freebsd-arch@freebsd.org
Received: from fw.wintelcom.net (ns1.wintelcom.net [209.1.153.20])
	by hub.freebsd.org (Postfix) with ESMTP id E521B37B422
	for <Arch@FreeBSD.ORG>; Fri, 27 Apr 2001 19:21:18 -0700 (PDT)
	(envelope-from bright@fw.wintelcom.net)
Received: (from bright@localhost)
	by fw.wintelcom.net (8.10.0/8.10.0) id f3S2L5q16113;
	Fri, 27 Apr 2001 19:21:05 -0700 (PDT)
Date: Fri, 27 Apr 2001 19:21:05 -0700
From: Alfred Perlstein <bright@wintelcom.net>
To: David Xu <bsddiy@163.net>
Cc: Nate Williams <nate@yogotech.com>,
	Daniel Eischen <eischen@vigrid.com>,
	Matt Dillon <dillon@earth.backplane.com>,
	Julian Elischer <julian@elischer.org>, Arch@FreeBSD.ORG
Subject: Re: KSE threading support (first parts)
Message-ID: <20010427192105.R18676@fw.wintelcom.net>
References: <15081.50170.297579.938254@nomad.yogotech.com> <Pine.SUN.3.91.1010427154434.12501B-100000@pcnet1.pcnet.com> <20010427130826.G18676@fw.wintelcom.net> <15081.53821.755743.746621@nomad.yogotech.com> <20010427133433.H18676@fw.wintelcom.net> <002c01c0cf83$f3d594a0$cc01a8c0@xyf>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.2.5i
In-Reply-To: <002c01c0cf83$f3d594a0$cc01a8c0@xyf>; from bsddiy@163.net on Sat, Apr 28, 2001 at 09:38:29AM +0800
X-all-your-base: are belong to us.
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

* David Xu <bsddiy@163.net> [010427 18:36] wrote:
> 
> so you ignore NxN and MxN thread model discuss,  and follow
> fuck Linux thread model design. maybe I can not expect such
> advance feature will be in  Free OS.
> sigh, where is Jason Evans? we need your help.

David, I'd appreciate you attempting to get a clue before responding
to any more of my email.

-- 
-Alfred Perlstein - [alfred@freebsd.org]
Instead of asking why a piece of software is using "1970s technology,"
start asking why software is ignoring 30 years of accumulated wisdom.

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Fri Apr 27 20:37:35 2001
Delivered-To: freebsd-arch@freebsd.org
Received: from smtp02.primenet.com (smtp02.primenet.com [206.165.6.132])
	by hub.freebsd.org (Postfix) with ESMTP id E75EC37B423
	for <arch@FreeBSD.ORG>; Fri, 27 Apr 2001 20:37:22 -0700 (PDT)
	(envelope-from tlambert@usr08.primenet.com)
Received: (from daemon@localhost)
	by smtp02.primenet.com (8.9.3/8.9.3) id UAA16292;
	Fri, 27 Apr 2001 20:29:59 -0700 (MST)
Received: from usr08.primenet.com(206.165.6.208)
 via SMTP by smtp02.primenet.com, id smtpdAAApKayZF; Fri Apr 27 20:29:53 2001
Received: (from tlambert@localhost)
	by usr08.primenet.com (8.8.5/8.8.5) id UAA29358;
	Fri, 27 Apr 2001 20:37:51 -0700 (MST)
From: Terry Lambert <tlambert@primenet.com>
Message-Id: <200104280337.UAA29358@usr08.primenet.com>
Subject: Re: KSE threading support (first parts)
To: bright@wintelcom.net (Alfred Perlstein)
Date: Sat, 28 Apr 2001 03:37:50 +0000 (GMT)
Cc: tlambert@primenet.com (Terry Lambert), arch@FreeBSD.ORG,
	terry@lambert.org
In-Reply-To: <20010427160607.M18676@fw.wintelcom.net> from "Alfred Perlstein" at Apr 27, 2001 04:06:07 PM
X-Mailer: ELM [version 2.5 PL2]
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

> The way I envision async call gates is something like each syscall
> could borrow a spare pcb and do an rfork back into the user
> application using the borrowed pcb and allowing the syscall to
> proceed as scheduled as another kernel thread, upon return it would
> somehow notify the process of completion.

Close.  Effectively, it uses the minimal amount of call context
it can get away with, and points the VM space and other stuff
back to the process control block, which is shared among all
system calls.

Some calls, which will never block, immediately return without
grabbing a context... they just return the status into the
per call status block in user space, as if they had completed
asynchronously.

Other calls, which may block, run to the point where they would
block, and allocate a context then, and return.  If they don't
end up blocking, they return like a non-blocking call.

Calls which will always block, or which may block, and get to
the point where a return would be too complicated, allocate a
context and return.

The context is used by the kernel to continue processing.  It
contains the address of the user space status block, as well as
a copy of the stack of the returning program (think of the one
that continues as a "setmp", with the one doing the return as
getting a longjmp, where the code it would have run is skipped).

The final part is that the context runs to completion at the
user space boundary; since the call has already returned, it does
not return to user space, instead it stops at the user/kernel
boundary, after copying out the completion status into the
user space status block.

The status block is a simplified version of the aioread/aiowrite
status block.

A program can just use these calls directly.  They can also set a
flag to make the call synchornous (as in an aiowait).  Finally, a
user space threads scheduler can use completion notifications to
make scheduling decisions.

FFor SMP, you can state that you have the ability to return into
user space (e.g. similar to vfork/sfork) multiple times.  Each
of these represents a "scheduler reservation", where you reserve
the right to compete for a quanta.

You can also easily implement negafinity for up to 32 processors
with three 32 bit unsigned int's in the process block: just don't
reserve on a processor where the bit is already set, until you
have reserved on all available processors at least once.


> > My ideal implementation would use async call gates.  In effect,
> > this would be the same as implementing VMS ASTs in all of FreeBSD.
> 
> Actually, why not just have a syscall that turns on the async
> behavior?

Libc will break.  It does not expect to have to reap completed
system call status blocks to report completion status to the user
program.


> > In any case, you and Nate are getting upset at shortcuts that
> > people want to take in implementation, not at the design itself.
> > 
> > Cut it out.
> 
> Well if we have an implementation where the implementators are
> unwilling or incapable (because of time constraints, or getting
> hit by a bus, etc) of doing the more optimized version then what's
> the point besideds getting more IO concurrancy?  I don't know, it
> just that if someone has a terrific idea that seems to have astounding
> complexity and they don't feel like they want to or can take the
> final step with it, then it really should not be considered.

The point of threads was to reduce context switch overhead, and
to increase the useful work that actually gets done in any given
time period, as opposed to spending cycles on system overhead or
spinning waiting for a call to complete when you have other, better
work to do.

Somewhere along the way, it became corrupted into a tool to allow
people without very much clue to write programs one-per-connection,
instead of building finite state automata, and that corruption has
proceeded, until now it's a tool to get SMP scalability.


> btw, I've read some on scheduler activations, where some references
> on async call gates?

You're talking to the originator of the idea.  See the -arch archives.


					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Fri Apr 27 22:43:37 2001
Delivered-To: freebsd-arch@freebsd.org
Received: from filk.iinet.net.au (syncopation-dns.iinet.net.au [203.59.24.29])
	by hub.freebsd.org (Postfix) with SMTP id 990D937B422
	for <Arch@FreeBSD.ORG>; Fri, 27 Apr 2001 22:43:33 -0700 (PDT)
	(envelope-from julian@elischer.org)
Received: (qmail 20288 invoked by uid 666); 28 Apr 2001 05:46:46 -0000
Received: from i194-025.nv.iinet.net.au (HELO elischer.org) (203.59.194.25)
  by mail.m.iinet.net.au with SMTP; 28 Apr 2001 05:46:46 -0000
Message-ID: <3AEA5845.D3377794@elischer.org>
Date: Fri, 27 Apr 2001 22:42:29 -0700
From: Julian Elischer <julian@elischer.org>
X-Mailer: Mozilla 4.7 [en] (X11; U; FreeBSD 5.0-CURRENT i386)
X-Accept-Language: en, hu
MIME-Version: 1.0
To: Nate Williams <nate@yogotech.com>
Cc: Daniel Eischen <eischen@vigrid.com>,
	Matt Dillon <dillon@earth.backplane.com>, Arch@FreeBSD.ORG
Subject: Re: KSE threading support (first parts)
References: <15081.50170.297579.938254@nomad.yogotech.com>
		<Pine.SUN.3.91.1010427154434.12501B-100000@pcnet1.pcnet.com> <15081.53117.150505.145701@nomad.yogotech.com>
Content-Type: text/plain; charset=iso-8859-2
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

Nate Williams wrote:
> 
> > > >     Well, that's complete bullshit.  KSE's are extremely short-running
> > > >     affairs in kernel mode, especially when you consider the most likely
> > > >     asynchronizing case (a simple blocking situation that will most commonly
> > > >     be in a read() or write()).
> > >
> > > Not necessarily.  My experience with developing and running applications
> > > on Solaris says that having multiple KSE's/process is a *huge* win.
> >
> > You do know that the proposed implementation isn't quite like
> > Solaris (KSEs don't get their own quantum).  You better holler
> > if you want it ;-)
> 
> I'm not sure how much a difference that makes, but to be honest, I
> haven't thought about the consequences of it much. :(
> 
> Nate

If you implementN LWPs as  N KSEGs with a KSE each, they do get 
their own quanta so  it can be arranged to do it either way.


-- 
      __--_|\  Julian Elischer
     /       \ julian@elischer.org
    (   OZ    ) World tour 2000-2001
---> X_.---._/  
            v

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Fri Apr 27 23: 6:54 2001
Delivered-To: freebsd-arch@freebsd.org
Received: from filk.iinet.net.au (syncopation-dns.iinet.net.au [203.59.24.29])
	by hub.freebsd.org (Postfix) with SMTP id 35AEA37B423
	for <arch@FreeBSD.ORG>; Fri, 27 Apr 2001 23:06:48 -0700 (PDT)
	(envelope-from julian@elischer.org)
Received: (qmail 20337 invoked by uid 666); 28 Apr 2001 06:10:00 -0000
Received: from i194-025.nv.iinet.net.au (HELO elischer.org) (203.59.194.25)
  by mail.m.iinet.net.au with SMTP; 28 Apr 2001 06:10:00 -0000
Message-ID: <3AEA5DB7.C9209955@elischer.org>
Date: Fri, 27 Apr 2001 23:05:43 -0700
From: Julian Elischer <julian@elischer.org>
X-Mailer: Mozilla 4.7 [en] (X11; U; FreeBSD 5.0-CURRENT i386)
X-Accept-Language: en, hu
MIME-Version: 1.0
To: Alfred Perlstein <bright@wintelcom.net>
Cc: Terry Lambert <tlambert@primenet.com>, arch@FreeBSD.ORG,
	terry@lambert.org
Subject: Re: KSE threading support (first parts)
References: <200104272306.QAA13810@usr02.primenet.com> <20010427160607.M18676@fw.wintelcom.net>
Content-Type: text/plain; charset=iso-8859-2
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

Alfred Perlstein wrote:
> 
> * Terry Lambert <tlambert@primenet.com> [010427 15:56] wrote:
> >
> > In other words, the complexity of the model which Jason Evans
> > arrived at from the Big Threads Design Meeting in Foster City
> > that about 120 of us attended, is optimal for achieving my
> > design goals, without going to ascyn call gates.
> 
> The way I envision async call gates is something like each syscall
> could borrow a spare pcb and do an rfork back into the user
> application using the borrowed pcb and allowing the syscall to
> proceed as scheduled as another kernel thread, upon return it would
> somehow notify the process of completion.


well if that's async call-gates than we are doing Async call-gates..
it describes how a KSE system works..

here's how I see it.

KSE does syscall.
KSE blocks in syscall.
KSE saves state into a KSEC.
KSE returns with an upcall to the UTS.
UTS schedules another thread.
[time passes]
Quantum is exhausted. KSE is pre-empted. Current user stack is munged 
to look like it did a yield() (unless in critical region)
[time passes]
Interrupt signals completion of original IO (interrupt level part).
Associated KSEC is hung off the KSEG as 'runnable' (And KSEG put on 
run queue if not already there).
[time passes]
At next kernel scheduling event where that KSEG is made 'current', 
KSEC is loaded, syscall is completed and results writen
back to process space. User stack is munged to look like the process
did a yield() just after returning from syscall.
After all runnable KSECs have been run to this state,
an upcall is made to UTS reporting all threads that completed syscalls
  since last report.
UTS schedules either suspended original thread or pre-empted thread.

> 
> > IMO, going to async call gates could result in as much as an
> > additional 25% improvement in performance.  Unfortunately, it
> > would also mean a lot of extra overhead to ensure binary
> > backward compatability, since it would put all of the standard
> > POSIX semantics into the libraries, and the default behaviour
> > would be completely asynchronous.  Binary compatability would
> > mean using a different INT than INT 0x80 for system calls, and
> > putting backward compatability cruft into a module to support
> > any binary which some moron decided to link static because they
> > thought that linking it static makes it somehow safer from single
> > file damage failure than using libc.so and ld.so.
> >
> > My ideal implementation would use async call gates.  In effect,
> > this would be the same as implementing VMS ASTs in all of FreeBSD.
> 
> Actually, why not just have a syscall that turns on the async
> behavior?

Basically I don't see the problem.. that's basically what we are
doing....
 
> 
> > That all said, the current project, as it was envisioned by Jason
> > Evans, does not have the limitations which you and Nate fear,
> > unless you cop out on the implementation, and do what Julian and
> > Archie wanted with KSEG non-migration of KSEs (and the concommitant
> > single scheduler run queue for all CPUs), or if you take Matt's
> > approach, and serialize execution everywhere, not just within a
> > particular KSEG (Matt's approach prevents the need for a single
> > process to be able to exist on a run queue as more than one entry
> > instance, which makes some things easier).
> >
> > I would call both the Matt approach, and the Julian/Archie approach
> > "overly conservative"; they are both in excess of 12 years behind
> > the state of the Art.  But I would also level the same criticism
> > at using the 6 year old technology of scheduler activations to
> > avoid going to true async call gates.

define an aysnc call gate and show how it differs from what we are 
suggesting?  We are suggesting that all blocking syscalls 
be made async. We are also suggesting a reporting mechanism by which
completed syscalls are reported.

> >
> >
> > In any case, you and Nate are getting upset at shortcuts that
> > people want to take in implementation, not at the design itself.
> >
> > Cut it out.
> 
> Well if we have an implementation where the implementators are
> unwilling or incapable (because of time constraints, or getting
> hit by a bus, etc) of doing the more optimized version then what's
> the point besideds getting more IO concurrancy?  I don't know, it
> just that if someone has a terrific idea that seems to have astounding
> complexity and they don't feel like they want to or can take the
> final step with it, then it really should not be considered.

Alfred, if you think this is astoudingly compex, then it shows you
need to read it again..
I think it's very simple..

Blocking syscalls are allowed to return to user space via
an upcall.

Completed (previously blocked) syscalls are reported on scheduler
events and possibly at other oportune times. (all
of which would be some sort od kernel boundary crossing event)

the rest of it is housekeeping to allow this to happen safely,
fairly, and concurrently on multiple processors..
 
> 
> btw, I've read some on scheduler activations, where some references
> on async call gates?
> 
> --
> -Alfred Perlstein - [alfred@freebsd.org]
> Represent yourself, show up at BABUG http://www.babug.org/
> 
> To Unsubscribe: send mail to majordomo@FreeBSD.org
> with "unsubscribe freebsd-arch" in the body of the message

-- 
      __--_|\  Julian Elischer
     /       \ julian@elischer.org
    (   OZ    ) World tour 2000-2001
---> X_.---._/  
            v

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Sat Apr 28  1:48: 4 2001
Delivered-To: freebsd-arch@freebsd.org
Received: from filk.iinet.net.au (syncopation-dns.iinet.net.au [203.59.24.29])
	by hub.freebsd.org (Postfix) with SMTP id A00E537B43C
	for <arch@FreeBSD.ORG>; Sat, 28 Apr 2001 01:47:57 -0700 (PDT)
	(envelope-from julian@elischer.org)
Received: (qmail 20651 invoked by uid 666); 28 Apr 2001 08:51:10 -0000
Received: from i179-143.nv.iinet.net.au (HELO elischer.org) (203.59.179.143)
  by mail.m.iinet.net.au with SMTP; 28 Apr 2001 08:51:10 -0000
Message-ID: <3AEA837D.C2AE5E8D@elischer.org>
Date: Sat, 28 Apr 2001 01:46:53 -0700
From: Julian Elischer <julian@elischer.org>
X-Mailer: Mozilla 4.7 [en] (X11; U; FreeBSD 5.0-CURRENT i386)
X-Accept-Language: en, hu
MIME-Version: 1.0
To: Terry Lambert <tlambert@primenet.com>
Cc: Alfred Perlstein <bright@wintelcom.net>, arch@FreeBSD.ORG,
	terry@lambert.org
Subject: Re: KSE threading support (first parts)
References: <200104280337.UAA29358@usr08.primenet.com>
Content-Type: text/plain; charset=iso-8859-2
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

Terry, what you describe here is so similar to what we are planning 
on doing that the differences could be called "Implementation details"

The only difference is that your syscall returns to where it came from
when it would have blocked, and in that I'm championning, the 
particular thread is left suspended, and control returns to the UTS
via an upcall.

Terry Lambert wrote:
> 
> > The way I envision async call gates is something like each syscall
> > could borrow a spare pcb and do an rfork back into the user
> > application using the borrowed pcb and allowing the syscall to
> > proceed as scheduled as another kernel thread, upon return it would
> > somehow notify the process of completion.
> 
> Close.  Effectively, it uses the minimal amount of call context
> it can get away with, and points the VM space and other stuff
> back to the process control block, which is shared among all
> system calls.
> 
> 
> The context is used by the kernel to continue processing.  It
> contains the address of the user space status block, as well as
> a copy of the stack of the returning program (think of the one
> that continues as a "setmp", with the one doing the return as
> getting a longjmp, where the code it would have run is skipped).

In Tehh proposed scheme, the original context is save in exactly
the same way it would be if the process blocked, but instead of 
scheduling a new process to run, teh same process continues to run, 
in the same KSE, having done a longjmp() to a saved context
that simply returns to the UTS.

The original context is saved into a KSEC structure
and it is hung on the appropriate sleep/wait queue.

The returning new context is very small (a couple of entries on
a small kernel stack) It doesn't need to know anything about the 
syscall that just blocked. (there may not even have been one)

When the KSE was created, one of the argumants was the address of a
mailbox used by that KSE to communicate with the UTS. The status of 
the blocked syscall/thread will be available to the UTS via 
that mailbox.


> 
> The final part is that the context runs to completion at the
> user space boundary; since the call has already returned, it does
> not return to user space, instead it stops at the user/kernel
> boundary, after copying out the completion status into the
> user space status block.

ditto. 

> 
> The status block is a simplified version of the aioread/aiowrite
> status block.
> 
> A program can just use these calls directly.  They can also set a
> flag to make the call synchornous (as in an aiowait).  Finally, a
> user space threads scheduler can use completion notifications to
> make scheduling decisions.

In our case that act of creating a KSE enables 'kse mode' in which case
syscalls always look to the thread concerned, as if they completed
normally, but there may have been intervening upcalls to the UTS
between the time the syscall was dispatched and the time it was completed.

libc MAY not need the syscall stubs changed at all.

> 
> For SMP, you can state that you have the ability to return into
> user space (e.g. similar to vfork/sfork) multiple times.  Each
> of these represents a "scheduler reservation", where you reserve
> the right to compete for a quanta.
> 
> You can also easily implement negafinity for up to 32 processors
> with three 32 bit unsigned int's in the process block: just don't
> reserve on a processor where the bit is already set, until you
> have reserved on all available processors at least once.
> 
> > > My ideal implementation would use async call gates.  In effect,
> > > this would be the same as implementing VMS ASTs in all of FreeBSD.
> >
> > Actually, why not just have a syscall that turns on the async
> > behavior?
> 
> Libc will break.  It does not expect to have to reap completed
> system call status blocks to report completion status to the user
> program.

In the KSE world, you do not reap syscall results. Your reap runnable
threads.
Each thread that is runnable is set up by the kernel to look as though it
did a yield() on the first machine instruction after the syscall.

when you longjmp() to retstart the thread, you have effectlively
just retunred from a perfectly normal 'read()' or whatever call. 
As a thread you cannot tell if you were blocked in that call or not.
(the information ay be available to you if you ask for it but the thread's
behaviour is nto different in any way.


> >
> > Well if we have an implementation where the implementators are
> > unwilling or incapable (because of time constraints, or getting
> > hit by a bus, etc) of doing the more optimized version then what's
> > the point besideds getting more IO concurrancy?  I don't know, it
> > just that if someone has a terrific idea that seems to have astounding
> > complexity and they don't feel like they want to or can take the
> > final step with it, then it really should not be considered.

there are two main reasons for doing the KSE work:
1/ IO concurrancy.. Even with one KSE and one KSEG and one processor
you can still have multithreading, with several IO operations outstanding 
at one time, and still do some processing as well.

You could also implement IO concurrency in 'non threaded' programming
models using KSEs and IO stubs.

2/ Increase processor concurrency.
to be able to run multiple threads in one process context, concurrently
on different processors.


> 
> The point of threads was to reduce context switch overhead, and
> to increase the useful work that actually gets done in any given
> time period, as opposed to spending cycles on system overhead or
> spinning waiting for a call to complete when you have other, better
> work to do.

which is what we are trying to do.

> 
> Somewhere along the way, it became corrupted into a tool to allow
> people without very much clue to write programs one-per-connection,
> instead of building finite state automata, and that corruption has
> proceeded, until now it's a tool to get SMP scalability.

"let me introduce you...,
Mr. Foot, Mr. Bullet   ...    Mr. Bullet  Mr. Foot"

Just because this is true doesn't mean that we should give them tools
to do useful threading.

> 
> > btw, I've read some on scheduler activations, where some references
> > on async call gates?
> 
> You're talking to the originator of the idea.  See the -arch archives.

As far as I can see, the difference to what we are suggesting
is that in you async call infrastructure, the syscall
that has been blocked retunrs through the same code that it waould have 
retunred through had it not blocked, and that the library must
detect this and jump to the UTS inthe 'blocked' case to schedule 
another thread.


-- 
      __--_|\  Julian Elischer
     /       \ julian@elischer.org
    (   OZ    ) World tour 2000-2001
---> X_.---._/  
            v

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Sat Apr 28  4: 7: 7 2001
Delivered-To: freebsd-arch@freebsd.org
Received: from filk.iinet.net.au (syncopation-dns.iinet.net.au [203.59.24.29])
	by hub.freebsd.org (Postfix) with SMTP id 21F4A37B424
	for <arch@FreeBSD.ORG>; Sat, 28 Apr 2001 04:07:04 -0700 (PDT)
	(envelope-from julian@elischer.org)
Received: (qmail 20972 invoked by uid 666); 28 Apr 2001 11:10:18 -0000
Received: from i179-143.nv.iinet.net.au (HELO elischer.org) (203.59.179.143)
  by mail.m.iinet.net.au with SMTP; 28 Apr 2001 11:10:18 -0000
Message-ID: <3AEAA418.1EE1A1AF@elischer.org>
Date: Sat, 28 Apr 2001 04:06:00 -0700
From: Julian Elischer <julian@elischer.org>
X-Mailer: Mozilla 4.7 [en] (X11; U; FreeBSD 5.0-CURRENT i386)
X-Accept-Language: en, hu
MIME-Version: 1.0
To: Terry Lambert <tlambert@primenet.com>,
	Alfred Perlstein <bright@wintelcom.net>, arch@FreeBSD.ORG,
	terry@lambert.org
Subject: Re: KSE threading support (first parts)
References: <200104280337.UAA29358@usr08.primenet.com> <3AEA837D.C2AE5E8D@elischer.org>
Content-Type: text/plain; charset=iso-8859-2
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

Julian Elischer wrote:
> 
> Terry, what you describe here is so similar to what we are planning
> on doing that the differences could be called "Implementation details"


[...]

> >
> > Somewhere along the way, it became corrupted into a tool to allow
> > people without very much clue to write programs one-per-connection,
> > instead of building finite state automata, and that corruption has
> > proceeded, until now it's a tool to get SMP scalability.
> 
> "let me introduce you...,
> Mr. Foot, Mr. Bullet   ...    Mr. Bullet  Mr. Foot"
> 
> Just because this is true doesn't mean that we should give them tools
> to do useful threading.

s/should/shouldn't/


-- 
      __--_|\  Julian Elischer
     /       \ julian@elischer.org
    (   OZ    ) World tour 2000-2001
---> X_.---._/  
            v

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Sat Apr 28  6: 9:16 2001
Delivered-To: freebsd-arch@freebsd.org
Received: from pcnet1.pcnet.com (pcnet1.pcnet.com [204.213.232.3])
	by hub.freebsd.org (Postfix) with ESMTP id BEEF637B423
	for <Arch@FreeBSD.ORG>; Sat, 28 Apr 2001 06:09:12 -0700 (PDT)
	(envelope-from eischen@vigrid.com)
Received: (from eischen@localhost)
	by pcnet1.pcnet.com (8.8.7/PCNet) id JAA03253;
	Sat, 28 Apr 2001 09:08:32 -0400 (EDT)
Date: Sat, 28 Apr 2001 09:08:31 -0400 (EDT)
From: Daniel Eischen <eischen@vigrid.com>
To: Julian Elischer <julian@elischer.org>
Cc: Nate Williams <nate@yogotech.com>,
	Matt Dillon <dillon@earth.backplane.com>, Arch@FreeBSD.ORG
Subject: Re: KSE threading support (first parts)
In-Reply-To: <3AEA5845.D3377794@elischer.org>
Message-ID: <Pine.SUN.3.91.1010428084746.546A-100000@pcnet1.pcnet.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

On Fri, 27 Apr 2001, Julian Elischer wrote:
> Nate Williams wrote:
> > 
> > > > >     Well, that's complete bullshit.  KSE's are extremely short-running
> > > > >     affairs in kernel mode, especially when you consider the most likely
> > > > >     asynchronizing case (a simple blocking situation that will most commonly
> > > > >     be in a read() or write()).
> > > >
> > > > Not necessarily.  My experience with developing and running applications
> > > > on Solaris says that having multiple KSE's/process is a *huge* win.
> > >
> > > You do know that the proposed implementation isn't quite like
> > > Solaris (KSEs don't get their own quantum).  You better holler
> > > if you want it ;-)
> > 
> > I'm not sure how much a difference that makes, but to be honest, I
> > haven't thought about the consequences of it much. :(
> > 
> > Nate
> 
> If you implementN LWPs as  N KSEGs with a KSE each, they do get 
> their own quanta so  it can be arranged to do it either way.

As long as I am allowed to implement it this way in libpthread then
I don't really have a problem.

-- 
Dan Eischen

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Sat Apr 28 13:38:40 2001
Delivered-To: freebsd-arch@freebsd.org
Received: from critter.freebsd.dk (critter.freebsd.dk [212.242.86.163])
	by hub.freebsd.org (Postfix) with ESMTP
	id 44E8F37B424; Sat, 28 Apr 2001 13:38:36 -0700 (PDT)
	(envelope-from phk@critter.freebsd.dk)
Received: from critter (localhost [127.0.0.1])
	by critter.freebsd.dk (8.11.3/8.11.3) with ESMTP id f3SKcPU29884;
	Sat, 28 Apr 2001 22:38:25 +0200 (CEST)
	(envelope-from phk@critter.freebsd.dk)
To: Robert Watson <rwatson@FreeBSD.ORG>
Cc: freebsd-arch@FreeBSD.ORG
Subject: Re: jailNG 
In-Reply-To: Your message of "Mon, 23 Apr 2001 14:29:22 EDT."
             <Pine.NEB.3.96L.1010423141823.91472L-100000@fledge.watson.org> 
Date: Sat, 28 Apr 2001 22:38:25 +0200
Message-ID: <29882.988490305@critter>
From: Poul-Henning Kamp <phk@critter.freebsd.dk>
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG


I'm not uninterested in jails, but I have no time (and no contracts
to give me time) for it at present.

In general I think jail is in much more capable hands with you anyway :-)

Poul-Henning

In message <Pine.NEB.3.96L.1010423141823.91472L-100000@fledge.watson.org>, Robe
rt Watson writes:
>
>This weekend I was spending some time tweaking the jail(8) code to improve
>it's SMPng-happiness as well as manageability.  Unfortunately, I ended up
>rewriting it in the process :-).  I changed the model somewhat so that
>jails are now persistently configred, joined, et al, and broke out the
>chroot() from the creation/joining process, as with increased namespaces
>(such as System V IPC) creating a nice clean failure was increasingly
>difficult.  Aspects of individual jails may now be managed using sysctl's,
>which appears to work reasonably well.  Clearly there's a lot of work left
>to do, but I'd appreciate comments if people are interested:
>
>  http://www.watson.org/~robert/jailng/
>
>Simple example:
>
>dev# ./jailctl 
>usage:
>  jailctl create [jailname]
>  jailctl destroy [jailname]
>  jailctl join [jailname] [-c chrootpath] [path] [cmd] [args...]
>dev# ./jailctl create test
>dev# sysctl -a | grep jail
>jail.instance.test.sysvipc_permitted: 0
>jail.instance.test.set_hostname_permitted: 1
>jail.instance.test.socket_ipv4_permitted: 1
>jail.instance.test.socket_unix_permitted: 1
>jail.instance.test.socket_route_permitted: 1
>jail.instance.test.socket_other_permitted: 0
>jail.instance.test.ipv4addr: 0
>dev# ./jailctl join test -c /tmp /bin/sh
># ps ax
>  PID  TT  STAT      TIME COMMAND
>  907  d0  DWJ    0:00.02 /bin/sh
>  908  d0  RW+J   0:00.00 ps ax
># exit
>dev# ./jailctl destroy test
>dev# 
>
>I also have a jailinit(8) in the works which would allow improved
>startup/shutdown in the style of init(8) (sans the whole sigchild thing).
>Another feature I'd like to add is a jail signal call that allows a signal
>to be delivered to all processes inside a jail from outside, allowing an
>easier forceable shutdown.
>
>Robert N M Watson             FreeBSD Core Team, TrustedBSD Project
>robert@fledge.watson.org      NAI Labs, Safeport Network Services
>
>
>To Unsubscribe: send mail to majordomo@FreeBSD.org
>with "unsubscribe freebsd-arch" in the body of the message
>

--
Poul-Henning Kamp       | UNIX since Zilog Zeus 3.20
phk@FreeBSD.ORG         | TCP/IP since RFC 956
FreeBSD committer       | BSD since 4.3-tahoe    
Never attribute to malice what can adequately be explained by incompetence.

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Sat Apr 28 16:50:21 2001
Delivered-To: freebsd-arch@freebsd.org
Received: from sasami.jurai.net (sasami.jurai.net [64.0.106.45])
	by hub.freebsd.org (Postfix) with ESMTP
	id DC53B37B42C; Sat, 28 Apr 2001 16:50:18 -0700 (PDT)
	(envelope-from scanner@jurai.net)
Received: from localhost (scanner@localhost)
	by sasami.jurai.net (8.9.3/8.8.7) with ESMTP id TAA85008;
	Sat, 28 Apr 2001 19:49:59 -0400 (EDT)
Date: Sat, 28 Apr 2001 19:49:59 -0400 (EDT)
From: <scanner@jurai.net>
To: Poul-Henning Kamp <phk@critter.freebsd.dk>
Cc: Robert Watson <rwatson@FreeBSD.ORG>, freebsd-arch@FreeBSD.ORG
Subject: Re: jailNG 
In-Reply-To: <29882.988490305@critter>
Message-ID: <Pine.BSF.4.21.0104281944550.84976-100000@sasami.jurai.net>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG


It is my understanding from the OpenRoot project that jail currently does
not allow ICMP to work inside a jail? If this is so, this seriously
damages services that need Path MTU-D such as SMTP and HTTP. Surely this
is not the case? Can someone enlighten me on this.

Thanks

=============================================================================
-Chris Watson         (316) 326-3862 | FreeBSD Consultant, FreeBSD Geek 
Work:              scanner@jurai.net | Open Systems Inc., Wellington, Kansas
Home:  scanner@deceptively.shady.org | http://open-systems.net
=============================================================================
WINDOWS: "Where do you want to go today?"
LINUX: "Where do you want to go tomorrow?"
BSD: "Are you guys coming or what?"
=============================================================================
irc.openprojects.net #FreeBSD -Join the revolution!
ICQ: 20016186


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Sat Apr 28 16:54: 2 2001
Delivered-To: freebsd-arch@freebsd.org
Received: from fledge.watson.org (fledge.watson.org [204.156.12.50])
	by hub.freebsd.org (Postfix) with ESMTP id BA14437B423
	for <freebsd-arch@FreeBSD.ORG>; Sat, 28 Apr 2001 16:54:00 -0700 (PDT)
	(envelope-from robert@fledge.watson.org)
Received: from fledge.watson.org (robert@fledge.pr.watson.org [192.0.2.3])
	by fledge.watson.org (8.11.3/8.11.3) with SMTP id f3SNsHf06257;
	Sat, 28 Apr 2001 19:54:17 -0400 (EDT)
	(envelope-from robert@fledge.watson.org)
Date: Sat, 28 Apr 2001 19:54:17 -0400 (EDT)
From: Robert Watson <rwatson@FreeBSD.ORG>
X-Sender: robert@fledge.watson.org
To: scanner@jurai.net
Cc: Poul-Henning Kamp <phk@critter.freebsd.dk>,
	freebsd-arch@FreeBSD.ORG
Subject: Re: jailNG 
In-Reply-To: <Pine.BSF.4.21.0104281944550.84976-100000@sasami.jurai.net>
Message-ID: <Pine.NEB.3.96L.1010428195253.89482E-100000@fledge.watson.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

On Sat, 28 Apr 2001 scanner@jurai.net wrote:

> It is my understanding from the OpenRoot project that jail currently
> does not allow ICMP to work inside a jail? If this is so, this seriously
> damages services that need Path MTU-D such as SMTP and HTTP. Surely this
> is not the case? Can someone enlighten me on this. 

The jail() code doesn't allow user applications to open raw sockets
permitting direct use of ICMP by user processes, but all of the normal use
of ICMP by the network stack directly is uninhibited.  This means that
things like PMTU discovery work just fine, but applications such as ping
do not work in jail().  It's possible to imagine modifications to the raw
socket behavior that might permit use of it from within jail(), but
there's a whole can of worms there that we're not willing to spend too
much time on at this point.

Robert N M Watson             FreeBSD Core Team, TrustedBSD Project
robert@fledge.watson.org      NAI Labs, Safeport Network Services


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Sat Apr 28 17: 2:32 2001
Delivered-To: freebsd-arch@freebsd.org
Received: from sasami.jurai.net (sasami.jurai.net [64.0.106.45])
	by hub.freebsd.org (Postfix) with ESMTP
	id 623E937B423; Sat, 28 Apr 2001 17:02:24 -0700 (PDT)
	(envelope-from scanner@jurai.net)
Received: from localhost (scanner@localhost)
	by sasami.jurai.net (8.9.3/8.8.7) with ESMTP id UAA85197;
	Sat, 28 Apr 2001 20:02:23 -0400 (EDT)
Date: Sat, 28 Apr 2001 20:02:23 -0400 (EDT)
From: <scanner@jurai.net>
To: Robert Watson <rwatson@FreeBSD.ORG>
Cc: freebsd-arch@FreeBSD.ORG
Subject: Re: jailNG 
In-Reply-To: <Pine.NEB.3.96L.1010428195253.89482E-100000@fledge.watson.org>
Message-ID: <Pine.BSF.4.21.0104281956430.85066-100000@sasami.jurai.net>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

On Sat, 28 Apr 2001, Robert Watson wrote:

> The jail() code doesn't allow user applications to open raw sockets
> permitting direct use of ICMP by user processes, but all of the normal use
> of ICMP by the network stack directly is uninhibited.  This means that
> things like PMTU discovery work just fine, but applications such as ping
> do not work in jail().  It's possible to imagine modifications to the raw
> socket behavior that might permit use of it from within jail(), but
> there's a whole can of worms there that we're not willing to spend too
> much time on at this point.

Ok. I wasn't sure. I couldnt believe it would block
ICMP. I knew there was a logical system with its behaviour. I actually
like the current way then. I see jail as a virtual hosting env. more then
anything else. Thanks for the explanation.

=============================================================================
-Chris Watson         (316) 326-3862 | FreeBSD Consultant, FreeBSD Geek 
Work:              scanner@jurai.net | Open Systems Inc., Wellington, Kansas
Home:  scanner@deceptively.shady.org | http://open-systems.net
=============================================================================
WINDOWS: "Where do you want to go today?"
LINUX: "Where do you want to go tomorrow?"
BSD: "Are you guys coming or what?"
=============================================================================
irc.openprojects.net #FreeBSD -Join the revolution!
ICQ: 20016186


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message


From owner-freebsd-arch  Sat Apr 28 17:30:46 2001
Delivered-To: freebsd-arch@freebsd.org
Received: from fledge.watson.org (fledge.watson.org [204.156.12.50])
	by hub.freebsd.org (Postfix) with ESMTP id 013C337B422
	for <freebsd-arch@FreeBSD.ORG>; Sat, 28 Apr 2001 17:30:44 -0700 (PDT)
	(envelope-from robert@fledge.watson.org)
Received: from fledge.watson.org (robert@fledge.pr.watson.org [192.0.2.3])
	by fledge.watson.org (8.11.3/8.11.3) with SMTP id f3T0VDf06567;
	Sat, 28 Apr 2001 20:31:13 -0400 (EDT)
	(envelope-from robert@fledge.watson.org)
Date: Sat, 28 Apr 2001 20:31:13 -0400 (EDT)
From: Robert Watson <rwatson@FreeBSD.ORG>
X-Sender: robert@fledge.watson.org
To: scanner@jurai.net
Cc: freebsd-arch@FreeBSD.ORG
Subject: Re: jailNG 
In-Reply-To: <Pine.BSF.4.21.0104281956430.85066-100000@sasami.jurai.net>
Message-ID: <Pine.NEB.3.96L.1010428202642.6533A-100000@fledge.watson.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG


On Sat, 28 Apr 2001 scanner@jurai.net wrote:

> On Sat, 28 Apr 2001, Robert Watson wrote:
> 
> > The jail() code doesn't allow user applications to open raw sockets
> > permitting direct use of ICMP by user processes, but all of the normal use
> > of ICMP by the network stack directly is uninhibited.  This means that
> > things like PMTU discovery work just fine, but applications such as ping
> > do not work in jail().  It's possible to imagine modifications to the raw
> > socket behavior that might permit use of it from within jail(), but
> > there's a whole can of worms there that we're not willing to spend too
> > much time on at this point.
> 
> Ok. I wasn't sure. I couldnt believe it would block ICMP. I knew there
> was a logical system with its behaviour. I actually like the current way
> then. I see jail as a virtual hosting env. more then anything else.
> Thanks for the explanation. 

Yeah -- there are three basic function of jail():

1) Cause the suser() call to work only in a specifically designated set of
   cases (using the PRISON_ROOT flags to suser_xxx() permits the call to
   succeed in jail()).

2) Institute a set of simply mandatory inter-process restrictions
   preventing a set of inter-process operations from taking place (such as
   debugging, signalling, et al from within the jail to outside the
   jail()).

3) Rewrite or block certain socket operations so that listen and connect
   operations in the IP space use the IP designated for the jail(), and so
   that access to some other protocols (such as IPv6) can be disabled.
   This in effect provides a simple form of poly-instantiation for the
   localhost address (by substituting the jail IP for 127.0.0.1), but also
   has some other side effects that I'm looking at ways to remedy in the
   future (hopefuly without virtualizing the entire stack, which would
   work, but has a lot of negative sides.)

The access to ICMP is a property of (1), not of (3).  Jail's impact on the
IP stack happens almost exclusively as part of the socket implementation,
and doesn't get down into the network layer much.

Robert N M Watson             FreeBSD Core Team, TrustedBSD Project
robert@fledge.watson.org      NAI Labs, Safeport Network Services


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message