From owner-freebsd-arch Mon Apr 23 8:27:53 2001 Delivered-To: freebsd-arch@freebsd.org Received: from rina.r.dl.itc.u-tokyo.ac.jp (rina.r.dl.itc.u-tokyo.ac.jp [133.11.199.247]) by hub.freebsd.org (Postfix) with ESMTP id B10DB37B423 for ; Mon, 23 Apr 2001 08:27:46 -0700 (PDT) (envelope-from tanimura@r.dl.itc.u-tokyo.ac.jp) Received: (from uucp@localhost) by rina.r.dl.itc.u-tokyo.ac.jp (8.11.3+3.4W/3.7W-rina.r-20010412) with UUCP id f3NFPAV75414 ; Tue, 24 Apr 2001 00:25:10 +0900 (JST) Received: (from root@localhost) by sohgo.carrots.uucp.r.dl.itc.u-tokyo.ac.jp (8.11.3+3.4W/3.7W) with UUCP id f3NFHCU01590 ; Tue, 24 Apr 2001 00:17:12 +0900 (JST) Received: from bunko.nkth.carrots.uucp.r.dl.itc.u-tokyo.ac.jp (localhost [127.0.0.1]) by bunko.carrots.uucp.r.dl.itc.u-tokyo.ac.jp (8.11.1+3.4W/3.7W) with ESMTP id f3NEsnN26340 ; Mon, 23 Apr 2001 23:54:50 +0900 (JST) Message-Id: <200104231454.f3NEsnN26340@bunko.carrots.uucp.r.dl.itc.u-tokyo.ac.jp> Date: Mon, 23 Apr 2001 23:54:49 +0900 From: Seigo Tanimura To: Matt Dillon Cc: Seigo Tanimura , arch@FreeBSD.ORG, bde@zeta.org.au Subject: Re: Mmap(2) should start just below stack (was: Re: Bumping up {MAX,DFL}*SIZ in i386) In-Reply-To: <200104070026.f370QfY50900@earth.backplane.com> References: <200103191056.f2JAuox00630@rina.r.dl.itc.u-tokyo.ac.jp> <200103230517.f2N5HXx08605@rina.r.dl.itc.u-tokyo.ac.jp> <200104050506.f3556Xw28400@rina.r.dl.itc.u-tokyo.ac.jp> <200104070026.f370QfY50900@earth.backplane.com> User-Agent: Wanderlust/2.4.1 (Stand By Me) SEMI/1.13.7 (Awazu) FLIM/1.13.2 (Kasanui) MULE XEmacs/21.1 (patch 14) (Cuyahoga Valley) (i386--freebsd) Organization: Digital Library Research Division, Information Techinology Centre, The University of Tokyo MIME-Version: 1.0 (generated by SEMI 1.13.7 - "Awazu") Content-Type: text/plain; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Fri, 6 Apr 2001 17:26:41 -0700 (PDT), Matt Dillon said: dillon> :| Process Stack | dillon> :+--------------------+ down to 3GB - max of RLIMIT_STACK dillon> :|Reserved for Process| dillon> :| Stack | dillon> :+--------------------+ 3GB - max of RLIMIT_STACK dillon> :| Mmap(2)ed space | dillon> :| (mmap(2), dynamic | This may be fragmented. dillon> :| linker, shared | dillon> :| objects, etc) | dillon> :+--------------------+ down to end of bss + max of RLIMIT_DATA dillon> :| Mmap(2) Heap | dillon> :+--------------------+ end of bss + max of RLIMIT_DATA dillon> :|Reserved for Malloc | dillon> :+--------------------+ up to end of bss + max of RLIMIT_DATA (break) dillon> :| Malloc(3) Heap | dillon> suid-root programs often adjust resources upwards in order to avoid dillon> potential root compromises due to allocation failures at just the dillon> wrong time (coupled with a badly written program). Most commonly this dillon> means RLIMIT_DATA will be increased and, of course, many programs will dillon> also increase RLIMIT_DATA. However, the same problem with suid-root dillon> programs exists for RLIMIT_STACK as well. dillon> So using a RLIMIT_STACK based solution instead of RLIMIT_DATA only dillon> partially solves the problem. dillon> We also have a similar issue with fork(). Process A fork()'s and dillon> adjusts the resource limits for the child process downward or upward. At that point, we have to reserve a certain size of an address region either below the max of RLIMIT_STACK or above the max of RLIMIT_DATA. As the reserved address space is likely to be unused in many cases, the size should be kept small. While the size of the data handled by a process can go up to gigabytes, the size of the stack consumed by a process is only up to several ten megabytes. (unless you are trying to solve a problem that cannot be solved by a Turing machine :) Hence our option should be to reserve a region down to 3GB - MAXSSIZ (which is the lower limit of RLIMIT_STACK) and possibly an additional safety zone below the stack. Then a user process vm space looks something like this: | Process Stack | +--------------------+ down to 3GB - max of RLIMIT_STACK |Reserved for Process| | Stack | +--------------------+ 3GB - max of RLIMIT_STACK |Reserved for growth | | of RLIMIT_STACK | +--------------------+ 3GB - MAXSSIZ | Safety zone for | | buffer overflow | +--------------------+ 3GB - (MAXSSIZ + SAFETY_ZONE_SIZE) | Mmap(2)ed space | | (mmap(2), dynamic | This may be fragmented. | linker, shared | | objects, etc) | +--------------------+ down to end of bss + max of RLIMIT_DATA | Mmap(2) Heap | +--------------------+ end of bss + max of RLIMIT_DATA |Reserved for Malloc | +--------------------+ up to end of bss + max of RLIMIT_DATA (break) | Malloc(3) Heap | It should be enough for SAFETY_ZONE_SIZE to be about 16-32MB. Since MAXSSIZ + SAFETY_ZONE_SIZE is much smaller than MAXDSIZE, reserving the stack space and the safety zone is less likely to be the obstacle to reserve a large mmap(2) space than now. -- Seigo Tanimura To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Apr 23 11:29: 6 2001 Delivered-To: freebsd-arch@freebsd.org Received: from fledge.watson.org (fledge.watson.org [204.156.12.50]) by hub.freebsd.org (Postfix) with ESMTP id A585837B422 for ; Mon, 23 Apr 2001 11:28:59 -0700 (PDT) (envelope-from robert@fledge.watson.org) Received: from fledge.watson.org (robert@fledge.pr.watson.org [192.0.2.3]) by fledge.watson.org (8.11.3/8.11.3) with SMTP id f3NITMf02680 for ; Mon, 23 Apr 2001 14:29:23 -0400 (EDT) (envelope-from robert@fledge.watson.org) Date: Mon, 23 Apr 2001 14:29:22 -0400 (EDT) From: Robert Watson X-Sender: robert@fledge.watson.org To: freebsd-arch@FreeBSD.org Subject: jailNG Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG This weekend I was spending some time tweaking the jail(8) code to improve it's SMPng-happiness as well as manageability. Unfortunately, I ended up rewriting it in the process :-). I changed the model somewhat so that jails are now persistently configred, joined, et al, and broke out the chroot() from the creation/joining process, as with increased namespaces (such as System V IPC) creating a nice clean failure was increasingly difficult. Aspects of individual jails may now be managed using sysctl's, which appears to work reasonably well. Clearly there's a lot of work left to do, but I'd appreciate comments if people are interested: http://www.watson.org/~robert/jailng/ Simple example: dev# ./jailctl usage: jailctl create [jailname] jailctl destroy [jailname] jailctl join [jailname] [-c chrootpath] [path] [cmd] [args...] dev# ./jailctl create test dev# sysctl -a | grep jail jail.instance.test.sysvipc_permitted: 0 jail.instance.test.set_hostname_permitted: 1 jail.instance.test.socket_ipv4_permitted: 1 jail.instance.test.socket_unix_permitted: 1 jail.instance.test.socket_route_permitted: 1 jail.instance.test.socket_other_permitted: 0 jail.instance.test.ipv4addr: 0 dev# ./jailctl join test -c /tmp /bin/sh # ps ax PID TT STAT TIME COMMAND 907 d0 DWJ 0:00.02 /bin/sh 908 d0 RW+J 0:00.00 ps ax # exit dev# ./jailctl destroy test dev# I also have a jailinit(8) in the works which would allow improved startup/shutdown in the style of init(8) (sans the whole sigchild thing). Another feature I'd like to add is a jail signal call that allows a signal to be delivered to all processes inside a jail from outside, allowing an easier forceable shutdown. Robert N M Watson FreeBSD Core Team, TrustedBSD Project robert@fledge.watson.org NAI Labs, Safeport Network Services To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Apr 23 13:25:22 2001 Delivered-To: freebsd-arch@freebsd.org Received: from cicero1.cybercity.dk (cicero1.cybercity.dk [212.242.40.4]) by hub.freebsd.org (Postfix) with ESMTP id DF2B337B628 for ; Mon, 23 Apr 2001 13:25:14 -0700 (PDT) (envelope-from hroi@asdf.dk) Received: from usr00.cybercity.dk (usr00.cybercity.dk [212.242.40.34]) by cicero1.cybercity.dk (Postfix) with ESMTP id 62AF315FC93 for ; Mon, 23 Apr 2001 22:25:12 +0200 (CEST) Received: from asdf.dk (port18.ds1-noe.adsl.cybercity.dk [212.242.52.19]) by usr00.cybercity.dk (8.9.3/8.9.3) with ESMTP id WAA61826 for ; Mon, 23 Apr 2001 22:25:45 +0200 (CEST) (envelope-from hroi@asdf.dk) Message-ID: <3AE48FFB.69A6142E@asdf.dk> Date: Mon, 23 Apr 2001 22:26:35 +0200 From: Hroi Sigurdsson Organization: Expert Knob Twiddlers X-Mailer: Mozilla 4.76 [en] (Windows NT 5.0; U) X-Accept-Language: en MIME-Version: 1.0 Cc: freebsd-arch@FreeBSD.ORG Subject: Re: jailNG References: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Robert Watson wrote: > http://www.watson.org/~robert/jailng/ Very nice! What about the possibility of setting a non-overridable "nice" value on jails or maybe rlimit? -- Hroi Sigurdsson To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Apr 23 16:44:35 2001 Delivered-To: freebsd-arch@freebsd.org Received: from CPE-61-9-164-106.vic.bigpond.net.au (CPE-61-9-138-241.vic.bigpond.net.au [61.9.138.241]) by hub.freebsd.org (Postfix) with ESMTP id 20C3437B422 for ; Mon, 23 Apr 2001 16:44:32 -0700 (PDT) (envelope-from darrenr@reed.wattle.id.au) Received: (from root@localhost) by CPE-61-9-164-106.vic.bigpond.net.au (8.11.0/8.11.0) id f3NNiU624919 for ; Tue, 24 Apr 2001 09:44:30 +1000 (EST) From: Darren Reed Message-Id: <200104232344.JAA10103@avalon.reed.wattle.id.au> Subject: User-defined bit in sysctl flags ? To: arch@freebsd.org Date: Tue, 24 Apr 2001 09:44:05 +1000 (EST) X-Mailer: ELM [version 2.4ME+ PL37 (25)] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG What do people think about having a range of bits in oid_kind that are not used by FreeBSD but are only to be used by ``private'' sysctl handlers? e.g. #define CTLFLAG_PRIVATE 0x000ffff0 The idea is so you can do this: #define SYSCTL_IPF(parent, nbr, name, access, ptr, val, descr) \ SYSCTL_OID(parent, nbr, name, CTLTYPE_INT|access, \ ptr, val, sysctl_ipf_int, "I", descr); SYSCTL_IPF(_net_inet_ipf, OID_AUTO, fr_tcpidletimeout, CTLFLAG_RW|CTL_PRIV, &fr_tcpidletimeout, 0, ""); and have CTL_PRIV be a bit which sysctl_ipf_int understands and not have to worry about the value of CTL_PRIV ever being afflicted with double-use by a FreeBSD flag because CTL_PRIV is part of CTLFLAG_PRIVATE. Any objections to committing it to -current in the next week or so ? Darren To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Apr 23 17: 0:36 2001 Delivered-To: freebsd-arch@freebsd.org Received: from fledge.watson.org (fledge.watson.org [204.156.12.50]) by hub.freebsd.org (Postfix) with ESMTP id 9E5D637B440 for ; Mon, 23 Apr 2001 17:00:33 -0700 (PDT) (envelope-from robert@fledge.watson.org) Received: from fledge.watson.org (robert@fledge.pr.watson.org [192.0.2.3]) by fledge.watson.org (8.11.3/8.11.3) with SMTP id f3O00lf06647; Mon, 23 Apr 2001 20:00:47 -0400 (EDT) (envelope-from robert@fledge.watson.org) Date: Mon, 23 Apr 2001 20:00:47 -0400 (EDT) From: Robert Watson X-Sender: robert@fledge.watson.org To: Hroi Sigurdsson Cc: freebsd-arch@FreeBSD.ORG Subject: Re: jailNG In-Reply-To: <3AE48FFB.69A6142E@asdf.dk> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Mon, 23 Apr 2001, Hroi Sigurdsson wrote: > Robert Watson wrote: > > > http://www.watson.org/~robert/jailng/ > > Very nice! What about the possibility of setting a non-overridable > "nice" value on jails or maybe rlimit? One issue that does need to be addressed in the new code is a problem inherited from the old code: a number of services are addressed on the global scope rather than the jail scope, including resource limits/accounting. One challenge in the jail implementation is a way to do this such that the jail code remains (relatively) cleanly abstracted from the remainder of the system. This is generally true of a number of namespace-based services, including System V IPC. I've toyed with a number of ideas, including a p->p_namespace, but haven't reached any firm conclusions yet, especially regarding situations where multiple issues (not just jail()) might be associated with namespace management. In the mean time, I'll continue my general cleanup of the authorization code. Robert N M Watson FreeBSD Core Team, TrustedBSD Project robert@fledge.watson.org NAI Labs, Safeport Network Services To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Tue Apr 24 11:54:56 2001 Delivered-To: freebsd-arch@freebsd.org Received: from smtp05.primenet.com (smtp05.primenet.com [206.165.6.135]) by hub.freebsd.org (Postfix) with ESMTP id 6D46A37B423 for ; Tue, 24 Apr 2001 11:54:54 -0700 (PDT) (envelope-from tlambert@usr08.primenet.com) Received: (from daemon@localhost) by smtp05.primenet.com (8.9.3/8.9.3) id LAA28456; Tue, 24 Apr 2001 11:54:49 -0700 (MST) Received: from usr08.primenet.com(206.165.6.208) via SMTP by smtp05.primenet.com, id smtpdAAAY9aOI3; Tue Apr 24 11:54:43 2001 Received: (from tlambert@localhost) by usr08.primenet.com (8.8.5/8.8.5) id LAA02463; Tue, 24 Apr 2001 11:55:18 -0700 (MST) From: Terry Lambert Message-Id: <200104241855.LAA02463@usr08.primenet.com> Subject: vm/swap_pager.c swap_pager_swap_init() To: arch@freebsd.org Date: Tue, 24 Apr 2001 18:55:14 +0000 (GMT) Cc: terry@lambert.org X-Mailer: ELM [version 2.5 PL2] MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG It seems to me that the pages calculation in this function is wrong: /* * Initialize our zone. Right now I'm just guessing on the number * we need based on the number of pages in the system. Each swblock * can hold 16 pages, so this is probably overkill. */ n = cnt.v_page_count * 2; In particular, for a 4G system, it seems that this should be more bounded, e.g.: /* * Provide backing store for only 2*physical memory limit. * This approximately halves the amount of memory otherwise * required in a 4G system, relative to the previous 'n' * calculation. It could probably be reduced by half again. */ n = cnt.v_page_count * 2; n = min(n, 128*1024); /* (4G / PAGE_SIZE) / 16 * 2 */ Irealize that this changes for the Alpha nad IA64, and should be more general, but I haven't found the address space limitation defined anywhere. Comments? Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Apr 25 8:38:27 2001 Delivered-To: freebsd-arch@freebsd.org Received: from ringworld.nanolink.com (ringworld.nanolink.com [195.24.48.13]) by hub.freebsd.org (Postfix) with SMTP id B0C8537B422 for ; Wed, 25 Apr 2001 08:38:24 -0700 (PDT) (envelope-from roam@orbitel.bg) Received: (qmail 56829 invoked by uid 1000); 25 Apr 2001 15:36:40 -0000 Date: Wed, 25 Apr 2001 18:36:40 +0300 From: Peter Pentchev To: arch@FreeBSD.org Subject: gid_t vs. plain int Message-ID: <20010425183640.C54687@ringworld.oblivion.bg> Mail-Followup-To: arch@FreeBSD.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Hi, OK. I've (kinda) had enough. Is there a reason that struct group in does not define 'gr_gid' as a gid_t value, but as a plain int? This makes all kinds of things go berserk with gcc -Wall -W, and causes dozens of (totally unneeded) casts. Is there some standard that says pw_gid is gid_t, but gr_gid is int? If not, would anyone be interested in patches (yes, I'm prepared to sweep the whole source tree), making gr_gid a gid_t? G'luck, Peter -- This sentence no verb. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Apr 25 8:46:29 2001 Delivered-To: freebsd-arch@freebsd.org Received: from ringworld.nanolink.com (ringworld.nanolink.com [195.24.48.13]) by hub.freebsd.org (Postfix) with SMTP id 6FAD537B424 for ; Wed, 25 Apr 2001 08:46:26 -0700 (PDT) (envelope-from roam@orbitel.bg) Received: (qmail 57036 invoked by uid 1000); 25 Apr 2001 15:44:43 -0000 Date: Wed, 25 Apr 2001 18:44:43 +0300 From: Peter Pentchev To: arch@FreeBSD.org Subject: Re: gid_t vs. plain int Message-ID: <20010425184443.D54687@ringworld.oblivion.bg> Mail-Followup-To: arch@FreeBSD.org References: <20010425183640.C54687@ringworld.oblivion.bg> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <20010425183640.C54687@ringworld.oblivion.bg>; from roam@orbitel.bg on Wed, Apr 25, 2001 at 06:36:40PM +0300 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Wed, Apr 25, 2001 at 06:36:40PM +0300, Peter Pentchev wrote: > Hi, > > OK. I've (kinda) had enough. > > Is there a reason that struct group in does not define 'gr_gid' of course that should read , not . > as a gid_t value, but as a plain int? This makes all kinds of things > go berserk with gcc -Wall -W, and causes dozens of (totally unneeded) > casts. > > Is there some standard that says pw_gid is gid_t, but gr_gid is int? > If not, would anyone be interested in patches (yes, I'm prepared to sweep > the whole source tree), making gr_gid a gid_t? G'luck, Peter -- This sentence contains exactly threee erors. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Apr 25 9:53:52 2001 Delivered-To: freebsd-arch@freebsd.org Received: from fw.wintelcom.net (ns1.wintelcom.net [209.1.153.20]) by hub.freebsd.org (Postfix) with ESMTP id DF41A37B423 for ; Wed, 25 Apr 2001 09:53:49 -0700 (PDT) (envelope-from bright@fw.wintelcom.net) Received: (from bright@localhost) by fw.wintelcom.net (8.10.0/8.10.0) id f3PGrl824270; Wed, 25 Apr 2001 09:53:47 -0700 (PDT) Date: Wed, 25 Apr 2001 09:53:47 -0700 From: Alfred Perlstein To: Peter Pentchev Cc: arch@FreeBSD.ORG Subject: Re: gid_t vs. plain int Message-ID: <20010425095347.I1790@fw.wintelcom.net> References: <20010425183640.C54687@ringworld.oblivion.bg> <20010425184443.D54687@ringworld.oblivion.bg> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <20010425184443.D54687@ringworld.oblivion.bg>; from roam@orbitel.bg on Wed, Apr 25, 2001 at 06:44:43PM +0300 X-all-your-base: are belong to us. Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG * Peter Pentchev [010425 08:46] wrote: > On Wed, Apr 25, 2001 at 06:36:40PM +0300, Peter Pentchev wrote: > > Hi, > > > > OK. I've (kinda) had enough. > > > > Is there a reason that struct group in does not define 'gr_gid' > > of course that should read , not . > > > as a gid_t value, but as a plain int? This makes all kinds of things > > go berserk with gcc -Wall -W, and causes dozens of (totally unneeded) > > casts. > > > > Is there some standard that says pw_gid is gid_t, but gr_gid is int? > > If not, would anyone be interested in patches (yes, I'm prepared to sweep > > the whole source tree), making gr_gid a gid_t? It looks like a worthy task, I would ask Bruce and Wollman about it before taking it on if it looks like a lot of work just to make sure it's the right thing. -- -Alfred Perlstein - [alfred@freebsd.org] Daemon News Magazine in your snail-mail! http://magazine.daemonnews.org/ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Apr 25 10:48:59 2001 Delivered-To: freebsd-arch@freebsd.org Received: from magellan.palisadesys.com (magellan.palisadesys.com [192.188.162.211]) by hub.freebsd.org (Postfix) with ESMTP id 6636937B423 for ; Wed, 25 Apr 2001 10:48:56 -0700 (PDT) (envelope-from ghelmer@palisadesys.com) Received: from CAPELLA (capella.palisadesys.com [192.188.162.112]) (authenticated (0 bits)) by magellan.palisadesys.com (8.11.2/8.11.2) with ESMTP id f3PHmnZ28561 (using TLSv1/SSLv3 with cipher RC4-MD5 (128 bits) verified NO); Wed, 25 Apr 2001 12:48:49 -0500 From: "Guy Helmer" To: "Alfred Perlstein" , "Peter Pentchev" Cc: Subject: RE: gid_t vs. plain int Date: Wed, 25 Apr 2001 12:49:26 -0500 Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook IMO, Build 9.0.2416 (9.0.2910.0) In-Reply-To: <20010425095347.I1790@fw.wintelcom.net> X-MimeOLE: Produced By Microsoft MimeOLE V5.50.4522.1200 Importance: Normal Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Wednesday, April 25, 2001 11:54 AM Alfred Perlstein wrote: > * Peter Pentchev [010425 08:46] wrote: > > On Wed, Apr 25, 2001 at 06:36:40PM +0300, Peter Pentchev wrote: > > > Hi, > > > > > > OK. I've (kinda) had enough. > > > > > > Is there a reason that struct group in does not > define 'gr_gid' > > > > of course that should read , not . > > > > > as a gid_t value, but as a plain int? This makes all kinds of things > > > go berserk with gcc -Wall -W, and causes dozens of (totally unneeded) > > > casts. > > > > > > Is there some standard that says pw_gid is gid_t, but gr_gid is int? > > > If not, would anyone be interested in patches (yes, I'm > prepared to sweep > > > the whole source tree), making gr_gid a gid_t? > > It looks like a worthy task, I would ask Bruce and Wollman about it > before taking it on if it looks like a lot of work just to make sure > it's the right thing. PR 22210 addresses this issue and a fix, along with comments from Garrett Wollman about his anticipated fix. However, the PR is over six months old :-) Guy Guy Helmer, Ph.D. http://www.palisadesys.com/~ghelmer/ Sr. Software Engineer, Palisade Systems ghelmer@palisadesys.com To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Apr 25 11: 0:28 2001 Delivered-To: freebsd-arch@freebsd.org Received: from filk.iinet.net.au (syncopation-dns.iinet.net.au [203.59.24.29]) by hub.freebsd.org (Postfix) with SMTP id 7C88B37B424 for ; Wed, 25 Apr 2001 11:00:05 -0700 (PDT) (envelope-from julian@elischer.org) Received: (qmail 6093 invoked by uid 666); 25 Apr 2001 18:03:09 -0000 Received: from i181-032.nv.iinet.net.au (HELO elischer.org) (203.59.181.32) by mail.m.iinet.net.au with SMTP; 25 Apr 2001 18:03:09 -0000 Message-ID: <3AE71067.FF4BD029@elischer.org> Date: Wed, 25 Apr 2001 10:59:03 -0700 From: Julian Elischer X-Mailer: Mozilla 4.7 [en] (X11; U; FreeBSD 5.0-CURRENT i386) X-Accept-Language: en, hu MIME-Version: 1.0 To: Arch@freebsd.org, alfred@freebsd.org, Robert Watson , Daniel Eischen Subject: KSE threading support (first parts) Content-Type: multipart/mixed; boundary="------------56834A9EA7789B526697FC9C" Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG This is a multi-part message in MIME format. --------------56834A9EA7789B526697FC9C Content-Type: text/plain; charset=iso-8859-2 Content-Transfer-Encoding: 7bit After discussing this with Jason Evans before he took his new position, and having looked at his patches from December, and My similar patches from january, here is a 'merged' patch. It breaks the proc structure into 4 parts. proc... owns all 'total process' resources. (e.g. address space, limits, files) kseg... KSE 'group'. Anything to do with working out the quanta to be given to the threads (KSEs). A scheduling abstraction. kse.... Actual scheduable entity for a processor (if the KSEG has a quantum for it) ksec... Where a thread stores its context when it is blocked so tha the kse can return to either the user, or another unblocked kse to continue using is quanta. This compiles cleanly and SHOULD run (it did run in an earlier incarnation). It is by no means final, but rather designed to give us a starting point in discussions. In this view, KSEGs are on the run queue and when they get some quanta the KSEs hanging off them are run. If 2 KSEs are running, the KSEG's quanta are exhausted a twice the rate. Each KSE has a very strong affinity for one processor and KSECs have a weak affinity for a KSE. If a KSE runs out of work but has time, it will 'poach' a KSEC from another KSE in the same KSEG list. In this patch the linkages are not set up at all. All that is done is that the structures are defined and used instead of a monolithic 'proc' struct. The new structures are 'included' in the proc structure to maintain compatibility and to allow code to be changed slowely. What really needs to be done is for everyone who is interested to go over rather arbitrary allocation of fields to structures that I did and make suggested changes. Also I've punted on most things to do with signals as we haven't really discussed how we want signals to be handled in a KSE world.. (ca each KSEG or KSE get individual signals? do we need to define a special 'signal' KSE? If so is that all it does? What happens to the 'u-area'? how do we define a "cur-kse" similar to curproc? (do we need one?) presently the processor state is stored all over the place when a process is suspended.. This needs to be brought together so it can be put into the KSEC. Who understands that stuff? Some of the next steps would be: 1/ figure out what we want for signals etc.. 2/ get the contexts actually stored in the KSEC structure when a proces is suspended. (instead of some strange pcb in funny memory near the u area) 3/ Set up the linkages between these structures, and 4/ start using 'kse' instead of 'proc' in a bunch of places and using the linkages to find the appropriate other structures when needed. 5/ Add code to make new KSEs so that the 1:1 Mapping is no longer true. 6/ Add syscalls to start making KSEs other than the one that is built into the process. 7/ start making upcalls -- __--_|\ Julian Elischer / \ julian@elischer.org ( OZ ) World tour 2000-2001 ---> X_.---._/ v --------------56834A9EA7789B526697FC9C Content-Type: text/plain; charset=iso-8859-2; name="proc.4-26.diff" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="proc.4-26.diff" Index: kern/kern_fork.c =================================================================== RCS file: /unused/cvs/freebsd/src/sys/kern/kern_fork.c,v retrieving revision 1.110 diff -u -r1.110 kern_fork.c --- kern/kern_fork.c 2001/03/28 11:52:53 1.110 +++ kern/kern_fork.c 2001/04/25 17:11:22 @@ -390,6 +390,24 @@ (unsigned) ((caddr_t)&p2->p_endcopy - (caddr_t)&p2->p_startcopy)); PROC_UNLOCK(p1); + bzero(&p2->p_kse.ke_startzero, + (unsigned) ((caddr_t)&p2->p_kse.ke_endzero + - (caddr_t)&p2->p_kse.ke_startzero)); + bcopy(&p1->p_kse.ke_startcopy, &p2->p_kse.ke_startcopy, + (unsigned) ((caddr_t)&p2->p_kse.ke_endcopy + - (caddr_t)&p2->p_kse.ke_startcopy)); + + bzero(&p2->p_ksec.kc_startzero, + (unsigned) ((caddr_t)&p2->p_ksec.kc_endzero + - (caddr_t)&p2->p_ksec.kc_startzero)); + bcopy(&p1->p_ksec.kc_startcopy, &p2->p_ksec.kc_startcopy, + (unsigned) ((caddr_t)&p2->p_ksec.kc_endcopy + - (caddr_t)&p2->p_ksec.kc_startcopy)); + + bcopy(&p1->p_kseg.kg_startcopy, &p2->p_kseg.kg_startcopy, + (unsigned) ((caddr_t)&p2->p_kseg.kg_endcopy + - (caddr_t)&p2->p_kseg.kg_startcopy)); + mtx_init(&p2->p_mtx, "process lock", MTX_DEF); PROC_LOCK(p2); Index: sys/proc.h =================================================================== RCS file: /unused/cvs/freebsd/src/sys/sys/proc.h,v retrieving revision 1.160 diff -u -r1.160 proc.h --- sys/proc.h 2001/04/20 22:34:48 1.160 +++ sys/proc.h 2001/04/25 17:20:51 @@ -147,17 +147,200 @@ * either lock is sufficient for read access, but both locks must be held * for write access. */ + struct ithd; struct nlminfo; +/* + * Here we define the four structures used for process information. + * The first is the ksec. It stands for "Kernel Schedulabale Entity Context". + * This structure contains all the information as to where a thread of + * execution was when it was suspended, why it was suspended, and anything else + * that will be needed to restart it when it is rescheduled. Always + * associated with a KSE, but can be reassigned to an equivalent KSE for + * load balancing. + */ +struct ksec; + +/* + * The second structure is the Kernel Schedulable Entity. (KSE) + * As long as this is scheduled, it will continue to run any KSECs that + * are assigned to it until either it runs out of KSECs or CPU. + * It runs on one CPU and is assigned a quantum of time. When a KSEC is + * blocked, The KSE continues to run and will search for another KSEC + * in a runnable state amongst those it has. It May decide to return to user + * mode with a new 'empty' KSEC if there are no runnable KSECs. + * KSEs are associated with a KSE for cache reasons, but a sheduled KSE with + * no runnable KSECs will try take a KSEC from a sibling KSE before + * surrendering its quantum. + */ +struct kse; + +/* + * The KSEG is allocated resources across a number of CPUs. + * (Including a number of CPUxQUANTA. It parcels these QUANTA up among + * Its KSEs, each of which should be running in a different CPU. + * Priority and total avaliable sheduled quanta are properties of a KSEG. + * Multiple KSEGs in a single process compete against each other + * for total quanta in the same way that a forked child competes against + * it's parent process. + */ +struct kseg; + +/* + * A process is the owner of all system resources allocated to a task. + * All KSEGs under one process see, and have the same access to, these + * resources (e.g. files, memory, sockets, permissions). A process may + * compete for CPU cycles on the same basis as a forked process cluster + * by spawning several KSEGs. + */ +struct proc; + +/*************** + * In pictures: + With a single run queue used by all processors: + + RUNQ: --->KSEG---KSEG--... SLEEPQ:[]---KSEC---KSEC---KSEC + | []---KSEC + KSE---KSEC--KSEC--KSEC [] + | []---KSEC---KSEC + KSE--KSEC--KSEC + + (processors run KSEs from the head KSEG until they are exhausted or + the KSEG exhausts its quantum) + +With PER-CPU run queues: +it may be easier to put the KSEs on the run queues directly +They would be given priorities calculated from the KSEG. + + * + *****************/ + +/* + * Kernel runnable context. This is what is put to sleep and reactivated. + * (Kernel Schedulable Entity Context) + * The first KSE available in the correct group will run this context. + * If several are available, use the one on the same CPU as last time. + */ +struct ksec { + /*** New fields for KSE linkage ***/ + /* While it is possible to find the proc via the kse->kseg->proc + * it is directly held here for efficiency (etc.) + */ + struct proc *kc_proc; /* Associated process. */ + struct kseg *kc_kseg; /* Associated KSEG. */ + struct kse *kc_kse; /* Associated KSE. */ + + TAILQ_ENTRY(ksec) kc_ksegq; /* All ksecs in this kseg */ + TAILQ_ENTRY(ksec) kc_slpqk; /* (j) Sleep/run queue. */ + + /* the fields below will mutate into those above */ + TAILQ_ENTRY(proc) kc_procq; /* (j) Run/mutex queue. */ + TAILQ_ENTRY(proc) kc_slpq; /* (j) Sleep queue. */ + /* The following fields are all zeroed upon creation in fork. */ +#define kc_startzero kc_dupfd + int kc_flag; /* (c) P_* flags. */ + int kc_sflag; /* (j) PS_* flags. */ + int kc_stat; /* (j) S* process status. */ + int kc_dupfd; /* (c) ret value from fdopen. XXX */ + void *kc_wchan; /* (j) Sleep address. */ + const char *kc_wmesg; /* (j) Reason for sleep. */ + u_char kc_lastcpu; /* (j) Last cpu we were on. */ + short kc_locks; /* (*) DEBUG: lockmgr count of locks */ + u_int kc_stops; /* (c) Procfs event bitmask. */ + u_int kc_stype; /* (c) Procfs stop event type. */ + char kc_step; /* (c) Procfs stop *once* flag. */ + u_char kc_pfsflags; /* (c) Procfs flags. */ + struct klist kc_klist; /* (c) Knotes attached to this proc. */ + struct mtx *kc_blocked; /* (j Mutex process is blocked on. */ + const char *kc_mtxname; /* (j) Name of mutex blocked on. */ + LIST_HEAD(, mtx) kc_contested; /* (j) Contested locks. */ + /* End area that is zeroed on creation. */ + /* The following fields are all copied upon creation in fork. */ + struct lock_list_entry *kc_sleeplocks; /* (k) Held sleep locks. */ + register_t kc_retval[2]; /* (k) Syscall aux returns. */ +#define kc_endzero kc_slpcallout +#define kc_startcopy kc_endzero + struct callout kc_slpcallout;/* (h) Callout for sleep. */ + struct mdproc kc_md; /* (k) Any machine-dependent fields. */ + /* eventually struct mdksec.... */ + /* End area that is copied on creation. */ +#define kc_endcopy kc_addr + struct user *kc_addr; /* (k) Kernel virtual addr of u-area (CPU). */ + struct pasleep kc_asleep; /* (k) Used by asleep()/await(). */ +}; + +/* + * The schedulable entity that can be given a context to run. + * A process may have several of these. Probably one per processor + * but posibly a few more. In this universe they are grouped + * with a KSEG that contains the priority and niceness + * for the group. + */ +struct kse { + struct proc *ke_proc; /* Associated process. */ + struct kseg *ke_kseg; /* Associated KSEG. */ + TAILQ_ENTRY(kse) ke_kseq; /* Queue of KSEs in ke_kseg. */ + struct ksec *ke_ksec; /* Associated KSEC, if running. */ + TAILQ_HEAD(ke_ksec_hd, ksec); /* Runnable KSECs waiting on this KSE */ + struct pstats *ke_stats; /* (bk) Accounting/statistics (CPU). */ +/* The following fields are all zeroed upon creation in fork. */ +#define ke_startzero ke_estcpu + int ke_flag; /* (c) P_* flags. */ + int ke_sflag; /* (j) PS_* flags. */ + int ke_stat; /* (j) S* process status. */ + u_int ke_estcpu; /* (j) Time averaged value of ke_cpticks. */ + int ke_cpticks; /* (j) Ticks of cpu time. */ + fixpt_t ke_pctcpu; /* (j) %cpu during p_swtime. */ + u_int64_t ke_uu; /* (j) Previous user time in microsec. */ + u_int64_t ke_su; /* (j) Previous system time in microsec. */ + u_int64_t ke_iu; /* (j) Previous interrupt time in microsec. */ + u_int64_t ke_uticks; /* (j) Statclock hits in user mode. */ + u_int64_t ke_sticks; /* (j) Statclock hits in system mode. */ + u_int64_t ke_iticks; /* (j) Statclock hits processing intr. */ + u_int ke_slptime; /* (j) Time since last blocked. */ + u_char ke_oncpu; /* (j) Which cpu we are on. */ + char ke_rqindex; /* (j) Run queue index. */ + int ke_intr_nesting_level; /* (n) Interrupt recursion. */ +/* End area that is zeroed on creation. */ +/* The following fields are all copied upon creation in fork. */ +#define ke_endzero ke_priority +#define ke_startcopy ke_endzero + u_char ke_priority; /* (j) Process priority. */ + u_char ke_usrpri; /* (j) User priority based on p_cpu and p_nice. */ +/* End area that is copied on creation. */ +#define ke_endcopy ke_ithd + struct ithd *ke_ithd; /* (b) For interrupt threads only. */ +}; +/* + * Kernel-scheduled entity group (KSEG). The scheduler considers each KSEG to + * be an indivisible unit from a time-sharing perspective, though each KSEG may + * contain multiple KSEs. + */ +struct kseg { + struct proc *kg_proc; /* Process that contains this KSEG. */ + TAILQ_ENTRY(kseg) kg_ksegq; /* Queue of KSEGs in kg_proc. */ + TAILQ_HEAD(kg_kse_hd, kse); /* Queue of KSEs in this KSEG. */ + TAILQ_HEAD(kg_ksec_hd, ksec); /* Queue of KSECs in this KSEG. */ +/* The following fields are all copied upon creation in fork. */ +#define kg_startcopy kg_itcallout + struct callout kg_itcallout; /* (h) Interval timer callout. */ + struct priority kg_pri; /* (j) Process priority. */ + char kg_nice; /* (j?/k?) Process "nice" value. */ + struct rtprio kg_rtprio; /* (j) Realtime priority. */ +/* End area that is copied on creation. */ +#define kg_endcopy kg_dummy + int kg_dummy; +}; + struct proc { - TAILQ_ENTRY(proc) p_procq; /* (j) Run/mutex queue. */ - TAILQ_ENTRY(proc) p_slpq; /* (j) Sleep queue. */ LIST_ENTRY(proc) p_list; /* (d) List of all processes. */ /* substructures: */ + TAILQ_HEAD(p_ksegq, kseg); /* Queue of KSEGs. */ struct pcred *p_cred; /* (c + k) Process owner's identity. */ struct filedesc *p_fd; /* (b) Ptr to open files structure. */ + /* accumulated stats for all owned KSEs? */ struct pstats *p_stats; /* (b) Accounting/statistics (CPU). */ struct plimit *p_limit; /* (m) Process limits. */ struct vm_object *p_upages_obj;/* (a) Upages object. */ @@ -168,7 +351,61 @@ #define p_ucred p_cred->pc_ucred #define p_rlimit p_limit->pl_rlimit - +/* + * Compatibility defines for while we are using a + * single one in the proc struct during development. + */ + struct kseg p_kseg; +#define p_itcallout p_kseg.kg_itcallout +#define p_pri p_kseg.kg_pri +#define p_nice p_kseg.kg_nice +#define p_rtprio p_kseg.kg_rtprio + + struct kse p_kse; +#define p_stats p_kse.ke_stats +#define p_estcpu p_kse.ke_estcpu +#define p_cpticks p_kse.ke_cpticks +#define p_pctcpu p_kse.ke_pctcpu +#define p_uu p_kse.ke_uu +#define p_su p_kse.ke_su +#define p_iu p_kse.ke_iu +#define p_uticks p_kse.ke_uticks +#define p_sticks p_kse.ke_sticks +#define p_iticks p_kse.ke_iticks +#define p_slptime p_kse.ke_slptime +#define p_oncpu p_kse.ke_oncpu +#define p_rqindex p_kse.ke_rqindex +#define p_usrpri p_kse.ke_usrpri +#define p_ithd p_kse.ke_ithd +#define p_intr_nesting_level p_kse.ke_intr_nesting_level + + struct ksec p_ksec; +#define p_procq p_ksec.kc_procq +#define p_slpq p_ksec.kc_slpq +#define p_dupfd p_ksec.kc_dupfd +#define p_wchan p_ksec.kc_wchan +#define p_wmesg p_ksec.kc_wmesg +#define p_lastcpu p_ksec.kc_lastcpu +#define p_locks p_ksec.kc_locks +#define p_stops p_ksec.kc_stops +#define p_stype p_ksec.kc_stype +#define p_retval p_ksec.kc_retval +#define p_step p_ksec.kc_step +#define p_pfsflags p_ksec.kc_pfsflags +#define p_klist p_ksec.kc_klist +#define p_blocked p_ksec.kc_blocked +#define p_mtxname p_ksec.kc_mtxname +#define p_contested p_ksec.kc_contested +#define p_sleeplocks p_ksec.kc_sleeplocks +#define p_slpcallout p_ksec.kc_slpcallout +#define p_md p_ksec.kc_md +#define p_asleep p_ksec.kc_asleep + + + /* + * The following don't make too much sense.. + * See the kc_ or ke_ versions of the same flags + */ int p_flag; /* (c) P_* flags. */ int p_sflag; /* (j) PS_* flags. */ int p_stat; /* (j) S* process status. */ @@ -183,80 +420,47 @@ /* The following fields are all zeroed upon creation in fork. */ #define p_startzero p_oppid - pid_t p_oppid; /* (c + e) Save parent pid during ptrace. XXX */ - int p_dupfd; /* (c) Sideways ret value from fdopen. XXX */ + pid_t p_oppid; /* (c + e) Save ppid in ptrace. XXX */ struct vmspace *p_vmspace; /* (b) Address space. */ /* scheduling */ - u_int p_estcpu; /* (j) Time averaged value of p_cpticks. */ - int p_cpticks; /* (j) Ticks of cpu time. */ - fixpt_t p_pctcpu; /* (j) %cpu during p_swtime. */ - struct callout p_slpcallout; /* (h) Callout for sleep. */ - void *p_wchan; /* (j) Sleep address. */ - const char *p_wmesg; /* (j) Reason for sleep. */ - u_int p_swtime; /* (j) Time swapped in or out. */ - u_int p_slptime; /* (j) Time since last blocked. */ + u_int p_swtime; /* (j) Time swapped in or out. */ - struct callout p_itcallout; /* (h) Interval timer callout. */ struct itimerval p_realtimer; /* (h?/k?) Alarm timer. */ - u_int64_t p_runtime; /* (j) Real time in microsec. */ - u_int64_t p_uu; /* (j) Previous user time in microsec. */ - u_int64_t p_su; /* (j) Previous system time in microsec. */ - u_int64_t p_iu; /* (j) Previous interrupt time in microsec. */ - u_int64_t p_uticks; /* (j) Statclock hits in user mode. */ - u_int64_t p_sticks; /* (j) Statclock hits in system mode. */ - u_int64_t p_iticks; /* (j) Statclock hits processing intr. */ + u_int64_t p_runtime; /* (j) Real time in microsec. */ int p_traceflag; /* (j?) Kernel trace points. */ struct vnode *p_tracep; /* (j?) Trace to vnode. */ - sigset_t p_siglist; /* (c) Signals arrived but not delivered. */ + sigset_t p_siglist; /* (c) Sigs arrived, not delivered. */ struct vnode *p_textvp; /* (b) Vnode of executable. */ struct mtx p_mtx; /* (k) Lock for this struct. */ u_int p_spinlocks; /* (k) Count of held spin locks. */ - char p_lock; /* (c) Process lock (prevent swap) count. */ - u_char p_oncpu; /* (j) Which cpu we are on. */ - u_char p_lastcpu; /* (j) Last cpu we were on. */ - char p_rqindex; /* (j) Run queue index. */ - - short p_locks; /* (*) DEBUG: lockmgr count of held locks */ - u_int p_stops; /* (c) Procfs event bitmask. */ - u_int p_stype; /* (c) Procfs stop event type. */ - char p_step; /* (c) Procfs stop *once* flag. */ - u_char p_pfsflags; /* (c) Procfs flags. */ - char p_pad3[2]; /* Alignment. */ - register_t p_retval[2]; /* (k) Syscall aux returns. */ + char p_lock; /* (c) Process (prevent swap) count. */ + char p_pad3[3]; /* Alignment. */ struct sigiolst p_sigiolst; /* (c) List of sigio sources. */ int p_sigparent; /* (c) Signal to parent on exit. */ - sigset_t p_oldsigmask; /* (c) Saved mask from before sigpause. */ + sigset_t p_oldsigmask; /* (c) Saved mask from pre sigpause. */ int p_sig; /* (n) For core dump/debugger XXX. */ u_long p_code; /* (n) For core dump/debugger XXX. */ - struct klist p_klist; /* (c) Knotes attached to this process. */ - struct lock_list_entry *p_sleeplocks; /* (k) Held sleep locks. */ - struct mtx *p_blocked; /* (j) Mutex process is blocked on. */ - const char *p_mtxname; /* (j) Name of mutex blocked on. */ - LIST_HEAD(, mtx) p_contested; /* (j) Contested locks. */ struct nlminfo *p_nlminfo; /* (?) only used by/for lockd */ void *p_aioinfo; /* (c) ASYNC I/O info. */ - struct ithd *p_ithd; /* (b) For interrupt threads only. */ - int p_intr_nesting_level; /* (k) Interrupt recursion. */ /* End area that is zeroed on creation. */ -#define p_endzero p_startcopy - /* The following fields are all copied upon creation in fork. */ #define p_startcopy p_sigmask +#define p_endzero p_startcopy + /* We haven't defined how KSEs do signals yet */ sigset_t p_sigmask; /* (c) Current signal mask. */ stack_t p_sigstk; /* (c) Stack pointer and on-stack flag. */ int p_magic; /* (b) Magic number. */ - struct priority p_pri; /* (j) Process priority. */ - char p_nice; /* (j?/k?) Process "nice" value. */ char p_comm[MAXCOMLEN + 1]; /* (b) Process name. */ + int p_kse_enabled; /* (b) 0, unless using KSEs this proc. */ struct pgrp *p_pgrp; /* (e?/c?) Pointer to process group. */ struct sysentvec *p_sysent; /* (b) System call dispatch information. */ @@ -266,7 +470,6 @@ #define p_endcopy p_addr struct user *p_addr; /* (k) Kernel virtual addr of u-area (CPU). */ - struct mdproc p_md; /* (k) Any machine-dependent fields. */ u_short p_xstat; /* (c) Exit status for wait; also stop sig. */ u_short p_acflag; /* (c) Accounting flags. */ @@ -274,7 +477,6 @@ struct proc *p_peers; /* (c) */ struct proc *p_leader; /* (c) */ - struct pasleep p_asleep; /* (k) Used by asleep()/await(). */ void *p_emuldata; /* (c) Emulator state data. */ }; @@ -293,9 +495,10 @@ #define SMTX 7 /* Blocked on a mutex. */ /* These flags are kept in p_flag. */ +/* In a KSE world some go to a KSEC or a KSE (*)*/ #define P_ADVLOCK 0x00001 /* Process may hold a POSIX advisory lock. */ #define P_CONTROLT 0x00002 /* Has a controlling terminal. */ -#define P_KTHREAD 0x00004 /* Kernel thread. */ +#define P_KTHREAD 0x00004 /* Kernel thread. (*)*/ #define P_NOLOAD 0x00008 /* Ignore during load avg calculations. */ #define P_PPWAIT 0x00010 /* Parent is waiting for child to exec/exit. */ #define P_SELECT 0x00040 /* Selecting; wakeup/waiting danger. */ @@ -305,6 +508,7 @@ #define P_WAITED 0x01000 /* Debugging process has waited for child. */ #define P_WEXIT 0x02000 /* Working on exiting. */ #define P_EXEC 0x04000 /* Process called exec. */ +#define P_KSES 0x08000 /* Process is using KSEs. */ /* Should be moved to machine-dependent areas. */ --------------56834A9EA7789B526697FC9C-- To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Apr 25 11: 9:47 2001 Delivered-To: freebsd-arch@freebsd.org Received: from fw.wintelcom.net (ns1.wintelcom.net [209.1.153.20]) by hub.freebsd.org (Postfix) with ESMTP id D2C3937B422; Wed, 25 Apr 2001 11:09:41 -0700 (PDT) (envelope-from bright@fw.wintelcom.net) Received: (from bright@localhost) by fw.wintelcom.net (8.10.0/8.10.0) id f3PI9ev26785; Wed, 25 Apr 2001 11:09:40 -0700 (PDT) Date: Wed, 25 Apr 2001 11:09:40 -0700 From: Alfred Perlstein To: Julian Elischer Cc: Arch@FreeBSD.ORG, Robert Watson , Daniel Eischen Subject: Re: KSE threading support (first parts) Message-ID: <20010425110940.L1790@fw.wintelcom.net> References: <3AE71067.FF4BD029@elischer.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <3AE71067.FF4BD029@elischer.org>; from julian@elischer.org on Wed, Apr 25, 2001 at 10:59:03AM -0700 X-all-your-base: are belong to us. Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG * Julian Elischer [010425 11:00] wrote: > After discussing this with Jason Evans before he took his new position, > and having looked at his patches from December, and My similar patches from > january, here is a 'merged' patch. > > It breaks the proc structure into 4 parts. > > proc... owns all 'total process' resources. (e.g. address space, > limits, files) > kseg... KSE 'group'. Anything to do with working out the quanta to > be given to the threads (KSEs). A scheduling abstraction. > kse.... Actual scheduable entity for a processor > (if the KSEG has a quantum for it) > ksec... Where a thread stores its context when it is blocked > so tha the kse can return to either the user, or another > unblocked kse to continue using is quanta. > > This compiles cleanly and SHOULD run (it did run in an > earlier incarnation). It is by no means final, but rather > designed to give us a starting point in discussions. > > In this view, KSEGs are on the run queue and when they get some > quanta the KSEs hanging off them are run. > If 2 KSEs are running, the KSEG's quanta are exhausted a twice > the rate. > Each KSE has a very strong affinity for one processor > and KSECs have a weak affinity for a KSE. If a KSE runs out > of work but has time, it will 'poach' a KSEC from another KSE in the > same KSEG list. > > In this patch the linkages are not set up at all. > All that is done is that the structures are > defined and used instead of a monolithic 'proc' struct. > The new structures are 'included' in the proc structure > to maintain compatibility and to allow code to be changed slowely. > > What really needs to be done is for everyone who is interested to go over > rather arbitrary allocation of fields to structures that > I did and make suggested changes. > > Also I've punted on most things to do with signals as we haven't > really discussed how we want signals to be handled in a KSE world.. > (ca each KSEG or KSE get individual signals? do we need to > define a special 'signal' KSE? If so is that all it does? > > What happens to the 'u-area'? It makes sense that it stays except for struct pcb. Honestly swapping out the pcbs could be left as something to re-optimize later, they can take a signifigant amount of space, but nowadays it's not that big of a deal. > how do we define a "cur-kse" similar to curproc? > (do we need one?) yes. > presently the processor state is stored all over the place > when a process is suspended.. > This needs to be brought together so it can be put into the KSEC. > Who understands that stuff? That's your job. Refer to Jason Evans if he's available. You should also ask John Baldwin about proc locking as this stuff is definetly going to require locking in order to function properly. > Some of the next steps would be: > 1/ figure out what we want for signals etc.. Afaik Solaris tried many different ways to propogate signals across thier lwps, afaik they found the task so complex and so hard to get right that the latest implementation makes on lwp the signal target. Most likely then signals would be still be in struct proc or the initial kse. > 2/ get the contexts actually stored in the KSEC structure > when a proces is suspended. (instead of some strange pcb in funny memory > near the u area) huh? > 3/ Set up the linkages between these structures, and > 4/ start using 'kse' instead of 'proc' in a bunch of places > and using the linkages to find the appropriate other > structures when needed. > 5/ Add code to make new KSEs so that the 1:1 Mapping is no longer > true. > 6/ Add syscalls to start making KSEs other than the one that > is built into the process. > 7/ start making upcalls > ok, when are you going to have these done? :) One other question, have you looked at the recent lwp/kse support added to NetBSD? Is there anything to learn/avoid? -Alfred To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Apr 25 11:37:39 2001 Delivered-To: freebsd-arch@freebsd.org Received: from pcnet1.pcnet.com (pcnet1.pcnet.com [204.213.232.3]) by hub.freebsd.org (Postfix) with ESMTP id DD85137B422; Wed, 25 Apr 2001 11:37:33 -0700 (PDT) (envelope-from eischen@vigrid.com) Received: (from eischen@localhost) by pcnet1.pcnet.com (8.8.7/PCNet) id OAA24589; Wed, 25 Apr 2001 14:36:53 -0400 (EDT) Date: Wed, 25 Apr 2001 14:36:53 -0400 (EDT) From: Daniel Eischen To: Julian Elischer Cc: Arch@freebsd.org, alfred@freebsd.org, Robert Watson Subject: Re: KSE threading support (first parts) In-Reply-To: <3AE71067.FF4BD029@elischer.org> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Wed, 25 Apr 2001, Julian Elischer wrote: > In this view, KSEGs are on the run queue and when they get some > quanta the KSEs hanging off them are run. > If 2 KSEs are running, the KSEG's quanta are exhausted a twice > the rate. Don't we eventually want per-CPU run queues? Then how do multiple KSEs hanging off a KSEG get scheduled then if the quanta are in the KSEG? Round robin? > Each KSE has a very strong affinity for one processor > and KSECs have a weak affinity for a KSE. If a KSE runs out > of work but has time, it will 'poach' a KSEC from another KSE in the > same KSEG list. Again, if KSEs can have a strong affinity for 1 processor and there can be multiple KSEs hanging off a KSEG, then how do you schedule these KSEs when we have per-CPU run queues? It makes scheduling these KSEs more difficult than it needs to be. I still don't see the need to have multiple KSEs within a KSEG ;-) > In this patch the linkages are not set up at all. > All that is done is that the structures are > defined and used instead of a monolithic 'proc' struct. > The new structures are 'included' in the proc structure > to maintain compatibility and to allow code to be changed slowely. > > What really needs to be done is for everyone who is interested to go over > rather arbitrary allocation of fields to structures that > I did and make suggested changes. > > Also I've punted on most things to do with signals as we haven't > really discussed how we want signals to be handled in a KSE world.. > (ca each KSEG or KSE get individual signals? do we need to > define a special 'signal' KSE? If so is that all it does? Signals should be sent (via an upcall) to the first available KSE to return to userland (return from syscall, after preemption, etc.). The userland thread scheduler will pick a thread to receive the signal. If the thread is running or in one of the scheduling queues for the current KSEG, it will be able to handle it without any other assist from the kernel. If the thread is running or in one of the scheduling queues for another KSEG, it will mark the signal pending in the target thread and "signal" the appropriate KSEG with help from the kernel (one of the new user<->kernel interfaces or syscalls). (We may have to replace "KSEG" in the above with "KSE") It might be nice to have a general way of sending messages between KSEGs (KSEs?). > What happens to the 'u-area'? > > how do we define a "cur-kse" similar to curproc? > (do we need one?) > presently the processor state is stored all over the place > when a process is suspended.. > This needs to be brought together so it can be put into the KSEC. > Who understands that stuff? > > Some of the next steps would be: > 1/ figure out what we want for signals etc.. Ask me for help in this area. I know what the userland scheduler has to do when dispatching signals to threads. > 2/ get the contexts actually stored in the KSEC structure > when a proces is suspended. (instead of some strange pcb in funny memory > near the u area) > 3/ Set up the linkages between these structures, and > 4/ start using 'kse' instead of 'proc' in a bunch of places > and using the linkages to find the appropriate other > structures when needed. > 5/ Add code to make new KSEs so that the 1:1 Mapping is no longer > true. > 6/ Add syscalls to start making KSEs other than the one that > is built into the process. > 7/ start making upcalls Can't we start with 7 ;-) -- Dan Eischen To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Apr 25 13: 5: 6 2001 Delivered-To: freebsd-arch@freebsd.org Received: from gw.nectar.com (gw.nectar.com [208.42.49.153]) by hub.freebsd.org (Postfix) with ESMTP id 59D8837B422 for ; Wed, 25 Apr 2001 13:05:03 -0700 (PDT) (envelope-from nectar@nectar.com) Received: from hamlet.nectar.com (hamlet.nectar.com [10.0.1.102]) by gw.nectar.com (Postfix) with ESMTP id B1311194C7; Wed, 25 Apr 2001 15:05:02 -0500 (CDT) Received: (from nectar@localhost) by hamlet.nectar.com (8.11.3/8.9.3) id f3PK52F02351; Wed, 25 Apr 2001 15:05:02 -0500 (CDT) (envelope-from nectar@spawn.nectar.com) Date: Wed, 25 Apr 2001 15:05:02 -0500 From: "Jacques A. Vidrine" To: Peter Pentchev Cc: arch@FreeBSD.org Subject: Re: gid_t vs. plain int Message-ID: <20010425150502.B2200@hamlet.nectar.com> References: <20010425183640.C54687@ringworld.oblivion.bg> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <20010425183640.C54687@ringworld.oblivion.bg>; from roam@orbitel.bg on Wed, Apr 25, 2001 at 06:36:40PM +0300 X-Url: http://www.nectar.com/ Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Wed, Apr 25, 2001 at 06:36:40PM +0300, Peter Pentchev wrote: > Hi, > > OK. I've (kinda) had enough. > > Is there a reason that struct group in does not define 'gr_gid' > as a gid_t value, but as a plain int? This makes all kinds of things > go berserk with gcc -Wall -W, and causes dozens of (totally unneeded) > casts. > > Is there some standard that says pw_gid is gid_t, but gr_gid is int? > If not, would anyone be interested in patches (yes, I'm prepared to sweep > the whole source tree), making gr_gid a gid_t? ISO/IEC 9945-1: 1996 (POSIX 1003.1) says that a group structure `includes the members': char * gr_name The name of the group gid_t gr_gid The numerical group ID char ** gr_mem A null-terminated vector of pointers to the individual member names Also, the getgr* functions which take a group number argument have prototypes with `gid_t'. I say go ahead and clean it up. Cheers, -- Jacques Vidrine / n@nectar.com / jvidrine@verio.net / nectar@FreeBSD.org To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Apr 25 13:26: 7 2001 Delivered-To: freebsd-arch@freebsd.org Received: from fledge.watson.org (fledge.watson.org [204.156.12.50]) by hub.freebsd.org (Postfix) with ESMTP id E706F37B424 for ; Wed, 25 Apr 2001 13:26:02 -0700 (PDT) (envelope-from arr@watson.org) Received: from localhost (arr@localhost) by fledge.watson.org (8.11.3/8.11.3) with SMTP id f3PKQWu41141 for ; Wed, 25 Apr 2001 16:26:33 -0400 (EDT) (envelope-from arr@watson.org) Date: Wed, 25 Apr 2001 16:26:32 -0400 (EDT) From: "Andrew R. Reiter" To: freebsd-arch@FreeBSD.org Subject: libevent & fbsd Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Hey, Im doing some -audit work and Im about to start writing some patches to change alot of the select(2) calls to use k{queue,event}(). Niels Provos has started writing (or has semi-finished a release, afaik) of his libevent code. Basically it's a general interface to multiple types of event handling code (select(), poll(), kqueue/event) on file descriptors. I am interested in using something like this, possibly, instead of hacking through some of the code and moving kqueue/event into them. I am wondering if anyone has spoken to Niels about perhaps getting it into our tree? or if anyone has any thoughts on this at all? :-) The url for the code is: http://www.monkey.org/~provos/libevent/ Thanks, Andrew *-------------................................................. | Andrew R. Reiter | arr@fledge.watson.org | "It requires a very unusual mind | to undertake the analysis of the obvious" -- A.N. Whitehead To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Apr 25 13:49: 9 2001 Delivered-To: freebsd-arch@freebsd.org Received: from fw.wintelcom.net (ns1.wintelcom.net [209.1.153.20]) by hub.freebsd.org (Postfix) with ESMTP id 0CAFA37B422 for ; Wed, 25 Apr 2001 13:49:07 -0700 (PDT) (envelope-from bright@fw.wintelcom.net) Received: (from bright@localhost) by fw.wintelcom.net (8.10.0/8.10.0) id f3PKn2301895; Wed, 25 Apr 2001 13:49:02 -0700 (PDT) Date: Wed, 25 Apr 2001 13:49:02 -0700 From: Alfred Perlstein To: "Andrew R. Reiter" Cc: freebsd-arch@FreeBSD.ORG, provos@openbsd.org Subject: Re: libevent & fbsd Message-ID: <20010425134902.S1790@fw.wintelcom.net> References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: ; from arr@watson.org on Wed, Apr 25, 2001 at 04:26:32PM -0400 X-all-your-base: are belong to us. Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG cc'd Niels Provos. * Andrew R. Reiter [010425 13:26] wrote: > Hey, > > Im doing some -audit work and Im about to start writing some patches to > change alot of the select(2) calls to use k{queue,event}(). Niels Provos > has started writing (or has semi-finished a release, afaik) of his > libevent code. Basically it's a general interface to multiple types of > event handling code (select(), poll(), kqueue/event) on file descriptors. > I am interested in using something like this, possibly, instead of hacking > through some of the code and moving kqueue/event into them. > > I am wondering if anyone has spoken to Niels about perhaps getting it into > our tree? or if anyone has any thoughts on this at all? :-) > > The url for the code is: > > http://www.monkey.org/~provos/libevent/ Niels, please excuse me if I'm jumping to conclusions here, and I realize that the library seems to be in very early beta form (ver 0.3), however, it looks like libevent's model is not complex enough to support effecient use of kqueue. This is because EV_ONESHOT is always OR'd into the event flags, espcially when EV_READ is called for. What you really want to do is provide a way to keep a generic list of "constantly polled fd" within your library. The idea is for instance you have an application (take IRCd for instance) where you have several thousand clients, it'd be much more optimal to register the read event once (~EV_ONESHOT) then have the application call back when it's no longer interested in the read events. The same could be said for EV_WRITE events, for streaming applications you don't want EV_ONESHOT because as soon as the event fires you're most likely going to blast the pipe full then request notification when more space is available. The only time you'd want the event cleared is when you're out of data and the socket isn't full. Perhaps for EV_WRITE/EV_READ a hints based mechanism could be used to specify whether interest will most likely remain for the event asked for. -- -Alfred Perlstein - [alfred@freebsd.org] Instead of asking why a piece of software is using "1970s technology," start asking why software is ignoring 30 years of accumulated wisdom. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Apr 25 14:14:29 2001 Delivered-To: freebsd-arch@freebsd.org Received: from citi.umich.edu (citi.umich.edu [141.211.92.141]) by hub.freebsd.org (Postfix) with ESMTP id 84C3437B43F for ; Wed, 25 Apr 2001 14:14:27 -0700 (PDT) (envelope-from provos@citi.umich.edu) Received: from citi.umich.edu (ssh-mapper.citi.umich.edu [141.211.92.147]) by citi.umich.edu (Postfix) with ESMTP id 05BC4207C1; Wed, 25 Apr 2001 17:14:27 -0400 (EDT) Subject: Re: libevent & fbsd From: Niels Provos In-Reply-To: Alfred Perlstein, Wed, 25 Apr 2001 13:49:02 PDT To: Alfred Perlstein Cc: "Andrew R. Reiter" , freebsd-arch@FreeBSD.ORG Date: Wed, 25 Apr 2001 17:14:26 -0400 Message-Id: <20010425211427.05BC4207C1@citi.umich.edu> Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG In message <20010425134902.S1790@fw.wintelcom.net>, Alfred Perlstein writes: >What you really want to do is provide a way to keep a generic list >of "constantly polled fd" within your library. The idea is for >instance you have an application (take IRCd for instance) where >you have several thousand clients, it'd be much more optimal to >register the read event once (~EV_ONESHOT) then have the application >call back when it's no longer interested in the read events. I am aware of this problem. My goal was to create a very easy to use and intuitive API that would abstract away complexities that people are experiencing with asynchronous I/O. If you have an idea to extend the API in a simple way that would make better use of the capabilities of kqueue-like systems, please let me know. I know of people who use libevent in a commercial environment, and they are very happy with it. Greetings, Niels. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Apr 25 15:25:58 2001 Delivered-To: freebsd-arch@freebsd.org Received: from mailman.zeta.org.au (mailman.zeta.org.au [203.26.10.16]) by hub.freebsd.org (Postfix) with ESMTP id 55C2437B423 for ; Wed, 25 Apr 2001 15:25:55 -0700 (PDT) (envelope-from bde@zeta.org.au) Received: from bde.zeta.org.au (bde.zeta.org.au [203.2.228.102]) by mailman.zeta.org.au (8.9.3/8.8.7) with ESMTP id IAA20732; Thu, 26 Apr 2001 08:25:49 +1000 Date: Thu, 26 Apr 2001 08:24:48 +1000 (EST) From: Bruce Evans X-Sender: bde@besplex.bde.org To: Peter Pentchev Cc: arch@FreeBSD.ORG Subject: Re: gid_t vs. plain int In-Reply-To: <20010425183640.C54687@ringworld.oblivion.bg> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Wed, 25 Apr 2001, Peter Pentchev wrote: > Is there a reason that struct group in does not define 'gr_gid' > as a gid_t value, but as a plain int? This makes all kinds of things Historical reasons, and because wollman still hasn't committed his header cleanups which fix this and many other related problems. > go berserk with gcc -Wall -W, and causes dozens of (totally unneeded) > casts. The casts might be needed to support K&R compilers on systems with sizeof(gid_t) < sizeof(int), but mostly make things worse by hiding bugs. > Is there some standard that says pw_gid is gid_t, but gr_gid is int? POSIX.1-1990 says that both are gid_t. BTW, the kernel still uses int for gids in many places, e.g., kern/syscalls.master says that chown(2) takes an "int gid" arg. This depends on various type puns to work. Similarly for many other syscall args. Bruce To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Apr 25 19:51:30 2001 Delivered-To: freebsd-arch@freebsd.org Received: from khavrinen.lcs.mit.edu (khavrinen.lcs.mit.edu [18.24.4.193]) by hub.freebsd.org (Postfix) with ESMTP id D289537B422 for ; Wed, 25 Apr 2001 19:51:25 -0700 (PDT) (envelope-from wollman@khavrinen.lcs.mit.edu) Received: (from wollman@localhost) by khavrinen.lcs.mit.edu (8.9.3/8.9.3) id WAA15799; Wed, 25 Apr 2001 22:51:10 -0400 (EDT) (envelope-from wollman) Date: Wed, 25 Apr 2001 22:51:10 -0400 (EDT) From: Garrett Wollman Message-Id: <200104260251.WAA15799@khavrinen.lcs.mit.edu> To: bde@zeta.org.au Subject: Re: gid_t vs. plain int X-Newsgroups: mit.lcs.mail.freebsd-arch In-Reply-To: References: <20010425183640.C54687@ringworld.oblivion.bg> Organization: MIT Laboratory for Computer Science Cc: arch@freebsd.org Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Bruce writes: >BTW, the kernel still uses int for gids in many places, e.g., >kern/syscalls.master says that chown(2) takes an "int gid" arg. This >depends on various type puns to work. Of course it has to do that in order to allow for the possibility that gid_t might still be a short. Of course, this precludes gid_t from being something-longer-than-int, but much larger parts of the ABI would have to change at the same time in that case. -GAWollman -- Garrett A. Wollman | O Siem / We are all family / O Siem / We're all the same wollman@lcs.mit.edu | O Siem / The fires of freedom Opinions not those of| Dance in the burning flame MIT, LCS, CRS, or NSA| - Susan Aglukark and Chad Irschick To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Apr 25 21:27:52 2001 Delivered-To: freebsd-arch@freebsd.org Received: from mailman.zeta.org.au (mailman.zeta.org.au [203.26.10.16]) by hub.freebsd.org (Postfix) with ESMTP id 8002B37B423 for ; Wed, 25 Apr 2001 21:27:49 -0700 (PDT) (envelope-from bde@zeta.org.au) Received: from bde.zeta.org.au (bde.zeta.org.au [203.2.228.102]) by mailman.zeta.org.au (8.9.3/8.8.7) with ESMTP id OAA31938; Thu, 26 Apr 2001 14:27:42 +1000 Date: Thu, 26 Apr 2001 14:26:23 +1000 (EST) From: Bruce Evans X-Sender: bde@besplex.bde.org To: Garrett Wollman Cc: arch@freebsd.org Subject: Re: gid_t vs. plain int In-Reply-To: <200104260251.WAA15799@khavrinen.lcs.mit.edu> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Wed, 25 Apr 2001, Garrett Wollman wrote: > Bruce writes: > > >BTW, the kernel still uses int for gids in many places, e.g., > >kern/syscalls.master says that chown(2) takes an "int gid" arg. This > >depends on various type puns to work. > > Of course it has to do that in order to allow for the possibility that > gid_t might still be a short. No. If gid_t were short, then type puns are neither necessary nor sufficient for handling it properly. Lying about the arg types in syscalls.master just makes it harder for trap.c:syscall() to convert the args. syscall() repacks the args into the syscall args structs declared in . It "just happens" that the repacking can be implemented using a simple copyin() on i386's and alphas. > Of course, this precludes gid_t from > being something-longer-than-int, but much larger parts of the ABI would > have to change at the same time in that case. The syscall args structs handle all cases that are likely to happen in practice, including short args on little-endian machines (minor adjustments are required for short args on big-endian machines). Short args already occur in practice for at least lchmod() (because NetBSD debogotified syscalls.master before FreeBSD obtained lchmod() from NetBSD). Bruce To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu Apr 26 10:15:42 2001 Delivered-To: freebsd-arch@freebsd.org Received: from filk.iinet.net.au (syncopation-dns.iinet.net.au [203.59.24.29]) by hub.freebsd.org (Postfix) with SMTP id 365B537B71B for ; Thu, 26 Apr 2001 10:15:33 -0700 (PDT) (envelope-from julian@elischer.org) Received: (qmail 13034 invoked by uid 666); 26 Apr 2001 17:18:42 -0000 Received: from i179-136.nv.iinet.net.au (HELO elischer.org) (203.59.179.136) by mail.m.iinet.net.au with SMTP; 26 Apr 2001 17:18:42 -0000 Message-ID: <3AE85776.92D6BD90@elischer.org> Date: Thu, 26 Apr 2001 10:14:30 -0700 From: Julian Elischer X-Mailer: Mozilla 4.7 [en] (X11; U; FreeBSD 5.0-CURRENT i386) X-Accept-Language: en, hu MIME-Version: 1.0 To: Alfred Perlstein Cc: Arch@FreeBSD.ORG, Robert Watson , Daniel Eischen , John Baldwin Subject: Re: KSE threading support (first parts) References: <3AE71067.FF4BD029@elischer.org> <20010425110940.L1790@fw.wintelcom.net> Content-Type: text/plain; charset=iso-8859-2 Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Alfred Perlstein wrote: > >  > > Also I've punted on most things to do with signals as we haven't > > really discussed how we want signals to be handled in a KSE world.. > > (ca each KSEG or KSE get individual signals? do we need to > > define a special 'signal' KSE? If so is that all it does? > > > > What happens to the 'u-area'? > > It makes sense that it stays except for struct pcb. Honestly > swapping out the pcbs could be left as something to re-optimize > later, they can take a signifigant amount of space, but nowadays > it's not that big of a deal. so how much work is it to move the pcbs into the proc struct? (and thus into the ksec struct) does anyone see any reason that that would not work? Is thre anything special about having it in the u area? (other than swapping) > > > how do we define a "cur-kse" similar to curproc? > > (do we need one?) > > yes. I will look at seeing if I can do this... > > > presently the processor state is stored all over the place > > when a process is suspended.. > > This needs to be brought together so it can be put into the KSEC. > > Who understands that stuff? > > That's your job. Refer to Jason Evans if he's available. gee thanks.. I don't really have a grip on all the ways that traps etc can need to save context.. I REALLY don't get the floating point context stuff. Some state is stored on the user tack, some on the kernel stack and some in the pcb (and maybe some in the proc struct.) to complicate thigs a little: Some things such as segment registers may be "per KSE" where normal registers are "per KSEC". > > You should also ask John Baldwin about proc locking as this > stuff is definetly going to require locking in order to function > properly. > > > Some of the next steps would be: > > 1/ figure out what we want for signals etc.. > > Afaik Solaris tried many different ways to propogate signals across > thier lwps, afaik they found the task so complex and so hard to get > right that the latest implementation makes one lwp the signal target. > > Most likely then signals would be still be in struct proc or the > initial kse. I was thinking about this.. I think that signals should be delivered to the UTS and it should be up to the UTS to decide what to do about it.. In that case they would be delivered to the first available kernel->user boundary crossing for that process. > > > 2/ get the contexts actually stored in the KSEC structure > > when a proces is suspended. (instead of some strange pcb in funny memory > > near the u area) > > huh? I mean that I get a headach when looking at where all the registers, segment registers etc. are all stored as it looks as if it's rather mixed up.. It'd be nice if it were all in one place, and the KSEC is where that should be. > > > 3/ Set up the linkages between these structures, and > > 4/ start using 'kse' instead of 'proc' in a bunch of places > > and using the linkages to find the appropriate other > > structures when needed. > > 5/ Add code to make new KSEs so that the 1:1 Mapping is no longer > > true. > > 6/ Add syscalls to start making KSEs other than the one that > > is built into the process. > > 7/ start making upcalls > > > > ok, when are you going to have these done? :) > > One other question, have you looked at the recent lwp/kse support added > to NetBSD? Is there anything to learn/avoid? I've had only a small look so far sorting hte wheat from the chaff is a hard task and of course it requires understanding a lot that I'm not too solid on. (e.g. UVM). > > -Alfred -- __--_|\ Julian Elischer / \ julian@elischer.org ( OZ ) World tour 2000-2001 ---> X_.---._/ v To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu Apr 26 11:25:18 2001 Delivered-To: freebsd-arch@freebsd.org Received: from filk.iinet.net.au (syncopation-dns.iinet.net.au [203.59.24.29]) by hub.freebsd.org (Postfix) with SMTP id 1097D37B424 for ; Thu, 26 Apr 2001 11:25:06 -0700 (PDT) (envelope-from julian@elischer.org) Received: (qmail 13168 invoked by uid 666); 26 Apr 2001 18:28:13 -0000 Received: from i179-136.nv.iinet.net.au (HELO elischer.org) (203.59.179.136) by mail.m.iinet.net.au with SMTP; 26 Apr 2001 18:28:13 -0000 Message-ID: <3AE867C2.3B657214@elischer.org> Date: Thu, 26 Apr 2001 11:24:02 -0700 From: Julian Elischer X-Mailer: Mozilla 4.7 [en] (X11; U; FreeBSD 5.0-CURRENT i386) X-Accept-Language: en, hu MIME-Version: 1.0 To: Daniel Eischen Cc: Arch@freebsd.org, alfred@freebsd.org, Robert Watson Subject: Re: KSE threading support (first parts) References: Content-Type: text/plain; charset=iso-8859-2 Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Daniel Eischen wrote: > > On Wed, 25 Apr 2001, Julian Elischer wrote: > > In this view, KSEGs are on the run queue and when they get some > > quanta the KSEs hanging off them are run. > > If 2 KSEs are running, the KSEG's quanta are exhausted a twice > > the rate. > > Don't we eventually want per-CPU run queues? Then how do > multiple KSEs hanging off a KSEG get scheduled then if the quanta > are in the KSEG? Round robin? Nominally yes we do, but I must admit that I see a single queue with multiple processors reading off it as actually being easier to use and implement and I think it may even have some better use characteristics. I have nighmares thinking about doing it with multiple (per processor) run queues. Allocating quanta and priority between KSEs on different queues is much more tricky. > > > Each KSE has a very strong affinity for one processor > > and KSECs have a weak affinity for a KSE. If a KSE runs out > > of work but has time, it will 'poach' a KSEC from another KSE in the > > same KSEG list. > > Again, if KSEs can have a strong affinity for 1 processor and there > can be multiple KSEs hanging off a KSEG, then how do you schedule > these KSEs when we have per-CPU run queues? It makes scheduling > these KSEs more difficult than it needs to be. In that case the KSEs are put on the run queues when there is at least one KSEC ready to run for each of them in the KSEG. If you have 3 KSECs ready to run and 4 processors, then you put three KSEs on run queues. If you have 6 KSECs ready to run then you put 4 on the run queues. The first 2 to complete or block will do a second KSEC if there is still some of the quantum left. The priority they are scheduled with is taken from the KSEG (probably). > > I still don't see the need to have multiple KSEs within a KSEG ;-) KSEs in the same KSEG are using the same pool of quanta to complete KSECs. SOMETHING has to hold that information. A KSE is a virtual processor where a KSEG is a virtual multiprocessor. You allocate quanta to the KSEG. KSEs use these quanta. The KSEG is competing (almost) fairly with other processes in the system. If you want "system" thread scheduling, you can create more KSEGs. They compete against other processes and against each other for slices of the 'real' machine. > > > In this patch the linkages are not set up at all. > > All that is done is that the structures are > > defined and used instead of a monolithic 'proc' struct. > > The new structures are 'included' in the proc structure > > to maintain compatibility and to allow code to be changed slowely. > > > > What really needs to be done is for everyone who is interested to go over > > rather arbitrary allocation of fields to structures that > > I did and make suggested changes. > > > > Also I've punted on most things to do with signals as we haven't > > really discussed how we want signals to be handled in a KSE world.. > > (ca each KSEG or KSE get individual signals? do we need to > > define a special 'signal' KSE? If so is that all it does? > > Signals should be sent (via an upcall) to the first available > KSE to return to userland (return from syscall, after preemption, > etc.). The userland thread scheduler will pick a thread to > receive the signal. If the thread is running or in one > of the scheduling queues for the current KSEG, it will > be able to handle it without any other assist from the kernel. > If the thread is running or in one of the scheduling queues for > another KSEG, it will mark the signal pending in the target > thread and "signal" the appropriate KSEG with help from the > kernel (one of the new user<->kernel interfaces or syscalls). OK so 'signals' and everything to do with them are "Per process". I may edit the patch to indicate this. This does indicate a mutex with SMP so that if two processors return their KSEs to userland at the same time, they don't deliver the same signal twice. Can two KSEs (KSEs are on different processors) deliver DIFFERENT signals to userland at the same time? > > (We may have to replace "KSEG" in the above with "KSE") yes, you are correct.. it should read: (I think) > Signals should be sent (via an upcall) to the first available > KSE to return to userland (return from syscall, after preemption, > etc.). The userland thread scheduler will pick a thread to > receive the signal. If the thread is running or in one > of the scheduling queues for the current KSEG, it will > be able to handle it without any other assist from the kernel. Is this what you mean? This is tricky... when a KSE returns to userland it is running NO threads. All threaded syscalls return to userland in the 'suspended' state, so that the UTS can decide what to run. All syscalls return via an upcall to the UTS (actually the original newkse() call returns infinitly many times.. that is how the upcall is achieved). The return values, error returns and data movements have been made to the appropriate memory locations.. It's as if the thread did a 'yield()' immediatly after returning from a normal syscall.. So we can be sure that THIS KSE isn't running the interrupt thread. If the thread is however being run on a different KSE (regardless of whether in this KSEG or not) then the signal must be noted so that the thread can see it at some future time. If it's not running but in another KSEG then it's treated as if running, (the signal noted) and the UTS will make it runnable at the next opportunity that that KSEG is runnable. (If we ran the thread on this KSE regardless of the fact that it's from another KSEG, then it will be running with a priority other than what the programmer assigned it. (maybe he wants lower priority signal handling)). > If the thread is running or in one of the scheduling queues for > another KSE, it will mark the signal pending in the target > thread and "signal" the appropriate KSEG with help from the > kernel (one of the new user<->kernel interfaces or syscalls). If a KSEG is not running because it had no work, then yes, you need to wake up one of its KSEs to handle the signal. > > It might be nice to have a general way of sending messages > between KSEGs (KSEs?). Userland-to-kernel? or userland-to-userland? "kind of like a signal?" :-) > > > What happens to the 'u-area'? > > > > how do we define a "cur-kse" similar to curproc? > > (do we need one?) > > presently the processor state is stored all over the place > > when a process is suspended.. > > This needs to be brought together so it can be put into the KSEC. > > Who understands that stuff? > > > > Some of the next steps would be: > > 1/ figure out what we want for signals etc.. > > Ask me for help in this area. I know what the userland scheduler > has to do when dispatching signals to threads. > > > 2/ get the contexts actually stored in the KSEC structure > > when a proces is suspended. (instead of some strange pcb in funny memory > > near the u area) > > 3/ Set up the linkages between these structures, and > > 4/ start using 'kse' instead of 'proc' in a bunch of places > > and using the linkages to find the appropriate other > > structures when needed. > > 5/ Add code to make new KSEs so that the 1:1 Mapping is no longer > > true. > > 6/ Add syscalls to start making KSEs other than the one that > > is built into the process. > > 7/ start making upcalls > > Can't we start with 7 ;-) well, really, there are 4 new syscalls and the upcall is the multiple return of one of them. kse_id ksecreate(struct retblock *rblk, boolean newkseg); kseyield(timeout); /* never returns.. comes back as upcall when awakened */ ksewakeup(kse_id sleeper); ksefinish(); /* just never returns (unless we are last kse in which case, upcalls) */ upcalls return with certain information in the retblock.. 1/ why the KSE upcalled. (a bitmap of reasons (there may have been more than one reason, (e.g. 3 returned syscalls and 2 signals and a wakeup). 2/ head of linked list of completed syscall status blocks. (these should be allocated in the thread control blocks that the UTS uses and will include room for a pointer that the kernel ignores but which the UTS can use to find the start of that thread control block. Also enough information so that the kernel can store enough thread run state so that the thread can be made to look as if it has just done a 'yield()'. (so it can be restarted in the same way that other threads can be restarted.)) > > -- > Dan Eischen -- __--_|\ Julian Elischer / \ julian@elischer.org ( OZ ) World tour 2000-2001 ---> X_.---._/ v To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu Apr 26 11:50:43 2001 Delivered-To: freebsd-arch@freebsd.org Received: from meow.osd.bsdi.com (meow.osd.bsdi.com [204.216.28.88]) by hub.freebsd.org (Postfix) with ESMTP id A509737B422; Thu, 26 Apr 2001 11:50:34 -0700 (PDT) (envelope-from jhb@FreeBSD.org) Received: from laptop.baldwin.cx (john@jhb-laptop.osd.bsdi.com [204.216.28.241]) by meow.osd.bsdi.com (8.11.2/8.11.2) with ESMTP id f3QInuG50061; Thu, 26 Apr 2001 11:49:56 -0700 (PDT) (envelope-from jhb@FreeBSD.org) Message-ID: X-Mailer: XFMail 1.4.0 on FreeBSD X-Priority: 3 (Normal) Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 8bit MIME-Version: 1.0 In-Reply-To: <3AE85776.92D6BD90@elischer.org> Date: Thu, 26 Apr 2001 11:49:14 -0700 (PDT) From: John Baldwin To: Julian Elischer Subject: Re: KSE threading support (first parts) Cc: Daniel Eischen , Robert Watson , Arch@FreeBSD.org, Alfred Perlstein , jasone@FreeBSD.org Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On 26-Apr-01 Julian Elischer wrote: >> > how do we define a "cur-kse" similar to curproc? >> > (do we need one?) >> >> yes. > > I will look at seeing if I can do this... Trivially. Just use a per-cpu variable 'curkse' and do the equivalent of 's/curproc/PCPU_GET(curkse)/' as needed. Some other tweaks will be needed in some asm files as well, but that is easy. >> > presently the processor state is stored all over the place >> > when a process is suspended.. >> > This needs to be brought together so it can be put into the KSEC. >> > Who understands that stuff? >> >> That's your job. Refer to Jason Evans if he's available. > > gee thanks.. > I don't really have a grip on all the ways that traps etc > can need to save context.. > I REALLY don't get the floating point context stuff. > Some state is stored on the user tack, some on the kernel stack > and some in the pcb (and maybe some in the proc struct.) The pcb is used to save state while a thread is switched out. When a trap/exception/interrupt occurs the state is saved in a stack frame in the kernel. The FP state is a little tricky because we don't want to save it and restore it at every context switch, so we use a type of lazy switching where we only save it if we are using it and only restore it if we are using it, but a bit more complicated. All of this should be per-thread instead of per-proces and won't be that hard. Hardly any of this needs changing. > to complicate thigs a little: > Some things such as segment registers may be "per KSE" > where normal registers are "per KSEC". Stick all the registers in the same place. It doesn't hurt to duplicate the 4 seg regs in a couple of places, and the miniscule gain is hardly worth the extra effort involved. State like this really should be per-thread. >> You should also ask John Baldwin about proc locking as this >> stuff is definetly going to require locking in order to function >> properly. At first what Jason was planning on doing I think was just letting hte lock for the process lock all the kse's, kseg's, ksec's, etc. associated with a proc as well as the proc itself. I wouldn't worry too much about this at first. >> > Some of the next steps would be: >> > 1/ figure out what we want for signals etc.. >> >> Afaik Solaris tried many different ways to propogate signals across >> thier lwps, afaik they found the task so complex and so hard to get >> right that the latest implementation makes one lwp the signal target. >> >> Most likely then signals would be still be in struct proc or the >> initial kse. > > I was thinking about this.. > I think that signals should be delivered to the UTS > and it should be up to the UTS to decide what to do about it.. > In that case they would be delivered to the first available > kernel->user boundary crossing for that process. Userland is not available to create signal stacks, etc. You can make signals still be a process property adn the first kse (or ksec or whatever one is a runnable thread/context) that returns to userland from interrupt, etc. will execute ast() on the way out and post any signals. If you leave signals as being per-process I see there being hardly any changes needed in any of the signal handling code. >> > 2/ get the contexts actually stored in the KSEC structure >> > when a proces is suspended. (instead of some strange pcb in funny >> > memory >> > near the u area) >> >> huh? > > I mean that I get a headach when looking at where all the > registers, segment registers etc. are all stored as it looks as if > it's rather mixed up.. It'd be nice if it were all in one place, > and the KSEC is where that should be. The pcb should be per-thread, yes. >> > 3/ Set up the linkages between these structures, and >> > 4/ start using 'kse' instead of 'proc' in a bunch of places >> > and using the linkages to find the appropriate other >> > structures when needed. >> > 5/ Add code to make new KSEs so that the 1:1 Mapping is no longer >> > true. >> > 6/ Add syscalls to start making KSEs other than the one that >> > is built into the process. >> > 7/ start making upcalls >> > >> >> ok, when are you going to have these done? :) >> >> One other question, have you looked at the recent lwp/kse support added >> to NetBSD? Is there anything to learn/avoid? > > I've had only a small look so far > sorting hte wheat from the chaff is a hard task and of course it requires > understanding a lot that I'm not too solid on. (e.g. UVM). My only concern at this point in time is that I think 5.0 is fragile enough as it is. I'd rather that KSE not come in until 6.0-CURRENT so that 5.x has a fighting chance of being stable, but that is just my opinion. -- John Baldwin -- http://www.FreeBSD.org/~jhb/ PGP Key: http://www.baldwin.cx/~john/pgpkey.asc "Power Users Use the Power to Serve!" - http://www.FreeBSD.org/ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu Apr 26 12: 6:38 2001 Delivered-To: freebsd-arch@freebsd.org Received: from dragon.nuxi.com (trang.nuxi.com [209.152.133.57]) by hub.freebsd.org (Postfix) with ESMTP id A6CE737B422; Thu, 26 Apr 2001 12:06:36 -0700 (PDT) (envelope-from obrien@NUXI.com) Received: (from obrien@localhost) by dragon.nuxi.com (8.11.3/8.11.1) id f3QJ6UJ92992; Thu, 26 Apr 2001 12:06:30 -0700 (PDT) (envelope-from obrien) Date: Thu, 26 Apr 2001 12:06:30 -0700 From: "David O'Brien" To: Julian Elischer Cc: Arch@FreeBSD.ORG, Daniel Eischen Subject: Re: KSE threading support (first parts) Message-ID: <20010426120630.A92915@dragon.nuxi.com> Reply-To: obrien@FreeBSD.ORG References: <3AE71067.FF4BD029@elischer.org> <20010425110940.L1790@fw.wintelcom.net> <3AE85776.92D6BD90@elischer.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <3AE85776.92D6BD90@elischer.org>; from julian@elischer.org on Thu, Apr 26, 2001 at 10:14:30AM -0700 X-Operating-System: FreeBSD 5.0-CURRENT Organization: The NUXI BSD group X-Pgp-Rsa-Fingerprint: B7 4D 3E E9 11 39 5F A3 90 76 5D 69 58 D9 98 7A X-Pgp-Rsa-Keyid: 1024/34F9F9D5 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Uh people. We really, really NEED to agree on the design here. Jason's paper (http://people.freebsd.org/~jasone/refs/freebsd_kse/freebsd_kse.html) is explains all this. Before any more work is done on KSE's I really feel people should either agree fully with the paper, or debate its contents first. I really doubt a single person will develop KSE, so it is imperative there is a common sheet of music. -- -- David (obrien@FreeBSD.org) To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu Apr 26 13:11: 8 2001 Delivered-To: freebsd-arch@freebsd.org Received: from pcnet1.pcnet.com (pcnet1.pcnet.com [204.213.232.3]) by hub.freebsd.org (Postfix) with ESMTP id CED4037B422; Thu, 26 Apr 2001 13:11:02 -0700 (PDT) (envelope-from eischen@vigrid.com) Received: (from eischen@localhost) by pcnet1.pcnet.com (8.8.7/PCNet) id QAA14182; Thu, 26 Apr 2001 16:10:19 -0400 (EDT) Date: Thu, 26 Apr 2001 16:10:18 -0400 (EDT) From: Daniel Eischen To: Julian Elischer Cc: Arch@freebsd.org, alfred@freebsd.org, Robert Watson Subject: Re: KSE threading support (first parts) In-Reply-To: <3AE867C2.3B657214@elischer.org> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Thu, 26 Apr 2001, Julian Elischer wrote: > Daniel Eischen wrote: > > I still don't see the need to have multiple KSEs within a KSEG ;-) > > KSEs in the same KSEG are using the same pool of quanta to > complete KSECs. > SOMETHING has to hold that information. A KSE is a virtual > processor where a KSEG is a virtual multiprocessor. > > You allocate quanta to the KSEG. KSEs use these quanta. > The KSEG is competing (almost) fairly with other processes > in the system. If you want "system" thread scheduling, you > can create more KSEGs. They compete against other processes and > against each other for slices of the 'real' machine. Right, like I've said before, it's easier just to combine the KSEG and KSE into one entity and forget about fair scheduling. Limit the number of "combined KSE/KSEGs" to the number of processors (scope system threads still get their own "combined KSE/KSEG" to satisfy POSIX). The common case is a single processor system anyways, and in that case it doesn't make sense to have more than 1 KSE in a KSEG. In a multiprocessor system, if you have an extra quantum, who really cares? If someone really does care, then it can be a kernel tunable or resource limit. I just don't see any benefit for the added complexity. It certainly is easier for the UTS with a combined KSE/KSEG. > > Signals should be sent (via an upcall) to the first available > > KSE to return to userland (return from syscall, after preemption, > > etc.). The userland thread scheduler will pick a thread to > > receive the signal. If the thread is running or in one > > of the scheduling queues for the current KSEG, it will > > be able to handle it without any other assist from the kernel. > > If the thread is running or in one of the scheduling queues for > > another KSEG, it will mark the signal pending in the target > > thread and "signal" the appropriate KSEG with help from the > > kernel (one of the new user<->kernel interfaces or syscalls). > > OK so 'signals' and everything to do with them are "Per process". > I may edit the patch to indicate this. This does indicate a mutex > with SMP so that if two processors return their KSEs to userland > at the same time, they don't deliver the same signal twice. > Can two KSEs (KSEs are on different processors) deliver > DIFFERENT signals to userland at the same time? I suppose they could, as long as they are delivered via an upcall (on the special stack used for upcalls, and the running thread marked as preempted). The UTS will have to use some locking mechanisms, but it has to do that normally anyways. > > > > (We may have to replace "KSEG" in the above with "KSE") > > yes, you are correct.. it should read: (I think) > > Signals should be sent (via an upcall) to the first available > > KSE to return to userland (return from syscall, after preemption, > > etc.). The userland thread scheduler will pick a thread to > > receive the signal. If the thread is running or in one > > of the scheduling queues for the current KSEG, it will > > be able to handle it without any other assist from the kernel. > > Is this what you mean? For the most part. But if the target thread is running in one of the other KSEs for that KSEG, then it will still require an assist from the kernel. > This is tricky... when a KSE returns to userland it is running > NO threads. All threaded syscalls return to userland in the 'suspended' > state, so that the UTS can decide what to run. This is only when syscalls or when you need to notify the KSE of special events (signals, interruptions from other KSEs, etc). Normally, a syscall that doesn't block just returns without any upcall. > All syscalls return via > an upcall to the UTS (actually the original newkse() call returns > infinitly many times.. that is how the upcall is achieved). The return > values, error returns and data movements have been made to the appropriate > memory locations.. It's as if the thread did a 'yield()' immediatly > after returning from a normal syscall.. > So we can be sure that THIS KSE isn't running the interrupt thread. Running threads can cause synchronous signals, so it's quite possible the running thread generated the signal. The KSE in which the thread was running would then get the notification. I don't see any problem with this as long as interrupted thread contexts are available to the UTS. > If the thread is however being run on a different KSE (regardless of > whether in this KSEG or not) then the signal must be noted so that > the thread can see it at some future time. If it's not running but in > another KSEG then it's treated as if running, (the signal noted) and the > UTS will make it runnable at the next opportunity that that KSEG > is runnable. (If we ran the thread on this KSE regardless of the fact > that it's from another KSEG, then it will be running with a priority > other than what the programmer assigned it. (maybe he wants lower > priority signal handling)). 1996 POSIX spec says that signals should be delivered "as soon as possible". This leaves some leeway (I'll have to see if Austin changes any of this), but my approach in the current threads library is to deliver the signal right away unless the thread is in a critical region (in which case the signal is delivered when it exits the critical region). > > If the thread is running or in one of the scheduling queues for > > another KSE, it will mark the signal pending in the target > > thread and "signal" the appropriate KSEG with help from the > > kernel (one of the new user<->kernel interfaces or syscalls). > > If a KSEG is not running because it had no work, then > yes, you need to wake up one of its KSEs to handle the signal. Yeah, but I was thinking more along the lines of interrupting a currently running KSE. > > It might be nice to have a general way of sending messages > > between KSEGs (KSEs?). > > Userland-to-kernel? or userland-to-userland? > "kind of like a signal?" :-) Userland to userland with an assist from the kernel. KSE A wants to interrupt KSE B and send B an upcall message of some sort. The UTS knows what the message format is, but the kernel doesn't need to know other than possibly its message type and size. -- Dan Eischen To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu Apr 26 17:15:12 2001 Delivered-To: freebsd-arch@freebsd.org Received: from earth.backplane.com (earth-nat-cw.backplane.com [208.161.114.67]) by hub.freebsd.org (Postfix) with ESMTP id 99C2137B422; Thu, 26 Apr 2001 17:15:10 -0700 (PDT) (envelope-from dillon@earth.backplane.com) Received: (from dillon@localhost) by earth.backplane.com (8.11.2/8.11.2) id f3R0FAi62512; Thu, 26 Apr 2001 17:15:10 -0700 (PDT) (envelope-from dillon) Date: Thu, 26 Apr 2001 17:15:10 -0700 (PDT) From: Matt Dillon Message-Id: <200104270015.f3R0FAi62512@earth.backplane.com> To: "David O'Brien" Cc: Julian Elischer , Arch@FreeBSD.ORG, Daniel Eischen Subject: Re: KSE threading support (first parts) References: <3AE71067.FF4BD029@elischer.org> <20010425110940.L1790@fw.wintelcom.net> <3AE85776.92D6BD90@elischer.org> <20010426120630.A92915@dragon.nuxi.com> Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG :Uh people. : :We really, really NEED to agree on the design here. Jason's paper :(http://people.freebsd.org/~jasone/refs/freebsd_kse/freebsd_kse.html) is :explains all this. : :Before any more work is done on KSE's I really feel people should either :agree fully with the paper, or debate its contents first. : :I really doubt a single person will develop KSE, so it is imperative :there is a common sheet of music. : :-- :-- David (obrien@FreeBSD.org) I've read it. I was under the impression from prior discussions that KSEs belonging to the same process had to be serialized... that you couldn't run them concurrently with each other. I can't imagine how we could possibly run KSEs belonging to the same process concurrently anyway. I think I prefer the original rfork()/KSE model. -Matt To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu Apr 26 22:10:36 2001 Delivered-To: freebsd-arch@freebsd.org Received: from filk.iinet.net.au (syncopation-dns.iinet.net.au [203.59.24.29]) by hub.freebsd.org (Postfix) with SMTP id 512B037B423 for ; Thu, 26 Apr 2001 22:10:33 -0700 (PDT) (envelope-from julian@elischer.org) Received: (qmail 16501 invoked by uid 666); 27 Apr 2001 05:13:43 -0000 Received: from i177-040.nv.iinet.net.au (HELO elischer.org) (203.59.177.40) by mail.m.iinet.net.au with SMTP; 27 Apr 2001 05:13:43 -0000 Message-ID: <3AE8FF0A.AFAF3AE1@elischer.org> Date: Thu, 26 Apr 2001 22:09:30 -0700 From: Julian Elischer X-Mailer: Mozilla 4.7 [en] (X11; U; FreeBSD 5.0-CURRENT i386) X-Accept-Language: en, hu MIME-Version: 1.0 To: obrien@FreeBSD.ORG Cc: Arch@FreeBSD.ORG, Daniel Eischen Subject: Re: KSE threading support (first parts) References: <3AE71067.FF4BD029@elischer.org> <20010425110940.L1790@fw.wintelcom.net> <3AE85776.92D6BD90@elischer.org> <20010426120630.A92915@dragon.nuxi.com> Content-Type: text/plain; charset=iso-8859-2 Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG David O'Brien wrote: > > Uh people. > > We really, really NEED to agree on the design here. Jason's paper > (http://people.freebsd.org/~jasone/refs/freebsd_kse/freebsd_kse.html) is > explains all this. > > Before any more work is done on KSE's I really feel people should either > agree fully with the paper, or debate its contents first. > > I really doubt a single person will develop KSE, so it is imperative > there is a common sheet of music. I helped develop that paper.. We are not planning on ignoring it, just clarifying it and maybe producing a new version of it. it doesn't cover all details (e.g. how signals are handled) to great depth. > > -- > -- David (obrien@FreeBSD.org) -- __--_|\ Julian Elischer / \ julian@elischer.org ( OZ ) World tour 2000-2001 ---> X_.---._/ v To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu Apr 26 22:37:51 2001 Delivered-To: freebsd-arch@freebsd.org Received: from filk.iinet.net.au (syncopation-dns.iinet.net.au [203.59.24.29]) by hub.freebsd.org (Postfix) with SMTP id 7E30837B42C for ; Thu, 26 Apr 2001 22:37:43 -0700 (PDT) (envelope-from julian@elischer.org) Received: (qmail 16686 invoked by uid 666); 27 Apr 2001 05:40:53 -0000 Received: from i177-040.nv.iinet.net.au (HELO elischer.org) (203.59.177.40) by mail.m.iinet.net.au with SMTP; 27 Apr 2001 05:40:53 -0000 Message-ID: <3AE90567.CA50293E@elischer.org> Date: Thu, 26 Apr 2001 22:36:39 -0700 From: Julian Elischer X-Mailer: Mozilla 4.7 [en] (X11; U; FreeBSD 5.0-CURRENT i386) X-Accept-Language: en, hu MIME-Version: 1.0 To: Daniel Eischen Cc: Arch@freebsd.org, alfred@freebsd.org, Robert Watson Subject: Re: KSE threading support (first parts) References: Content-Type: text/plain; charset=iso-8859-2 Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Daniel Eischen wrote: > > On Thu, 26 Apr 2001, Julian Elischer wrote: > > Daniel Eischen wrote: > > > I still don't see the need to have multiple KSEs within a KSEG ;-) > > > > KSEs in the same KSEG are using the same pool of quanta to > > complete KSECs. > > SOMETHING has to hold that information. A KSE is a virtual > > processor where a KSEG is a virtual multiprocessor. > > > > You allocate quanta to the KSEG. KSEs use these quanta. > > The KSEG is competing (almost) fairly with other processes > > in the system. If you want "system" thread scheduling, you > > can create more KSEGs. They compete against other processes and > > against each other for slices of the 'real' machine. > > Right, like I've said before, it's easier just to combine the KSEG and KSE > into one entity and forget about fair scheduling. Limit the number of > "combined KSE/KSEGs" to the number of processors (scope system threads > still get their own "combined KSE/KSEG" to satisfy POSIX). The common > case is a single processor system anyways, and in that case it doesn't > make sense to have more than 1 KSE in a KSEG. By grouping KSEs ito an equivalence group, I give the system enough information to allow it to schedule resuming KSECs on an 'equivalent' but idle KSE. i.e. a syscall may be initiated on one kse and completed on another as long as they are in the same KSEG. The UTS doesn't really have to think to much about this except to know that the paralelism in a KSEG is equal to the lesser of the number of KSEs and the number of processors. (having more KSEs than processors is completely pointless, and I happen to think it should not be allowed.) I tried to get away from having a KSEG but it ended up getting more complicated again. Most threaded applications would have one KSEG and N KSEs (N==num-processors) or ONE KSEG and one KSE (which is still useful as you have N KSECs). Few apps would have more than one KSEG and hardly any would have more than 2. > > In a multiprocessor system, if you have an extra quantum, who really > cares? If someone really does care, then it can be a kernel tunable > or resource limit. > > I just don't see any benefit for the added complexity. It certainly > is easier for the UTS with a combined KSE/KSEG. but I think it will be simpler WITH it.. Treat the KSEs in a KSEG as interchangable. threads that go into the system on one may be reported to have completed their syscall on another. > > > > Signals should be sent (via an upcall) to the first available > > > KSE to return to userland (return from syscall, after preemption, > > > etc.). The userland thread scheduler will pick a thread to > > > receive the signal. If the thread is running or in one > > > of the scheduling queues for the current KSEG, it will > > > be able to handle it without any other assist from the kernel. > > > If the thread is running or in one of the scheduling queues for > > > another KSEG, it will mark the signal pending in the target > > > thread and "signal" the appropriate KSEG with help from the > > > kernel (one of the new user<->kernel interfaces or syscalls). > > > > OK so 'signals' and everything to do with them are "Per process". > > I may edit the patch to indicate this. This does indicate a mutex > > with SMP so that if two processors return their KSEs to userland > > at the same time, they don't deliver the same signal twice. > > Can two KSEs (KSEs are on different processors) deliver > > DIFFERENT signals to userland at the same time? > > I suppose they could, as long as they are delivered via an > upcall (on the special stack used for upcalls, and the running > thread marked as preempted). The UTS will have to use some > locking mechanisms, but it has to do that normally anyways. > > > > > > > (We may have to replace "KSEG" in the above with "KSE") > > > > yes, you are correct.. it should read: (I think) > > > Signals should be sent (via an upcall) to the first available > > > KSE to return to userland (return from syscall, after preemption, > > > etc.). The userland thread scheduler will pick a thread to > > > receive the signal. If the thread is running or in one > > > of the scheduling queues for the current KSEG, it will > > > be able to handle it without any other assist from the kernel. > > > > Is this what you mean? > > For the most part. But if the target thread is running in one > of the other KSEs for that KSEG, then it will still require an > assist from the kernel. Why? Surely it is to be considered to be processing another signal. note teh signal, and it should pick it up when it's completed the one its doing.. > > > This is tricky... when a KSE returns to userland it is running > > NO threads. All threaded syscalls return to userland in the 'suspended' > > state, so that the UTS can decide what to run. > > This is only when syscalls or when you need to notify the KSE of > special events (signals, interruptions from other KSEs, etc). > Normally, a syscall that doesn't block just returns without > any upcall. well, I don't know that this is true.. the thread could starve other threads by doing only non-blocking syscalls. Ihad thought that all syscalls would return via upcalls to allow the UTS to decide whether to pre-empt them. > > > All syscalls return via > > an upcall to the UTS (actually the original newkse() call returns > > infinitly many times.. that is how the upcall is achieved). The return > > values, error returns and data movements have been made to the appropriate > > memory locations.. It's as if the thread did a 'yield()' immediatly > > after returning from a normal syscall.. > > So we can be sure that THIS KSE isn't running the interrupt thread. > > Running threads can cause synchronous signals, so it's quite possible > the running thread generated the signal. The KSE in which the thread > was running would then get the notification. the KSE would, but when the upcall is made, the previous running thread is made to lookas if it had yielded.. the KSE is running but the thread is 'suspended'. > > I don't see any problem with this as long as interrupted thread > contexts are available to the UTS. > > > If the thread is however being run on a different KSE (regardless of > > whether in this KSEG or not) then the signal must be noted so that > > the thread can see it at some future time. If it's not running but in > > another KSEG then it's treated as if running, (the signal noted) and the > > UTS will make it runnable at the next opportunity that that KSEG > > is runnable. (If we ran the thread on this KSE regardless of the fact > > that it's from another KSEG, then it will be running with a priority > > other than what the programmer assigned it. (maybe he wants lower > > priority signal handling)). > > 1996 POSIX spec says that signals should be delivered "as soon as > possible". This leaves some leeway (I'll have to see if Austin > changes any of this), but my approach in the current threads library > is to deliver the signal right away unless the thread is in a critical > region (in which case the signal is delivered when it exits the > critical region). > > > > If the thread is running or in one of the scheduling queues for > > > another KSE, it will mark the signal pending in the target > > > thread and "signal" the appropriate KSEG with help from the > > > kernel (one of the new user<->kernel interfaces or syscalls). > > > > If a KSEG is not running because it had no work, then > > yes, you need to wake up one of its KSEs to handle the signal. > > Yeah, but I was thinking more along the lines of interrupting a > currently running KSE. If you are returning from the kernel, all threads are effectively suspeended and equivalent. The KSE is always 'idle' when returning to the UTS. The UTS then decides what to run next.. (usually the thread that was just active, but not always.) > > > > It might be nice to have a general way of sending messages > > > between KSEGs (KSEs?). > > > > Userland-to-kernel? or userland-to-userland? > > "kind of like a signal?" :-) > > Userland to userland with an assist from the kernel. KSE A wants > to interrupt KSE B and send B an upcall message of some sort. > The UTS knows what the message format is, but the kernel doesn't > need to know other than possibly its message type and size. fair enough. > > -- > Dan Eischen -- __--_|\ Julian Elischer / \ julian@elischer.org ( OZ ) World tour 2000-2001 ---> X_.---._/ v To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Fri Apr 27 4:55:26 2001 Delivered-To: freebsd-arch@freebsd.org Received: from pcnet1.pcnet.com (pcnet1.pcnet.com [204.213.232.3]) by hub.freebsd.org (Postfix) with ESMTP id AD4FE37B422; Fri, 27 Apr 2001 04:55:20 -0700 (PDT) (envelope-from eischen@vigrid.com) Received: (from eischen@localhost) by pcnet1.pcnet.com (8.8.7/PCNet) id HAA26845; Fri, 27 Apr 2001 07:54:42 -0400 (EDT) Date: Fri, 27 Apr 2001 07:54:42 -0400 (EDT) From: Daniel Eischen To: Julian Elischer Cc: Arch@freebsd.org, alfred@freebsd.org, Robert Watson Subject: Re: KSE threading support (first parts) In-Reply-To: <3AE90567.CA50293E@elischer.org> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Thu, 26 Apr 2001, Julian Elischer wrote: > Daniel Eischen wrote: > > Right, like I've said before, it's easier just to combine the KSEG and KSE > > into one entity and forget about fair scheduling. Limit the number of > > "combined KSE/KSEGs" to the number of processors (scope system threads > > still get their own "combined KSE/KSEG" to satisfy POSIX). The common > > case is a single processor system anyways, and in that case it doesn't > > make sense to have more than 1 KSE in a KSEG. > > By grouping KSEs ito an equivalence group, I give the system enough > information to allow it to schedule resuming KSECs on an 'equivalent' > but idle KSE. i.e. a syscall may be initiated on one kse and completed on > another as long as they are in the same KSEG. I think we need (in the UTS) to have curthread->curkse. Every time the thread changes KSEs, the UTS has to setup change this. We'll also have curkse that will have curkse->curthread. The curkse is the thing pointed at by the ldt (%fs). > The UTS doesn't really have to think to much about this except to > know that the paralelism in a KSEG is equal to the lesser of > the number of KSEs and the number of processors. (having more KSEs > than processors is completely pointless, and I happen to think it > should not be allowed.) Right, I thought we all agreed that more KSEs than CPUs is pointless. The problem is how does the UTS schedule threads to a set of KSEs within the same KSEG. There will be one scheduling queue for the main process (in the UTS) and this is where we'll potentially have multiple KSEs available. Scope system threads get their own KSEG/KSE, and the remaining threads run in the main process. If KSEs have processor affinity, then the UTS should try to keep threads running on the same KSE and still load balance to ensure each threads gets its fair share of processor time. > I tried to get away from having a KSEG but it ended up getting > more complicated again. > > Most threaded applications would have one KSEG and N KSEs > (N==num-processors) > or ONE KSEG and one KSE (which is still useful as you have N KSECs). > > Few apps would have more than one KSEG and hardly any would have > more than 2. Scope system threads get their own KSEG. I'm sure there are a lot of applications out there that create more than 1 or 2 scope system threads. > > In a multiprocessor system, if you have an extra quantum, who really > > cares? If someone really does care, then it can be a kernel tunable > > or resource limit. > > > > I just don't see any benefit for the added complexity. It certainly > > is easier for the UTS with a combined KSE/KSEG. > > but I think it will be simpler WITH it.. > Treat the KSEs in a KSEG as interchangable. threads that go into > the system on one may be reported to have completed their syscall > on another. But I don't think it _is_ easier. Whatever. I can't be too concerned with this right now. It's more important that we get _something_ that's better than the current threads library and the NxN Linuxthreads model. It can always be improved later. > > For the most part. But if the target thread is running in one > > of the other KSEs for that KSEG, then it will still require an > > assist from the kernel. > > Why? Surely it is to be considered to be processing another signal. > note teh signal, and it should pick it up when it's completed the one > its doing.. You've got 2 KSEs (A and B) running. Another process sends a SIGUSR1 to this process. The next KSE (A) to cross the kernel->user boundary gets an upcall to notify it that it got SIGUSR1. The UTS decides that it should be handled by the thread running on the other KSE (B). > > > This is tricky... when a KSE returns to userland it is running > > > NO threads. All threaded syscalls return to userland in the 'suspended' > > > state, so that the UTS can decide what to run. > > > > This is only when syscalls or when you need to notify the KSE of > > special events (signals, interruptions from other KSEs, etc). > > Normally, a syscall that doesn't block just returns without > > any upcall. > > well, I don't know that this is true.. the thread could starve other threads > by doing only non-blocking syscalls. Ihad thought that all syscalls would return > via upcalls to allow the UTS to decide whether to pre-empt them. We don't need fine-grained resolution like that. And I wouldn't want to run the UTS scheduler every time that a system call was made - yech! At a minimum, you'd have to have an extra context switch for each syscall. You only need to make an upcall when the KSE is preempted to run another process (or KSE). This limits the UTS thread quantum to a multiple of the kernels quantum. We could also continue to use a setitimer type of mechanism with a fixed interval. Scope system threads don't need to be notified of timing signals since there are no other threads to be run. It's only the KSEs within the main process that need some sort of timing signal. > > > All syscalls return via > > > an upcall to the UTS (actually the original newkse() call returns > > > infinitly many times.. that is how the upcall is achieved). The return > > > values, error returns and data movements have been made to the appropriate > > > memory locations.. It's as if the thread did a 'yield()' immediatly > > > after returning from a normal syscall.. > > > So we can be sure that THIS KSE isn't running the interrupt thread. > > > > Running threads can cause synchronous signals, so it's quite possible > > the running thread generated the signal. The KSE in which the thread > > was running would then get the notification. > > the KSE would, but when the upcall is made, the previous running thread > is made to lookas if it had yielded.. the KSE is running but the thread > is 'suspended'. OK, I just wanted to make sure we were on the same page. To me, 'suspended' is preempted. > > > If a KSEG is not running because it had no work, then > > > yes, you need to wake up one of its KSEs to handle the signal. > > > > Yeah, but I was thinking more along the lines of interrupting a > > currently running KSE. > > If you are returning from the kernel, all threads are effectively > suspeended and equivalent. The KSE is always 'idle' when returning > to the UTS. The UTS then decides what to run next.. (usually the > thread that was just active, but not always.) KSEs have upcalls, not KSEGs. If you return from the kernel in one KSE, that does not mean that threads are suspended and not running in one of the other KSEs within that KSEG (and you can certainly have scope system threads in their own KSEG/KSE pair that are running also). I'm thinking multiprocessor. -- Dan Eischen To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Fri Apr 27 8:36: 7 2001 Delivered-To: freebsd-arch@freebsd.org Received: from merchandisewholesale.com (ci392057-b.ruthfd1.tn.home.com [24.15.72.99]) by hub.freebsd.org (Postfix) with SMTP id 032C437B424 for ; Fri, 27 Apr 2001 08:35:52 -0700 (PDT) (envelope-from cs@merchandisewholesale.com) From: "Merchandise WholeSale" To: Subject: Grand Opening Mime-Version: 1.0 Content-Type: text/plain; charset="ISO-8859-1" Date: Fri, 27 Apr 2001 10:30:23 -0700 Reply-To: "Merchandise WholeSale" Content-Transfer-Encoding: 8bit Message-Id: <20010427153552.032C437B424@hub.freebsd.org> Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG First off I would like to Thank You for taking time to read this letter. Second of all your e-mail address was pulled from an on-line source. This is the only & last message you'll receive from us, so you don't have to worry about an unsubscribe list or spam. Nor will we give your e-mail out to any one else. I'd like to stop, and tell you about a new ON-LINE Retail store. Merchandise Wholesale, a retail store that has over 2,000 products for home,travel,jewelry,personal needs etc... Please take time out when you have it to browse our ON-LINE directory at http://www.merchandisewholesale.com Click on any images of the item to enlarge. Our site is always under constant change for the better. Thanks for your precious time, HTTP://MERCHANDISEWHOLESALE.COM promotions@merchandisewholesale.com To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Fri Apr 27 9:10:36 2001 Delivered-To: freebsd-arch@freebsd.org Received: from ns.yogotech.com (ns.yogotech.com [206.127.123.66]) by hub.freebsd.org (Postfix) with ESMTP id 7F99937B422; Fri, 27 Apr 2001 09:10:28 -0700 (PDT) (envelope-from nate@yogotech.com) Received: from nomad.yogotech.com (nomad.yogotech.com [206.127.123.131]) by ns.yogotech.com (8.9.3/8.9.3) with ESMTP id KAA20944; Fri, 27 Apr 2001 10:10:20 -0600 (MDT) (envelope-from nate@nomad.yogotech.com) Received: (from nate@localhost) by nomad.yogotech.com (8.8.8/8.8.8) id KAA18653; Fri, 27 Apr 2001 10:10:14 -0600 (MDT) (envelope-from nate) From: Nate Williams MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <15081.39397.944224.776391@nomad.yogotech.com> Date: Fri, 27 Apr 2001 10:10:13 -0600 (MDT) To: Matt Dillon Cc: "David O'Brien" , Julian Elischer , Arch@FreeBSD.ORG, Daniel Eischen Subject: Re: KSE threading support (first parts) In-Reply-To: <200104270015.f3R0FAi62512@earth.backplane.com> References: <3AE71067.FF4BD029@elischer.org> <20010425110940.L1790@fw.wintelcom.net> <3AE85776.92D6BD90@elischer.org> <20010426120630.A92915@dragon.nuxi.com> <200104270015.f3R0FAi62512@earth.backplane.com> X-Mailer: VM 6.75 under 21.1 (patch 12) "Channel Islands" XEmacs Lucid Reply-To: nate@yogotech.com (Nate Williams) Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG > :Uh people. > : > :We really, really NEED to agree on the design here. Jason's paper > :(http://people.freebsd.org/~jasone/refs/freebsd_kse/freebsd_kse.html) is > :explains all this. > : > :Before any more work is done on KSE's I really feel people should either > :agree fully with the paper, or debate its contents first. > : > :I really doubt a single person will develop KSE, so it is imperative > :there is a common sheet of music. > : > :-- > :-- David (obrien@FreeBSD.org) > > I've read it. I was under the impression from prior discussions that > KSEs belonging to the same process had to be serialized... that you > couldn't run them concurrently with each other. What's the point of SMP then? This would give us essentially a 'single-threaded' process, since only one thread/process can be running at any one point in time. Arguable, this is still better than the current situation where if a thread blocks, the entire process blocks, but if we've got an idle CPU, why not allow another thread run in a second KSE on the idle processor? > I can't imagine how > we could possibly run KSEs belonging to the same process concurrently > anyway. Think 'multi-threaded' applications. It's trivial to design a program where multiple threads are independant of one another. Nate To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Fri Apr 27 10: 2:16 2001 Delivered-To: freebsd-arch@freebsd.org Received: from earth.backplane.com (earth-nat-cw.backplane.com [208.161.114.67]) by hub.freebsd.org (Postfix) with ESMTP id E88FA37B422; Fri, 27 Apr 2001 10:02:13 -0700 (PDT) (envelope-from dillon@earth.backplane.com) Received: (from dillon@localhost) by earth.backplane.com (8.11.2/8.11.2) id f3RH1Tk05185; Fri, 27 Apr 2001 10:01:29 -0700 (PDT) (envelope-from dillon) Date: Fri, 27 Apr 2001 10:01:29 -0700 (PDT) From: Matt Dillon Message-Id: <200104271701.f3RH1Tk05185@earth.backplane.com> To: Nate Williams Cc: "David O'Brien" , Julian Elischer , Arch@FreeBSD.ORG, Daniel Eischen Subject: Re: KSE threading support (first parts) References: <3AE71067.FF4BD029@elischer.org> <20010425110940.L1790@fw.wintelcom.net> <3AE85776.92D6BD90@elischer.org> <20010426120630.A92915@dragon.nuxi.com> <200104270015.f3R0FAi62512@earth.backplane.com> <15081.39397.944224.776391@nomad.yogotech.com> Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG :... :> : :> :Before any more work is done on KSE's I really feel people should either :> :agree fully with the paper, or debate its contents first. :> : :> :I really doubt a single person will develop KSE, so it is imperative :> :there is a common sheet of music. :> : :> :-- :> :-- David (obrien@FreeBSD.org) :> :> I've read it. I was under the impression from prior discussions that :> KSEs belonging to the same process had to be serialized... that you :> couldn't run them concurrently with each other. : :What's the point of SMP then? This would give us essentially a :'single-threaded' process, since only one thread/process can be running :at any one point in time. Arguable, this is still better than the :current situation where if a thread blocks, the entire process blocks, :but if we've got an idle CPU, why not allow another thread run in a :second KSE on the idle processor? : :> I can't imagine how :> we could possibly run KSEs belonging to the same process concurrently :> anyway. : :Think 'multi-threaded' applications. It's trivial to design a program :where multiple threads are independant of one another. : :Nate Try reading my posting again, Nate, carefully. You missed the whole thing. -Matt To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Fri Apr 27 10: 6: 7 2001 Delivered-To: freebsd-arch@freebsd.org Received: from ns.yogotech.com (ns.yogotech.com [206.127.123.66]) by hub.freebsd.org (Postfix) with ESMTP id E3E3537B423; Fri, 27 Apr 2001 10:06:00 -0700 (PDT) (envelope-from nate@yogotech.com) Received: from nomad.yogotech.com (nomad.yogotech.com [206.127.123.131]) by ns.yogotech.com (8.9.3/8.9.3) with ESMTP id LAA21824; Fri, 27 Apr 2001 11:05:57 -0600 (MDT) (envelope-from nate@nomad.yogotech.com) Received: (from nate@localhost) by nomad.yogotech.com (8.8.8/8.8.8) id LAA18857; Fri, 27 Apr 2001 11:05:52 -0600 (MDT) (envelope-from nate) From: Nate Williams MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <15081.42735.860662.876478@nomad.yogotech.com> Date: Fri, 27 Apr 2001 11:05:51 -0600 (MDT) To: Matt Dillon Cc: Nate Williams , "David O'Brien" , Julian Elischer , Arch@FreeBSD.ORG, Daniel Eischen Subject: Re: KSE threading support (first parts) In-Reply-To: <200104271701.f3RH1Tk05185@earth.backplane.com> References: <3AE71067.FF4BD029@elischer.org> <20010425110940.L1790@fw.wintelcom.net> <3AE85776.92D6BD90@elischer.org> <20010426120630.A92915@dragon.nuxi.com> <200104270015.f3R0FAi62512@earth.backplane.com> <15081.39397.944224.776391@nomad.yogotech.com> <200104271701.f3RH1Tk05185@earth.backplane.com> X-Mailer: VM 6.75 under 21.1 (patch 12) "Channel Islands" XEmacs Lucid Reply-To: nate@yogotech.com (Nate Williams) Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG > :> :Before any more work is done on KSE's I really feel people should either > :> :agree fully with the paper, or debate its contents first. > :> : > :> :I really doubt a single person will develop KSE, so it is imperative > :> :there is a common sheet of music. > :> : > :> :-- > :> :-- David (obrien@FreeBSD.org) > :> > :> I've read it. I was under the impression from prior discussions that > :> KSEs belonging to the same process had to be serialized... that you > :> couldn't run them concurrently with each other. > : > :What's the point of SMP then? This would give us essentially a > :'single-threaded' process, since only one thread/process can be running > :at any one point in time. Arguable, this is still better than the > :current situation where if a thread blocks, the entire process blocks, > :but if we've got an idle CPU, why not allow another thread run in a > :second KSE on the idle processor? > : > :> I can't imagine how > :> we could possibly run KSEs belonging to the same process concurrently > :> anyway. > : > :Think 'multi-threaded' applications. It's trivial to design a program > :where multiple threads are independant of one another. > : > :Nate > > Try reading my posting again, Nate, carefully. You missed the whole > thing. I read it, and this is what I hear you saying in a nutshell. KSEs belonging to the same process are serialized, and can not be run concurrently. What I'm saying: KSEs belonging to the same process can be run concurrently if we have multiple processors. Where did I miss what you were saying? Nate To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Fri Apr 27 10:18:16 2001 Delivered-To: freebsd-arch@freebsd.org Received: from earth.backplane.com (earth-nat-cw.backplane.com [208.161.114.67]) by hub.freebsd.org (Postfix) with ESMTP id D7EAD37B423; Fri, 27 Apr 2001 10:18:12 -0700 (PDT) (envelope-from dillon@earth.backplane.com) Received: (from dillon@localhost) by earth.backplane.com (8.11.2/8.11.2) id f3RHHGp05457; Fri, 27 Apr 2001 10:17:16 -0700 (PDT) (envelope-from dillon) Date: Fri, 27 Apr 2001 10:17:16 -0700 (PDT) From: Matt Dillon Message-Id: <200104271717.f3RHHGp05457@earth.backplane.com> To: Nate Williams Cc: "David O'Brien" , Julian Elischer , Arch@FreeBSD.ORG, Daniel Eischen Subject: Re: KSE threading support (first parts) References: <3AE71067.FF4BD029@elischer.org> <20010425110940.L1790@fw.wintelcom.net> <3AE85776.92D6BD90@elischer.org> <20010426120630.A92915@dragon.nuxi.com> <200104270015.f3R0FAi62512@earth.backplane.com> <15081.39397.944224.776391@nomad.yogotech.com> <200104271701.f3RH1Tk05185@earth.backplane.com> <15081.42735.860662.876478@nomad.yogotech.com> Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG : :I read it, and this is what I hear you saying in a nutshell. : :KSEs belonging to the same process are serialized, and can not be run :concurrently. : :What I'm saying: : :KSEs belonging to the same process can be run concurrently if we have :multiple processors. : :Where did I miss what you were saying? : :Nate You seem to believe that not being able to run KSE's for the same process concurrently somehow kills the whole concept of SMP. Well, that's complete bullshit. KSE's are extremely short-running affairs in kernel mode, especially when you consider the most likely asynchronizing case (a simple blocking situation that will most commonly be in a read() or write()). Serializing them within the context of a single process will actually *IMPROVE* SMP performance, not make it worse. Running multiple kernel contexts for the same process on different cpu's concurrently means that you must now lock every single aspect of the 'current process' concept, and cannot make any assumptions whatsoever in regards to accessing elements of the current process. Well, that's just plain insane. You will wind up with so many fragging locks and mutexes in the kernel that what performance gain you might have thought you could get is now completely blown away by the locking overhead. This is another aspect of the problem you run into when you start trying to preempt a process running in the kernel arbitrarily. Suddenly all the assumptions you were able to make before that resulted in optimal code paths now must be thrown out the window and replaced with a godaweful number of locks to protect kernel contexts from unexpected interruptions. That's insane as well. You are introducing a 'solution' to a problem that doesn't exist and breaking any chance we have of getting a reliable kernel in anything less then a few years in the process. If we were writing a kernel completely from scratch we could probably construct it to allow these things, but trying to do it with the current base is impossible -- you will never get something reliable or efficient at the end of this road. Or perhaps I should phrase it: The only way you will get anything close to reliable will be to effectively revert the system to the days of the single giant lock, because you will need so many fraggin locks to deal with the consequences you might as well have a single big giant lock. -Matt To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Fri Apr 27 12:12: 2 2001 Delivered-To: freebsd-arch@freebsd.org Received: from ns.yogotech.com (ns.yogotech.com [206.127.123.66]) by hub.freebsd.org (Postfix) with ESMTP id 1F85A37B424; Fri, 27 Apr 2001 12:10:01 -0700 (PDT) (envelope-from nate@yogotech.com) Received: from nomad.yogotech.com (nomad.yogotech.com [206.127.123.131]) by ns.yogotech.com (8.9.3/8.9.3) with ESMTP id NAA23837; Fri, 27 Apr 2001 13:09:48 -0600 (MDT) (envelope-from nate@nomad.yogotech.com) Received: (from nate@localhost) by nomad.yogotech.com (8.8.8/8.8.8) id NAA19281; Fri, 27 Apr 2001 13:09:46 -0600 (MDT) (envelope-from nate) From: Nate Williams MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <15081.50170.297579.938254@nomad.yogotech.com> Date: Fri, 27 Apr 2001 13:09:46 -0600 (MDT) To: Matt Dillon Cc: Nate Williams , "David O'Brien" , Julian Elischer , Arch@FreeBSD.ORG, Daniel Eischen Subject: Re: KSE threading support (first parts) In-Reply-To: <200104271717.f3RHHGp05457@earth.backplane.com> References: <3AE71067.FF4BD029@elischer.org> <20010425110940.L1790@fw.wintelcom.net> <3AE85776.92D6BD90@elischer.org> <20010426120630.A92915@dragon.nuxi.com> <200104270015.f3R0FAi62512@earth.backplane.com> <15081.39397.944224.776391@nomad.yogotech.com> <200104271701.f3RH1Tk05185@earth.backplane.com> <15081.42735.860662.876478@nomad.yogotech.com> <200104271717.f3RHHGp05457@earth.backplane.com> X-Mailer: VM 6.75 under 21.1 (patch 12) "Channel Islands" XEmacs Lucid Reply-To: nate@yogotech.com (Nate Williams) Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG > :I read it, and this is what I hear you saying in a nutshell. > : > :KSEs belonging to the same process are serialized, and can not be run > :concurrently. > : > :What I'm saying: > : > :KSEs belonging to the same process can be run concurrently if we have > :multiple processors. > : > :Where did I miss what you were saying? > : > :Nate > > You seem to believe that not being able to run KSE's for the same > process concurrently somehow kills the whole concept of SMP. No, it kills one of the biggest reasons for supporting KSE. Otherwise, a single process can only take advantage of a single processor. > Well, that's complete bullshit. KSE's are extremely short-running > affairs in kernel mode, especially when you consider the most likely > asynchronizing case (a simple blocking situation that will most commonly > be in a read() or write()). Not necessarily. My experience with developing and running applications on Solaris says that having multiple KSE's/process is a *huge* win. > Serializing them within the context of a > single process will > actually *IMPROVE* SMP performance, not make it worse. Why? > Running multiple kernel contexts for the same process on different > cpu's concurrently means that you must now lock every single aspect > of the 'current process' concept Which has to be done anyway, since the processor will be running multiple processes in any case, and that a process may migrate to a different processor depending on process load. Affinity is a goal, but there's no guarantee that a process will *always* execute on the same processor. In essence, you're limiting the design of a threaded program to serialized processes, which is completely bogus. > Well, that's just plain insane. You will wind up with so many fragging > locks and mutexes in the kernel that what performance gain you might > have thought you could get is now completely blown away by the locking > overhead. See above. This has to be done in any case, and is done now. The problem is no more difficult with the addition of KSE's, and removes one of the single biggest advantages of using KSE's. Out of curiousity, have you read the KSE papers at all? They are able to deal with concurrency without all of the complexity you imply must exist. > This is another aspect of the problem you run into when you start > trying to preempt a process running in the kernel arbitrarily. Suddenly > all the assumptions you were able to make before that resulted in > optimal code paths now must be thrown out the window and replaced with > a godaweful number of locks to protect kernel contexts from unexpected > interruptions. *sarcasm on* Heck, then we should just throw out KSE's, since they are way too complex and just stick with the current 'BGL' model, right? *sarcasm off* It doesn't come for free. There is no way to have progress without some additional complexity. The question we must ask is does the complexity we add buy us anything. I believe it does, as do many other people. Certainly Solaris's ability to scale shows that there is something to be said for having a pre-emptive kernel. > That's insane as well. You are introducing a 'solution' to a > problem that doesn't exist Matt, honestly, there's no reason to change the existing FreeBSD model at all, if we're running on a single-processor. It's not broken in any way. However, the current model does not scale with multiple processors. One of the stated goals of the later releases of FreeBSD is to create an OS that scales better on multiple processors, so the current 'model' is not adequate. It's a solution to a new problem, one that *does* exist in BSD if we accept that fact that we want to run better on multiple processors. Hence, the KSE model, which is one of many solution to the scaling problem, and the solution that was decided to be a good solution. Another 'goal' is the ability to write threaded programs that run effeciently on both UP and SMP hardware. KSE's can help with this, but a 'serialized KSE' model won't allow a I/O intensive application to benefit from adding multiple CPU's. An example of such an application is one that does the following (UDP packets were used in this example, for streaming...) 1) One thread is in kernel context in select(), waiting for packets, which are thrown onto a queue back in userland and the thread returns to kernel land. 2) Another thread processes these packets into two classes, and these packets are stuck onto a two different queue. a) Data packets b) Query packets 3) The data queue is read by another thread, which writes them out to disk. 4) The query packets are processed by another thread, which reads the information off the disk (the data may be old, or new, so there is some contention between threads 3/4), and sticks it onto the 'send' queue. 5) A final thread reads information from the send queue, and sends it out to the requestors as BW is available. Not only is this example not made up, it's very similar to a project I completed over 2 years ago. It's a bit more complicated than this, but you get the general picture. Not only did this application scale well (on Solaris), it also had very few bottlenecks since we were able to minimize thread contention with some clever data structures. In our case, the # of packets sent/receive was the biggest bottleneck, so the limit wasn't one of hardware (in terms of I/O bandwidth), but CPU processing of the packets. Adding more CPU's to the mix allowed us to create an application that ran faster by throwing more CPU at it (if CPU was a bottleneck). If CPU wasn't a bottleneck, then the application had no scaling issues on modern hardware. > If we were writing a kernel completely from scratch we could probably > construct it to allow these things, but trying to do it with the current > base is impossible -- you will never get something reliable or efficient > at the end of this road. I believe that in the end, many parts of the system will be re-written, or at least revamped to support multi-tasking to some degree. Even with serialized KSE's, there's still an issue of pre-emption, since multiple processes may be accessing the same data structures (on different CPU's). > Or perhaps I should phrase it: The only way > you will get anything close to reliable will be to effectively revert > the system to the days of the single giant lock, because you will need > so many fraggin locks to deal with the consequences you might as well > have a single big giant lock. I'm not so naive to suggest that it's going to be simple. If it were goign to be simple task, it would have been done already. However, just because it's difficult and time consuming doesn't mean it's not worthwhile. Nate To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Fri Apr 27 12:44: 8 2001 Delivered-To: freebsd-arch@freebsd.org Received: from pcnet1.pcnet.com (pcnet1.pcnet.com [204.213.232.3]) by hub.freebsd.org (Postfix) with ESMTP id 7B34937B424; Fri, 27 Apr 2001 12:44:04 -0700 (PDT) (envelope-from eischen@vigrid.com) Received: (from eischen@localhost) by pcnet1.pcnet.com (8.8.7/PCNet) id PAA16949; Fri, 27 Apr 2001 15:43:25 -0400 (EDT) Date: Fri, 27 Apr 2001 15:43:20 -0400 (EDT) From: Daniel Eischen To: "Daniel C. Sobral" Cc: Julian Elischer , Arch@FreeBSD.ORG, alfred@FreeBSD.ORG, Robert Watson Subject: Re: KSE threading support (first parts) In-Reply-To: <3AE9B93C.E8060911@newsguy.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Fri, 27 Apr 2001, Daniel C. Sobral wrote: > Daniel Eischen wrote: > > > > Right, like I've said before, it's easier just to combine the KSEG and KSE > > into one entity and forget about fair scheduling. Limit the number of > > "combined KSE/KSEGs" to the number of processors (scope system threads > > still get their own "combined KSE/KSEG" to satisfy POSIX). The common > > case is a single processor system anyways, and in that case it doesn't > > make sense to have more than 1 KSE in a KSEG. > > First and foremost, you must preserve the current behavior (because > that's what I want, that's what we have atm, and it's there in POSIX), > where a process quanta is a process quanta and that's that, no matter > how many threads the process has. You get that if you don't use pthread_setconcurrency() and don't create any system scope threads. This means you get 1 KSE in 1 KSEG. > If you happen to want the "system" scope, which is also in POSIX, then > each thread has a quanta of it's own. > > Process scope: one KSEG, N KSE. Not quite. 1 KSEG, 1 KSE by default. You have to set the concurrency level to get more than 1 KSE. When you use pthread_setconcurrency() under Solaris, you get a LWP for each concurrency level that Solaris grants to you. Each LWP under Solaris gets its own quantum. With the proposed implementation for FreeBSD, pthread_setconcurrency would give you multiple KSEs (limited to CPUs) within the same KSEG, but these KSEs wouldn't give you additional quantum. I'd rather see us emulate Solaris if possible, but whatever. > System scope: N KSEG, 1 KSE per KSEG. Right. And for Process _and_ System scope: S+1 KSEG, S+1 KSE, where S=number of system scope threads. > > In a multiprocessor system, if you have an extra quantum, who really > > cares? If someone really does care, then it can be a kernel tunable > > or resource limit. > > > > I just don't see any benefit for the added complexity. It certainly > > is easier for the UTS with a combined KSE/KSEG. > > Yes, but I want to see you do process scope and system scope without it. Read what I've written again. You don't need more than 1 KSE within the _same_ KSEG to do this. You can have as many process scope threads as you want with one KSE in the main process' KSEG. If you want to additionally create system scope threads, then no problem; each system scope thread gets its own KSEG/KSE pair. -- Dan Eischen To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Fri Apr 27 12:50:55 2001 Delivered-To: freebsd-arch@freebsd.org Received: from pcnet1.pcnet.com (pcnet1.pcnet.com [204.213.232.3]) by hub.freebsd.org (Postfix) with ESMTP id B5EC737B423 for ; Fri, 27 Apr 2001 12:50:53 -0700 (PDT) (envelope-from eischen@vigrid.com) Received: (from eischen@localhost) by pcnet1.pcnet.com (8.8.7/PCNet) id PAA18118; Fri, 27 Apr 2001 15:50:13 -0400 (EDT) Date: Fri, 27 Apr 2001 15:50:11 -0400 (EDT) From: Daniel Eischen To: Nate Williams Cc: Matt Dillon , Julian Elischer , Arch@FreeBSD.ORG Subject: Re: KSE threading support (first parts) In-Reply-To: <15081.50170.297579.938254@nomad.yogotech.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Fri, 27 Apr 2001, Nate Williams wrote: > > Well, that's complete bullshit. KSE's are extremely short-running > > affairs in kernel mode, especially when you consider the most likely > > asynchronizing case (a simple blocking situation that will most commonly > > be in a read() or write()). > > Not necessarily. My experience with developing and running applications > on Solaris says that having multiple KSE's/process is a *huge* win. You do know that the proposed implementation isn't quite like Solaris (KSEs don't get their own quantum). You better holler if you want it ;-) -- Dan Eischen To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Fri Apr 27 12:59:26 2001 Delivered-To: freebsd-arch@freebsd.org Received: from ns.yogotech.com (ns.yogotech.com [206.127.123.66]) by hub.freebsd.org (Postfix) with ESMTP id C556737B422 for ; Fri, 27 Apr 2001 12:59:23 -0700 (PDT) (envelope-from nate@yogotech.com) Received: from nomad.yogotech.com (nomad.yogotech.com [206.127.123.131]) by ns.yogotech.com (8.9.3/8.9.3) with ESMTP id NAA24634; Fri, 27 Apr 2001 13:58:54 -0600 (MDT) (envelope-from nate@nomad.yogotech.com) Received: (from nate@localhost) by nomad.yogotech.com (8.8.8/8.8.8) id NAA19479; Fri, 27 Apr 2001 13:58:53 -0600 (MDT) (envelope-from nate) From: Nate Williams MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <15081.53117.150505.145701@nomad.yogotech.com> Date: Fri, 27 Apr 2001 13:58:53 -0600 (MDT) To: Daniel Eischen Cc: Nate Williams , Matt Dillon , Julian Elischer , Arch@FreeBSD.ORG Subject: Re: KSE threading support (first parts) In-Reply-To: References: <15081.50170.297579.938254@nomad.yogotech.com> X-Mailer: VM 6.75 under 21.1 (patch 12) "Channel Islands" XEmacs Lucid Reply-To: nate@yogotech.com (Nate Williams) Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG > > > Well, that's complete bullshit. KSE's are extremely short-running > > > affairs in kernel mode, especially when you consider the most likely > > > asynchronizing case (a simple blocking situation that will most commonly > > > be in a read() or write()). > > > > Not necessarily. My experience with developing and running applications > > on Solaris says that having multiple KSE's/process is a *huge* win. > > You do know that the proposed implementation isn't quite like > Solaris (KSEs don't get their own quantum). You better holler > if you want it ;-) I'm not sure how much a difference that makes, but to be honest, I haven't thought about the consequences of it much. :( Nate To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Fri Apr 27 13: 8:35 2001 Delivered-To: freebsd-arch@freebsd.org Received: from fw.wintelcom.net (ns1.wintelcom.net [209.1.153.20]) by hub.freebsd.org (Postfix) with ESMTP id 58ECC37B422 for ; Fri, 27 Apr 2001 13:08:33 -0700 (PDT) (envelope-from bright@fw.wintelcom.net) Received: (from bright@localhost) by fw.wintelcom.net (8.10.0/8.10.0) id f3RK8Qd06823; Fri, 27 Apr 2001 13:08:26 -0700 (PDT) Date: Fri, 27 Apr 2001 13:08:26 -0700 From: Alfred Perlstein To: Daniel Eischen Cc: Nate Williams , Matt Dillon , Julian Elischer , Arch@FreeBSD.ORG Subject: Re: KSE threading support (first parts) Message-ID: <20010427130826.G18676@fw.wintelcom.net> References: <15081.50170.297579.938254@nomad.yogotech.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: ; from eischen@vigrid.com on Fri, Apr 27, 2001 at 03:50:11PM -0400 X-all-your-base: are belong to us. Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG * Daniel Eischen [010427 12:50] wrote: > On Fri, 27 Apr 2001, Nate Williams wrote: > > > Well, that's complete bullshit. KSE's are extremely short-running > > > affairs in kernel mode, especially when you consider the most likely > > > asynchronizing case (a simple blocking situation that will most commonly > > > be in a read() or write()). > > > > Not necessarily. My experience with developing and running applications > > on Solaris says that having multiple KSE's/process is a *huge* win. > > You do know that the proposed implementation isn't quite like > Solaris (KSEs don't get their own quantum). You better holler > if you want it ;-) There's two things on the issue that I'd like to bring up. The concepts are cool, however the implementation you guys are discussion really hurt my head, not in a bad way, but conceptually the concepts look quite daunting. Kudos if you guys get it done though! Being able to have threads used in a "this application wants to utilize _all_ available system reasources" meaning if you have more than one processor, I want to see mysql, apache, whatever using it (by default!). If your model doesn't include this then please don't bother continuing, the stability issues versus the gain don't work for me at all. Sorry, correctness is sort of out of style nowadays especially since every other OS allows this and totes the performance gains of thier system. -- -Alfred Perlstein - [alfred@freebsd.org] Represent yourself, show up at BABUG http://www.babug.org/ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Fri Apr 27 13:14:12 2001 Delivered-To: freebsd-arch@freebsd.org Received: from ns.yogotech.com (ns.yogotech.com [206.127.123.66]) by hub.freebsd.org (Postfix) with ESMTP id 325A837B423 for ; Fri, 27 Apr 2001 13:14:00 -0700 (PDT) (envelope-from nate@yogotech.com) Received: from nomad.yogotech.com (nomad.yogotech.com [206.127.123.131]) by ns.yogotech.com (8.9.3/8.9.3) with ESMTP id OAA24832; Fri, 27 Apr 2001 14:10:40 -0600 (MDT) (envelope-from nate@nomad.yogotech.com) Received: (from nate@localhost) by nomad.yogotech.com (8.8.8/8.8.8) id OAA19524; Fri, 27 Apr 2001 14:10:38 -0600 (MDT) (envelope-from nate) From: Nate Williams MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <15081.53821.755743.746621@nomad.yogotech.com> Date: Fri, 27 Apr 2001 14:10:37 -0600 (MDT) To: Alfred Perlstein Cc: Daniel Eischen , Nate Williams , Matt Dillon , Julian Elischer , Arch@FreeBSD.ORG Subject: Re: KSE threading support (first parts) In-Reply-To: <20010427130826.G18676@fw.wintelcom.net> References: <15081.50170.297579.938254@nomad.yogotech.com> <20010427130826.G18676@fw.wintelcom.net> X-Mailer: VM 6.75 under 21.1 (patch 12) "Channel Islands" XEmacs Lucid Reply-To: nate@yogotech.com (Nate Williams) Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG > > > > Well, that's complete bullshit. KSE's are extremely short-running > > > > affairs in kernel mode, especially when you consider the most likely > > > > asynchronizing case (a simple blocking situation that will most commonly > > > > be in a read() or write()). > > > > > > Not necessarily. My experience with developing and running applications > > > on Solaris says that having multiple KSE's/process is a *huge* win. > > > > You do know that the proposed implementation isn't quite like > > Solaris (KSEs don't get their own quantum). You better holler > > if you want it ;-) > > There's two things on the issue that I'd like to bring up. > > The concepts are cool, however the implementation you guys are > discussion really hurt my head, not in a bad way, but conceptually > the concepts look quite daunting. Kudos if you guys get it done > though! > > Being able to have threads used in a "this application wants to > utilize _all_ available system reasources" meaning if you have > more than one processor, I want to see mysql, apache, whatever > using it (by default!). If your model doesn't include this then > please don't bother continuing, the stability issues versus the > gain don't work for me at all. Having 'serialized' KSE's (which Matt wants) means that an application will be *UNABLE* to use all of the system resources, because only one thread in threaded application (apache, mysql, etc..) is allowed to run at one time, no matter how many CPU's are there. Nate To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Fri Apr 27 13:35:19 2001 Delivered-To: freebsd-arch@freebsd.org Received: from fw.wintelcom.net (ns1.wintelcom.net [209.1.153.20]) by hub.freebsd.org (Postfix) with ESMTP id 8602A37B422 for ; Fri, 27 Apr 2001 13:35:16 -0700 (PDT) (envelope-from bright@fw.wintelcom.net) Received: (from bright@localhost) by fw.wintelcom.net (8.10.0/8.10.0) id f3RKYXo07478; Fri, 27 Apr 2001 13:34:33 -0700 (PDT) Date: Fri, 27 Apr 2001 13:34:33 -0700 From: Alfred Perlstein To: Nate Williams Cc: Daniel Eischen , Matt Dillon , Julian Elischer , Arch@FreeBSD.ORG Subject: Re: KSE threading support (first parts) Message-ID: <20010427133433.H18676@fw.wintelcom.net> References: <15081.50170.297579.938254@nomad.yogotech.com> <20010427130826.G18676@fw.wintelcom.net> <15081.53821.755743.746621@nomad.yogotech.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <15081.53821.755743.746621@nomad.yogotech.com>; from nate@yogotech.com on Fri, Apr 27, 2001 at 02:10:37PM -0600 X-all-your-base: are belong to us. Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG * Nate Williams [010427 13:14] wrote: > > > > > Well, that's complete bullshit. KSE's are extremely short-running > > > > > affairs in kernel mode, especially when you consider the most likely > > > > > asynchronizing case (a simple blocking situation that will most commonly > > > > > be in a read() or write()). > > > > > > > > Not necessarily. My experience with developing and running applications > > > > on Solaris says that having multiple KSE's/process is a *huge* win. > > > > > > You do know that the proposed implementation isn't quite like > > > Solaris (KSEs don't get their own quantum). You better holler > > > if you want it ;-) > > > > There's two things on the issue that I'd like to bring up. > > > > The concepts are cool, however the implementation you guys are > > discussion really hurt my head, not in a bad way, but conceptually > > the concepts look quite daunting. Kudos if you guys get it done > > though! > > > > Being able to have threads used in a "this application wants to > > utilize _all_ available system reasources" meaning if you have > > more than one processor, I want to see mysql, apache, whatever > > using it (by default!). If your model doesn't include this then > > please don't bother continuing, the stability issues versus the > > gain don't work for me at all. > > Having 'serialized' KSE's (which Matt wants) means that an application > will be *UNABLE* to use all of the system resources, because only one > thread in threaded application (apache, mysql, etc..) is allowed to run > at one time, no matter how many CPU's are there. It doesn't seem like that's what Daniel is saying, which is that the default will be like this, but that applications or the startup code will have the choice. However that's true then we might as well scrap the project, it just brings the complexity out of userland and into the kernel, sure we can schedule IO better, but then we might as well cop out and use aio and some special signal system for handling faults back into the uts. It's just a lot simpler to go with rfork threads or a simpler model than all this complexity just to satisfy Terry's view of who should get what quantum. Honestly if you ask anyone they expect to be able to cheat with threads the same way they cheat by using multiple processes to gain additional CPU. -- -Alfred Perlstein - [alfred@freebsd.org] Daemon News Magazine in your snail-mail! http://magazine.daemonnews.org/ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Fri Apr 27 15:56: 0 2001 Delivered-To: freebsd-arch@freebsd.org Received: from smtp10.phx.gblx.net (smtp10.phx.gblx.net [206.165.6.140]) by hub.freebsd.org (Postfix) with ESMTP id 4EA9B37B422 for ; Fri, 27 Apr 2001 15:55:55 -0700 (PDT) (envelope-from tlambert@usr02.primenet.com) Received: (from daemon@localhost) by smtp10.phx.gblx.net (8.9.3/8.9.3) id PAA99626; Fri, 27 Apr 2001 15:55:54 -0700 Received: from usr02.primenet.com(206.165.6.202) via SMTP by smtp10.phx.gblx.net, id smtpd0vTNEa; Fri Apr 27 15:55:45 2001 Received: (from tlambert@localhost) by usr02.primenet.com (8.8.5/8.8.5) id QAA13810; Fri, 27 Apr 2001 16:06:01 -0700 (MST) From: Terry Lambert Message-Id: <200104272306.QAA13810@usr02.primenet.com> Subject: Re: KSE threading support (first parts) To: arch@freebsd.org, terry@lambert.org Date: Fri, 27 Apr 2001 23:06:01 +0000 (GMT) X-Mailer: ELM [version 2.5 PL2] MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Alfred Perlstein wrote: ] It doesn't seem like that's what Daniel is saying, which is that ] the default will be like this, but that applications or the startup ] code will have the choice. ] ] However that's true then we might as well scrap the project, it ] just brings the complexity out of userland and into the kernel, ] sure we can schedule IO better, but then we might as well cop out ] and use aio and some special signal system for handling faults back ] into the uts. It's just a lot simpler to go with rfork threads or ] a simpler model than all this complexity just to satisfy Terry's ] view of who should get what quantum. Honestly if you ask anyone ] they expect to be able to cheat with threads the same way they ] cheat by using multiple processes to gain additional CPU. I personally do not give a flying " " if processes cheat in order to compete unfairly for quantum; that's an administrative issue, and I think FreeBSD already has far too many arbitrary limits to "protect" the user from doing useful work^W^W^Whurting themselves. My main emphasis has always been: o Minimize context switch overhead o Reduce the overal scheduler complexity; in particular, IMO, it is nearly impossible to get thread group affinity to work correctly by hacking up a scheduler, without resulting in starvation for other processes, so The Scheduler Is Not The Place To Do Affinity. o I would like to see SMP scalability The approach which Julian touts tends to fail to achieve SMP scaling, unless you explicitly ask for it. In Julian's model, there is a KSEG per CPU on which you want to be able to run, and all KSEs live in the context of a KSEG. In the Julian and Archie model, KSEs do _not_ move between KSEGs, without an Act Of God. I think this model is wrong. I would completely discard the concept of a KSEG, entirely. In place of that "easy CPU affinity" model, which is what it is intended to gurantee, I would create per-CPU run queues. I would control affinity and load balancing through the decision to migrate or to not migrate a particular KSE between CPUs. In other words, the complexity of the model which Jason Evans arrived at from the Big Threads Design Meeting in Foster City that about 120 of us attended, is optimal for achieving my design goals, without going to ascyn call gates. IMO, going to async call gates could result in as much as an additional 25% improvement in performance. Unfortunately, it would also mean a lot of extra overhead to ensure binary backward compatability, since it would put all of the standard POSIX semantics into the libraries, and the default behaviour would be completely asynchronous. Binary compatability would mean using a different INT than INT 0x80 for system calls, and putting backward compatability cruft into a module to support any binary which some moron decided to link static because they thought that linking it static makes it somehow safer from single file damage failure than using libc.so and ld.so. My ideal implementation would use async call gates. In effect, this would be the same as implementing VMS ASTs in all of FreeBSD. That all said, the current project, as it was envisioned by Jason Evans, does not have the limitations which you and Nate fear, unless you cop out on the implementation, and do what Julian and Archie wanted with KSEG non-migration of KSEs (and the concommitant single scheduler run queue for all CPUs), or if you take Matt's approach, and serialize execution everywhere, not just within a particular KSEG (Matt's approach prevents the need for a single process to be able to exist on a run queue as more than one entry instance, which makes some things easier). I would call both the Matt approach, and the Julian/Archie approach "overly conservative"; they are both in excess of 12 years behind the state of the Art. But I would also level the same criticism at using the 6 year old technology of scheduler activations to avoid going to true async call gates. In any case, you and Nate are getting upset at shortcuts that people want to take in implementation, not at the design itself. Cut it out. Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Fri Apr 27 16: 6:12 2001 Delivered-To: freebsd-arch@freebsd.org Received: from fw.wintelcom.net (ns1.wintelcom.net [209.1.153.20]) by hub.freebsd.org (Postfix) with ESMTP id E34A437B422 for ; Fri, 27 Apr 2001 16:06:08 -0700 (PDT) (envelope-from bright@fw.wintelcom.net) Received: (from bright@localhost) by fw.wintelcom.net (8.10.0/8.10.0) id f3RN67x11637; Fri, 27 Apr 2001 16:06:07 -0700 (PDT) Date: Fri, 27 Apr 2001 16:06:07 -0700 From: Alfred Perlstein To: Terry Lambert Cc: arch@FreeBSD.ORG, terry@lambert.org Subject: Re: KSE threading support (first parts) Message-ID: <20010427160607.M18676@fw.wintelcom.net> References: <200104272306.QAA13810@usr02.primenet.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <200104272306.QAA13810@usr02.primenet.com>; from tlambert@primenet.com on Fri, Apr 27, 2001 at 11:06:01PM +0000 X-all-your-base: are belong to us. Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG * Terry Lambert [010427 15:56] wrote: > > In other words, the complexity of the model which Jason Evans > arrived at from the Big Threads Design Meeting in Foster City > that about 120 of us attended, is optimal for achieving my > design goals, without going to ascyn call gates. The way I envision async call gates is something like each syscall could borrow a spare pcb and do an rfork back into the user application using the borrowed pcb and allowing the syscall to proceed as scheduled as another kernel thread, upon return it would somehow notify the process of completion. > IMO, going to async call gates could result in as much as an > additional 25% improvement in performance. Unfortunately, it > would also mean a lot of extra overhead to ensure binary > backward compatability, since it would put all of the standard > POSIX semantics into the libraries, and the default behaviour > would be completely asynchronous. Binary compatability would > mean using a different INT than INT 0x80 for system calls, and > putting backward compatability cruft into a module to support > any binary which some moron decided to link static because they > thought that linking it static makes it somehow safer from single > file damage failure than using libc.so and ld.so. > > My ideal implementation would use async call gates. In effect, > this would be the same as implementing VMS ASTs in all of FreeBSD. Actually, why not just have a syscall that turns on the async behavior? > That all said, the current project, as it was envisioned by Jason > Evans, does not have the limitations which you and Nate fear, > unless you cop out on the implementation, and do what Julian and > Archie wanted with KSEG non-migration of KSEs (and the concommitant > single scheduler run queue for all CPUs), or if you take Matt's > approach, and serialize execution everywhere, not just within a > particular KSEG (Matt's approach prevents the need for a single > process to be able to exist on a run queue as more than one entry > instance, which makes some things easier). > > I would call both the Matt approach, and the Julian/Archie approach > "overly conservative"; they are both in excess of 12 years behind > the state of the Art. But I would also level the same criticism > at using the 6 year old technology of scheduler activations to > avoid going to true async call gates. > > > In any case, you and Nate are getting upset at shortcuts that > people want to take in implementation, not at the design itself. > > Cut it out. Well if we have an implementation where the implementators are unwilling or incapable (because of time constraints, or getting hit by a bus, etc) of doing the more optimized version then what's the point besideds getting more IO concurrancy? I don't know, it just that if someone has a terrific idea that seems to have astounding complexity and they don't feel like they want to or can take the final step with it, then it really should not be considered. btw, I've read some on scheduler activations, where some references on async call gates? -- -Alfred Perlstein - [alfred@freebsd.org] Represent yourself, show up at BABUG http://www.babug.org/ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Fri Apr 27 18:28: 9 2001 Delivered-To: freebsd-arch@freebsd.org Received: from mail.viasoft.com.cn (unknown [61.153.1.177]) by hub.freebsd.org (Postfix) with ESMTP id 777DC37B422; Fri, 27 Apr 2001 18:28:01 -0700 (PDT) (envelope-from bsddiy@163.net) Received: from xyf ([192.168.1.204]) by mail.viasoft.com.cn (8.9.3/8.9.3) with ESMTP id JAA02080; Sat, 28 Apr 2001 09:24:43 +0800 Message-ID: <001e01c0cf82$ac300460$cc01a8c0@xyf> From: "David Xu" To: "Matt Dillon" , "Nate Williams" Cc: "David O'Brien" , "Julian Elischer" , , "Daniel Eischen" References: <3AE71067.FF4BD029@elischer.org><20010425110940.L1790@fw.wintelcom.net><3AE85776.92D6BD90@elischer.org><20010426120630.A92915@dragon.nuxi.com><200104270015.f3R0FAi62512@earth.backplane.com><15081.39397.944224.776391@nomad.yogotech.com><200104271701.f3RH1Tk05185@earth.backplane.com> <15081.42735.860662.876478@nomad.yogotech.com> <200104271717.f3RHHGp05457@earth.backplane.com> Subject: Re: KSE threading support (first parts) Date: Sat, 28 Apr 2001 09:28:31 +0800 X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 5.00.2615.200 X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2615.200 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG ----- Original Message ----- From: Matt Dillon To: Nate Williams Cc: David O'Brien ; Julian Elischer ; ; Daniel Eischen Sent: Saturday, April 28, 2001 1:17 AM Subject: Re: KSE threading support (first parts) > > You seem to believe that not being able to run KSE's for the same > process concurrently somehow kills the whole concept of SMP. > > Well, that's complete bullshit. KSE's are extremely short-running > affairs in kernel mode, especially when you consider the most likely > asynchronizing case (a simple blocking situation that will most commonly > be in a read() or write()). Serializing them within the context of a > single process will actually *IMPROVE* SMP performance, not make it worse. No, most multi-threaded programs use threads to improve concurrent I/O, think about MySQL, why our version is slower than their Linux version? because we serialize read()/write() in a multi-threaded program, this is our failure. > Running multiple kernel contexts for the same process on different > cpu's concurrently means that you must now lock every single aspect > of the 'current process' concept, and cannot make any assumptions > whatsoever in regards to accessing elements of the current process. > Well, that's just plain insane. You will wind up with so many fragging > locks and mutexes in the kernel that what performance gain you might > have thought you could get is now completely blown away by the locking > overhead. > Yes, it must be done! I believe FreeBSD-current already make proc structure and many other resources to support concurrent access. I don't think there will have many new lock-down should be made. a KSE is a scheduler unit, just like a proc in running queue, I don't see there are so many different on this concept. > This is another aspect of the problem you run into when you start > trying to preempt a process running in the kernel arbitrarily. Suddenly > all the assumptions you were able to make before that resulted in > optimal code paths now must be thrown out the window and replaced with > a godaweful number of locks to protect kernel contexts from unexpected > interruptions. That's insane as well. You are introducing a 'solution' > to a problem that doesn't exist and breaking any chance we have of > getting a reliable kernel in anything less then a few years in the process. > > If we were writing a kernel completely from scratch we could probably > construct it to allow these things, but trying to do it with the current > base is impossible -- you will never get something reliable or efficient > at the end of this road. Or perhaps I should phrase it: The only way > you will get anything close to reliable will be to effectively revert > the system to the days of the single giant lock, because you will need > so many fraggin locks to deal with the consequences you might as well > have a single big giant lock. > > -Matt a well designed multi-thread program can cleanly dispatch its internal tasks to different threads, it will avoid collision on its internal resources. BGL is a joke and bogus for SMP, don't talk about it. Regards, David Xu To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Fri Apr 27 18:37: 7 2001 Delivered-To: freebsd-arch@freebsd.org Received: from mail.viasoft.com.cn (unknown [61.153.1.177]) by hub.freebsd.org (Postfix) with ESMTP id 0599537B422 for ; Fri, 27 Apr 2001 18:37:02 -0700 (PDT) (envelope-from bsddiy@163.net) Received: from xyf ([192.168.1.204]) by mail.viasoft.com.cn (8.9.3/8.9.3) with ESMTP id JAA02128; Sat, 28 Apr 2001 09:34:37 +0800 Message-ID: <002c01c0cf83$f3d594a0$cc01a8c0@xyf> From: "David Xu" To: "Alfred Perlstein" , "Nate Williams" Cc: "Daniel Eischen" , "Matt Dillon" , "Julian Elischer" , References: <15081.50170.297579.938254@nomad.yogotech.com> <20010427130826.G18676@fw.wintelcom.net> <15081.53821.755743.746621@nomad.yogotech.com> <20010427133433.H18676@fw.wintelcom.net> Subject: Re: KSE threading support (first parts) Date: Sat, 28 Apr 2001 09:38:29 +0800 X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 5.00.2615.200 X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2615.200 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG ----- Original Message ----- From: Alfred Perlstein To: Nate Williams Cc: Daniel Eischen ; Matt Dillon ; Julian Elischer ; Sent: Saturday, April 28, 2001 4:34 AM Subject: Re: KSE threading support (first parts) > * Nate Williams [010427 13:14] wrote: > > > > > > Well, that's complete bullshit. KSE's are extremely short-running > > > > > > affairs in kernel mode, especially when you consider the most likely > > > > > > asynchronizing case (a simple blocking situation that will most commonly > > > > > > be in a read() or write()). > > > > > > > > > > Not necessarily. My experience with developing and running applications > > > > > on Solaris says that having multiple KSE's/process is a *huge* win. > > > > > > > > You do know that the proposed implementation isn't quite like > > > > Solaris (KSEs don't get their own quantum). You better holler > > > > if you want it ;-) > > > > > > There's two things on the issue that I'd like to bring up. > > > > > > The concepts are cool, however the implementation you guys are > > > discussion really hurt my head, not in a bad way, but conceptually > > > the concepts look quite daunting. Kudos if you guys get it done > > > though! > > > > > > Being able to have threads used in a "this application wants to > > > utilize _all_ available system reasources" meaning if you have > > > more than one processor, I want to see mysql, apache, whatever > > > using it (by default!). If your model doesn't include this then > > > please don't bother continuing, the stability issues versus the > > > gain don't work for me at all. > > > > Having 'serialized' KSE's (which Matt wants) means that an application > > will be *UNABLE* to use all of the system resources, because only one > > thread in threaded application (apache, mysql, etc..) is allowed to run > > at one time, no matter how many CPU's are there. > > It doesn't seem like that's what Daniel is saying, which is that > the default will be like this, but that applications or the startup > code will have the choice. > > However that's true then we might as well scrap the project, it > just brings the complexity out of userland and into the kernel, > sure we can schedule IO better, but then we might as well cop out > and use aio and some special signal system for handling faults back > into the uts. It's just a lot simpler to go with rfork threads or > a simpler model than all this complexity just to satisfy Terry's > view of who should get what quantum. Honestly if you ask anyone > they expect to be able to cheat with threads the same way they > cheat by using multiple processes to gain additional CPU. > so you ignore NxN and MxN thread model discuss, and follow fuck Linux thread model design. maybe I can not expect such advance feature will be in Free OS. sigh, where is Jason Evans? we need your help. David Xu To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Fri Apr 27 18:37:24 2001 Delivered-To: freebsd-arch@freebsd.org Received: from dragon.nuxi.com (trang.nuxi.com [209.152.133.57]) by hub.freebsd.org (Postfix) with ESMTP id 1E71D37B422 for ; Fri, 27 Apr 2001 18:37:23 -0700 (PDT) (envelope-from obrien@NUXI.com) Received: (from obrien@localhost) by dragon.nuxi.com (8.11.3/8.11.1) id f3S1as627839; Fri, 27 Apr 2001 18:36:54 -0700 (PDT) (envelope-from obrien) Date: Fri, 27 Apr 2001 18:36:53 -0700 From: "David O'Brien" To: David Xu Cc: Arch@FreeBSD.ORG Subject: Re: KSE threading support (first parts) Message-ID: <20010427183653.A23824@dragon.nuxi.com> Reply-To: obrien@FreeBSD.ORG References: <3AE71067.FF4BD029@elischer.org><20010425110940.L1790@fw.wintelcom.net><3AE85776.92D6BD90@elischer.org><20010426120630.A92915@dragon.nuxi.com><200104270015.f3R0FAi62512@earth.backplane.com><15081.39397.944224.776391@nomad.yogotech.com><200104271701.f3RH1Tk05185@earth.backplane.com> <15081.42735.860662.876478@nomad.yogotech.com> <200104271717.f3RHHGp05457@earth.backplane.com> <001e01c0cf82$ac300460$cc01a8c0@xyf> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <001e01c0cf82$ac300460$cc01a8c0@xyf>; from bsddiy@163.net on Sat, Apr 28, 2001 at 09:28:31AM +0800 X-Operating-System: FreeBSD 5.0-CURRENT Organization: The NUXI BSD group X-Pgp-Rsa-Fingerprint: B7 4D 3E E9 11 39 5F A3 90 76 5D 69 58 D9 98 7A X-Pgp-Rsa-Keyid: 1024/34F9F9D5 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Sat, Apr 28, 2001 at 09:28:31AM +0800, David Xu wrote: > BGL is a joke and bogus for SMP, don't talk about it. *sigh*. It is certainly [one of] the slowest implementations. But it isn't a total joke. If your process mix is say userland computationally bound (say statistical simulations), FreeBSD 4's BGL is fine. T D.XU DON'T TALK ABSOLUTES K PLZ THNX To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Fri Apr 27 19:21:20 2001 Delivered-To: freebsd-arch@freebsd.org Received: from fw.wintelcom.net (ns1.wintelcom.net [209.1.153.20]) by hub.freebsd.org (Postfix) with ESMTP id E521B37B422 for ; Fri, 27 Apr 2001 19:21:18 -0700 (PDT) (envelope-from bright@fw.wintelcom.net) Received: (from bright@localhost) by fw.wintelcom.net (8.10.0/8.10.0) id f3S2L5q16113; Fri, 27 Apr 2001 19:21:05 -0700 (PDT) Date: Fri, 27 Apr 2001 19:21:05 -0700 From: Alfred Perlstein To: David Xu Cc: Nate Williams , Daniel Eischen , Matt Dillon , Julian Elischer , Arch@FreeBSD.ORG Subject: Re: KSE threading support (first parts) Message-ID: <20010427192105.R18676@fw.wintelcom.net> References: <15081.50170.297579.938254@nomad.yogotech.com> <20010427130826.G18676@fw.wintelcom.net> <15081.53821.755743.746621@nomad.yogotech.com> <20010427133433.H18676@fw.wintelcom.net> <002c01c0cf83$f3d594a0$cc01a8c0@xyf> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <002c01c0cf83$f3d594a0$cc01a8c0@xyf>; from bsddiy@163.net on Sat, Apr 28, 2001 at 09:38:29AM +0800 X-all-your-base: are belong to us. Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG * David Xu [010427 18:36] wrote: > > so you ignore NxN and MxN thread model discuss, and follow > fuck Linux thread model design. maybe I can not expect such > advance feature will be in Free OS. > sigh, where is Jason Evans? we need your help. David, I'd appreciate you attempting to get a clue before responding to any more of my email. -- -Alfred Perlstein - [alfred@freebsd.org] Instead of asking why a piece of software is using "1970s technology," start asking why software is ignoring 30 years of accumulated wisdom. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Fri Apr 27 20:37:35 2001 Delivered-To: freebsd-arch@freebsd.org Received: from smtp02.primenet.com (smtp02.primenet.com [206.165.6.132]) by hub.freebsd.org (Postfix) with ESMTP id E75EC37B423 for ; Fri, 27 Apr 2001 20:37:22 -0700 (PDT) (envelope-from tlambert@usr08.primenet.com) Received: (from daemon@localhost) by smtp02.primenet.com (8.9.3/8.9.3) id UAA16292; Fri, 27 Apr 2001 20:29:59 -0700 (MST) Received: from usr08.primenet.com(206.165.6.208) via SMTP by smtp02.primenet.com, id smtpdAAApKayZF; Fri Apr 27 20:29:53 2001 Received: (from tlambert@localhost) by usr08.primenet.com (8.8.5/8.8.5) id UAA29358; Fri, 27 Apr 2001 20:37:51 -0700 (MST) From: Terry Lambert Message-Id: <200104280337.UAA29358@usr08.primenet.com> Subject: Re: KSE threading support (first parts) To: bright@wintelcom.net (Alfred Perlstein) Date: Sat, 28 Apr 2001 03:37:50 +0000 (GMT) Cc: tlambert@primenet.com (Terry Lambert), arch@FreeBSD.ORG, terry@lambert.org In-Reply-To: <20010427160607.M18676@fw.wintelcom.net> from "Alfred Perlstein" at Apr 27, 2001 04:06:07 PM X-Mailer: ELM [version 2.5 PL2] MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG > The way I envision async call gates is something like each syscall > could borrow a spare pcb and do an rfork back into the user > application using the borrowed pcb and allowing the syscall to > proceed as scheduled as another kernel thread, upon return it would > somehow notify the process of completion. Close. Effectively, it uses the minimal amount of call context it can get away with, and points the VM space and other stuff back to the process control block, which is shared among all system calls. Some calls, which will never block, immediately return without grabbing a context... they just return the status into the per call status block in user space, as if they had completed asynchronously. Other calls, which may block, run to the point where they would block, and allocate a context then, and return. If they don't end up blocking, they return like a non-blocking call. Calls which will always block, or which may block, and get to the point where a return would be too complicated, allocate a context and return. The context is used by the kernel to continue processing. It contains the address of the user space status block, as well as a copy of the stack of the returning program (think of the one that continues as a "setmp", with the one doing the return as getting a longjmp, where the code it would have run is skipped). The final part is that the context runs to completion at the user space boundary; since the call has already returned, it does not return to user space, instead it stops at the user/kernel boundary, after copying out the completion status into the user space status block. The status block is a simplified version of the aioread/aiowrite status block. A program can just use these calls directly. They can also set a flag to make the call synchornous (as in an aiowait). Finally, a user space threads scheduler can use completion notifications to make scheduling decisions. FFor SMP, you can state that you have the ability to return into user space (e.g. similar to vfork/sfork) multiple times. Each of these represents a "scheduler reservation", where you reserve the right to compete for a quanta. You can also easily implement negafinity for up to 32 processors with three 32 bit unsigned int's in the process block: just don't reserve on a processor where the bit is already set, until you have reserved on all available processors at least once. > > My ideal implementation would use async call gates. In effect, > > this would be the same as implementing VMS ASTs in all of FreeBSD. > > Actually, why not just have a syscall that turns on the async > behavior? Libc will break. It does not expect to have to reap completed system call status blocks to report completion status to the user program. > > In any case, you and Nate are getting upset at shortcuts that > > people want to take in implementation, not at the design itself. > > > > Cut it out. > > Well if we have an implementation where the implementators are > unwilling or incapable (because of time constraints, or getting > hit by a bus, etc) of doing the more optimized version then what's > the point besideds getting more IO concurrancy? I don't know, it > just that if someone has a terrific idea that seems to have astounding > complexity and they don't feel like they want to or can take the > final step with it, then it really should not be considered. The point of threads was to reduce context switch overhead, and to increase the useful work that actually gets done in any given time period, as opposed to spending cycles on system overhead or spinning waiting for a call to complete when you have other, better work to do. Somewhere along the way, it became corrupted into a tool to allow people without very much clue to write programs one-per-connection, instead of building finite state automata, and that corruption has proceeded, until now it's a tool to get SMP scalability. > btw, I've read some on scheduler activations, where some references > on async call gates? You're talking to the originator of the idea. See the -arch archives. Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Fri Apr 27 22:43:37 2001 Delivered-To: freebsd-arch@freebsd.org Received: from filk.iinet.net.au (syncopation-dns.iinet.net.au [203.59.24.29]) by hub.freebsd.org (Postfix) with SMTP id 990D937B422 for ; Fri, 27 Apr 2001 22:43:33 -0700 (PDT) (envelope-from julian@elischer.org) Received: (qmail 20288 invoked by uid 666); 28 Apr 2001 05:46:46 -0000 Received: from i194-025.nv.iinet.net.au (HELO elischer.org) (203.59.194.25) by mail.m.iinet.net.au with SMTP; 28 Apr 2001 05:46:46 -0000 Message-ID: <3AEA5845.D3377794@elischer.org> Date: Fri, 27 Apr 2001 22:42:29 -0700 From: Julian Elischer X-Mailer: Mozilla 4.7 [en] (X11; U; FreeBSD 5.0-CURRENT i386) X-Accept-Language: en, hu MIME-Version: 1.0 To: Nate Williams Cc: Daniel Eischen , Matt Dillon , Arch@FreeBSD.ORG Subject: Re: KSE threading support (first parts) References: <15081.50170.297579.938254@nomad.yogotech.com> <15081.53117.150505.145701@nomad.yogotech.com> Content-Type: text/plain; charset=iso-8859-2 Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Nate Williams wrote: > > > > > Well, that's complete bullshit. KSE's are extremely short-running > > > > affairs in kernel mode, especially when you consider the most likely > > > > asynchronizing case (a simple blocking situation that will most commonly > > > > be in a read() or write()). > > > > > > Not necessarily. My experience with developing and running applications > > > on Solaris says that having multiple KSE's/process is a *huge* win. > > > > You do know that the proposed implementation isn't quite like > > Solaris (KSEs don't get their own quantum). You better holler > > if you want it ;-) > > I'm not sure how much a difference that makes, but to be honest, I > haven't thought about the consequences of it much. :( > > Nate If you implementN LWPs as N KSEGs with a KSE each, they do get their own quanta so it can be arranged to do it either way. -- __--_|\ Julian Elischer / \ julian@elischer.org ( OZ ) World tour 2000-2001 ---> X_.---._/ v To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Fri Apr 27 23: 6:54 2001 Delivered-To: freebsd-arch@freebsd.org Received: from filk.iinet.net.au (syncopation-dns.iinet.net.au [203.59.24.29]) by hub.freebsd.org (Postfix) with SMTP id 35AEA37B423 for ; Fri, 27 Apr 2001 23:06:48 -0700 (PDT) (envelope-from julian@elischer.org) Received: (qmail 20337 invoked by uid 666); 28 Apr 2001 06:10:00 -0000 Received: from i194-025.nv.iinet.net.au (HELO elischer.org) (203.59.194.25) by mail.m.iinet.net.au with SMTP; 28 Apr 2001 06:10:00 -0000 Message-ID: <3AEA5DB7.C9209955@elischer.org> Date: Fri, 27 Apr 2001 23:05:43 -0700 From: Julian Elischer X-Mailer: Mozilla 4.7 [en] (X11; U; FreeBSD 5.0-CURRENT i386) X-Accept-Language: en, hu MIME-Version: 1.0 To: Alfred Perlstein Cc: Terry Lambert , arch@FreeBSD.ORG, terry@lambert.org Subject: Re: KSE threading support (first parts) References: <200104272306.QAA13810@usr02.primenet.com> <20010427160607.M18676@fw.wintelcom.net> Content-Type: text/plain; charset=iso-8859-2 Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Alfred Perlstein wrote: > > * Terry Lambert [010427 15:56] wrote: > > > > In other words, the complexity of the model which Jason Evans > > arrived at from the Big Threads Design Meeting in Foster City > > that about 120 of us attended, is optimal for achieving my > > design goals, without going to ascyn call gates. > > The way I envision async call gates is something like each syscall > could borrow a spare pcb and do an rfork back into the user > application using the borrowed pcb and allowing the syscall to > proceed as scheduled as another kernel thread, upon return it would > somehow notify the process of completion. well if that's async call-gates than we are doing Async call-gates.. it describes how a KSE system works.. here's how I see it. KSE does syscall. KSE blocks in syscall. KSE saves state into a KSEC. KSE returns with an upcall to the UTS. UTS schedules another thread. [time passes] Quantum is exhausted. KSE is pre-empted. Current user stack is munged to look like it did a yield() (unless in critical region) [time passes] Interrupt signals completion of original IO (interrupt level part). Associated KSEC is hung off the KSEG as 'runnable' (And KSEG put on run queue if not already there). [time passes] At next kernel scheduling event where that KSEG is made 'current', KSEC is loaded, syscall is completed and results writen back to process space. User stack is munged to look like the process did a yield() just after returning from syscall. After all runnable KSECs have been run to this state, an upcall is made to UTS reporting all threads that completed syscalls since last report. UTS schedules either suspended original thread or pre-empted thread. > > > IMO, going to async call gates could result in as much as an > > additional 25% improvement in performance. Unfortunately, it > > would also mean a lot of extra overhead to ensure binary > > backward compatability, since it would put all of the standard > > POSIX semantics into the libraries, and the default behaviour > > would be completely asynchronous. Binary compatability would > > mean using a different INT than INT 0x80 for system calls, and > > putting backward compatability cruft into a module to support > > any binary which some moron decided to link static because they > > thought that linking it static makes it somehow safer from single > > file damage failure than using libc.so and ld.so. > > > > My ideal implementation would use async call gates. In effect, > > this would be the same as implementing VMS ASTs in all of FreeBSD. > > Actually, why not just have a syscall that turns on the async > behavior? Basically I don't see the problem.. that's basically what we are doing.... > > > That all said, the current project, as it was envisioned by Jason > > Evans, does not have the limitations which you and Nate fear, > > unless you cop out on the implementation, and do what Julian and > > Archie wanted with KSEG non-migration of KSEs (and the concommitant > > single scheduler run queue for all CPUs), or if you take Matt's > > approach, and serialize execution everywhere, not just within a > > particular KSEG (Matt's approach prevents the need for a single > > process to be able to exist on a run queue as more than one entry > > instance, which makes some things easier). > > > > I would call both the Matt approach, and the Julian/Archie approach > > "overly conservative"; they are both in excess of 12 years behind > > the state of the Art. But I would also level the same criticism > > at using the 6 year old technology of scheduler activations to > > avoid going to true async call gates. define an aysnc call gate and show how it differs from what we are suggesting? We are suggesting that all blocking syscalls be made async. We are also suggesting a reporting mechanism by which completed syscalls are reported. > > > > > > In any case, you and Nate are getting upset at shortcuts that > > people want to take in implementation, not at the design itself. > > > > Cut it out. > > Well if we have an implementation where the implementators are > unwilling or incapable (because of time constraints, or getting > hit by a bus, etc) of doing the more optimized version then what's > the point besideds getting more IO concurrancy? I don't know, it > just that if someone has a terrific idea that seems to have astounding > complexity and they don't feel like they want to or can take the > final step with it, then it really should not be considered. Alfred, if you think this is astoudingly compex, then it shows you need to read it again.. I think it's very simple.. Blocking syscalls are allowed to return to user space via an upcall. Completed (previously blocked) syscalls are reported on scheduler events and possibly at other oportune times. (all of which would be some sort od kernel boundary crossing event) the rest of it is housekeeping to allow this to happen safely, fairly, and concurrently on multiple processors.. > > btw, I've read some on scheduler activations, where some references > on async call gates? > > -- > -Alfred Perlstein - [alfred@freebsd.org] > Represent yourself, show up at BABUG http://www.babug.org/ > > To Unsubscribe: send mail to majordomo@FreeBSD.org > with "unsubscribe freebsd-arch" in the body of the message -- __--_|\ Julian Elischer / \ julian@elischer.org ( OZ ) World tour 2000-2001 ---> X_.---._/ v To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Sat Apr 28 1:48: 4 2001 Delivered-To: freebsd-arch@freebsd.org Received: from filk.iinet.net.au (syncopation-dns.iinet.net.au [203.59.24.29]) by hub.freebsd.org (Postfix) with SMTP id A00E537B43C for ; Sat, 28 Apr 2001 01:47:57 -0700 (PDT) (envelope-from julian@elischer.org) Received: (qmail 20651 invoked by uid 666); 28 Apr 2001 08:51:10 -0000 Received: from i179-143.nv.iinet.net.au (HELO elischer.org) (203.59.179.143) by mail.m.iinet.net.au with SMTP; 28 Apr 2001 08:51:10 -0000 Message-ID: <3AEA837D.C2AE5E8D@elischer.org> Date: Sat, 28 Apr 2001 01:46:53 -0700 From: Julian Elischer X-Mailer: Mozilla 4.7 [en] (X11; U; FreeBSD 5.0-CURRENT i386) X-Accept-Language: en, hu MIME-Version: 1.0 To: Terry Lambert Cc: Alfred Perlstein , arch@FreeBSD.ORG, terry@lambert.org Subject: Re: KSE threading support (first parts) References: <200104280337.UAA29358@usr08.primenet.com> Content-Type: text/plain; charset=iso-8859-2 Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Terry, what you describe here is so similar to what we are planning on doing that the differences could be called "Implementation details" The only difference is that your syscall returns to where it came from when it would have blocked, and in that I'm championning, the particular thread is left suspended, and control returns to the UTS via an upcall. Terry Lambert wrote: > > > The way I envision async call gates is something like each syscall > > could borrow a spare pcb and do an rfork back into the user > > application using the borrowed pcb and allowing the syscall to > > proceed as scheduled as another kernel thread, upon return it would > > somehow notify the process of completion. > > Close. Effectively, it uses the minimal amount of call context > it can get away with, and points the VM space and other stuff > back to the process control block, which is shared among all > system calls. > > > The context is used by the kernel to continue processing. It > contains the address of the user space status block, as well as > a copy of the stack of the returning program (think of the one > that continues as a "setmp", with the one doing the return as > getting a longjmp, where the code it would have run is skipped). In Tehh proposed scheme, the original context is save in exactly the same way it would be if the process blocked, but instead of scheduling a new process to run, teh same process continues to run, in the same KSE, having done a longjmp() to a saved context that simply returns to the UTS. The original context is saved into a KSEC structure and it is hung on the appropriate sleep/wait queue. The returning new context is very small (a couple of entries on a small kernel stack) It doesn't need to know anything about the syscall that just blocked. (there may not even have been one) When the KSE was created, one of the argumants was the address of a mailbox used by that KSE to communicate with the UTS. The status of the blocked syscall/thread will be available to the UTS via that mailbox. > > The final part is that the context runs to completion at the > user space boundary; since the call has already returned, it does > not return to user space, instead it stops at the user/kernel > boundary, after copying out the completion status into the > user space status block. ditto. > > The status block is a simplified version of the aioread/aiowrite > status block. > > A program can just use these calls directly. They can also set a > flag to make the call synchornous (as in an aiowait). Finally, a > user space threads scheduler can use completion notifications to > make scheduling decisions. In our case that act of creating a KSE enables 'kse mode' in which case syscalls always look to the thread concerned, as if they completed normally, but there may have been intervening upcalls to the UTS between the time the syscall was dispatched and the time it was completed. libc MAY not need the syscall stubs changed at all. > > For SMP, you can state that you have the ability to return into > user space (e.g. similar to vfork/sfork) multiple times. Each > of these represents a "scheduler reservation", where you reserve > the right to compete for a quanta. > > You can also easily implement negafinity for up to 32 processors > with three 32 bit unsigned int's in the process block: just don't > reserve on a processor where the bit is already set, until you > have reserved on all available processors at least once. > > > > My ideal implementation would use async call gates. In effect, > > > this would be the same as implementing VMS ASTs in all of FreeBSD. > > > > Actually, why not just have a syscall that turns on the async > > behavior? > > Libc will break. It does not expect to have to reap completed > system call status blocks to report completion status to the user > program. In the KSE world, you do not reap syscall results. Your reap runnable threads. Each thread that is runnable is set up by the kernel to look as though it did a yield() on the first machine instruction after the syscall. when you longjmp() to retstart the thread, you have effectlively just retunred from a perfectly normal 'read()' or whatever call. As a thread you cannot tell if you were blocked in that call or not. (the information ay be available to you if you ask for it but the thread's behaviour is nto different in any way. > > > > Well if we have an implementation where the implementators are > > unwilling or incapable (because of time constraints, or getting > > hit by a bus, etc) of doing the more optimized version then what's > > the point besideds getting more IO concurrancy? I don't know, it > > just that if someone has a terrific idea that seems to have astounding > > complexity and they don't feel like they want to or can take the > > final step with it, then it really should not be considered. there are two main reasons for doing the KSE work: 1/ IO concurrancy.. Even with one KSE and one KSEG and one processor you can still have multithreading, with several IO operations outstanding at one time, and still do some processing as well. You could also implement IO concurrency in 'non threaded' programming models using KSEs and IO stubs. 2/ Increase processor concurrency. to be able to run multiple threads in one process context, concurrently on different processors. > > The point of threads was to reduce context switch overhead, and > to increase the useful work that actually gets done in any given > time period, as opposed to spending cycles on system overhead or > spinning waiting for a call to complete when you have other, better > work to do. which is what we are trying to do. > > Somewhere along the way, it became corrupted into a tool to allow > people without very much clue to write programs one-per-connection, > instead of building finite state automata, and that corruption has > proceeded, until now it's a tool to get SMP scalability. "let me introduce you..., Mr. Foot, Mr. Bullet ... Mr. Bullet Mr. Foot" Just because this is true doesn't mean that we should give them tools to do useful threading. > > > btw, I've read some on scheduler activations, where some references > > on async call gates? > > You're talking to the originator of the idea. See the -arch archives. As far as I can see, the difference to what we are suggesting is that in you async call infrastructure, the syscall that has been blocked retunrs through the same code that it waould have retunred through had it not blocked, and that the library must detect this and jump to the UTS inthe 'blocked' case to schedule another thread. -- __--_|\ Julian Elischer / \ julian@elischer.org ( OZ ) World tour 2000-2001 ---> X_.---._/ v To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Sat Apr 28 4: 7: 7 2001 Delivered-To: freebsd-arch@freebsd.org Received: from filk.iinet.net.au (syncopation-dns.iinet.net.au [203.59.24.29]) by hub.freebsd.org (Postfix) with SMTP id 21F4A37B424 for ; Sat, 28 Apr 2001 04:07:04 -0700 (PDT) (envelope-from julian@elischer.org) Received: (qmail 20972 invoked by uid 666); 28 Apr 2001 11:10:18 -0000 Received: from i179-143.nv.iinet.net.au (HELO elischer.org) (203.59.179.143) by mail.m.iinet.net.au with SMTP; 28 Apr 2001 11:10:18 -0000 Message-ID: <3AEAA418.1EE1A1AF@elischer.org> Date: Sat, 28 Apr 2001 04:06:00 -0700 From: Julian Elischer X-Mailer: Mozilla 4.7 [en] (X11; U; FreeBSD 5.0-CURRENT i386) X-Accept-Language: en, hu MIME-Version: 1.0 To: Terry Lambert , Alfred Perlstein , arch@FreeBSD.ORG, terry@lambert.org Subject: Re: KSE threading support (first parts) References: <200104280337.UAA29358@usr08.primenet.com> <3AEA837D.C2AE5E8D@elischer.org> Content-Type: text/plain; charset=iso-8859-2 Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Julian Elischer wrote: > > Terry, what you describe here is so similar to what we are planning > on doing that the differences could be called "Implementation details" [...] > > > > Somewhere along the way, it became corrupted into a tool to allow > > people without very much clue to write programs one-per-connection, > > instead of building finite state automata, and that corruption has > > proceeded, until now it's a tool to get SMP scalability. > > "let me introduce you..., > Mr. Foot, Mr. Bullet ... Mr. Bullet Mr. Foot" > > Just because this is true doesn't mean that we should give them tools > to do useful threading. s/should/shouldn't/ -- __--_|\ Julian Elischer / \ julian@elischer.org ( OZ ) World tour 2000-2001 ---> X_.---._/ v To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Sat Apr 28 6: 9:16 2001 Delivered-To: freebsd-arch@freebsd.org Received: from pcnet1.pcnet.com (pcnet1.pcnet.com [204.213.232.3]) by hub.freebsd.org (Postfix) with ESMTP id BEEF637B423 for ; Sat, 28 Apr 2001 06:09:12 -0700 (PDT) (envelope-from eischen@vigrid.com) Received: (from eischen@localhost) by pcnet1.pcnet.com (8.8.7/PCNet) id JAA03253; Sat, 28 Apr 2001 09:08:32 -0400 (EDT) Date: Sat, 28 Apr 2001 09:08:31 -0400 (EDT) From: Daniel Eischen To: Julian Elischer Cc: Nate Williams , Matt Dillon , Arch@FreeBSD.ORG Subject: Re: KSE threading support (first parts) In-Reply-To: <3AEA5845.D3377794@elischer.org> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Fri, 27 Apr 2001, Julian Elischer wrote: > Nate Williams wrote: > > > > > > > Well, that's complete bullshit. KSE's are extremely short-running > > > > > affairs in kernel mode, especially when you consider the most likely > > > > > asynchronizing case (a simple blocking situation that will most commonly > > > > > be in a read() or write()). > > > > > > > > Not necessarily. My experience with developing and running applications > > > > on Solaris says that having multiple KSE's/process is a *huge* win. > > > > > > You do know that the proposed implementation isn't quite like > > > Solaris (KSEs don't get their own quantum). You better holler > > > if you want it ;-) > > > > I'm not sure how much a difference that makes, but to be honest, I > > haven't thought about the consequences of it much. :( > > > > Nate > > If you implementN LWPs as N KSEGs with a KSE each, they do get > their own quanta so it can be arranged to do it either way. As long as I am allowed to implement it this way in libpthread then I don't really have a problem. -- Dan Eischen To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Sat Apr 28 13:38:40 2001 Delivered-To: freebsd-arch@freebsd.org Received: from critter.freebsd.dk (critter.freebsd.dk [212.242.86.163]) by hub.freebsd.org (Postfix) with ESMTP id 44E8F37B424; Sat, 28 Apr 2001 13:38:36 -0700 (PDT) (envelope-from phk@critter.freebsd.dk) Received: from critter (localhost [127.0.0.1]) by critter.freebsd.dk (8.11.3/8.11.3) with ESMTP id f3SKcPU29884; Sat, 28 Apr 2001 22:38:25 +0200 (CEST) (envelope-from phk@critter.freebsd.dk) To: Robert Watson Cc: freebsd-arch@FreeBSD.ORG Subject: Re: jailNG In-Reply-To: Your message of "Mon, 23 Apr 2001 14:29:22 EDT." Date: Sat, 28 Apr 2001 22:38:25 +0200 Message-ID: <29882.988490305@critter> From: Poul-Henning Kamp Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG I'm not uninterested in jails, but I have no time (and no contracts to give me time) for it at present. In general I think jail is in much more capable hands with you anyway :-) Poul-Henning In message , Robe rt Watson writes: > >This weekend I was spending some time tweaking the jail(8) code to improve >it's SMPng-happiness as well as manageability. Unfortunately, I ended up >rewriting it in the process :-). I changed the model somewhat so that >jails are now persistently configred, joined, et al, and broke out the >chroot() from the creation/joining process, as with increased namespaces >(such as System V IPC) creating a nice clean failure was increasingly >difficult. Aspects of individual jails may now be managed using sysctl's, >which appears to work reasonably well. Clearly there's a lot of work left >to do, but I'd appreciate comments if people are interested: > > http://www.watson.org/~robert/jailng/ > >Simple example: > >dev# ./jailctl >usage: > jailctl create [jailname] > jailctl destroy [jailname] > jailctl join [jailname] [-c chrootpath] [path] [cmd] [args...] >dev# ./jailctl create test >dev# sysctl -a | grep jail >jail.instance.test.sysvipc_permitted: 0 >jail.instance.test.set_hostname_permitted: 1 >jail.instance.test.socket_ipv4_permitted: 1 >jail.instance.test.socket_unix_permitted: 1 >jail.instance.test.socket_route_permitted: 1 >jail.instance.test.socket_other_permitted: 0 >jail.instance.test.ipv4addr: 0 >dev# ./jailctl join test -c /tmp /bin/sh ># ps ax > PID TT STAT TIME COMMAND > 907 d0 DWJ 0:00.02 /bin/sh > 908 d0 RW+J 0:00.00 ps ax ># exit >dev# ./jailctl destroy test >dev# > >I also have a jailinit(8) in the works which would allow improved >startup/shutdown in the style of init(8) (sans the whole sigchild thing). >Another feature I'd like to add is a jail signal call that allows a signal >to be delivered to all processes inside a jail from outside, allowing an >easier forceable shutdown. > >Robert N M Watson FreeBSD Core Team, TrustedBSD Project >robert@fledge.watson.org NAI Labs, Safeport Network Services > > >To Unsubscribe: send mail to majordomo@FreeBSD.org >with "unsubscribe freebsd-arch" in the body of the message > -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Sat Apr 28 16:50:21 2001 Delivered-To: freebsd-arch@freebsd.org Received: from sasami.jurai.net (sasami.jurai.net [64.0.106.45]) by hub.freebsd.org (Postfix) with ESMTP id DC53B37B42C; Sat, 28 Apr 2001 16:50:18 -0700 (PDT) (envelope-from scanner@jurai.net) Received: from localhost (scanner@localhost) by sasami.jurai.net (8.9.3/8.8.7) with ESMTP id TAA85008; Sat, 28 Apr 2001 19:49:59 -0400 (EDT) Date: Sat, 28 Apr 2001 19:49:59 -0400 (EDT) From: To: Poul-Henning Kamp Cc: Robert Watson , freebsd-arch@FreeBSD.ORG Subject: Re: jailNG In-Reply-To: <29882.988490305@critter> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG It is my understanding from the OpenRoot project that jail currently does not allow ICMP to work inside a jail? If this is so, this seriously damages services that need Path MTU-D such as SMTP and HTTP. Surely this is not the case? Can someone enlighten me on this. Thanks ============================================================================= -Chris Watson (316) 326-3862 | FreeBSD Consultant, FreeBSD Geek Work: scanner@jurai.net | Open Systems Inc., Wellington, Kansas Home: scanner@deceptively.shady.org | http://open-systems.net ============================================================================= WINDOWS: "Where do you want to go today?" LINUX: "Where do you want to go tomorrow?" BSD: "Are you guys coming or what?" ============================================================================= irc.openprojects.net #FreeBSD -Join the revolution! ICQ: 20016186 To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Sat Apr 28 16:54: 2 2001 Delivered-To: freebsd-arch@freebsd.org Received: from fledge.watson.org (fledge.watson.org [204.156.12.50]) by hub.freebsd.org (Postfix) with ESMTP id BA14437B423 for ; Sat, 28 Apr 2001 16:54:00 -0700 (PDT) (envelope-from robert@fledge.watson.org) Received: from fledge.watson.org (robert@fledge.pr.watson.org [192.0.2.3]) by fledge.watson.org (8.11.3/8.11.3) with SMTP id f3SNsHf06257; Sat, 28 Apr 2001 19:54:17 -0400 (EDT) (envelope-from robert@fledge.watson.org) Date: Sat, 28 Apr 2001 19:54:17 -0400 (EDT) From: Robert Watson X-Sender: robert@fledge.watson.org To: scanner@jurai.net Cc: Poul-Henning Kamp , freebsd-arch@FreeBSD.ORG Subject: Re: jailNG In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Sat, 28 Apr 2001 scanner@jurai.net wrote: > It is my understanding from the OpenRoot project that jail currently > does not allow ICMP to work inside a jail? If this is so, this seriously > damages services that need Path MTU-D such as SMTP and HTTP. Surely this > is not the case? Can someone enlighten me on this. The jail() code doesn't allow user applications to open raw sockets permitting direct use of ICMP by user processes, but all of the normal use of ICMP by the network stack directly is uninhibited. This means that things like PMTU discovery work just fine, but applications such as ping do not work in jail(). It's possible to imagine modifications to the raw socket behavior that might permit use of it from within jail(), but there's a whole can of worms there that we're not willing to spend too much time on at this point. Robert N M Watson FreeBSD Core Team, TrustedBSD Project robert@fledge.watson.org NAI Labs, Safeport Network Services To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Sat Apr 28 17: 2:32 2001 Delivered-To: freebsd-arch@freebsd.org Received: from sasami.jurai.net (sasami.jurai.net [64.0.106.45]) by hub.freebsd.org (Postfix) with ESMTP id 623E937B423; Sat, 28 Apr 2001 17:02:24 -0700 (PDT) (envelope-from scanner@jurai.net) Received: from localhost (scanner@localhost) by sasami.jurai.net (8.9.3/8.8.7) with ESMTP id UAA85197; Sat, 28 Apr 2001 20:02:23 -0400 (EDT) Date: Sat, 28 Apr 2001 20:02:23 -0400 (EDT) From: To: Robert Watson Cc: freebsd-arch@FreeBSD.ORG Subject: Re: jailNG In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Sat, 28 Apr 2001, Robert Watson wrote: > The jail() code doesn't allow user applications to open raw sockets > permitting direct use of ICMP by user processes, but all of the normal use > of ICMP by the network stack directly is uninhibited. This means that > things like PMTU discovery work just fine, but applications such as ping > do not work in jail(). It's possible to imagine modifications to the raw > socket behavior that might permit use of it from within jail(), but > there's a whole can of worms there that we're not willing to spend too > much time on at this point. Ok. I wasn't sure. I couldnt believe it would block ICMP. I knew there was a logical system with its behaviour. I actually like the current way then. I see jail as a virtual hosting env. more then anything else. Thanks for the explanation. ============================================================================= -Chris Watson (316) 326-3862 | FreeBSD Consultant, FreeBSD Geek Work: scanner@jurai.net | Open Systems Inc., Wellington, Kansas Home: scanner@deceptively.shady.org | http://open-systems.net ============================================================================= WINDOWS: "Where do you want to go today?" LINUX: "Where do you want to go tomorrow?" BSD: "Are you guys coming or what?" ============================================================================= irc.openprojects.net #FreeBSD -Join the revolution! ICQ: 20016186 To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Sat Apr 28 17:30:46 2001 Delivered-To: freebsd-arch@freebsd.org Received: from fledge.watson.org (fledge.watson.org [204.156.12.50]) by hub.freebsd.org (Postfix) with ESMTP id 013C337B422 for ; Sat, 28 Apr 2001 17:30:44 -0700 (PDT) (envelope-from robert@fledge.watson.org) Received: from fledge.watson.org (robert@fledge.pr.watson.org [192.0.2.3]) by fledge.watson.org (8.11.3/8.11.3) with SMTP id f3T0VDf06567; Sat, 28 Apr 2001 20:31:13 -0400 (EDT) (envelope-from robert@fledge.watson.org) Date: Sat, 28 Apr 2001 20:31:13 -0400 (EDT) From: Robert Watson X-Sender: robert@fledge.watson.org To: scanner@jurai.net Cc: freebsd-arch@FreeBSD.ORG Subject: Re: jailNG In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Sat, 28 Apr 2001 scanner@jurai.net wrote: > On Sat, 28 Apr 2001, Robert Watson wrote: > > > The jail() code doesn't allow user applications to open raw sockets > > permitting direct use of ICMP by user processes, but all of the normal use > > of ICMP by the network stack directly is uninhibited. This means that > > things like PMTU discovery work just fine, but applications such as ping > > do not work in jail(). It's possible to imagine modifications to the raw > > socket behavior that might permit use of it from within jail(), but > > there's a whole can of worms there that we're not willing to spend too > > much time on at this point. > > Ok. I wasn't sure. I couldnt believe it would block ICMP. I knew there > was a logical system with its behaviour. I actually like the current way > then. I see jail as a virtual hosting env. more then anything else. > Thanks for the explanation. Yeah -- there are three basic function of jail(): 1) Cause the suser() call to work only in a specifically designated set of cases (using the PRISON_ROOT flags to suser_xxx() permits the call to succeed in jail()). 2) Institute a set of simply mandatory inter-process restrictions preventing a set of inter-process operations from taking place (such as debugging, signalling, et al from within the jail to outside the jail()). 3) Rewrite or block certain socket operations so that listen and connect operations in the IP space use the IP designated for the jail(), and so that access to some other protocols (such as IPv6) can be disabled. This in effect provides a simple form of poly-instantiation for the localhost address (by substituting the jail IP for 127.0.0.1), but also has some other side effects that I'm looking at ways to remedy in the future (hopefuly without virtualizing the entire stack, which would work, but has a lot of negative sides.) The access to ICMP is a property of (1), not of (3). Jail's impact on the IP stack happens almost exclusively as part of the socket implementation, and doesn't get down into the network layer much. Robert N M Watson FreeBSD Core Team, TrustedBSD Project robert@fledge.watson.org NAI Labs, Safeport Network Services To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message