From owner-freebsd-hackers Sat Jul 29 19:16:18 1995 Return-Path: hackers-owner Received: (from majordom@localhost) by freefall.cdrom.com (8.6.11/8.6.6) id TAA10035 for hackers-outgoing; Sat, 29 Jul 1995 19:16:18 -0700 Received: from cs.weber.edu (cs.weber.edu [137.190.16.16]) by freefall.cdrom.com (8.6.11/8.6.6) with SMTP id TAA10029 for ; Sat, 29 Jul 1995 19:16:15 -0700 Received: by cs.weber.edu (4.1/SMI-4.1.1) id AA10428; Sat, 29 Jul 95 20:08:37 MDT From: terry@cs.weber.edu (Terry Lambert) Message-Id: <9507300208.AA10428@cs.weber.edu> Subject: Re: pthreads To: julian@ref.tfs.com (Julian Elischer) Date: Sat, 29 Jul 95 20:08:27 MDT Cc: bakul@netcom.com, freebsd-hackers@freebsd.org In-Reply-To: <199507300010.RAA07970@ref.tfs.com> from "Julian Elischer" at Jul 29, 95 05:10:07 pm X-Mailer: ELM [version 2.4dev PL52] Sender: hackers-owner@freebsd.org Precedence: bulk > Kirk McKusic and co. had a discussion on this topic > when I didi the BSD4.4 course at UCB.. > they were of the opinion that with recent changes to the > efficiency of forking, the answer was to create the new > 'rfork' call, where a forking process can decide what resources it wants > to share with it's child.. > options include: > text space data space, stacks, file descriptor tables etc. Sequent has a call called "sfork", which I implemented in the UnixWare kernel using a proc structure change so you can set the inheritance flag. The point is to inhereit the per process open file table on a fork. There was code posted here to do that using a new sfork system call. Other than global context data (which is seperable in an application that was written to run in a threaded environment anyway), there is no reason to run in a threaded environment that supplies effectively nothing more than system call contexts, unless you like allocating your own limited stacks statically at thread start. Stack sharing is a dumb idea; if everything you mentioned was shared, then what you've invented is vfork without calling exec. That's already been invented. > using this approach, how do you tell two processes that are sharing > all resources from threads? If you can't tell two processes from two threads, then that is *exactly* the supporting argument *against* using threads instead of simply using processes instead. The implementation difference is that the pointer to your global data needs to point explicitly to shared memory (instead of implicitly). A tradeoff between "thread_create" startup code complication and "shmget" startup code complication. The other difference, which shows up only in an SMP environemnt, is that you can more easily independently schedule a process than a thread because of the mutex complications involved. The point of LWP on SunOS was to cause a process to consume as much of its scheduling quantum as it could possibly consume. Thus it avoids the process context switch overhead for as long as possible, which is something to avoid, especially since *that* is where you lose cache locality. The kernel thread implementation buys you minimal benefit in terms of TLB flushing over a full context switch (assuming the threads are otherwise incapable of differentiating themselves). You still eat the register set flushing (on SPARC) and the stack switch and the L1 cache invalidation, etc., etc. A minimal benefit for the added cost, and one that doesn't require that particular implementation to achieve. Where are kernel threads good? 1) As contexts for kernel level tasks and daemons; LFS's cleaner and the standard updated could benefit. So could the implementation of external pagers and CPU emulation for binary compatability. 2) To avoid crossing protection domains for system level daemons; this is a minimal benefit, since things like nfsd and biod have implemented this with alternate technology. 3) For SMP scalability ...*but* only when combined with some form of cooperative scheduling to user space sync-to-async conversions with the thread set internal scheduling. The real benefit is, and continues to be, avoidance of context switch overhead. The use of kernel threads to allow the user space threads to be scheduled on multiple CPU resources is *not* beneficial unless it is combined -- otherwise, you might as well be using seperate processes instead for all the good it will do you. Really, a general async mechanism would be the best "next step", with support for making *any* potentially blocking call into a call queue plus a context switch. This is relatively easy to implement at the system call and libc level. Terry Lambert terry@cs.weber.edu --- Any opinions in this posting are my own and not those of my present or previous employers.