Date: Thu, 26 Apr 2001 17:37:46 +0000 (GMT) From: Terry Lambert <tlambert@primenet.com> To: mwm@mired.org (Mike Meyer) Cc: herveyw@dynamic-cast.com (Hervey Wilson), markgiglio@yahoo.com (Mark Giglio), chat@FreeBSD.ORG Subject: Re: hotmail converted from freeBSD Message-ID: <200104261739.KAA24045@usr05.primenet.com> In-Reply-To: <15080.15992.732520.463161@guru.mired.org> from "Mike Meyer" at Apr 26, 2001 10:27:52 AM
next in thread | previous in thread | raw e-mail | index | archive | help
> Actually, the first one was pretty much what I was looking for - at > least at this level. It looks like completion ports give you a way to > tune behavior between the single thread w/select and the > thread-per-socket model, which is an interesting concept. The real use > is to adopt the single thread w/select model to an SMP system. The > recommended tuning on a uniprocessor system is a single thread. > > The only thing that the completion port model clearly saves you > compared to the thread-per-socket model is memory resources. It may > save you context switches compared to some approaches, but that looks > to be more of a problem with the underlying platform than anything > else. > > It does add the problems of dealing with threading to the single > thread w/select model. That's pretty much the cost of using more than > one CPU, though. > > In theory, the Unix select & thread semantics can generate this kind > of behavior. I'd be surprised if it actually worked that well, > though. I'd *not* be surprised if it failed in some strange way. The reasoning behind I/O completion ports is that it permits you to do something on completion of I/O, such as initiate yet another I/O (a "feedme" signal that is delivered reliably as an event, unlike a UNIX signal, which is merely a persistant condition). Effectively, this allows you to provide the equivalent of a multithreaded program, without having to adopt the stupidity and context switch overhead of most threads implementations (e.g Linux, SVR4, etc.) which results from the inability to implement thread group affinity in the scheduler properly, without resulting in a starvation deadlock for other processes whose threads are not in the same group ("process" == "group of threads"). If you look at the FreeBSD threads design, it doesn't suffer from these problems (barring the "KSEG == CPU affinity" silliness that pops up in discussion on -arch occasionally). This _significantly_ saves on context switch overhead; in particular, it avoid reloading of control register 3, and TLB shootdown, which would otherwise result in significant processing overhead, even when switching between threads in the same group (since you can never know what thread the scheduler is going to pick, only that you are being preempted). Windows Also does something UNIX implementations do not, which is have the concept of "kernel threads" which do not have attached to them a virtual address space. It is the address space changes which result in the need to reload CR3. In addition, there is a limitation on the number of LDT's you can use simultaneously. This approach permits Windows to support more LDTs thant a TSS using context switch based OS can use. Otherwise, there is a serious limitation on the number of simultaneous TSS based processes you can run at the same time (8192 minus the overhead for the system). FreeBSD does not use TSS based context switching. Linux used to; it may or may not, these days, but since they don't document the internals very well (an artifact of not having any historical perspective, which has evolved out of their lack of source code control), it's not worth it for me to go digging in their source tree for what they've been doing since last Tuesday. When you are running a threaded Windows program, each simultaneously running thread in user space has a kernel thread which is providing its quantum for it; the user space thread is providing the virtual address space pointer. In Windows, each user space thread runs in its own virtual addresss space; however, this address space overlaps in a complex way, based on order of thread creation, since the address space is "copy on write" based on threads being created by other threads. The upshot of all of this is that I/O completion handlers permit you to pass events between user space threads, without needing to marshall data, and thus permit tthreaded processes to scale to an arbitrarily large number of CPUs, without having to directly address the affinity issues that badly written threading system (e.g. Linux, SVR4) must address. If you actually want to move a data object between these threads, e.g. as in passing a connection context structure between threads, the object represented by the data has to be explicitly reinstanced in the target thread. This is the downside to Windows threading, and is really a legacy issue having to do with WIN32S compatability for threads support in Windows 3.x, prior to Windows 95. For more information on threading in Microsoft Windows, I suggest you sign up as an MSDN developer, after which they will provide you with much more documentation than they provide you on their web site. If you can get them to send it to you, you might also ask for their threading architecture model for their Active Server Platform (at the time I saw it, it was still named "VIPER"). You may also want to read _all_ of the documents on their web site that discuss "rental model", "apartment model", and "freethreading model" threaded application, and the increasing restrictions on how you must program for each of them. Taken together, these documents will give you a basic overview from which you can deduce a lot of their internal architecture. I also suggest that you learn a little bit more about threading, in general, and how context and task switching work in a protected mode operating system, in particular. A good reference for this is: Protected Mode Software Architecture Tom Shanley MindShare, Inc. ISBN: 0-201-55447-X I've also seen another reference recently, which is rather obtuse, but which I've decided I like, now that I've gotten into it: The Indispensible PC Hardware Book (Third Edition) Hans-Peter Messmer Addison-Wesley ISBN 0-201-40399-4 FYI: In my opinion, people who use threads for turning finite state automata into easier to program linear code execution are just being incredibly intellectually lazy, and the resulting application will run slower on everything but SMP hardware, and might even run slower there, depending on whther their judgment in algorithms was as poor as their judgement in programming models. Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-chat" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200104261739.KAA24045>