Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 26 Apr 2001 17:37:46 +0000 (GMT)
From:      Terry Lambert <tlambert@primenet.com>
To:        mwm@mired.org (Mike Meyer)
Cc:        herveyw@dynamic-cast.com (Hervey Wilson), markgiglio@yahoo.com (Mark Giglio), chat@FreeBSD.ORG
Subject:   Re: hotmail converted from freeBSD
Message-ID:  <200104261739.KAA24045@usr05.primenet.com>
In-Reply-To: <15080.15992.732520.463161@guru.mired.org> from "Mike Meyer" at Apr 26, 2001 10:27:52 AM

next in thread | previous in thread | raw e-mail | index | archive | help
> Actually, the first one was pretty much what I was looking for - at
> least at this level. It looks like completion ports give you a way to
> tune behavior between the single thread w/select and the
> thread-per-socket model, which is an interesting concept. The real use
> is to adopt the single thread w/select model to an SMP system.  The
> recommended tuning on a uniprocessor system is a single thread.
> 
> The only thing that the completion port model clearly saves you
> compared to the thread-per-socket model is memory resources. It may
> save you context switches compared to some approaches, but that looks
> to be more of a problem with the underlying platform than anything
> else.
> 
> It does add the problems of dealing with threading to the single
> thread w/select model. That's pretty much the cost of using more than
> one CPU, though.
> 
> In theory, the Unix select & thread semantics can generate this kind
> of behavior. I'd be surprised if it actually worked that well,
> though. I'd *not* be surprised if it failed in some strange way.

The reasoning behind I/O completion ports is that it permits
you to do something on completion of I/O, such as initiate yet
another I/O (a "feedme" signal that is delivered reliably as an
event, unlike a UNIX signal, which is merely a persistant
condition).

Effectively, this allows you to provide the equivalent of a
multithreaded program, without having to adopt the stupidity and
context switch overhead of most threads implementations (e.g Linux,
SVR4, etc.) which results from the inability to implement thread
group affinity in the scheduler properly, without resulting in a
starvation deadlock for other processes whose threads are not in
the same group ("process" == "group of threads").

If you look at the FreeBSD threads design, it doesn't suffer from
these problems (barring the "KSEG == CPU affinity" silliness that
pops up in discussion on -arch occasionally).

This _significantly_ saves on context switch overhead; in particular,
it avoid reloading of control register 3, and TLB shootdown, which
would otherwise result in significant  processing overhead, even
when switching between threads in the same group (since you can
never know what thread the scheduler is going to pick, only that
you are being preempted).

Windows Also does something UNIX implementations do not, which is
have the concept of "kernel threads" which do not have attached to
them a virtual address space.  It is the address space changes which
result in the need to reload CR3.

In addition, there is a limitation on the number of LDT's you can
use simultaneously.  This approach permits Windows to support more
LDTs thant a TSS using context switch based OS can use.  Otherwise,
there is a serious limitation on the number of simultaneous TSS
based processes you can run at the same time (8192 minus the
overhead for the system).

FreeBSD does not use TSS based context switching.  Linux used to;
it may or may not, these days, but since they don't document the
internals very well (an artifact of not having any historical
perspective, which has evolved out of their lack of source code
control), it's not worth it for me to go digging in their source
tree for what they've been doing since last Tuesday.

When you are running a threaded Windows program, each simultaneously
running thread in user space has a kernel thread which is providing
its quantum for it; the user space thread is providing the virtual
address space pointer.  In Windows, each user space thread runs in
its own virtual addresss space; however, this address space overlaps
in a complex way, based on order of thread creation, since the
address space is "copy on write" based on threads being created by
other threads.

The upshot of all of this is that I/O completion handlers permit
you to pass events between user space threads, without needing to
marshall data, and thus permit tthreaded processes to scale to an
arbitrarily large number of CPUs, without having to directly
address the affinity issues that badly written threading system
(e.g. Linux, SVR4) must address.

If you actually want to move a data object between these threads,
e.g. as in passing a connection context structure between threads,
the object represented by the data has to be explicitly reinstanced
in the target thread.  This is the downside to Windows threading,
and is really a legacy issue having to do with WIN32S compatability
for threads support in Windows 3.x, prior to Windows 95.


For more information on threading in Microsoft Windows, I suggest
you sign up as an MSDN developer, after which they will provide
you with much more documentation than they provide you on their
web site.  If you can get them to send it to you, you might also
ask for their threading architecture model for their Active Server
Platform (at the time I saw it, it was still named "VIPER").  You
may also want to read _all_ of the documents on their web site that
discuss "rental model", "apartment model", and "freethreading model"
threaded application, and the increasing restrictions on how you
must program for each of them.  Taken together, these documents
will give you a basic overview from which you can deduce a lot of
their internal architecture.

I also suggest that you learn a little bit more about threading,
in general, and how context and task switching work in a protected
mode operating system, in particular.  A good reference for this is:

	Protected Mode Software Architecture
	Tom Shanley
	MindShare, Inc.
	ISBN: 0-201-55447-X

I've also seen another reference recently, which is rather obtuse,
but which I've decided I like, now that I've gotten into it:

	The Indispensible PC Hardware Book (Third Edition)
	Hans-Peter Messmer
	Addison-Wesley
	ISBN 0-201-40399-4

FYI: In my opinion, people who use threads for turning finite state
automata into easier to program linear code execution are just
being incredibly intellectually lazy, and the resulting application
will run slower on everything but SMP hardware, and might even run
slower there, depending on whther their judgment in algorithms was
as poor as their judgement in programming models.


					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-chat" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200104261739.KAA24045>