Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 19 Sep 1997 22:10:13 +0000 (GMT)
From:      Terry Lambert <tlambert@primenet.com>
To:        nate@mt.sri.com (Nate Williams)
Cc:        current@FreeBSD.ORG
Subject:   Re: FYI: regarding our rfork(2)
Message-ID:  <199709192210.PAA08418@usr06.primenet.com>
In-Reply-To: <199709191956.NAA20377@rocky.mt.sri.com> from "Nate Williams" at Sep 19, 97 01:56:23 pm

next in thread | previous in thread | raw e-mail | index | archive | help
I'm going to reorder this so I can ask something in context...

> > The threads are a different issue.  I don't disagree with the threads
> > stacks being isolated for philosophical reasons -- however it is just
> > wrong from a compatibility standpoint.  If we had a type of thread
> > that had isolated stacks, it would be nice, but that is a different
> > exercise.
> 
> One that would be worthwhile, IMHO. :) :) :)

The idea of a seperate stack address space mapping for threads has
one *major* benefit: the ability to auto-grow threads stacks without
resorting to guard pages, statistical protection, and a signal handler
that know a lot about sigcontext.

The POSIX threading, however, assumes you preallocate (presumably,
allocating it out of the heap), and then provide a stack for each
thread at the time it is started.  So for POSIX threading, this is
not necessary.

Note: lack of auto-stack growth is one of the worst aspects of POSIX
threading, and, IMO, renders it very difficult to use.  On the other
hand, there's not a lot of reason to not implement the more complex
guard-page based mechanisms for auto-stack growth.

The benefits over a seperate mapping are that thread context switches
between threads in a given process(/kernel schedulable context/kernel
thread) are lighter weight, and that auto variables may be passed
between threads.

This is a significant benefit, which I'm reluctant to give up: the
point of using threads as opposed to processes with a shared memory
segment and a shared descriptor table is reduced context switch
overhead.  It is, in fact, the *only* benefit to using threads instead
of that architecture.  Even so, there are consequences to using threads
that are frequently overlooked:

o	Given N processes on a system, and M threads of execution
	for a given service you want to run in addition to this, a
	threaded implementation competes for quantum as 1:(N+1),
	while a multiple process implementation competes for quantum
	as M:(N+M).

	On a heavily loaded system, this means that a multiple
	process implementation will get a larger share of the
	available quantum, and will thus complete sooner.

o	In Solaris and SVR4 kernel threading, where there are M
	user threads bound to N kernel threads, when M > N, suffers
	the same unfair competition by program counters for any
	scheduler alloted quantum.

	The SVR4/Solaris answer to this problem is to have the
	application run in a different scheduling class, where it
	can specify the amount of quantum consumers it wants to
	compete as with other processes.

	This is an unsatisfying soloution, mostly because kernel
	threads block on blocking calls from user threads.  This
	means that I can only ever have N blocking calls outstanding,
	and a total of (M-N) threads, which are ready to run will
	not get quantum, regardless of the scheduling class used.

	It's also unsatisfying from the perspective that I have
	to give away my quantum and take a kernel context switch
	in return for the kernel allowing me to make a system call.

I think that the only way to satisfactorily address these issues is
to allow user threads to migrate between kernel schedulable entities.
The kernel schedulable entites are there both to compete for quantum,
and to provide SMP scalability, such that seperate user threads from
a given application can be scheduled to run concurrently on multiple
CPUs.

There are additional complex issues of "when do I give away a quantum
that the scheduler gave me, and what do I get in return" which are
best addressed with call conversion and a cooperative user space
scheduling component (think of it from the ideal that "the scheduler
gave the quantum to the application, not to the thread"... "once the
scheduler gives me a quantum, it's *my* damn quantum!").


> However, although I
> understand the reasons, I can also see where doing so makes it *much*
> more difficult to write 'correct' threaded programs, where I define
> correct as the ability to run w/out stepping on yourself in *all*
> cases.  Note, I said difficult, not impossible.

The cases where you might step on yourself are error cases, so far as
I can see.  I think that any case where there is an error, it doesn't
matter how the error exhibits, your results are suspect.

Here's the question:

Maybe what you really mean is that a blown stack that doesn't cause an
immediate error is harder to detect than a SIGBUS?

I think we can agree that this is true.  8-).  So it may be harder to
initially develop correct code with more pages mapped.  On the other
hand, it's statistically likely that if the program goes off in the
weeds for a memory reference, it will get an umapped page rather than
some other threads stack.

I think this is an acceptable risk for increased performance without
the need for address space remapping on thread context switches, and
for working around the scheduling-related overhead issues.  After all,
lower overhead is what prompted us to use threads in the first place;
we must expect to pay for the tradeoff somewhere.


					Regards,
					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199709192210.PAA08418>