Date: Fri, 19 Sep 1997 22:10:13 +0000 (GMT) From: Terry Lambert <tlambert@primenet.com> To: nate@mt.sri.com (Nate Williams) Cc: current@FreeBSD.ORG Subject: Re: FYI: regarding our rfork(2) Message-ID: <199709192210.PAA08418@usr06.primenet.com> In-Reply-To: <199709191956.NAA20377@rocky.mt.sri.com> from "Nate Williams" at Sep 19, 97 01:56:23 pm
next in thread | previous in thread | raw e-mail | index | archive | help
I'm going to reorder this so I can ask something in context... > > The threads are a different issue. I don't disagree with the threads > > stacks being isolated for philosophical reasons -- however it is just > > wrong from a compatibility standpoint. If we had a type of thread > > that had isolated stacks, it would be nice, but that is a different > > exercise. > > One that would be worthwhile, IMHO. :) :) :) The idea of a seperate stack address space mapping for threads has one *major* benefit: the ability to auto-grow threads stacks without resorting to guard pages, statistical protection, and a signal handler that know a lot about sigcontext. The POSIX threading, however, assumes you preallocate (presumably, allocating it out of the heap), and then provide a stack for each thread at the time it is started. So for POSIX threading, this is not necessary. Note: lack of auto-stack growth is one of the worst aspects of POSIX threading, and, IMO, renders it very difficult to use. On the other hand, there's not a lot of reason to not implement the more complex guard-page based mechanisms for auto-stack growth. The benefits over a seperate mapping are that thread context switches between threads in a given process(/kernel schedulable context/kernel thread) are lighter weight, and that auto variables may be passed between threads. This is a significant benefit, which I'm reluctant to give up: the point of using threads as opposed to processes with a shared memory segment and a shared descriptor table is reduced context switch overhead. It is, in fact, the *only* benefit to using threads instead of that architecture. Even so, there are consequences to using threads that are frequently overlooked: o Given N processes on a system, and M threads of execution for a given service you want to run in addition to this, a threaded implementation competes for quantum as 1:(N+1), while a multiple process implementation competes for quantum as M:(N+M). On a heavily loaded system, this means that a multiple process implementation will get a larger share of the available quantum, and will thus complete sooner. o In Solaris and SVR4 kernel threading, where there are M user threads bound to N kernel threads, when M > N, suffers the same unfair competition by program counters for any scheduler alloted quantum. The SVR4/Solaris answer to this problem is to have the application run in a different scheduling class, where it can specify the amount of quantum consumers it wants to compete as with other processes. This is an unsatisfying soloution, mostly because kernel threads block on blocking calls from user threads. This means that I can only ever have N blocking calls outstanding, and a total of (M-N) threads, which are ready to run will not get quantum, regardless of the scheduling class used. It's also unsatisfying from the perspective that I have to give away my quantum and take a kernel context switch in return for the kernel allowing me to make a system call. I think that the only way to satisfactorily address these issues is to allow user threads to migrate between kernel schedulable entities. The kernel schedulable entites are there both to compete for quantum, and to provide SMP scalability, such that seperate user threads from a given application can be scheduled to run concurrently on multiple CPUs. There are additional complex issues of "when do I give away a quantum that the scheduler gave me, and what do I get in return" which are best addressed with call conversion and a cooperative user space scheduling component (think of it from the ideal that "the scheduler gave the quantum to the application, not to the thread"... "once the scheduler gives me a quantum, it's *my* damn quantum!"). > However, although I > understand the reasons, I can also see where doing so makes it *much* > more difficult to write 'correct' threaded programs, where I define > correct as the ability to run w/out stepping on yourself in *all* > cases. Note, I said difficult, not impossible. The cases where you might step on yourself are error cases, so far as I can see. I think that any case where there is an error, it doesn't matter how the error exhibits, your results are suspect. Here's the question: Maybe what you really mean is that a blown stack that doesn't cause an immediate error is harder to detect than a SIGBUS? I think we can agree that this is true. 8-). So it may be harder to initially develop correct code with more pages mapped. On the other hand, it's statistically likely that if the program goes off in the weeds for a memory reference, it will get an umapped page rather than some other threads stack. I think this is an acceptable risk for increased performance without the need for address space remapping on thread context switches, and for working around the scheduling-related overhead issues. After all, lower overhead is what prompted us to use threads in the first place; we must expect to pay for the tradeoff somewhere. Regards, Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199709192210.PAA08418>