Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 15 Jul 2003 03:09:16 -0700
From:      Terry Lambert <tlambert2@mindspring.com>
To:        David Schultz <das@FreeBSD.ORG>
Cc:        freebsd-threads@FreeBSD.ORG
Subject:   Re: LinuxThreads replacement
Message-ID:  <3F13D2CC.68D9DEC9@mindspring.com>
References:  <007601c3467b$5f20e960$020aa8c0@aims.private> <004d01c348ae$583084f0$812a40c1@PETEX31> <16146.65087.69689.594109@emerger.yogotech.com> <3F13B1B4.8765B8F3@mindspring.com> <20030715082910.GA34696@HAL9000.homeunix.com>

next in thread | previous in thread | raw e-mail | index | archive | help
David Schultz wrote:
> On Tue, Jul 15, 2003, Terry Lambert wrote:
> > Yes, this is somewhat mitigated by the fact that it's easier to write
> > threads code than an FSA, such that a lesser coder is still able to
> > be productive.  As a class, it's a tool I would lump in with things
> > like "perl".
> 
> Actually, event-based programming is usually easier, since it does
> not require synchronization.  A number of people, myself included,
> think that threads are overused, and often used incorrectly.  But
> as Nate pointed out, threads are useful for many purposes, the
> most fundamental of which are SMP scalability and reduced latency.

I really dislike the TLS model that attributes global data
in order to turn it into per-thread data instead.  I realize
that this probably has to go forward due to OpenGL relying
on this model, but it's just the most recent in a long line
of really evil things.  An FSA wouldn't have this problem,
because the data would already be compartmentalized into a
stateite that could be replicated N times without causing
problems.  You could combine that with a kqueue mechanism,
and automatically get the right context handed back to you
any time there was work to do.

The first in that long line is that POSIX signals are not
set to restart by default, as the old BSD 4.2/4.3 signals
were.  The BSD 4.3 release was actually the first release to
even support the idea of system call restar being able to be
switched off (via signinterrupt(2), which came from DEC
Ultrix).  This means a lot of gross masking and unmasking of
signals is required, unless you wrapper all of the signals
implementation in your own, establish handlers for all
signals, and, effectively, have the user space scheduler
rethrow them.  This is Very Evil(tm), because of what it has
to do to deal with implementing the POSIX-mandated per-process
signal mask; sigaltstack glue code would likely be painful as
well.

Add to this that it's mutually impossible to both correctly
implement PTHREAD_SCOPE_PROCESS/PTHREAD_SCOPE_SYSTEM, and at
the same time build a kernel-only implementation, at least
not without rewriting your scheduler and adding implementation
specific knowledge into it, which is exactly the wrong way to
go with the scheduler.  Among other things, it would mean that
you will have starvation issues when trying to enforce thread
affinity on context switches, during trying to keep the thread
cost down where threads are less expensive than processes because
you don't trash your cache lines or TLBs on each switch.  Ingo
Molnar did a lot of work in the Linux scheduler to try and
address this issue for Linux.  FreeBSD doesn't really address it
fully yet.

On top of this, you need to deal with cancellation of things
that are not easily cancelled, to correctly deal with things
like the close(2) races vs. other blocking operations.  You
effectively have to cause an operation that's potentially
many function calls deep before it blocks to fail out properly
(which is what Sun does), or you have to fail the close(2) call
with the allowable failure for a non-zero reference count on
the descriptor - EINTR - which could leave you spinning trying
to close the thing over and over, thinking you were being
interrupted by a signal.  This assumes that you even check the
close(2) return value, which most programmers don't, and as a
result, you'll leak descriptors.  Basically, it wants the code
to be rewritten to flatten out all the call graphs OR to take
all resources up front (The Banker's Algorithm) so that the
cancellation can be handled high up.  I guess there's a third
option, which would be to block the close(2) call until the
reference count on the struct file goes from 2->1, but that's
not really practical, either.  Certainly it's not an expected
behaviour; what's expected these days is the Sun behaviour, and
that's very hard to implement without rewriting.

If you look at some of the descriptor-using functions, the ones
that try to deal with this deal with it by taking a reference;
but they don't reference it down as far as they need to; and
those that do, don't take an object lock on all the intermediate
objects to protect the reference count.  For sockets, for example,
it's not safe to dereference the f_data element and continue to
use it, unless you continue to hold a reference over the time
that you are doing the referencing.  There are some nice comments
about mutexes in socketvar.h, but no mutexes yet.

All of this complexity to deal with the fact that naieve threads
users close descriptors out from under themselves, not really having
their head in the right programming model, and malicious programmers
can crash the system intentionally by exploiting these races.

I'm sure it will all be sorted out eventually, but I'm also sure
that this is just the tip of the iceberg when it comes to the new
problems that having a real threads implementation which isn't
based on a user space call conversion scheduler are going to cause.


> Also, threads are simply the natural programming model for many
> applications.  For instance, you wouldn't rewrite the FreeBSD
> kernel to use one thread per processor with an event loop and
> L4-style continuations, would you?

I might do the continuations; or actually, Mach-style activations
would be a better match.  They would also be inherently safe, in
terms of providing cancellation points.  But you'd still need to
flatten the call graph and/or prereserve your resources.  Doing
the prereserve thing might be the most expedient approach, but
Djikstra's Banker's Algorithm has really poor overall performance,
and you'd really damage concurrency prereserving resources that
you might only need in some obscure corner case.  Best case, you
will end up taking a lot of locks you don't need to to protect
things that don't end up being contended.

Without explicit support for cancellation in the OS primitives,
though, probably you'd want to turn your object locks into SIX locks,
instead of straight mutexes, so that you could implement read/write
locks and intention upgrade locks for things you might need to
write, and explicitly support cancellation flags on a per structure
basis which are protected by a global mutex for write (or a pool
mutex), and are generally readable at an time due to being set/cleared
atomically.

-- Terry



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3F13D2CC.68D9DEC9>