Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 15 Dec 1998 07:32:54 +0000 (GMT)
From:      Terry Lambert <tlambert@primenet.com>
To:        james@westongold.com (James Mansion)
Cc:        tlambert@primenet.com, mal@algonet.se, alk@pobox.com, peter@netplex.com.au, gpalmer@FreeBSD.ORG, marcelk@stack.nl, smp@FreeBSD.ORG
Subject:   Re: Pthreads and SMP
Message-ID:  <199812150732.AAA12768@usr06.primenet.com>
In-Reply-To: <32BABEF63EAED111B2C5204C4F4F5020183A@WGP01> from "James Mansion" at Dec 14, 98 07:27:43 pm

next in thread | previous in thread | raw e-mail | index | archive | help
> So, in what way is an application built from multiple processes with
> shared state in a shared memory segment going to be better than an
> application with multiple kernel threads running in a largely shared
> address space?

I don't understand.  Why do you need to share state?

If you need to share state, then why do you need threads instead of
a select-driven finite state automaton?

I guess when all you have is a threads implementation, everything
looks like a thread...

For a shared context server, the viability of the server is based
on how *little* context you actually have to contend for between
processors.

One of the main tasks we had to do when designing the shared context
work-to-do architecture for the NWU (NetWare for UNIX) product at
Novell was in minimization of collision domains.  It was that, and
the need to contend for resources (mostly quantum) that drove the
decision to *not* use SVR4 style threads, but to use multiple processes
and place the context in a shared memory segment instead.  There were
also issues that this could address that threading could not;
specifically, "hot engine scheduling".  In that scenario, the streams
MUX handed out work to do in LIFO order of calling process with work
to do.  The effect we were looking for was to optimize for the most
data pages in core for a given process (thread/engine) when work
needed to be done.  The load rarely cause CPU time to accumulate on
more than 5 engines simultaneously, even serving 256 client PC's
simultaneously in "packet burst" mode.

> Is this simply an artifact of the heap manager being an additional
> shared data structure that is updated?

If all the address space is shared, then an invalidation of an
area in the address space by a thread on CPU #5 makes updates to
the other CPU's, even though the threads running on those CPU's
may never actually use the data.

It's more an artifact of having a common page table between all
of the processors, such that the same invalidations affect all
of the processors instead of one processor.

I guess you could call this heap, but it's more of a PTDE issue,
if you get down to it.

> If the threads run (largely) in thread-specific heaps, and/or the
> main heap is very thread-aware (likeSmartHeap for SMP, for example)
> where's the beef?

On and around cache-line boundaries actually.  You pretty much
want them one or more cache lines apart.  On many modern Intel
processors, this is 64k -- 16 pages.  You can pretty much guarantee
this statistically by using diffterent mappings -- different
processes -- to implement your contexts for your work-to-do
engines.


> It seems this isn't a complaint about kernel threads per se, but about
> the negative effects of having much state that is updated by multiple
> threads and also read frequently, which is a design issue for the MT
> application.  And one that is there for an app built from multiple
> processes with explicit shared memory, too.

Yes, it's a problem when you share context between threads, no
matter how you do it.  The moral to the story is "don't do that".
If a problem is capable of being parallelized, then it makes sense
to try and compute it in parallel; if it isn't, then no amount of
adding PC's together is going to get you a virtual supercomputer.

The problem space mappable using SMP is not trivially small, but
it's a hell of a lot smaller than the problem space that can be
mapped with 32 times the computations per clock cycle on a linear
uniprocessor.

Just because you have a lot of work to do doesn't mean using threads
will make it faster.  Or SMP, for that matter.


					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-smp" in the body of the message



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199812150732.AAA12768>