Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 26 Apr 1998 06:17:17 +0000 (GMT)
From:      Terry Lambert <tlambert@primenet.com>
To:        toshok@Hungry.COM (Christoph Toshok)
Cc:        freebsd-hackers@FreeBSD.ORG
Subject:   Re: threads performance
Message-ID:  <199804260617.XAA27669@usr05.primenet.com>
In-Reply-To: <m23ef2vzca.fsf@terror.hungry.com> from "Christoph Toshok" at Apr 25, 98 05:51:33 am

next in thread | previous in thread | raw e-mail | index | archive | help
> I'm working on japhar (the hungry java vm) and I'm primarily using
> freebsd for my work.  One of the central "features" of japhar is that
> it uses platform's thread library -- pthreads on freebsd and linux,
> cthreads on nextstep and the hurd.
> 
> On freebsd the performance is just abysmal.  Really, it's *awful*.
> Just for kicks, I ported the thread api to NSPR (Netscape's Portable
> Runtime) and the runtime for javac compiling a trivial .java file
> drops from 39 seconds to 18 seconds.


This is a long post; sorry about that.  Please read the whole thing;
I give a couple of possible causes for the problem, analysis, fixes
and workarounds, an analysis of why you are probably getting the
NSPR vs. pthreads results you are seeing, and an analysis of why
they mean that kernel threads probably wouldn't help in two of the
three possible root causes.

-- 1 --

I'll assume you are running FreeBSD 2.2.6 or -current, since the
pthreads before those releases is not Draft 4 compliant, and you
should.  If not, it may just be that you are using a buggy libc_r.

-- 2 --

More likely you are doing a compute intensive task, it's not
explicitly calling pthread_yield(), and other threads are not
running concurrently.

The pthreads implementation is a call conversion implemenetation.  It
takes blocking calls and converts them into non-blocking calls plus a
context switch.

One problem here is that an EWOULDBLOCK operation does not necessarily
schedule the operation to be performed in an efficient way; this is a
kernel issue (attempted operations should be considered to be likely to
be reattempted, and normal triggerring, like sequential read-ahead,
should occur as if the EWOULDBLOCK operation and the subsequently
successful call were the same call).  In a general sense, aio is your
friend; there are problems there, too: what should an aioread on a
non-blocking fd do?

-- 3 --

The next most likely thing is that you are doing the same broken thing
the LDAP implementors did in their code.  The problem is that it's not
obviously broken, so it's hard to steer clear of it.

What they did was use getdtablesize(2) and/or sysctl(3) to get the
maximum possible number of fd's, and then pass that as the first
argument to select.

The number was larger than FD_SETSIZE, and, as a result, select(2)
was returning "true" for the fd's off in space (some of which, when
dereferenced, pointed to 0, 1,and 2 as far as the kernel could tell).


> Are there any plans to address the performance of threads in the
> coming weeks/months?

Kernel threads will be supported.  Let me hedge your opinion, here,
though.

-- kernel threads won't improve performance in most cases --

The user space threading causes I/O to be highly concurrent.  In the
general case, threads are used as I/O processing context holders.

What this means is that, for I/O event processing, kernel threads
will not be a big win for you.  Your I/O will be as overlapped as
it can be.

If you are doing compute intensive tasks as a result of I/O, then
kernel threads will increase real concurrency; this assumes tha
the compute intensity is such that a single compute task spans
more than a single quantum -- the amount of time between forcible
context switches.

You can expect a minor benefit to kernel threads on a heavily loaded
system, as well.  This is because your threaded process will compete
as (# of kernel threads) processes for quantum, against the other
processes on the machine (pretty clearly: if you are a threaded
server on a loaded machine, you will be beaten out by servers built
on mulitple processes instead of threads... they will get more
quantum because they have more kernel schedulable entities).

Where kernel threads are expected to shine is SMP, whre you will be
able to concurrently, rather than timesliced round-robin simulated
concurrency, run multiple threads in a single process.

This presumes we do the necessary work on scheduling and CPU
affinity at some point.


All that said, I don't think kernel threads are needed for your
application.  Here's why:


> The fact that NSPR can drop 21 seconds off the
> runtime (in this very contrived example) makes me think that there is
> a lot going on in libc_r that is suboptimal, but perhaps there is just
> no other way to implement things so they conform to the posix spec.

The fact that NSPR can drop 21 seconds off the runtime means that
threading is not your bottleneck, and that kernel threads would
probably help, but only because the code is badly behaved.

NSPR can't implement kernel services that aren't there in the base
OS.  That means that the best it can do is to build upon what's
already there.

Most likely, you either have a run-away program (because of the select()
coding error or a similar problem), OR the NSPR implementation is making
explicit yield calls that the native implementation doesn't because it
assumes a kernel implementation of pthreads.

The fix for the first is as discussed above.

The fix for the second is to look at where the yeilds would occur
in the NSPR based implementation, and probably use:

#ifndef PTHREAD_KERNEL
	pthread_yield();
#endif	/* PTHREAD_KERNEL*/

There to keep the CPU bound thread from interferring with I/O
concurrency.


					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199804260617.XAA27669>