Date: Sun, 26 Apr 1998 06:17:17 +0000 (GMT) From: Terry Lambert <tlambert@primenet.com> To: toshok@Hungry.COM (Christoph Toshok) Cc: freebsd-hackers@FreeBSD.ORG Subject: Re: threads performance Message-ID: <199804260617.XAA27669@usr05.primenet.com> In-Reply-To: <m23ef2vzca.fsf@terror.hungry.com> from "Christoph Toshok" at Apr 25, 98 05:51:33 am
next in thread | previous in thread | raw e-mail | index | archive | help
> I'm working on japhar (the hungry java vm) and I'm primarily using > freebsd for my work. One of the central "features" of japhar is that > it uses platform's thread library -- pthreads on freebsd and linux, > cthreads on nextstep and the hurd. > > On freebsd the performance is just abysmal. Really, it's *awful*. > Just for kicks, I ported the thread api to NSPR (Netscape's Portable > Runtime) and the runtime for javac compiling a trivial .java file > drops from 39 seconds to 18 seconds. This is a long post; sorry about that. Please read the whole thing; I give a couple of possible causes for the problem, analysis, fixes and workarounds, an analysis of why you are probably getting the NSPR vs. pthreads results you are seeing, and an analysis of why they mean that kernel threads probably wouldn't help in two of the three possible root causes. -- 1 -- I'll assume you are running FreeBSD 2.2.6 or -current, since the pthreads before those releases is not Draft 4 compliant, and you should. If not, it may just be that you are using a buggy libc_r. -- 2 -- More likely you are doing a compute intensive task, it's not explicitly calling pthread_yield(), and other threads are not running concurrently. The pthreads implementation is a call conversion implemenetation. It takes blocking calls and converts them into non-blocking calls plus a context switch. One problem here is that an EWOULDBLOCK operation does not necessarily schedule the operation to be performed in an efficient way; this is a kernel issue (attempted operations should be considered to be likely to be reattempted, and normal triggerring, like sequential read-ahead, should occur as if the EWOULDBLOCK operation and the subsequently successful call were the same call). In a general sense, aio is your friend; there are problems there, too: what should an aioread on a non-blocking fd do? -- 3 -- The next most likely thing is that you are doing the same broken thing the LDAP implementors did in their code. The problem is that it's not obviously broken, so it's hard to steer clear of it. What they did was use getdtablesize(2) and/or sysctl(3) to get the maximum possible number of fd's, and then pass that as the first argument to select. The number was larger than FD_SETSIZE, and, as a result, select(2) was returning "true" for the fd's off in space (some of which, when dereferenced, pointed to 0, 1,and 2 as far as the kernel could tell). > Are there any plans to address the performance of threads in the > coming weeks/months? Kernel threads will be supported. Let me hedge your opinion, here, though. -- kernel threads won't improve performance in most cases -- The user space threading causes I/O to be highly concurrent. In the general case, threads are used as I/O processing context holders. What this means is that, for I/O event processing, kernel threads will not be a big win for you. Your I/O will be as overlapped as it can be. If you are doing compute intensive tasks as a result of I/O, then kernel threads will increase real concurrency; this assumes tha the compute intensity is such that a single compute task spans more than a single quantum -- the amount of time between forcible context switches. You can expect a minor benefit to kernel threads on a heavily loaded system, as well. This is because your threaded process will compete as (# of kernel threads) processes for quantum, against the other processes on the machine (pretty clearly: if you are a threaded server on a loaded machine, you will be beaten out by servers built on mulitple processes instead of threads... they will get more quantum because they have more kernel schedulable entities). Where kernel threads are expected to shine is SMP, whre you will be able to concurrently, rather than timesliced round-robin simulated concurrency, run multiple threads in a single process. This presumes we do the necessary work on scheduling and CPU affinity at some point. All that said, I don't think kernel threads are needed for your application. Here's why: > The fact that NSPR can drop 21 seconds off the > runtime (in this very contrived example) makes me think that there is > a lot going on in libc_r that is suboptimal, but perhaps there is just > no other way to implement things so they conform to the posix spec. The fact that NSPR can drop 21 seconds off the runtime means that threading is not your bottleneck, and that kernel threads would probably help, but only because the code is badly behaved. NSPR can't implement kernel services that aren't there in the base OS. That means that the best it can do is to build upon what's already there. Most likely, you either have a run-away program (because of the select() coding error or a similar problem), OR the NSPR implementation is making explicit yield calls that the native implementation doesn't because it assumes a kernel implementation of pthreads. The fix for the first is as discussed above. The fix for the second is to look at where the yeilds would occur in the NSPR based implementation, and probably use: #ifndef PTHREAD_KERNEL pthread_yield(); #endif /* PTHREAD_KERNEL*/ There to keep the CPU bound thread from interferring with I/O concurrency. Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199804260617.XAA27669>