From owner-freebsd-arch@FreeBSD.ORG Sun Mar 30 16:11:45 2003 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 4FB9837B401 for ; Sun, 30 Mar 2003 16:11:45 -0800 (PST) Received: from mail.chesapeake.net (chesapeake.net [205.130.220.14]) by mx1.FreeBSD.org (Postfix) with ESMTP id 6AFD043F75 for ; Sun, 30 Mar 2003 16:11:44 -0800 (PST) (envelope-from jroberson@chesapeake.net) Received: from localhost (jroberson@localhost) by mail.chesapeake.net (8.11.6/8.11.6) with ESMTP id h2V0BXv60666; Sun, 30 Mar 2003 19:11:35 -0500 (EST) (envelope-from jroberson@chesapeake.net) Date: Sun, 30 Mar 2003 19:11:33 -0500 (EST) From: Jeff Roberson To: Wes Peters In-Reply-To: <200303301247.34629.wes@softweyr.com> Message-ID: <20030330190104.T64602-100000@mail.chesapeake.net> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: Marc Olzheim cc: arch@freebsd.org Subject: Re: 1:1 Threading implementation. X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 31 Mar 2003 00:12:01 -0000 On Sun, 30 Mar 2003, Wes Peters wrote: > On Wednesday 26 March 2003 01:23, Jeff Roberson wrote: > > On Wed, 26 Mar 2003, Marc Olzheim wrote: > > > On Wed, Mar 26, 2003 at 03:36:57AM -0500, Jeff Roberson wrote: > > > > First, if your application has more threads than cpus it is written > > > > incorrectly. For people who are doing thread pools instead of > > > > event driven IO models they will encounter the same overhead with > > > > M:N as 1:1. I'm not sure what applications are entirely compute and > > > > have more threads than cpus. These are the only ones which really > > > > theoretically benefit. I don't think our threading model should be > > > > designed to optimize poorly thought out applications. > > > > > > Might I suggest that there are 'nice' C++ ways of using > > > thread-classes where both the usual C++ dogmas of readability and > > > reuseability make you easily end up with more threads than cpus... > > > I think that from a userland's point of view, most programmers > > > shouldn't be caring less about how many cpus the machine has their > > > core is running on. > > > > Sure, but in these cases you're not likely to be using them in > > performance critical code. Which means you're not likely to be using > > all of the cpu.. Which means you're going to have to go block in the > > kernel anyway. And so, really what we're talking about is wasted > > memory here. Not even many cpu cycles. > > > > I think people who actually care about performance don't want the M:N > > overhead. 1:1 will be faster for them. > > > > For the rest, well, they didn't care about performance and so why > > should we work so hard to make it marginally faster for them? > > If the gains are marginal it's probably not worth pursuing, especially not > up front. Make it work, then make it work right, then make it work > faster, right? > > This does not mean your statement "if your application has more threads > than cpus it is written incorrectly" is correct. It is true only if > threads are being use to accelerate the application. On the contrary, it > is quite simple to postulate applications that are designed with threads > to accomplish discrete tasks that must run in parallel and which may or > may not compete with one another for resources (cpu, i/o, etc) depending > on the current task load. > The assertion is that performance critical cpu bound applications which use more than one thread per cpu are probably written incorrectly. I'm sort of tired of debating this. I think most people have a reasonable handle on the issue. I realize I didn't give enough context on my comment above. It sounds a bit like 64k is enough memory for everyone. ;-) What I'm arguing is that for most applications having the kernel do context switches isn't so expensive. And having an extra 8k of kernel memory per thread is probably ok. There are even ways to optimize this out without going through all the work of upcalls. Cheers, Jeff