From owner-freebsd-current@FreeBSD.ORG Tue Sep 16 03:48:56 2008 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C768E1065679 for ; Tue, 16 Sep 2008 03:48:56 +0000 (UTC) (envelope-from stephen@math.missouri.edu) Received: from cauchy.math.missouri.edu (cauchy.math.missouri.edu [128.206.184.213]) by mx1.freebsd.org (Postfix) with ESMTP id 92EAE8FC15 for ; Tue, 16 Sep 2008 03:48:56 +0000 (UTC) (envelope-from stephen@math.missouri.edu) Received: from laptop3.gateway.2wire.net (cauchy.math.missouri.edu [128.206.184.213]) by cauchy.math.missouri.edu (8.14.2/8.14.2) with ESMTP id m8G3mCPE008343; Mon, 15 Sep 2008 22:48:13 -0500 (CDT) (envelope-from stephen@math.missouri.edu) Message-ID: <48CF2CA4.1000802@math.missouri.edu> Date: Mon, 15 Sep 2008 22:48:52 -0500 From: Stephen Montgomery-Smith User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US; rv:1.8.1.16) Gecko/20080909 SeaMonkey/1.1.11 MIME-Version: 1.0 To: Steve Kargl References: <48CDBC78.4010409@math.missouri.edu> <20080915195021.GA69528@cons.org> <48CEFF74.8020602@math.missouri.edu> <20080916033459.GA31220@troutmask.apl.washington.edu> <48CF2AEF.9070208@math.missouri.edu> In-Reply-To: <48CF2AEF.9070208@math.missouri.edu> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Martin Cracauer , freebsd-current@freebsd.org Subject: Re: Improved multiprocessor usage on amd64 X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 16 Sep 2008 03:48:56 -0000 Stephen Montgomery-Smith wrote: > Steve Kargl wrote: >> On Mon, Sep 15, 2008 at 07:36:04PM -0500, Stephen Montgomery-Smith wrote: >>> ... and each thread is a loop of the form >>> >>> while (1) { >>> wait until told to start; >>> do massive amounts of floating point arithmetic (only additions and >>> multiplications) on large arrays; >>> tell the master process that you are done; >>> } >>> >>>> Do you have about as many threads as processor or more? >>> Both ways. The time difference between the two approaches is >>> negligible. >>> >> >> Are you using ULE? With my MPI applications, if the number of >> launched processes exceeds the number of cpus by 1, ULE falls >> through the floor. I have a nagging feeling that there is a problem >> with cpu affinity. >> >> http://lists.freebsd.org/pipermail/freebsd-current/2008-July/086917.html >> Let me say a little bit more. I have this gut feeling that the problem has a lot to do with cache management. My program has each thread doing, in effect, huge matrix multiplications, each one working on their own little bit. If a CPU core changes from one thread to another, it then has to flush out the cache to RAM, and read in a whole bunch of other RAM into cache. I have this sense that Linux and FreeBSD have something in its internals where it figures this out, and after a while starts changing the time between when it changes from one process to another. But Linux has a faster learning curve than FreeBSD. But this is all pure speculation on my part, because I have very little ideas as to how these internals work. Stephen