From owner-freebsd-current@FreeBSD.ORG  Thu Jul  7 15:54:52 2011
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
Delivered-To: freebsd-current@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id B80D8106564A;
	Thu,  7 Jul 2011 15:54:52 +0000 (UTC) (envelope-from dudu@dudu.ro)
Received: from mail-qy0-f182.google.com (mail-qy0-f182.google.com
	[209.85.216.182])
	by mx1.freebsd.org (Postfix) with ESMTP id 47FDB8FC15;
	Thu,  7 Jul 2011 15:54:52 +0000 (UTC)
Received: by qyk38 with SMTP id 38so728307qyk.13
	for <multiple recipients>; Thu, 07 Jul 2011 08:54:51 -0700 (PDT)
Received: by 10.224.33.82 with SMTP id g18mr744333qad.105.1310052237133; Thu,
	07 Jul 2011 08:23:57 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.229.90.195 with HTTP; Thu, 7 Jul 2011 08:23:17 -0700 (PDT)
In-Reply-To: <20110707151440.GA75537@troutmask.apl.washington.edu>
References: <20110706170132.GA68775@troutmask.apl.washington.edu>
	<5080.1309971941@critter.freebsd.dk>
	<20110706180001.GA69157@troutmask.apl.washington.edu>
	<4E14A54A.4050106@freebsd.org> <4E155FF9.5090905@FreeBSD.org>
	<20110707151440.GA75537@troutmask.apl.washington.edu>
From: Vlad Galu <dudu@dudu.ro>
Date: Thu, 7 Jul 2011 17:23:17 +0200
Message-ID: <CA+FTnKOu4CoLYM=3Ge1fofw6TcKb04ic94pQuKKaz7i96xFZZg@mail.gmail.com>
To: Steve Kargl <sgk@troutmask.apl.washington.edu>
Content-Type: text/plain; charset=ISO-8859-1
X-Content-Filtered-By: Mailman/MimeDel 2.1.5
Cc: FreeBSD Current <freebsd-current@freebsd.org>, "Hartmann,
	O." <ohartman@zedat.fu-berlin.de>,
	Nathan Whitehorn <nwhitehorn@freebsd.org>, Andriy Gapon <avg@freebsd.org>
Subject: Re: Heavy I/O blocks FreeBSD box for several seconds
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
	<freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>, 
	<mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 07 Jul 2011 15:54:52 -0000

On Thu, Jul 7, 2011 at 5:14 PM, Steve Kargl <
sgk@troutmask.apl.washington.edu> wrote:

> On Thu, Jul 07, 2011 at 10:27:53AM +0300, Andriy Gapon wrote:
> > on 06/07/2011 21:11 Nathan Whitehorn said the following:
> > > On 07/06/11 13:00, Steve Kargl wrote:
> > >> AFAICT, it is a cpu affinity issue.  If I launch n+1 MPI images
> > >> on a system with n cpus/cores, then 2 (and sometimes 3) images
> > >> are stuck on a cpu and those 2 (or 3) images ping-pong on that
> > >> cpu.  I recall trying to use renice(8) to force some load
> > >> balancing, but vaguely remember that it did not help.
> > >
> > > I've seen exactly this problem with multi-threaded math libraries, as
> well.
> >
> > Exactly the same?  Let's see.
> >
> > > Using parallel GotoBLAS on FreeBSD gives terrible performance because
> the
> > > threads keep migrating between CPUs, causing frequent cache misses.
> >
> > So Steve reports that if he has Nthr > Ncpu, then some threads are
> "over-glued"
> > to a particular CPU, which results in sub-optimal scheduling for those
> threads.
> >  I have to guess that Steve would want to see the threads being shuffled
> between
> > CPUs to produce more even CPU load.
>
> I'm using OpenMPI.  These are N > Ncpu processes not threads, and without
> the loss of generality let N = Ncpu + 1.  It is a classic master-slave
> situation where 1 process initializes all others.  The n-1 slave processes
> are then independent of each other.  After 20 minutes or so of number
> crunching, each slave sends a few 10s of KB of data to the master.  The
> master collects all the data, writes it to disk, and then sends the
> slaves the next set of computations to do.  The computations are nearly
> identical, so each slave finishes it task in the same amount of time. The
> problem appears to be that 2 slaves are bound to the same cpu and the
> remaining N - 3 slaves are bound to a specific cpu.  The N - 3 slaves
> finish their task, send data to the master, and then spin (chewing up
> nearly 100% cpu) waiting for the 2 ping-ponging slaves to finishes.
> This causes a stall in the computation.  When a complete computation
> takes days to complete, theses stall become problematic.  So, yes, I
> want the processes to get a more uniform access to cpus via migration
> to other cpus.  This is what 4BSD appears to do.
>
>
Spinning threads are a PITA for any scheduler, it's just that in your case
4BSD computes quantums differently. Is there any way to make the software
sleep instead of spinning?


> > On the other hand, you report that your threads keep being shuffled
> between CPUs
> > (I presume for Nthr == Ncpu case, where Nthr is a count of the
> number-crunching
> > threads).  And I guess that you want them to stay glued to particular
> CPUs.
> >
> > So how is this the same problem?  In fact, it sounds like somewhat
> opposite.
> > The only thing in common is that you both don't like how ULE works.
>
> Well, it may be similar in that N - 2 threads are bound to N - 2
> cpus, and the remaining 2 threads are ping ponging on the last
> remaining cpu.  I suspect that GotoBLAS has a large amount
> communication between threads, and once again the computations
> stalls waiting of the 2 threads to either finish battling for the
> 1 cpu or perhaps the process uses pthread_yield() in some clever
> way to try to get load balancing.
>
> --
> Steve
> _______________________________________________
> freebsd-current@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
>



-- 
Good, fast & cheap. Pick any two.