From owner-freebsd-current@FreeBSD.ORG Thu Jul 7 19:42:47 2011 Return-Path: Delivered-To: freebsd-current@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C86801065673; Thu, 7 Jul 2011 19:42:47 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 01FE58FC1B; Thu, 7 Jul 2011 19:42:46 +0000 (UTC) Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id WAA28642; Thu, 07 Jul 2011 22:42:41 +0300 (EEST) (envelope-from avg@FreeBSD.org) Received: from localhost ([127.0.0.1]) by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1QeuTA-0009nz-Qf; Thu, 07 Jul 2011 22:42:40 +0300 Message-ID: <4E160C2F.8020001@FreeBSD.org> Date: Thu, 07 Jul 2011 22:42:39 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:5.0) Gecko/20110706 Thunderbird/5.0 MIME-Version: 1.0 To: Steve Kargl References: <20110706170132.GA68775@troutmask.apl.washington.edu> <5080.1309971941@critter.freebsd.dk> <20110706180001.GA69157@troutmask.apl.washington.edu> <4E14A54A.4050106@freebsd.org> <4E155FF9.5090905@FreeBSD.org> <20110707151440.GA75537@troutmask.apl.washington.edu> In-Reply-To: <20110707151440.GA75537@troutmask.apl.washington.edu> X-Enigmail-Version: 1.2pre Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: FreeBSD Current , "Hartmann, O." , Nathan Whitehorn Subject: Re: Heavy I/O blocks FreeBSD box for several seconds X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 07 Jul 2011 19:42:48 -0000 on 07/07/2011 18:14 Steve Kargl said the following: > On Thu, Jul 07, 2011 at 10:27:53AM +0300, Andriy Gapon wrote: >> on 06/07/2011 21:11 Nathan Whitehorn said the following: >>> On 07/06/11 13:00, Steve Kargl wrote: >>>> AFAICT, it is a cpu affinity issue. If I launch n+1 MPI images >>>> on a system with n cpus/cores, then 2 (and sometimes 3) images >>>> are stuck on a cpu and those 2 (or 3) images ping-pong on that >>>> cpu. I recall trying to use renice(8) to force some load >>>> balancing, but vaguely remember that it did not help. >>> >>> I've seen exactly this problem with multi-threaded math libraries, as well. >> >> Exactly the same? Let's see. >> >>> Using parallel GotoBLAS on FreeBSD gives terrible performance because the >>> threads keep migrating between CPUs, causing frequent cache misses. [*]-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ >> So Steve reports that if he has Nthr > Ncpu, then some threads are "over-glued" >> to a particular CPU, which results in sub-optimal scheduling for those threads. >> I have to guess that Steve would want to see the threads being shuffled between >> CPUs to produce more even CPU load. > > I'm using OpenMPI. These are N > Ncpu processes not threads, I used 'thread' in a sense of a kernel thread. It shouldn't actually matter if it's a process or a thread in userland in this context. > and without > the loss of generality let N = Ncpu + 1. It is a classic master-slave > situation where 1 process initializes all others. The n-1 slave processes > are then independent of each other. After 20 minutes or so of number > crunching, each slave sends a few 10s of KB of data to the master. The > master collects all the data, writes it to disk, and then sends the > slaves the next set of computations to do. The computations are nearly > identical, so each slave finishes it task in the same amount of time. The > problem appears to be that 2 slaves are bound to the same cpu and the > remaining N - 3 slaves are bound to a specific cpu. The N - 3 slaves > finish their task, send data to the master, and then spin (chewing up > nearly 100% cpu) waiting for the 2 ping-ponging slaves to finishes. > This causes a stall in the computation. When a complete computation > takes days to complete, theses stall become problematic. So, yes, I > want the processes to get a more uniform access to cpus via migration > to other cpus. This is what 4BSD appears to do. I would imagine that periodic rebalancing would take care of this, but probably the ULE rebalancing algorithm is not perfect. There was a suggestion on performance@ to try to use a lower value for kern.sched.steal_thresh, a value of 1 was recommended: http://article.gmane.org/gmane.os.freebsd.performance/3459 >> On the other hand, you report that your threads keep being shuffled between CPUs >> (I presume for Nthr == Ncpu case, where Nthr is a count of the number-crunching >> threads). And I guess that you want them to stay glued to particular CPUs. >> >> So how is this the same problem? In fact, it sounds like somewhat opposite. >> The only thing in common is that you both don't like how ULE works. > > Well, it may be similar in that N - 2 threads are bound to N - 2 > cpus, and the remaining 2 threads are ping ponging on the last It could be, but Nathan has never said this [*] and I also have never seen this in my very limited experiments with GotoBLAS. > remaining cpu. I suspect that GotoBLAS has a large amount > communication between threads, and once again the computations > stalls waiting of the 2 threads to either finish battling for the > 1 cpu or perhaps the process uses pthread_yield() in some clever > way to try to get load balancing. > -- Andriy Gapon