From owner-freebsd-current@FreeBSD.ORG Thu Jul 7 23:13:43 2011 Return-Path: Delivered-To: freebsd-current@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id AD45E106566C; Thu, 7 Jul 2011 23:13:43 +0000 (UTC) (envelope-from ohartman@zedat.fu-berlin.de) Received: from outpost1.zedat.fu-berlin.de (outpost1.zedat.fu-berlin.de [130.133.4.66]) by mx1.freebsd.org (Postfix) with ESMTP id 646018FC0A; Thu, 7 Jul 2011 23:13:42 +0000 (UTC) Received: from inpost2.zedat.fu-berlin.de ([130.133.4.69]) by outpost1.zedat.fu-berlin.de (Exim 4.69) with esmtp (envelope-from ) id <1QexlN-0005TK-Eg>; Fri, 08 Jul 2011 01:13:41 +0200 Received: from e178023242.adsl.alicedsl.de ([85.178.23.242] helo=thor.walstatt.dyndns.org) by inpost2.zedat.fu-berlin.de (Exim 4.69) with esmtpsa (envelope-from ) id <1QexlN-0003Oa-B4>; Fri, 08 Jul 2011 01:13:41 +0200 Message-ID: <4E163DA4.6060505@zedat.fu-berlin.de> Date: Fri, 08 Jul 2011 01:13:40 +0200 From: "Hartmann, O." User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:5.0) Gecko/20110630 Thunderbird/5.0 MIME-Version: 1.0 To: Andriy Gapon References: <20110706170132.GA68775@troutmask.apl.washington.edu> <5080.1309971941@critter.freebsd.dk> <20110706180001.GA69157@troutmask.apl.washington.edu> <4E14A54A.4050106@freebsd.org> <4E155FF9.5090905@FreeBSD.org> In-Reply-To: <4E155FF9.5090905@FreeBSD.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Originating-IP: 85.178.23.242 Cc: "freebsd-performance@freebsd.org" , FreeBSD Current , Nathan Whitehorn , Steve Kargl Subject: Re: Heavy I/O blocks FreeBSD box for several seconds X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 07 Jul 2011 23:13:43 -0000 On 07/07/11 09:27, Andriy Gapon wrote: > on 06/07/2011 21:11 Nathan Whitehorn said the following: >> On 07/06/11 13:00, Steve Kargl wrote: >>> AFAICT, it is a cpu affinity issue. If I launch n+1 MPI images >>> on a system with n cpus/cores, then 2 (and sometimes 3) images >>> are stuck on a cpu and those 2 (or 3) images ping-pong on that >>> cpu. I recall trying to use renice(8) to force some load >>> balancing, but vaguely remember that it did not help. >> I've seen exactly this problem with multi-threaded math libraries, as well. > Exactly the same? Let's see. > >> Using parallel GotoBLAS on FreeBSD gives terrible performance because the >> threads keep migrating between CPUs, causing frequent cache misses. > So Steve reports that if he has Nthr> Ncpu, then some threads are "over-glued" > to a particular CPU, which results in sub-optimal scheduling for those threads. > I have to guess that Steve would want to see the threads being shuffled between > CPUs to produce more even CPU load. > > On the other hand, you report that your threads keep being shuffled between CPUs > (I presume for Nthr == Ncpu case, where Nthr is a count of the number-crunching > threads). And I guess that you want them to stay glued to particular CPUs. > > So how is this the same problem? In fact, it sounds like somewhat opposite. > The only thing in common is that you both don't like how ULE works. > > ULE has many knobs to tune its behavior. Unfortunately they are not very well > documented and there are too many of them. So, it's not easy to find which > combination would be the best for a particular work-load. In your particular > case you might want to try to increase value of kern.sched.affinity to increase > affinity of threads to their CPUs. Not all of those using FreeBSD are developer or experts, even experts of a very specific area of computer science and engineering or a particular subject of the FreeBSD kernel and its techniques of scheduling. I'm not capable of tuning my servers via a lot of undocumented knobs, I'm sorry. I'd like to do if there would be a kind of howto (handbook?). > > Also, please note that FreeBSD support in GotoBLAS is not equivalent to Linux > support as I have pointed out before. On Linux they bind their threads to CPUs > to avoid the situation that you describe. Apparently they didn't know how to do > CPU-binding on FreeBSD, so this is not implemented. You may have a motivation > to help them out with this. >