From owner-freebsd-current@FreeBSD.ORG Tue Jul 12 08:05:21 2011 Return-Path: Delivered-To: freebsd-current@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B5781106564A for ; Tue, 12 Jul 2011 08:05:21 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 056888FC14 for ; Tue, 12 Jul 2011 08:05:20 +0000 (UTC) Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id LAA07126; Tue, 12 Jul 2011 11:05:16 +0300 (EEST) (envelope-from avg@FreeBSD.org) Received: from localhost ([127.0.0.1]) by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1QgXy0-000PkH-0Y; Tue, 12 Jul 2011 11:05:16 +0300 Message-ID: <4E1C003B.4090604@FreeBSD.org> Date: Tue, 12 Jul 2011 11:05:15 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:5.0) Gecko/20110706 Thunderbird/5.0 MIME-Version: 1.0 To: Steve Kargl References: <20110706170132.GA68775@troutmask.apl.washington.edu> <5080.1309971941@critter.freebsd.dk> <20110706180001.GA69157@troutmask.apl.washington.edu> <4E14A54A.4050106@freebsd.org> <4E155FF9.5090905@FreeBSD.org> <20110707151440.GA75537@troutmask.apl.washington.edu> <4E160C2F.8020001@FreeBSD.org> <20110707200845.GA77049@troutmask.apl.washington.edu> <4E1B1198.6090308@FreeBSD.org> <20110711161654.GA97361@troutmask.apl.washington.edu> In-Reply-To: <20110711161654.GA97361@troutmask.apl.washington.edu> X-Enigmail-Version: 1.2pre Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: freebsd-current@FreeBSD.org Subject: Re: Heavy I/O blocks FreeBSD box for several seconds X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 12 Jul 2011 08:05:21 -0000 on 11/07/2011 19:16 Steve Kargl said the following: > On Mon, Jul 11, 2011 at 06:07:04PM +0300, Andriy Gapon wrote: >> But it's not clear which of the processes are slaves and which is master. >> It's also not clear why the master takes so much CPU (on par with the >> slaves) - >> from my reading of its description (by Steve) it should be doing only light >> periodic work. > > These are all slave processes. The master process was on a different > node in the cluster. Each process is doing the exact same computation > with only a small change in a coordinate from (x,y,z) to (x,y+n*dy,z) > with n = 1, 2, 3, 4. The small change does not causes a different > code path, so all should complete in nearly identical times. OK, the situation is much clearer (to me) now. >> If it does have to do CPU-heavy work, then I'd imagine that it should >> spawn only Ncpus - 1 slaves. > > And if you have M users on the system? Also note, you can get the > exact same loading problem by launching Ncpu+1 completely independent > cpu-bound processes. Ncpu-1 processes will be bound to specific cpus > and 2 processes will ping-pong on one cpu. This ping-ponging will > simply kill performance. I'd still argue that if someone cares about doing some calculations as fast as possible then he shouldn't have more than Ncpu CPU-bound processes. How to achieve that is a technical/administrative issue. But nevertheless I now see what the problem is. I think that the best thing you can further provide (as objective evidence for the problem at hand) is ktr(4) traces for at least KTR_SCHED mask. Perhaps you even already have them from your previous sessions with Jeff. P.S. This is not a promise to actually debug this issue based on the traces :-) -- Andriy Gapon