From owner-freebsd-questions@FreeBSD.ORG  Wed Jul  6 18:00:02 2011
Return-Path: <owner-freebsd-questions@FreeBSD.ORG>
Delivered-To: freebsd-questions@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 3722E106564A;
	Wed,  6 Jul 2011 18:00:02 +0000 (UTC)
	(envelope-from sgk@troutmask.apl.washington.edu)
Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu
	[128.95.76.21])
	by mx1.freebsd.org (Postfix) with ESMTP id 174C98FC1F;
	Wed,  6 Jul 2011 18:00:02 +0000 (UTC)
Received: from troutmask.apl.washington.edu (localhost.apl.washington.edu
	[127.0.0.1])
	by troutmask.apl.washington.edu (8.14.4/8.14.4) with ESMTP id
	p66I01CY069240; Wed, 6 Jul 2011 11:00:01 -0700 (PDT)
	(envelope-from sgk@troutmask.apl.washington.edu)
Received: (from sgk@localhost)
	by troutmask.apl.washington.edu (8.14.4/8.14.4/Submit) id
	p66I01sb069239; Wed, 6 Jul 2011 11:00:01 -0700 (PDT)
	(envelope-from sgk)
Date: Wed, 6 Jul 2011 11:00:01 -0700
From: Steve Kargl <sgk@troutmask.apl.washington.edu>
To: Poul-Henning Kamp <phk@phk.freebsd.dk>
Message-ID: <20110706180001.GA69157@troutmask.apl.washington.edu>
References: <20110706170132.GA68775@troutmask.apl.washington.edu>
	<5080.1309971941@critter.freebsd.dk>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <5080.1309971941@critter.freebsd.dk>
User-Agent: Mutt/1.4.2.3i
Cc: FreeBSD Current <freebsd-current@freebsd.org>, "Hartmann,
	O." <ohartman@zedat.fu-berlin.de>,
	arrowdodger <6yearold@gmail.com>, freebsd-questions@freebsd.org
Subject: Re: Heavy I/O blocks FreeBSD box for several seconds
X-BeenThere: freebsd-questions@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: User questions <freebsd-questions.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-questions>, 
	<mailto:freebsd-questions-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-questions>
List-Post: <mailto:freebsd-questions@freebsd.org>
List-Help: <mailto:freebsd-questions-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-questions>, 
	<mailto:freebsd-questions-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 06 Jul 2011 18:00:02 -0000

On Wed, Jul 06, 2011 at 05:05:41PM +0000, Poul-Henning Kamp wrote:
> In message <20110706170132.GA68775@troutmask.apl.washington.edu>, Steve Kargl w
> rites:
> 
> >I periodically ran the same type test in the 2008 post over the
> >last three years.  Nothing has changed.  I even set up an account
> >on one node in my cluster for jeffr to use.  He was too busy to
> >investigate at that time.
> 
> Isn't this just the lemming-syncer hurling every dirty block over
> the cliff at the same time ?

I don't know the answer.  Of course, having no experience in
processing scheduling, I don't understand the question either ;-)

AFAICT, it is a cpu affinity issue.  If I launch n+1 MPI images
on a system with n cpus/cores, then 2 (and sometimes 3) images
are stuck on a cpu and those 2 (or 3) images ping-pong on that
cpu.  I recall trying to use renice(8) to force some load 
balancing, but vaguely remember that it did not help.

> To find out:  Run gstat and keep and eye on the leftmost column
> 
> The road map for fixing that has been known for years...

I'll keep this in mind, the next time I upgrade the cluster.
It's currently running a Feb 10th vintage kernel, and is
under fairly heavy use at the moment.

-- 
Steve