From owner-freebsd-stable@FreeBSD.ORG  Thu Dec 22 16:31:09 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 0B0101065691;
	Thu, 22 Dec 2011 16:31:09 +0000 (UTC)
	(envelope-from sgk@troutmask.apl.washington.edu)
Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu
	[128.95.76.21])
	by mx1.freebsd.org (Postfix) with ESMTP id D9A478FC0C;
	Thu, 22 Dec 2011 16:31:08 +0000 (UTC)
Received: from troutmask.apl.washington.edu (localhost.apl.washington.edu
	[127.0.0.1])
	by troutmask.apl.washington.edu (8.14.5/8.14.5) with ESMTP id
	pBMGV772033805; Thu, 22 Dec 2011 08:31:07 -0800 (PST)
	(envelope-from sgk@troutmask.apl.washington.edu)
Received: (from sgk@localhost)
	by troutmask.apl.washington.edu (8.14.5/8.14.5/Submit) id
	pBMGV6Qm033804; Thu, 22 Dec 2011 08:31:06 -0800 (PST)
	(envelope-from sgk)
Date: Thu, 22 Dec 2011 08:31:06 -0800
From: Steve Kargl <sgk@troutmask.apl.washington.edu>
To: Luigi Rizzo <rizzo@iet.unipi.it>
Message-ID: <20111222163106.GA33689@troutmask.apl.washington.edu>
References: <4EE1EAFE.3070408@m5p.com>
	<CAJ-FndBSOS3hKYqmPnVkoMhPmowBBqy9-+eJJEMTdoVjdMTEdw@mail.gmail.com>
	<20111215215554.GA87606@troutmask.apl.washington.edu>
	<CAJ-FndD0vFWUnRPxz6CTR5JBaEaY3gh9y7-Dy6Gds69_aRgfpg@mail.gmail.com>
	<20111222005250.GA23115@troutmask.apl.washington.edu>
	<20111222103145.GA42457@onelab2.iet.unipi.it>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20111222103145.GA42457@onelab2.iet.unipi.it>
User-Agent: Mutt/1.4.2.3i
Cc: Attilio Rao <attilio@freebsd.org>, Andrey Chernov <ache@nagual.pp.ru>,
	George Mitchell <george+freebsd@m5p.com>,
	Doug Barton <dougb@freebsd.org>, freebsd-stable@freebsd.org
Subject: Re: SCHED_ULE should not be the default
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 22 Dec 2011 16:31:09 -0000

On Thu, Dec 22, 2011 at 11:31:45AM +0100, Luigi Rizzo wrote:
> On Wed, Dec 21, 2011 at 04:52:50PM -0800, Steve Kargl wrote:
>> 
>> I have placed several files at
>> 
>> http://troutmask.apl.washington.edu/~kargl/freebsd
>> 
>> dmesg.txt      --> dmesg for ULE kernel
>> summary        --> A summary that includes top(1) output of all runs.
>> sysctl.ule.txt --> sysctl -a for the ULE kernel
>> ktr-ule-problem-kargl.out.gz 
>> 
>> 
>> Since time is executed on the master, only the 'real' time is of
>> interest (the summary file includes user and sys times).  This
>> command is run at 5 times for each N value and up to 10 time for
>> some N values with the ULE kernel.  The following table records
>> the average 'real' time and the number in (...) is the mean
>> absolute deviations. 
>> 
>> #  N         ULE             4BSD
>> # -------------------------------------
>> #  4    223.27 (0.502)   221.76 (0.551)
>> #  5    404.35 (73.82)   270.68 (0.866)
>> #  6    627.56 (173.0)   247.23 (1.442)
>> #  7    475.53 (84.07)   285.78 (1.421)
>> #  8    429.45 (134.9)   223.64 (1.316)
> 
> One explanation for taking 1.5-2x times is that with ULE the
> threads are not migrated properly, so you end up with idle cores
> and ready threads not running

That's what I guessed back in 2008 when I first reported the
behavior.  

http://freebsd.monkey.org/freebsd-current/200807/msg00278.html
http://freebsd.monkey.org/freebsd-current/200807/msg00280.html

The top(1) output at the above URL shows 10 completely independent
instances of the same numerically intensive application running
on a circa 2008 ULE kernel.  Look at the PRI column.  The high
PRI jobs are not only pinned to a cpu, but these are running at
100% WCPU.  The low PRI jobs seem to be pinned to a subset of the
available cpus and simply ping-pong in and out of the same cpus.
In this instance, there are 5 jobs competing for time on 3 cpus.

> Also, perhaps one could build a simple test process that replicates
> this workload (so one can run it as part of regression tests):
> 	1. define a CPU-intensive function f(n) which issues no
> 	   system calls, optionally touching
> 	   a lot of memory, where n  determines the number of iterations.
> 	2. by trial and error (or let the program find it),
> 	   pick a value N1 so that the minimum execution time
> 	   of f(N1) is in the 10..100ms range
> 	3. now run the function f() again from an outer loop so
> 	   that the total execution time is large (10..100s)
> 	   again with no intervening system calls.
> 	4. use an external shell script can rerun a process
> 	   when it terminates, and then run multiple instances
> 	   in parallel. Instead of the external script one could
> 	   fork new instances before terminating, but i am a bit
> 	   unclear how CPU inheritance works when a process forks.
> 	   Going through the shell possibly breaks the chain.

The tests at the above URL does essentially what you
propose except in 2008 the kzk90 programs were doing 
some IO.

-- 
Steve