From owner-freebsd-stable@FreeBSD.ORG Thu Dec 22 16:31:09 2011 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 0B0101065691; Thu, 22 Dec 2011 16:31:09 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu [128.95.76.21]) by mx1.freebsd.org (Postfix) with ESMTP id D9A478FC0C; Thu, 22 Dec 2011 16:31:08 +0000 (UTC) Received: from troutmask.apl.washington.edu (localhost.apl.washington.edu [127.0.0.1]) by troutmask.apl.washington.edu (8.14.5/8.14.5) with ESMTP id pBMGV772033805; Thu, 22 Dec 2011 08:31:07 -0800 (PST) (envelope-from sgk@troutmask.apl.washington.edu) Received: (from sgk@localhost) by troutmask.apl.washington.edu (8.14.5/8.14.5/Submit) id pBMGV6Qm033804; Thu, 22 Dec 2011 08:31:06 -0800 (PST) (envelope-from sgk) Date: Thu, 22 Dec 2011 08:31:06 -0800 From: Steve Kargl To: Luigi Rizzo Message-ID: <20111222163106.GA33689@troutmask.apl.washington.edu> References: <4EE1EAFE.3070408@m5p.com> <20111215215554.GA87606@troutmask.apl.washington.edu> <20111222005250.GA23115@troutmask.apl.washington.edu> <20111222103145.GA42457@onelab2.iet.unipi.it> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20111222103145.GA42457@onelab2.iet.unipi.it> User-Agent: Mutt/1.4.2.3i Cc: Attilio Rao , Andrey Chernov , George Mitchell , Doug Barton , freebsd-stable@freebsd.org Subject: Re: SCHED_ULE should not be the default X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 22 Dec 2011 16:31:09 -0000 On Thu, Dec 22, 2011 at 11:31:45AM +0100, Luigi Rizzo wrote: > On Wed, Dec 21, 2011 at 04:52:50PM -0800, Steve Kargl wrote: >> >> I have placed several files at >> >> http://troutmask.apl.washington.edu/~kargl/freebsd >> >> dmesg.txt --> dmesg for ULE kernel >> summary --> A summary that includes top(1) output of all runs. >> sysctl.ule.txt --> sysctl -a for the ULE kernel >> ktr-ule-problem-kargl.out.gz >> >> >> Since time is executed on the master, only the 'real' time is of >> interest (the summary file includes user and sys times). This >> command is run at 5 times for each N value and up to 10 time for >> some N values with the ULE kernel. The following table records >> the average 'real' time and the number in (...) is the mean >> absolute deviations. >> >> # N ULE 4BSD >> # ------------------------------------- >> # 4 223.27 (0.502) 221.76 (0.551) >> # 5 404.35 (73.82) 270.68 (0.866) >> # 6 627.56 (173.0) 247.23 (1.442) >> # 7 475.53 (84.07) 285.78 (1.421) >> # 8 429.45 (134.9) 223.64 (1.316) > > One explanation for taking 1.5-2x times is that with ULE the > threads are not migrated properly, so you end up with idle cores > and ready threads not running That's what I guessed back in 2008 when I first reported the behavior. http://freebsd.monkey.org/freebsd-current/200807/msg00278.html http://freebsd.monkey.org/freebsd-current/200807/msg00280.html The top(1) output at the above URL shows 10 completely independent instances of the same numerically intensive application running on a circa 2008 ULE kernel. Look at the PRI column. The high PRI jobs are not only pinned to a cpu, but these are running at 100% WCPU. The low PRI jobs seem to be pinned to a subset of the available cpus and simply ping-pong in and out of the same cpus. In this instance, there are 5 jobs competing for time on 3 cpus. > Also, perhaps one could build a simple test process that replicates > this workload (so one can run it as part of regression tests): > 1. define a CPU-intensive function f(n) which issues no > system calls, optionally touching > a lot of memory, where n determines the number of iterations. > 2. by trial and error (or let the program find it), > pick a value N1 so that the minimum execution time > of f(N1) is in the 10..100ms range > 3. now run the function f() again from an outer loop so > that the total execution time is large (10..100s) > again with no intervening system calls. > 4. use an external shell script can rerun a process > when it terminates, and then run multiple instances > in parallel. Instead of the external script one could > fork new instances before terminating, but i am a bit > unclear how CPU inheritance works when a process forks. > Going through the shell possibly breaks the chain. The tests at the above URL does essentially what you propose except in 2008 the kzk90 programs were doing some IO. -- Steve