From owner-freebsd-current@FreeBSD.ORG  Fri Jan 19 22:07:08 2007
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
X-Original-To: current@freebsd.org
Delivered-To: freebsd-current@FreeBSD.ORG
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 80DFB16A404
	for <current@freebsd.org>; Fri, 19 Jan 2007 22:07:08 +0000 (UTC)
	(envelope-from jroberson@chesapeake.net)
Received: from webaccess-cl.virtdom.com (webaccess-cl.virtdom.com
	[216.240.101.25])
	by mx1.freebsd.org (Postfix) with ESMTP id 5910213C448
	for <current@freebsd.org>; Fri, 19 Jan 2007 22:07:08 +0000 (UTC)
	(envelope-from jroberson@chesapeake.net)
Received: from [10.0.0.1] (63-226-247-187.tukw.qwest.net [63.226.247.187])
	(authenticated bits=0)
	by webaccess-cl.virtdom.com (8.13.6/8.13.6) with ESMTP id
	l0JM74bP036281
	(version=TLSv1/SSLv3 cipher=DHE-DSS-AES256-SHA bits=256 verify=NO)
	for <current@freebsd.org>; Fri, 19 Jan 2007 17:07:06 -0500 (EST)
	(envelope-from jroberson@chesapeake.net)
Date: Fri, 19 Jan 2007 14:07:21 -0800 (PST)
From: Jeff Roberson <jroberson@chesapeake.net>
X-X-Sender: jroberson@10.0.0.1
To: current@freebsd.org
Message-ID: <20070119135849.D558@10.0.0.1>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: 
Subject: Improved ULE load balancing.
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
	<freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>, 
	<mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 19 Jan 2007 22:07:08 -0000

I'd like those of you that reported relatively poor SMP performance on ULE 
to update to revision 1.179.  This improved performance on my dual xeon to 
about 10% better than 4BSD running supersmack.  It is also highly tunable. 
Some options of interest:

kern.sched. :
pick_pri - The default is on.  Turning this off will revert to the older 
algorithm which is now called pickidle.  pick_pri tries to always run the 
highest priority threads.  pickidle really just tries to balance cpu load 
and doesn't take priority into consideration.

pick_pri_affinity - Number of ticks a thread has slept for before we stop 
considering it as having affinity for a given cpu.

busy_thresh - Length of run queue allowed before idle cpus will try to 
steal some of our work.  This defaults to 4 but on some workloads I see 
improvement with values as low as 2.

ipi_thresh - Priorities below this generate IPIs to preempt the target 
cpu.  Can decrease latency for some workloads but at the expense of extra 
context switches and interrupt overhead.

The default configuration was fastest on the most workloads on my 8way 
opteron and 2x xeon (+2xHTT).  I tested parallel compiles and super-smack 
with select-key.smack doing different workloads on both machines and with 
different numbers of processors enabled on the 8way opteron.  The opteron 
in 8way mode shows about 300% speedup compared to 4BSD on super-smack. 
compile times are nearly identical across all schedulers and platforms.  I 
get a more modest 5-10% faster on super-smack on my xeon running 
super-smack depending on the configuration.

Please report back your findings.  Hopefully with the tunables present I 
can experiment and get the settings ride for a wide array of machines.

Thanks,
Jeff

---------- Forwarded message ----------
Date: Fri, 19 Jan 2007 21:56:08 +0000 (UTC)
From: Jeff Roberson <jeff@FreeBSD.org>
To: src-committers@FreeBSD.org, cvs-src@FreeBSD.org, cvs-all@FreeBSD.org
Subject: cvs commit: src/sys/kern sched_ule.c

jeff        2007-01-19 21:56:08 UTC

   FreeBSD src repository

   Modified files:
     sys/kern             sched_ule.c
   Log:
   Major revamp of ULE's cpu load balancing:
    - Switch back to direct modification of remote CPU run queues.  This added
      a lot of complexity with questionable gain.  It's easy enough to
      reimplement if it's shown to help on huge machines.
    - Re-implement the old tdq_transfer() call as tdq_pickidle().  Change
      sched_add() so we have selectable cpu choosers and simplify the logic
      a bit here.
    - Implement tdq_pickpri() as the new default cpu chooser.  This algorithm
      is similar to Solaris in that it tries to always run the threads with
      the best priorities.  It is actually slightly more complex than
      solaris's algorithm because we also tend to favor the local cpu over
      other cpus which has a boost in latency but also potentially enables
      cache sharing between the waking thread and the woken thread.
    - Add a bunch of tunables that can be used to measure effects of different
      load balancing strategies.  Most of these will go away once the
      algorithm is more definite.
    - Add a new mechanism to steal threads from busy cpus when we idle.  This
      is enabled with kern.sched.steal_busy and kern.sched.busy_thresh.  The
      threshold is the required length of a tdq's run queue before another
      cpu will be able to steal runnable threads.  This prevents most queue
      imbalances that contribute the long latencies.

   Revision  Changes    Path
   1.179     +293 -240  src/sys/kern/sched_ule.c