From owner-freebsd-current@FreeBSD.ORG Tue Jul 15 17:59:45 2008 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7FAFC106564A for ; Tue, 15 Jul 2008 17:59:45 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu [128.208.78.105]) by mx1.freebsd.org (Postfix) with ESMTP id 49AF18FC13 for ; Tue, 15 Jul 2008 17:59:45 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (localhost.apl.washington.edu [127.0.0.1]) by troutmask.apl.washington.edu (8.14.2/8.14.2) with ESMTP id m6FHxjO7081068 for ; Tue, 15 Jul 2008 10:59:45 -0700 (PDT) (envelope-from sgk@troutmask.apl.washington.edu) Received: (from sgk@localhost) by troutmask.apl.washington.edu (8.14.2/8.14.2/Submit) id m6FHxiMl081067 for freebsd-current@freebsd.org; Tue, 15 Jul 2008 10:59:44 -0700 (PDT) (envelope-from sgk) Date: Tue, 15 Jul 2008 10:59:44 -0700 From: Steve Kargl To: freebsd-current@freebsd.org Message-ID: <20080715175944.GA80901@troutmask.apl.washington.edu> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.4.2.3i Subject: ULE scheduling oddity X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 15 Jul 2008 17:59:45 -0000 It appears that the ULE scheduler is not providing a fair slice to running processes. I have a dual-cpu, quad-core opteron based system with node21:kargl[229] uname -a FreeBSD node21.cimu.org 8.0-CURRENT FreeBSD 8.0-CURRENT #3: Wed Jun 4 16:22:49 PDT 2008 kargl@node10.cimu.org:src/sys/HPC amd64 If I start exactly 8 processes, each gets 100% WCPU according to top. If I add to additional processes, then I observe last pid: 3874; load averages: 9.99, 9.76, 9.43 up 0+19:54:44 10:51:18 41 processes: 11 running, 30 sleeping CPU: 100% user, 0.0% nice, 0.0% system, 0.0% interrupt, 0.0% idle Mem: 5706M Active, 8816K Inact, 169M Wired, 84K Cache, 108M Buf, 25G Free Swap: 4096M Total, 4096M Free PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND 3836 kargl 1 118 0 577M 572M CPU7 7 6:37 100.00% kzk90 3839 kargl 1 118 0 577M 572M CPU2 2 6:36 100.00% kzk90 3849 kargl 1 118 0 577M 572M CPU3 3 6:33 100.00% kzk90 3852 kargl 1 118 0 577M 572M CPU0 0 6:25 100.00% kzk90 3864 kargl 1 118 0 577M 572M RUN 1 6:24 100.00% kzk90 3858 kargl 1 112 0 577M 572M RUN 5 4:10 78.47% kzk90 3855 kargl 1 110 0 577M 572M CPU5 5 4:29 67.97% kzk90 3842 kargl 1 110 0 577M 572M CPU4 4 4:24 66.70% kzk90 3846 kargl 1 107 0 577M 572M RUN 6 3:22 53.96% kzk90 3861 kargl 1 107 0 577M 572M CPU6 6 3:15 53.37% kzk90 I would have expected to see a more evenly distributed WCPU of around 80% for each process. So, do I need to tune one or more of the following sysctl values? Is this a side effect of cpu affinity being a tad too aggressive? node21:kargl[231] sysctl -a | grep sched | more kern.sched.preemption: 1 kern.sched.steal_thresh: 3 kern.sched.steal_idle: 1 kern.sched.steal_htt: 1 kern.sched.balance_interval: 133 kern.sched.balance: 1 kern.sched.affinity: 1 kern.sched.idlespinthresh: 4 kern.sched.idlespins: 10000 kern.sched.static_boost: 160 kern.sched.preempt_thresh: 64 kern.sched.interact: 30 kern.sched.slice: 13 kern.sched.name: ULE -- Steve