From owner-freebsd-current@FreeBSD.ORG  Fri Jan  5 02:31:10 2007
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
X-Original-To: current@freebsd.org
Delivered-To: freebsd-current@FreeBSD.ORG
Received: from [127.0.0.1] (localhost [127.0.0.1])
	by hub.freebsd.org (Postfix) with ESMTP id 3816316A407;
	Fri,  5 Jan 2007 02:31:07 +0000 (UTC)
	(envelope-from davidxu@freebsd.org)
Message-ID: <459DB871.1050109@freebsd.org>
Date: Fri, 05 Jan 2007 10:31:13 +0800
From: David Xu <davidxu@freebsd.org>
User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.7.13) Gecko/20061204
X-Accept-Language: en-us, en
MIME-Version: 1.0
To: Jeff Roberson <jroberson@chesapeake.net>
References: <20070104005625.D1508@10.0.0.1>
	<459CCBA1.40305@freebsd.org>	<b1fa29170701041131o287ee0cflda63eae9528f68dc@mail.gmail.com>	<200701050749.49058.davidxu@freebsd.org>
	<20070104155755.Y552@10.0.0.1> <20070104163449.D552@10.0.0.1>
In-Reply-To: <20070104163449.D552@10.0.0.1>
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit
Cc: Kip Macy <kip.macy@gmail.com>, current@freebsd.org
Subject: Re: ULE 2.0
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
	<freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>, 
	<mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 05 Jan 2007 02:31:10 -0000

Jeff Roberson wrote:


> I just looked it up.  The solaris algorithm prevents direct preemption 
> if the priority is not in the kernel or realtime priority range.  
> Otherwise it simply sets NEEDRESCHED.  When assigning a thread to a cpu 
> it checks to see if the last cpu it executed on is running a higher 
> priority thread (lower for solaris where higher is better) and if so 
> puts it there. Otherwise it finds the cpu which is executing the lowest 
> priority thread and assigns it there.
> 
> This basically balances the cpus by priority rather than by load. 
> Balancing by priority indirectly balances by load because threads which 
> have slept for a long time will eventually have a higher priority.  This 
> is pretty smart, although it ends up inspecting per-cpu state of remote 
> cpus on almost every wakeup.  I would expect this to perform badly but 
> perhaps the improvement in user-space speed outweighs the kernel time.
> 
The lookup does not always happen, there is a per-cpu rechoose value, it
might be some ticks, a thread slept long enough (exceed the interval
ticks) will end up looking for a cpu which has lower priority can run
it, if it can not find one, it will stay at a cpu which has lowest 
priority but closest to the thread's last cpu, this is done by looking
a cpu from leaf to root in a cpu topology tree. if the resumed thread
is still within the interval, it will stay at its last cpu.
The alogrithm bets that after some ticks, a sleeping thread's cache is 
trashed by other threads, so it should look for a cpu can run it
immediately, but still closest to its last cpu. if a cpu is selected,
it will look for its next cpu which is neighbour in a circle link list,
check its runqueue length at the priority, and may use the next cpu
rather than the cpu returned by above algorithm, the circle link list
might link cpus within same locality together, e.g dual-core cpus
might be a candidate. there is another path which makes a chip balance, 
e.g it tries to balance two chips which are multiple-cores, and make 
them have same load ratio. I am not Solaris expert, so above may not be
very accurate. :-)

> I'll investigate this furher.  One slight problem with this approach for 
> ULE is that sitting on the run-queue for a long period of time improves 
> your position on the run-queue but not directly your priority.  I think 
> I can solve this though by resetting the priority as soon as you run 
> which will take into account the ticks that you were waiting.

Another problem in linux and ULE scheduler is they are less history
friendly within past load, it only looks the thread itself's run time
and sleep time, and don't know the system load, and calculate its 
priority based on this, under heavy load, it could be very inaccuracy,
maybe you still need some anti-unfairness code, for example a thread
stayed on a runqueue too long, ULE boosts its priority a bit, though I
don't know how to do it in cheapest way.

> Anyway, thanks for the pointer Xu.  I may hack this up and compare it to 
> ULE's current balancing.

Yesterday, I have tested super-smack benchmark on 2-cpu machine, ULE
decreased performance about 40%, this might be a regression though.

> 
> Cheers,
> Jeff
> 

Regards,
David Xu