From owner-freebsd-current@FreeBSD.ORG Fri Jan 5 02:31:10 2007 Return-Path: X-Original-To: current@freebsd.org Delivered-To: freebsd-current@FreeBSD.ORG Received: from [127.0.0.1] (localhost [127.0.0.1]) by hub.freebsd.org (Postfix) with ESMTP id 3816316A407; Fri, 5 Jan 2007 02:31:07 +0000 (UTC) (envelope-from davidxu@freebsd.org) Message-ID: <459DB871.1050109@freebsd.org> Date: Fri, 05 Jan 2007 10:31:13 +0800 From: David Xu User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.7.13) Gecko/20061204 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Jeff Roberson References: <20070104005625.D1508@10.0.0.1> <459CCBA1.40305@freebsd.org> <200701050749.49058.davidxu@freebsd.org> <20070104155755.Y552@10.0.0.1> <20070104163449.D552@10.0.0.1> In-Reply-To: <20070104163449.D552@10.0.0.1> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Cc: Kip Macy , current@freebsd.org Subject: Re: ULE 2.0 X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 05 Jan 2007 02:31:10 -0000 Jeff Roberson wrote: > I just looked it up. The solaris algorithm prevents direct preemption > if the priority is not in the kernel or realtime priority range. > Otherwise it simply sets NEEDRESCHED. When assigning a thread to a cpu > it checks to see if the last cpu it executed on is running a higher > priority thread (lower for solaris where higher is better) and if so > puts it there. Otherwise it finds the cpu which is executing the lowest > priority thread and assigns it there. > > This basically balances the cpus by priority rather than by load. > Balancing by priority indirectly balances by load because threads which > have slept for a long time will eventually have a higher priority. This > is pretty smart, although it ends up inspecting per-cpu state of remote > cpus on almost every wakeup. I would expect this to perform badly but > perhaps the improvement in user-space speed outweighs the kernel time. > The lookup does not always happen, there is a per-cpu rechoose value, it might be some ticks, a thread slept long enough (exceed the interval ticks) will end up looking for a cpu which has lower priority can run it, if it can not find one, it will stay at a cpu which has lowest priority but closest to the thread's last cpu, this is done by looking a cpu from leaf to root in a cpu topology tree. if the resumed thread is still within the interval, it will stay at its last cpu. The alogrithm bets that after some ticks, a sleeping thread's cache is trashed by other threads, so it should look for a cpu can run it immediately, but still closest to its last cpu. if a cpu is selected, it will look for its next cpu which is neighbour in a circle link list, check its runqueue length at the priority, and may use the next cpu rather than the cpu returned by above algorithm, the circle link list might link cpus within same locality together, e.g dual-core cpus might be a candidate. there is another path which makes a chip balance, e.g it tries to balance two chips which are multiple-cores, and make them have same load ratio. I am not Solaris expert, so above may not be very accurate. :-) > I'll investigate this furher. One slight problem with this approach for > ULE is that sitting on the run-queue for a long period of time improves > your position on the run-queue but not directly your priority. I think > I can solve this though by resetting the priority as soon as you run > which will take into account the ticks that you were waiting. Another problem in linux and ULE scheduler is they are less history friendly within past load, it only looks the thread itself's run time and sleep time, and don't know the system load, and calculate its priority based on this, under heavy load, it could be very inaccuracy, maybe you still need some anti-unfairness code, for example a thread stayed on a runqueue too long, ULE boosts its priority a bit, though I don't know how to do it in cheapest way. > Anyway, thanks for the pointer Xu. I may hack this up and compare it to > ULE's current balancing. Yesterday, I have tested super-smack benchmark on 2-cpu machine, ULE decreased performance about 40%, this might be a regression though. > > Cheers, > Jeff > Regards, David Xu