From owner-freebsd-hackers@FreeBSD.ORG  Mon Feb 13 20:47:29 2012
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 08F4E106566C;
	Mon, 13 Feb 2012 20:47:29 +0000 (UTC)
	(envelope-from jroberson@jroberson.net)
Received: from mail-iy0-f182.google.com (mail-iy0-f182.google.com
	[209.85.210.182])
	by mx1.freebsd.org (Postfix) with ESMTP id AD5888FC0A;
	Mon, 13 Feb 2012 20:47:28 +0000 (UTC)
Received: by iaeo4 with SMTP id o4so5924184iae.13
	for <multiple recipients>; Mon, 13 Feb 2012 12:47:28 -0800 (PST)
Received: by 10.42.148.197 with SMTP id s5mr24153663icv.45.1329164563257;
	Mon, 13 Feb 2012 12:22:43 -0800 (PST)
Received: from [72.253.42.56] ([72.253.42.56])
	by mx.google.com with ESMTPS id wn7sm15544246igc.0.2012.02.13.12.22.40
	(version=SSLv3 cipher=OTHER); Mon, 13 Feb 2012 12:22:41 -0800 (PST)
Date: Mon, 13 Feb 2012 10:23:36 -1000 (HST)
From: Jeff Roberson <jroberson@jroberson.net>
X-X-Sender: jroberson@desktop
To: Alexander Motin <mav@FreeBSD.org>
In-Reply-To: <4F396B24.5090602@FreeBSD.org>
Message-ID: <alpine.BSF.2.00.1202131012270.2020@desktop>
References: <4F2F7B7F.40508@FreeBSD.org> <4F366E8F.9060207@FreeBSD.org>
	<4F367965.6000602@FreeBSD.org> <4F396B24.5090602@FreeBSD.org>
User-Agent: Alpine 2.00 (BSF 1167 2008-08-23)
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Gm-Message-State: ALoCoQnzahpLMcaV9KuOSHtnnvjREZ3TDUQmOTGln2MnFGZOIYQbMg+hbwtFl2beC/IU5DNMh1Zn
X-Mailman-Approved-At: Mon, 13 Feb 2012 20:49:42 +0000
Cc: freebsd-hackers@FreeBSD.org, Florian Smeets <flo@FreeBSD.org>,
	Andriy Gapon <avg@FreeBSD.org>
Subject: Re: [RFT][patch] Scheduling for HTT and not only
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 13 Feb 2012 20:47:29 -0000

On Mon, 13 Feb 2012, Alexander Motin wrote:

> On 02/11/12 16:21, Alexander Motin wrote:
>> I've heavily rewritten the patch already. So at least some of the ideas
>> are already addressed. :) At this moment I am mostly satisfied with
>> results and after final tests today I'll probably publish new version.
>
> It took more time, but finally I think I've put pieces together:
> http://people.freebsd.org/~mav/sched.htt23.patch

I need some time to read and digest this.  However, at first glance, a 
global pickcpu lock will not be acceptable.  Better to make a rarely 
imperfect decision than too often cause contention.

>
> The patch is more complicated then previous one both logically and 
> computationally, but with growing CPU power and complexity I think we can 
> possibly spend some more time deciding how to spend time. :)
>

It is probably worth more cycles but we need to evaluate this much more 
complex algorithm carefully to make sure that each of these new features 
provides an advantage.

> Patch formalizes several ideas of the previous code about how to select CPU 
> for running a thread and adds some new. It's main idea is that I've moved 
> from comparing raw integer queue lengths to higher-resolution flexible 
> values. That additional 8-bit precision allows same time take into account 
> many factors affecting performance. Beside just choosing best from 
> equally-loaded CPUs, with new code it may even happen that because of SMT, 
> cache affinity, etc, CPU with more threads on it's queue will be reported as 
> less loaded and opposite.
>
> New code takes into account such factors:
> - SMT sharing penalty.
> - Cache sharing penalty.
> - Cache affinity (with separate coefficients for last-level and other level 
> caches) to the:

We already used separate affinity values for different cache levels.  Keep 
in mind that if something else has run on a core the cache affinity is 
lost in very short order.  Trying too hard to preserve it beyond a few ms 
never seems to pan out.

>  - other running threads of it's process,

This is not really a great indicator of whether things should be scheduled 
together or not.  What workload are you targeting here?

>  - previous CPU where it was running,
>  - current CPU (usually where it was called from).

These two were also already used.  Additionally:

+				 * Hide part of the current thread
+				 * load, hoping it or the scheduled
+				 * one complete soon.
+				 * XXX: We need more stats for this.

I had something like this before.  Unfortunately interactive tasks are 
allowed fairly aggressive bursts of cpu to account for things like xorg 
and web browsers.  Also, I tried this for ithreads but they can be very 
expensive in some workloads so other cpus will idle as you try to schedule 
behind an ithread.


> All of these factors are configurable via sysctls, but I think reasonable 
> defaults should fit most.
>
> Also, comparing to previous patch, I've resurrected optimized shortcut in CPU 
> selection for the case of SMT. Comparing to original code having problems 
> with this, I've added check for other logical cores load that should make it 
> safe and still very fast when there are less running threads then physical 
> cores.
>
> I've tested in on Core i7 and Atom systems, but more interesting would be to 
> test it on multi-socket system with properly detected topology to check 
> benefits from affinity.
>
> At this moment the main issue I see is that this patch affects only time when 
> thread is starting. If thread runs continuously, it will stay where it was, 
> even if due to situation change that is not very effective (causes SMT 
> sharing, etc). I haven't looked much on periodic load balancer yet, but 
> probably it could also be somehow improved.
>
> What is your opinion, is it too over-engineered, or it is the right way to 
> go?

I think it's a little too much change all at once.  I also believe that 
the changes that try very hard to preserve affinity likely help a much 
smaller number of cases than they hurt.  I would prefer you do one piece 
at a time and validate each step.  There are a lot of good ideas in here 
but good ideas don't always turn into results.

Thanks,
Jeff


>
> -- 
> Alexander Motin
>