From owner-freebsd-hackers@FreeBSD.ORG  Sat Mar  3 12:54:22 2012
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id A631C1065825;
	Sat,  3 Mar 2012 12:54:22 +0000 (UTC)
	(envelope-from mavbsd@gmail.com)
Received: from mail-bk0-f54.google.com (mail-bk0-f54.google.com
	[209.85.214.54])
	by mx1.freebsd.org (Postfix) with ESMTP id AABD58FC12;
	Sat,  3 Mar 2012 12:54:21 +0000 (UTC)
Received: by bkcjc3 with SMTP id jc3so2911666bkc.13
	for <multiple recipients>; Sat, 03 Mar 2012 04:54:20 -0800 (PST)
Received-SPF: pass (google.com: domain of mavbsd@gmail.com designates
	10.204.157.5 as permitted sender) client-ip=10.204.157.5; 
Authentication-Results: mr.google.com;
	spf=pass (google.com: domain of mavbsd@gmail.com
	designates 10.204.157.5 as permitted sender)
	smtp.mail=mavbsd@gmail.com; dkim=pass header.i=mavbsd@gmail.com
Received: from mr.google.com ([10.204.157.5])
	by 10.204.157.5 with SMTP id z5mr7330951bkw.95.1330779260533 (num_hops
	= 1); Sat, 03 Mar 2012 04:54:20 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
	h=sender:message-id:date:from:user-agent:mime-version:to:cc:subject
	:references:in-reply-to:content-type:content-transfer-encoding;
	bh=IMWkPfHr/At3sSN6ziOxHEEvRk3HqZvUTwAQIdiCiuA=;
	b=dWcRt07pZbpuUVZ0D8RAS6/dILRGK6+cu+qZ2iSVDKWQPB0vHtgm9s3iGpuiZIAiPO
	B2GCREa79C3zZplFi/P/1svs5dIBgODT39gA4hD4QG+RH2TQbzm75Rk9ek1aLAxa2iqm
	ZhsoQNwRtJsQOUeJzVGKU3++I0yYuv0ffjgMLvO6t+CKjMKGmGosJGW5YSAgNh4JfmXI
	OQw/Jl7HKfGtMBekIjN/jNHU4In/A09ZJjCK4qnZVObpDpmaDrf7FLCkssFe7hdT0ymz
	hP8CjYfKm6HmuNkGU57t2m3KUBpdWWiYm2Pun1AqSQAZAB4K4xAhjFqKwUro/czSoXcS
	933Q==
Received: by 10.204.157.5 with SMTP id z5mr5868060bkw.95.1330779260422;
	Sat, 03 Mar 2012 04:54:20 -0800 (PST)
Received: from mavbook2.mavhome.dp.ua (pc.mavhome.dp.ua. [212.86.226.226])
	by mx.google.com with ESMTPS id x22sm14523117bkw.11.2012.03.03.04.54.18
	(version=SSLv3 cipher=OTHER); Sat, 03 Mar 2012 04:54:19 -0800 (PST)
Sender: Alexander Motin <mavbsd@gmail.com>
Message-ID: <4F521479.30704@FreeBSD.org>
Date: Sat, 03 Mar 2012 14:54:17 +0200
From: Alexander Motin <mav@FreeBSD.org>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
	rv:10.0.2) Gecko/20120226 Thunderbird/10.0.2
MIME-Version: 1.0
To: Adrian Chadd <adrian@freebsd.org>
References: <4F2F7B7F.40508@FreeBSD.org> <4F366E8F.9060207@FreeBSD.org>
	<4F367965.6000602@FreeBSD.org> <4F396B24.5090602@FreeBSD.org>
	<alpine.BSF.2.00.1202131012270.2020@desktop>
	<4F3978BC.6090608@FreeBSD.org>
	<alpine.BSF.2.00.1202131108460.2020@desktop>
	<4F3990EA.1080002@FreeBSD.org> <4F3C0BB9.6050101@FreeBSD.org>
	<alpine.BSF.2.00.1202150949480.2020@desktop>
	<4F3E807A.60103@FreeBSD.org>
	<CACqU3MWEC4YYguPQF_d+_i_CwTc=86hG+PbxFgJQiUS-=AHiRw@mail.gmail.com>
	<4F3E8858.4000001@FreeBSD.org> <4F4ACF2C.50300@m5p.com>
	<CABzXLYNhYmCgM7rhJa8g_7PYey-rVirDjo5FqRaEMi7m43y-0g@mail.gmail.com>
	<4F4B67AB.40907@m5p.com>
	<CABzXLYMthn-kh05Cu22=U_W4vV98YbQtuvEq7yrsYtKC3iHRUw@mail.gmail.com>
	<4F4C17E2.2040101@m5p.com>
	<CAJ-VmokGdDHCwNa3pzsL2a6pWRNA_+_-oPqwwMkDo=_9iyvVXg@mail.gmail.com>
	<4F516281.30603@m5p.com>
	<CAJ-VmomH38u3a+LtRPQkPtwkBbREQX8vyqxiROzuGb4o5eBa4Q@mail.gmail.com>
	<4F51CAE9.20905@FreeBSD.org>
	<CAJ-VmongT+UVbLDC=k4vV5gs4OZCjWy3E=tJcnEXN63MgEAu7A@mail.gmail.com>
	<4F51E07C.4020706@FreeBSD.org>
In-Reply-To: <4F51E07C.4020706@FreeBSD.org>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: freebsd-hackers@freebsd.org, George Mitchell <george+freebsd@m5p.com>
Subject: Re: [RFT][patch] Scheduling for HTT and not only
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 03 Mar 2012 12:54:22 -0000

On 03/03/12 11:12, Alexander Motin wrote:
> On 03/03/12 10:59, Adrian Chadd wrote:
>> Right. Is this written up in a PR somewhere explaining the problem in
>> as much depth has you just have?
>
> Have no idea. I am new at this area and haven't looked on PRs yet.
>
>> And thanks for this, it's great to see some further explanation of the
>> current issues the scheduler faces.
>
> By the way I've just reproduced the problem with compilation. On
> dual-core system net/mpd5 compilation in one stream takes 17 seconds.
> But with two low-priority non-interactive CPU-burning threads running it
> takes 127 seconds. I'll try to analyze it more now. I have feeling that
> there could be more factors causing priority violation than I've
> described below.

On closer look my test appeared not so clean, but instead much more 
interesting. Because of NFS use, there is not just context switches 
between make, cc and as, that are possibly optimized a bit now, but many 
short sleeps when background process gets running. As result, in some 
moments I see such wonderful traces for cc:

wait on runq for 81ms,
run for 37us,
wait NFS for 202us,
wait on runq for 92ms,
run for 30us,
wait NFS for 245us,
wait on runq for 53ms,
run for 142us,

About 0.05% CPU time use for process that supposed to be CPU-bound. And 
while for small run/sleep times ratio process could be nominated on 
interactivity, with so small absolute sleep times it will need ages to 
compensate 5 seconds of "batch" run history, recorded before.

>> On 2 March 2012 23:40, Alexander Motin<mav@freebsd.org> wrote:
>>> On 03/03/12 05:24, Adrian Chadd wrote:
>>>>
>>>> mav@, can you please take a look at George's traces and see if there's
>>>> anything obviously silly going on?
>>>> He's reporting that your ULE work hasn't improved his (very) degenerate
>>>> case.
>>>
>>>
>>> As I can see, my patch has nothing to do with the problem. My patch
>>> improves
>>> SMP load balancing, while in this case problem is different. In some
>>> cases,
>>> when not all CPUs are busy, my patch could mask the problem by using
>>> more
>>> CPUs, but not in this case when dnets consumes all available CPUs.
>>>
>>> I still not feel very comfortable with ULE math, but as I understand, in
>>> both illustrated cases there is a conflict between clearly CPU-bound
>>> dnets
>>> threads, that consume all available CPU and never do voluntary context
>>> switches, and more or less interactive other threads. If other threads
>>> detected to be "interactive" in ULE terms, they should preempt dnets
>>> threads
>>> and everything will be fine. But "batch" (in ULE terms) threads never
>>> preempt each other, switching context only about 10 times per second, as
>>> hardcoded in sched_slice variable. Kernel build by definition
>>> consumes too
>>> much CPU time to be marked "interactive". exo-helper-1 thread in
>>> interact.out could potentially be marked "interactive", but possibly
>>> once it
>>> consumed some CPU to become "batch", it is difficult for it to get
>>> back, as
>>> waiting in a runq is not counted as sleep and each time it is getting
>>> running, it has some new work to do, so it remains "batch". May be if
>>> CPU
>>> time accounting was more precise it would work better (by accounting
>>> those
>>> short periods when threads really sleeps voluntary), but not with
>>> present
>>> sampled logic with 1ms granularity. As result, while dnets threads
>>> each time
>>> consume full 100ms time slices, other threads are starving, getting
>>> running
>>> only 10 times per second to voluntary switch out in just a few
>>> milliseconds.
>>>
>>>
>>>> On 2 March 2012 16:14, George Mitchell<george+freebsd@m5p.com> wrote:
>>>>>
>>>>> On 03/02/12 18:06, Adrian Chadd wrote:
>>>>>>
>>>>>>
>>>>>> Hi George,
>>>>>>
>>>>>> Have you thought about providing schedgraph traces with your
>>>>>> particular workload?
>>>>>>
>>>>>> I'm sure that'll help out the scheduler hackers quite a bit.
>>>>>>
>>>>>> THanks,
>>>>>>
>>>>>>
>>>>>> Adrian
>>>>>>
>>>>>
>>>>> I posted a couple back in December but I haven't created any more
>>>>> recently:
>>>>>
>>>>> http://www.m5p.com/~george/ktr-ule-problem.out
>>>>> http://www.m5p.com/~george/ktr-ule-interact.out
>>>>>
>>>>> To the best of my knowledge, no one ever examined them. -- George
>>>
>>> --
>>> Alexander Motin


-- 
Alexander Motin