From owner-freebsd-hackers@FreeBSD.ORG Sat Mar 3 15:26:15 2012 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id DB212106566C; Sat, 3 Mar 2012 15:26:15 +0000 (UTC) (envelope-from fidaj@ukr.net) Received: from fsm1.ukr.net (fsm1.ukr.net [195.214.192.120]) by mx1.freebsd.org (Postfix) with ESMTP id 24CA18FC0C; Sat, 3 Mar 2012 15:26:15 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=ukr.net; s=fsm; h=Content-Transfer-Encoding:Content-Type:Mime-Version:References:In-Reply-To:Message-ID:Subject:Cc:To:From:Date; bh=2CCnc1hHLF/Eb76THkuwMndvkpwyuH0VuMsVmIUl59E=; b=h3YlGYRz80WqT1IV1e7Rdy0DaL6d1U2lM7rf0s+DvQs3Dltl+72AQ6uDFT8p7UbS9OWBuzUGBbsW7tyK/eL0OcK7o6Hj6Gp3ZbKME+/vx/eXZuQz4XPgq0NqKYM/2cZv9LP30vOS3ixWb2Y3XIBqUbMEfBrwtxohdkZi/EaXPM8=; Received: from [178.137.138.140] (helo=nonamehost.) by fsm1.ukr.net with esmtpsa ID 1S3qqQ-000IeR-B1 ; Sat, 03 Mar 2012 17:26:02 +0200 Date: Sat, 3 Mar 2012 17:26:01 +0200 From: Ivan Klymenko To: Alexander Motin Message-ID: <20120303172601.07c9c2b5@nonamehost.> In-Reply-To: <4F521479.30704@FreeBSD.org> References: <4F2F7B7F.40508@FreeBSD.org> <4F396B24.5090602@FreeBSD.org> <4F3978BC.6090608@FreeBSD.org> <4F3990EA.1080002@FreeBSD.org> <4F3C0BB9.6050101@FreeBSD.org> <4F3E807A.60103@FreeBSD.org> <4F3E8858.4000001@FreeBSD.org> <4F4ACF2C.50300@m5p.com> <4F4B67AB.40907@m5p.com> <4F4C17E2.2040101@m5p.com> <4F516281.30603@m5p.com> <4F51CAE9.20905@FreeBSD.org> <4F51E07C.4020706@FreeBSD.org> <4F521479.30704@FreeBSD.org> X-Mailer: Claws Mail 3.8.0 (GTK+ 2.24.6; amd64-portbld-freebsd10.0) Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Cc: freebsd-hackers@freebsd.org, Adrian Chadd , George, Mitchell Subject: Re: [RFT][patch] Scheduling for HTT and not only X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 03 Mar 2012 15:26:15 -0000 =D0=92 Sat, 03 Mar 2012 14:54:17 +0200 Alexander Motin =D0=BF=D0=B8=D1=88=D0=B5=D1=82: > On 03/03/12 11:12, Alexander Motin wrote: > > On 03/03/12 10:59, Adrian Chadd wrote: > >> Right. Is this written up in a PR somewhere explaining the problem > >> in as much depth has you just have? > > > > Have no idea. I am new at this area and haven't looked on PRs yet. > > > >> And thanks for this, it's great to see some further explanation of > >> the current issues the scheduler faces. > > > > By the way I've just reproduced the problem with compilation. On > > dual-core system net/mpd5 compilation in one stream takes 17 > > seconds. But with two low-priority non-interactive CPU-burning > > threads running it takes 127 seconds. I'll try to analyze it more > > now. I have feeling that there could be more factors causing > > priority violation than I've described below. >=20 > On closer look my test appeared not so clean, but instead much more=20 > interesting. Because of NFS use, there is not just context switches=20 > between make, cc and as, that are possibly optimized a bit now, but > many short sleeps when background process gets running. As result, in > some moments I see such wonderful traces for cc: >=20 > wait on runq for 81ms, > run for 37us, > wait NFS for 202us, > wait on runq for 92ms, > run for 30us, > wait NFS for 245us, > wait on runq for 53ms, > run for 142us, >=20 > About 0.05% CPU time use for process that supposed to be CPU-bound. > And while for small run/sleep times ratio process could be nominated > on interactivity, with so small absolute sleep times it will need > ages to compensate 5 seconds of "batch" run history, recorded before. >=20 > >> On 2 March 2012 23:40, Alexander Motin wrote: > >>> On 03/03/12 05:24, Adrian Chadd wrote: > >>>> > >>>> mav@, can you please take a look at George's traces and see if > >>>> there's anything obviously silly going on? > >>>> He's reporting that your ULE work hasn't improved his (very) > >>>> degenerate case. > >>> > >>> > >>> As I can see, my patch has nothing to do with the problem. My > >>> patch improves > >>> SMP load balancing, while in this case problem is different. In > >>> some cases, > >>> when not all CPUs are busy, my patch could mask the problem by > >>> using more > >>> CPUs, but not in this case when dnets consumes all available CPUs. > >>> > >>> I still not feel very comfortable with ULE math, but as I > >>> understand, in both illustrated cases there is a conflict between > >>> clearly CPU-bound dnets > >>> threads, that consume all available CPU and never do voluntary > >>> context switches, and more or less interactive other threads. If > >>> other threads detected to be "interactive" in ULE terms, they > >>> should preempt dnets threads > >>> and everything will be fine. But "batch" (in ULE terms) threads > >>> never preempt each other, switching context only about 10 times > >>> per second, as hardcoded in sched_slice variable. Kernel build by > >>> definition consumes too > >>> much CPU time to be marked "interactive". exo-helper-1 thread in > >>> interact.out could potentially be marked "interactive", but > >>> possibly once it > >>> consumed some CPU to become "batch", it is difficult for it to get > >>> back, as > >>> waiting in a runq is not counted as sleep and each time it is > >>> getting running, it has some new work to do, so it remains > >>> "batch". May be if CPU > >>> time accounting was more precise it would work better (by > >>> accounting those > >>> short periods when threads really sleeps voluntary), but not with > >>> present > >>> sampled logic with 1ms granularity. As result, while dnets threads > >>> each time > >>> consume full 100ms time slices, other threads are starving, > >>> getting running > >>> only 10 times per second to voluntary switch out in just a few > >>> milliseconds. > >>> > >>> > >>>> On 2 March 2012 16:14, George Mitchell > >>>> wrote: > >>>>> > >>>>> On 03/02/12 18:06, Adrian Chadd wrote: > >>>>>> > >>>>>> > >>>>>> Hi George, > >>>>>> > >>>>>> Have you thought about providing schedgraph traces with your > >>>>>> particular workload? > >>>>>> > >>>>>> I'm sure that'll help out the scheduler hackers quite a bit. > >>>>>> > >>>>>> THanks, > >>>>>> > >>>>>> > >>>>>> Adrian > >>>>>> > >>>>> > >>>>> I posted a couple back in December but I haven't created any > >>>>> more recently: > >>>>> > >>>>> http://www.m5p.com/~george/ktr-ule-problem.out > >>>>> http://www.m5p.com/~george/ktr-ule-interact.out > >>>>> > >>>>> To the best of my knowledge, no one ever examined them. -- > >>>>> George > >>> > >>> -- > >>> Alexander Motin >=20 >=20 I have FreeBSD 10.0-CURRENT #0 r232253M Patch in r232454 broken my DRM My system patched http://people.freebsd.org/~kib/drm/all.13.5.patch After build kernel with only r232454 patch Xorg log contains: ... [ 504.865] [drm] failed to load kernel module "i915" [ 504.865] (EE) intel(0): [drm] Failed to open DRM device for pci:0000:00= :02.0: File exists [ 504.865] (EE) intel(0): Failed to become DRM master. [ 504.865] (**) intel(0): Depth 24, (--) framebuffer bpp 32 [ 504.865] (=3D=3D) intel(0): RGB weight 888 [ 504.865] (=3D=3D) intel(0): Default visual is TrueColor [ 504.865] (**) intel(0): Option "DRI" "True" [ 504.865] (**) intel(0): Option "TripleBuffer" "True" [ 504.865] (II) intel(0): Integrated Graphics Chipset: Intel(R) Sandybrid= ge Mobile (GT2) [ 504.865] (--) intel(0): Chipset: "Sandybridge Mobile (GT2)" and black screen... do not even know why it happened ... :(