From owner-freebsd-hackers@FreeBSD.ORG Sat Mar 3 12:54:22 2012 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A631C1065825; Sat, 3 Mar 2012 12:54:22 +0000 (UTC) (envelope-from mavbsd@gmail.com) Received: from mail-bk0-f54.google.com (mail-bk0-f54.google.com [209.85.214.54]) by mx1.freebsd.org (Postfix) with ESMTP id AABD58FC12; Sat, 3 Mar 2012 12:54:21 +0000 (UTC) Received: by bkcjc3 with SMTP id jc3so2911666bkc.13 for ; Sat, 03 Mar 2012 04:54:20 -0800 (PST) Received-SPF: pass (google.com: domain of mavbsd@gmail.com designates 10.204.157.5 as permitted sender) client-ip=10.204.157.5; Authentication-Results: mr.google.com; spf=pass (google.com: domain of mavbsd@gmail.com designates 10.204.157.5 as permitted sender) smtp.mail=mavbsd@gmail.com; dkim=pass header.i=mavbsd@gmail.com Received: from mr.google.com ([10.204.157.5]) by 10.204.157.5 with SMTP id z5mr7330951bkw.95.1330779260533 (num_hops = 1); Sat, 03 Mar 2012 04:54:20 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:message-id:date:from:user-agent:mime-version:to:cc:subject :references:in-reply-to:content-type:content-transfer-encoding; bh=IMWkPfHr/At3sSN6ziOxHEEvRk3HqZvUTwAQIdiCiuA=; b=dWcRt07pZbpuUVZ0D8RAS6/dILRGK6+cu+qZ2iSVDKWQPB0vHtgm9s3iGpuiZIAiPO B2GCREa79C3zZplFi/P/1svs5dIBgODT39gA4hD4QG+RH2TQbzm75Rk9ek1aLAxa2iqm ZhsoQNwRtJsQOUeJzVGKU3++I0yYuv0ffjgMLvO6t+CKjMKGmGosJGW5YSAgNh4JfmXI OQw/Jl7HKfGtMBekIjN/jNHU4In/A09ZJjCK4qnZVObpDpmaDrf7FLCkssFe7hdT0ymz hP8CjYfKm6HmuNkGU57t2m3KUBpdWWiYm2Pun1AqSQAZAB4K4xAhjFqKwUro/czSoXcS 933Q== Received: by 10.204.157.5 with SMTP id z5mr5868060bkw.95.1330779260422; Sat, 03 Mar 2012 04:54:20 -0800 (PST) Received: from mavbook2.mavhome.dp.ua (pc.mavhome.dp.ua. [212.86.226.226]) by mx.google.com with ESMTPS id x22sm14523117bkw.11.2012.03.03.04.54.18 (version=SSLv3 cipher=OTHER); Sat, 03 Mar 2012 04:54:19 -0800 (PST) Sender: Alexander Motin Message-ID: <4F521479.30704@FreeBSD.org> Date: Sat, 03 Mar 2012 14:54:17 +0200 From: Alexander Motin User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:10.0.2) Gecko/20120226 Thunderbird/10.0.2 MIME-Version: 1.0 To: Adrian Chadd References: <4F2F7B7F.40508@FreeBSD.org> <4F366E8F.9060207@FreeBSD.org> <4F367965.6000602@FreeBSD.org> <4F396B24.5090602@FreeBSD.org> <4F3978BC.6090608@FreeBSD.org> <4F3990EA.1080002@FreeBSD.org> <4F3C0BB9.6050101@FreeBSD.org> <4F3E807A.60103@FreeBSD.org> <4F3E8858.4000001@FreeBSD.org> <4F4ACF2C.50300@m5p.com> <4F4B67AB.40907@m5p.com> <4F4C17E2.2040101@m5p.com> <4F516281.30603@m5p.com> <4F51CAE9.20905@FreeBSD.org> <4F51E07C.4020706@FreeBSD.org> In-Reply-To: <4F51E07C.4020706@FreeBSD.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-hackers@freebsd.org, George Mitchell Subject: Re: [RFT][patch] Scheduling for HTT and not only X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 03 Mar 2012 12:54:22 -0000 On 03/03/12 11:12, Alexander Motin wrote: > On 03/03/12 10:59, Adrian Chadd wrote: >> Right. Is this written up in a PR somewhere explaining the problem in >> as much depth has you just have? > > Have no idea. I am new at this area and haven't looked on PRs yet. > >> And thanks for this, it's great to see some further explanation of the >> current issues the scheduler faces. > > By the way I've just reproduced the problem with compilation. On > dual-core system net/mpd5 compilation in one stream takes 17 seconds. > But with two low-priority non-interactive CPU-burning threads running it > takes 127 seconds. I'll try to analyze it more now. I have feeling that > there could be more factors causing priority violation than I've > described below. On closer look my test appeared not so clean, but instead much more interesting. Because of NFS use, there is not just context switches between make, cc and as, that are possibly optimized a bit now, but many short sleeps when background process gets running. As result, in some moments I see such wonderful traces for cc: wait on runq for 81ms, run for 37us, wait NFS for 202us, wait on runq for 92ms, run for 30us, wait NFS for 245us, wait on runq for 53ms, run for 142us, About 0.05% CPU time use for process that supposed to be CPU-bound. And while for small run/sleep times ratio process could be nominated on interactivity, with so small absolute sleep times it will need ages to compensate 5 seconds of "batch" run history, recorded before. >> On 2 March 2012 23:40, Alexander Motin wrote: >>> On 03/03/12 05:24, Adrian Chadd wrote: >>>> >>>> mav@, can you please take a look at George's traces and see if there's >>>> anything obviously silly going on? >>>> He's reporting that your ULE work hasn't improved his (very) degenerate >>>> case. >>> >>> >>> As I can see, my patch has nothing to do with the problem. My patch >>> improves >>> SMP load balancing, while in this case problem is different. In some >>> cases, >>> when not all CPUs are busy, my patch could mask the problem by using >>> more >>> CPUs, but not in this case when dnets consumes all available CPUs. >>> >>> I still not feel very comfortable with ULE math, but as I understand, in >>> both illustrated cases there is a conflict between clearly CPU-bound >>> dnets >>> threads, that consume all available CPU and never do voluntary context >>> switches, and more or less interactive other threads. If other threads >>> detected to be "interactive" in ULE terms, they should preempt dnets >>> threads >>> and everything will be fine. But "batch" (in ULE terms) threads never >>> preempt each other, switching context only about 10 times per second, as >>> hardcoded in sched_slice variable. Kernel build by definition >>> consumes too >>> much CPU time to be marked "interactive". exo-helper-1 thread in >>> interact.out could potentially be marked "interactive", but possibly >>> once it >>> consumed some CPU to become "batch", it is difficult for it to get >>> back, as >>> waiting in a runq is not counted as sleep and each time it is getting >>> running, it has some new work to do, so it remains "batch". May be if >>> CPU >>> time accounting was more precise it would work better (by accounting >>> those >>> short periods when threads really sleeps voluntary), but not with >>> present >>> sampled logic with 1ms granularity. As result, while dnets threads >>> each time >>> consume full 100ms time slices, other threads are starving, getting >>> running >>> only 10 times per second to voluntary switch out in just a few >>> milliseconds. >>> >>> >>>> On 2 March 2012 16:14, George Mitchell wrote: >>>>> >>>>> On 03/02/12 18:06, Adrian Chadd wrote: >>>>>> >>>>>> >>>>>> Hi George, >>>>>> >>>>>> Have you thought about providing schedgraph traces with your >>>>>> particular workload? >>>>>> >>>>>> I'm sure that'll help out the scheduler hackers quite a bit. >>>>>> >>>>>> THanks, >>>>>> >>>>>> >>>>>> Adrian >>>>>> >>>>> >>>>> I posted a couple back in December but I haven't created any more >>>>> recently: >>>>> >>>>> http://www.m5p.com/~george/ktr-ule-problem.out >>>>> http://www.m5p.com/~george/ktr-ule-interact.out >>>>> >>>>> To the best of my knowledge, no one ever examined them. -- George >>> >>> -- >>> Alexander Motin -- Alexander Motin