From owner-freebsd-hackers@FreeBSD.ORG Fri Apr 6 14:30:35 2012 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 3DB1F106564A; Fri, 6 Apr 2012 14:30:35 +0000 (UTC) (envelope-from asmrookie@gmail.com) Received: from mail-lb0-f182.google.com (mail-lb0-f182.google.com [209.85.217.182]) by mx1.freebsd.org (Postfix) with ESMTP id C306C8FC1D; Fri, 6 Apr 2012 14:30:33 +0000 (UTC) Received: by lbok6 with SMTP id k6so967605lbo.13 for ; Fri, 06 Apr 2012 07:30:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=tOILqdEcU3+tOlJFu8Kfm13oW7YAskpkPAOORjDaPrw=; b=MSjuwHUeBYrbdIDRV08hw8GBWVa0Hlaq5zUhQh2M0xhhL/YpizN+IKDyyerfQFdqRN P1TWrX7veFdFj+mqus1R7OCJRR21cFDhgUVHZtOA3J+A+DAaTiLAehG7vBk1v+flXT6v uPmg79eoRq6dbKBfYBkgYoo2tDfAkn5VF1Hkrm4g0jClr+K8Uvbk4ICsxgtWTIkUK7vz WVHb6q20WVy6mZrG8dVqOe4Dsji08eHVGse2o6xdgfCcCxq9ytnHmQD2/pJomAKgU8HD +qsUA3GwXkaUQNuTwtn/P1ggtUW18paTililqocUChC0SZBTPlblmNGd7Edsi51kYS0n 12QA== MIME-Version: 1.0 Received: by 10.152.103.239 with SMTP id fz15mr8813821lab.42.1333722632712; Fri, 06 Apr 2012 07:30:32 -0700 (PDT) Sender: asmrookie@gmail.com Received: by 10.112.93.138 with HTTP; Fri, 6 Apr 2012 07:30:32 -0700 (PDT) In-Reply-To: <4F7EFD42.9010507@FreeBSD.org> References: <4F2F7B7F.40508@FreeBSD.org> <4F366E8F.9060207@FreeBSD.org> <4F367965.6000602@FreeBSD.org> <4F396B24.5090602@FreeBSD.org> <4F3978BC.6090608@FreeBSD.org> <4F3990EA.1080002@FreeBSD.org> <4F3C0BB9.6050101@FreeBSD.org> <4F3E807A.60103@FreeBSD.org> <4F3E8858.4000001@FreeBSD.org> <4F7EFD42.9010507@FreeBSD.org> Date: Fri, 6 Apr 2012 15:30:32 +0100 X-Google-Sender-Auth: 7dogYljInpWmobDNMWo-gwj86CU Message-ID: From: Attilio Rao To: Alexander Motin Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Cc: Florian Smeets , freebsd-hackers@freebsd.org, Andriy Gapon , FreeBSD current , Jeff Roberson , Arnaud Lacombe Subject: Re: [RFT][patch] Scheduling for HTT and not only X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 06 Apr 2012 14:30:35 -0000 Il 06 aprile 2012 15:27, Alexander Motin ha scritto: > On 04/06/12 17:13, Attilio Rao wrote: >> >> Il 05 aprile 2012 19:12, Arnaud Lacombe =C2=A0ha scr= itto: >>> >>> Hi, >>> >>> [Sorry for the delay, I got a bit sidetrack'ed...] >>> >>> 2012/2/17 Alexander Motin: >>>> >>>> On 17.02.2012 18:53, Arnaud Lacombe wrote: >>>>> >>>>> >>>>> On Fri, Feb 17, 2012 at 11:29 AM, Alexander Motin >>>>> =C2=A0wrote: >>>>>> >>>>>> >>>>>> On 02/15/12 21:54, Jeff Roberson wrote: >>>>>>> >>>>>>> >>>>>>> On Wed, 15 Feb 2012, Alexander Motin wrote: >>>>>>>> >>>>>>>> >>>>>>>> I've decided to stop those cache black magic practices and focus o= n >>>>>>>> things that really exist in this world -- SMT and CPU load. I've >>>>>>>> dropped most of cache related things from the patch and made the >>>>>>>> rest >>>>>>>> of things more strict and predictable: >>>>>>>> http://people.freebsd.org/~mav/sched.htt34.patch >>>>>>> >>>>>>> >>>>>>> >>>>>>> This looks great. I think there is value in considering the other >>>>>>> approach further but I would like to do this part first. It would b= e >>>>>>> nice to also add priority as a greater influence in the load >>>>>>> balancing >>>>>>> as well. >>>>>> >>>>>> >>>>>> >>>>>> I haven't got good idea yet about balancing priorities, but I've >>>>>> rewritten >>>>>> balancer itself. As soon as sched_lowest() / sched_highest() are mor= e >>>>>> intelligent now, they allowed to remove topology traversing from the >>>>>> balancer itself. That should fix double-swapping problem, allow to >>>>>> keep >>>>>> some >>>>>> affinity while moving threads and make balancing more fair. I did >>>>>> number >>>>>> of >>>>>> tests running 4, 8, 9 and 16 CPU-bound threads on 8 CPUs. With 4, 8 >>>>>> and >>>>>> 16 >>>>>> threads everything is stationary as it should. With 9 threads I see >>>>>> regular >>>>>> and random load move between all 8 CPUs. Measurements on 5 minutes r= un >>>>>> show >>>>>> deviation of only about 5 seconds. It is the same deviation as I see >>>>>> caused >>>>>> by only scheduling of 16 threads on 8 cores without any balancing >>>>>> needed >>>>>> at >>>>>> all. So I believe this code works as it should. >>>>>> >>>>>> Here is the patch: http://people.freebsd.org/~mav/sched.htt40.patch >>>>>> >>>>>> I plan this to be a final patch of this series (more to come :)) and >>>>>> if >>>>>> there will be no problems or objections, I am going to commit it >>>>>> (except >>>>>> some debugging KTRs) in about ten days. So now it's a good time for >>>>>> reviews >>>>>> and testing. :) >>>>>> >>>>> is there a place where all the patches are available ? >>>> >>>> >>>> >>>> All my scheduler patches are cumulative, so all you need is only the >>>> last >>>> mentioned here sched.htt40.patch. >>>> >>> You may want to have a look to the result I collected in the >>> `runs/freebsd-experiments' branch of: >>> >>> https://github.com/lacombar/hackbench/ >>> >>> and compare them with vanilla FreeBSD 9.0 and -CURRENT results >>> available in `runs/freebsd'. On the dual package platform, your patch >>> is not a definite win. >>> >>>> But in some cases, especially for multi-socket systems, to let it show >>>> its >>>> best, you may want to apply additional patch from avg@ to better detec= t >>>> CPU >>>> topology: >>>> >>>> https://gitorious.org/~avg/freebsd/avgbsd/commit/6bca4a2e4854ea3fc2759= 46a023db65c483cb9dd >>>> >>> test I conducted specifically for this patch did not showed much >>> improvement... >> >> >> Can you please clarify on this point? >> The test you did included cases where the topology was detected badly >> against cases where the topology was detected correctly as a patched >> kernel (and you still didn't see a performance improvement), in terms >> of cache line sharing? > > > At this moment SCHED_ULE does almost nothing in terms of cache line shari= ng > affinity (though it probably worth some further experiments). What this > patch may improve is opposite case -- reduce cache sharing pressure for > cache-hungry applications. For example, proper cache topology detection > (such as lack of global L3 cache, but shared L2 per pairs of cores on > Core2Quad class CPUs) increases pbzip2 performance when number of threads= is > less then number of CPUs (i.e. when there is place for optimization). My asking is not referred to your patch really. I just wanted to know if he correctly benchmark a case where the topology was screwed up and then correctly recognized by avg's patch in terms of cache level aggregation (it wasn't referred to your patch btw). Attilio --=20 Peace can only be achieved by understanding - A. Einstein