From owner-freebsd-stable@FreeBSD.ORG Fri Oct 19 10:05:18 2007 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A62E216A419 for ; Fri, 19 Oct 2007 10:05:18 +0000 (UTC) (envelope-from remy.nonnenmacher@activnetworks.com) Received: from mallaury.nerim.net (mallaury.ipv6.nerim.net [IPv6:2001:7a8:1:5::82]) by mx1.freebsd.org (Postfix) with ESMTP id 0C15013C48E for ; Fri, 19 Oct 2007 10:05:18 +0000 (UTC) (envelope-from remy.nonnenmacher@activnetworks.com) Received: from rn.activnetworks.com (anwadmin.net8.nerim.net [213.41.185.85]) by mallaury.nerim.net (Postfix) with ESMTP id 540154F437; Fri, 19 Oct 2007 12:05:10 +0200 (CEST) Message-ID: <4718815C.9050102@activnetworks.com> Date: Fri, 19 Oct 2007 12:05:16 +0200 From: Remy Nonnenmacher User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.8.1.4) Gecko/20070709 SeaMonkey/1.1.2 MIME-Version: 1.0 To: josh.carroll@gmail.com References: <8cb6106e0710170911x77e72e95qb322f51d84a31813@mail.gmail.com> <8cb6106e0710180910u110a1c58tc18f36460ab74776@mail.gmail.com> In-Reply-To: <8cb6106e0710180910u110a1c58tc18f36460ab74776@mail.gmail.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-stable@freebsd.org Subject: Re: ULE vs. 4BSD in RELENG_7 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 19 Oct 2007 10:05:18 -0000 Josh Carroll wrote: >> I have noticed some performance discrepancies with ULE and 4BSD in >> RELENG_7, specifically with ffmpeg. I have all the kernel debugging >> options disabled, and as I understand it, the userland debugging is >> all off by default in RELENG_7. > > Here are a couple of additional benchmarks comparing the schedulers on > my system: > > make -j8 -DNOCLEAN buildkernel > 4BSD: 3:25.56 > ULE: 3:39.20 > Difference: -6.6 % > > ubench (CPU): > 4BSD: 1705258 > ULE: 1713510 > Difference: +0.48 % > > super-smack (select-key 10 10000): > 4BSD: 55044.38 > ULE: 68085.21 > Difference: +23.69 % > > super-smack (update-select 10 10000): > 4BSD: 16734.15 > ULE: 17631.43 > Difference: +5.36 % > > So at least for the MySQL super-smack benchmark (I know it's a rather > contrived benchmark), ULE is significantly faster for select-key and a > decent improvement for update-select. ubench is about the same, but > building a kernel is also slower with ULE. > Here are some buildworld measures (extract from a buildaton): i386: 6.2-RELEASE -j4: 746.08 real 1996.38 user 468.91 sys 1535.1 RSA -j6: 595.31 real 1957.31 user 539.24 sys 2304.9 RSA -j8: 534.21 real 1957.76 user 567.06 sys 3068.5 RSA M1000: 492.64 real 1956.58 user 587.41 sys 100: 526.22 real 1936.98 user 559.49 sys 3073.8 RSA M100: 474.26 real 1947.09 user 563.95 sys -j10: 550.18 real 1975.54 user 588.33 sys -j12: 550.23 real 1976.88 user 602.65 sys -j16: 559.22 real 1972.19 user 634.29 sys i386: 7.0-current (as of 10/16/2007) - SCHED_4BSD -j4: 1072.64 real 2880.29 user 561.13 sys 1495.7 RSA -j6: 842.91 real 2813.75 user 638.91 sys 2244.8 RSA -j8: 758.48 real 2824.23 user 704.00 sys 2990.1 RSA M1000: 696.12 real 2820.53 user 706.97 sys 100: 752.58 real 2809.97 user 685.35 sys 2993.2 RSA M100: 666.58 real 2804.72 user 714.01 sys -j10: 763.82 real 2843.44 user 743.77 sys -j12: 785.12 real 2845.11 user 770.31 sys -j16: 805.02 real 2848.06 user 819.53 sys i386: 7.0-current (as of 10/16/2007) - SCHED_ULE -j4: 1047.00 real 2857.59 user 486.93 sys 1494.2 RSA -j6: 831.10 real 2793.94 user 524.58 sys 2242.6 RSA -j8: 803.34 real 2796.46 user 552.56 sys 2991.0 RSA M1000: 709.77 real 2793.20 user 572.27 sys 100: 785.18 real 2765.14 user 545.57 sys 2991.4 RSA M100: 707.09 real 2769.88 user 572.92 sys -j10: 813.36 real 2808.13 user 587.51 sys -j12: 824.23 real 2817.60 user 618.00 sys -j16: 856.11 real 2847.68 user 721.97 sys --------- Conditions: Machine: Intel SR2520 - S5000PAL, 2xE5345 (8 cores, 2.33Ghz), 8G, 1xsata) Generic kernel; /usr/obj/* removed after each run; Runs done after a reboot; softupdates on all fs; RSA = openssl speed -multi rsa1024; M1000 = noatime /usr, kern.hz=1000, tar cf /dev/null /usr/src before run. M100 = idem with kern.hz=100 100 = generic test with kern.hz=100 (To be compared with -j8 line). (M variants to minimize disk contention reading src; Variants measured only on #-core sweetspot (8 here)). --------- Remarks: - There seems to be a loss of efficiency on openssl code. Not scheduler related. An indication of compiler change ? - 6.2 results only for information. RSA left aside, there is no direct equivalence between buildworld workload nor duration/level of parallelism between 6.2 and 7.0. Also, Amdhal's law limits efficiency on increasing cores number. - ULE tends to be more efficient than 4BSD when there are available cores (1.0244 and 1.0142 ratio on -j4 and -j6) but less efficient as load increase (0.9807 to 0.9403 from -j8 to -j16). - ULE seems to be less sensitive to Hz than 4BSD (1.0037 from 1.0443 on M1000/M100 variants ratio). (Beware of side effects on time/delay bandwidth estimator at network level). -- RN. IeM