Date: Thu, 27 Nov 2003 18:27:47 +1100 (EST) From: Bruce Evans <bde@zeta.org.au> To: Garance A Drosihn <drosih@rpi.edu> Cc: freebsd-current@freebsd.org Subject: Re: 40% slowdown with dynamic /bin/sh Message-ID: <20031127161940.I77322@gamplex.bde.org> In-Reply-To: <p06002014bbea2b21766b@[128.113.24.47]> References: <200311251214.23290.doconnor@gsoft.com.au> <20031126052320.GH15294@wombat.localnet> <p06002014bbea2b21766b@[128.113.24.47]>
next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, 26 Nov 2003, Garance A Drosihn wrote: > At 12:23 AM -0500 11/26/03, Michael Edenfield wrote: > > > >Just to provide some real-world numbers, here's what I got > >out of a buildworld: > > I have reformatted the numbers that Michael reported, > into the following table: > > >Static /bin/sh: Dynamic /bin/sh: > > real 385m29.977s real 455m44.852s => 18.22% > > user 111m58.508s user 113m17.807s => 1.18% > > sys 93m14.450s sys 103m16.509s => 10.76% > > user+sys => 5.53% What are people doing to make buildworld so slow? I once optimized makeworld to take 75 minutes on a K6-233 with 64MB of RAM. Things have been pessimized a bit since then, but not signifcantly except for the 100% slowdown of gcc (we now build large things like secure but this is partly compensated for by not building large things like perl). Michael's K7-500 with 320MB (?) of RAM should be serveral times faster than the K6-233, so I would be unhappy if it took more than 75 minutes but would expect it to take bit more than 2 hours when well configured. > Here are some buildworld numbers of my own, from my system. > In my case, I am running on a single Athlon MP2000, with a > gig of memory. It does a buildworld without paging to disk. I have a similar configuration, except with a single Athlon XP1600 overclocked by 146/133 and I always benchmark full makeworlds. I was unhappy when the gcc pessimizations between gcc-2.95 and gcc-3.0 increased the makeworld time from about 24 minutes to about 33 minutes. The time has since increased to about 38 minutes. The latter is cheating slightly -- I leave out the DYNAMICROOT and RESCUE mistakes and the KERBEROS non-mistake. > Static sh, No -j: Dynamic sh, No -j: > real 84m31.366s real 86m22.429s => 2.04% > user 50m33.013s user 51m13.080s => 1.32% > sys 29m59.047s sys 33m04.082s => 10.29% > user+sys => 4.66% > > Static sh, -j2: Dynamic sh, -j2: > real 92m38.656s real 95m21.027s => 2.92% > user 51m48.970s user 52m29.152s => 1.29% > sys 32m07.293s sys 34m40.595s => 7.95% > user+sys => 3.84% This also shows why -j should not be used on non-SMP machines. Apart from the make -j bug that causes missed opportunties to run a job, make -j increases real and user times due to competition for resources, so it can only possibly help on systems where unbalanced resources (mainly slow disks) give too much idle time. My current worst makeworld time is almost twice as small as the fastest buildworld time in the above (2788 seconds vs 5071 seconds). From my collection of makeworld benchmarks: %%% Fastest makeworld on a Celeron 366 overclocked by 95/66 (2000/05/15): 3309.30 real 2443.75 user 488.68 sys Last makeworld on a Celeron 366 overclocked by 95/66 (2001/11/19): 4219.83 real 3253.04 user 667.64 sys Fastest makeworld on an Athlon XP1600 overclocked by 146/133 (2002/01/03): 1390.18 real 913.56 user 232.63 sys Last makeworld before gcc-3 on an Athlon XP1600 o/c by 143/133 (2002/05/09) (overclocking reduced and due to memory problems and some local memory-related optimizations turned off): 1532.99 real 1093.08 user 293.15 sys Early makeworld with gcc-3 on an Athlon XP1600 o/c by 143/133 (2002/05/12): 2268.13 real 1613.25 user 313.56 sys Fastest makeworld with gcc-3 an Athlon XP1600 overclocked by 146/133 (maximal overclocking recovered; memory increased from 512MB to 1GB, local memory-related optimizations turned on and tuned) (2003/03/31): 1929.02 real 1576.67 user 205.30 sys Last makeworld before <the default bloat became too large for me and I started stopping it for me by putting things like NO_KERBEROS in /etc/make.conf> on an Athlon XP1600 o/c by 143/133 (2003/04/29: 2012.75 real 1637.59 user 225.07 sys Makeworld with the defaults (no /etc/make.conf and no local optimizations in the src tree; mainly no pessimizing for Athlons by optimizing for PII's, and no building dependencies; only optimizations in the host environment (mainly no dynamic linkage) on an Athlon as usual (2003/05/06): Last recorded makeworld with local source and make.conf optimizations (mainly no dynamic linkage) on an Athlon as usual (2003/10/22): 2225.83 real 1890.64 user 256.33 sys Last recorded makeworld with the defaults on an Athlon as usual (2003/11/11): 2788.41 real 2316.49 user 357.34 sys %%% I don't see such a large slowdown from using a dynamic /bin/sh. Unrecorded runs of makeworld gave times like the following: 2262 real ... with local opts including src ones and no dynamic linkage 2290 real ... with same except for /bin/sh (only) dynamically linked The difference may be because my /usr/bin/true and similar utilities remain statically linked. Fork-exec expense depends mor on the exec than the fork. >From an old benchmark for fork-exec of tiny programs: %%% st = statically linked sh = dynamically linked The numbers are the real, user and system times (using a real time(1)). K6-233 ------ st-st 0.93 0.01 0.91 sh-st 1.75 0.02 0.70 st-sh 3.94 0.70 3.20 sh-sh 5.14 1.08 4.03 %%% > Buildworld, static, with no '-j', > executed /bin/sh 32,308 times. > > Buildworld, static, with '-j2', > executed /bin/sh 32,802 times. Turning on accounting must have pessimized things a bit. I think you are also using a pessimized kernel (with INVARIANTS and WITNESS). makeworld times should be dominated by the gcc hog, but your sys times are almost as large as your user times. The small 1% pessimization for my world and Warner's world is only small because gcc is so slow. As John Dyson said, even macro-benchmarks like makewold can provided numbers that are hard to interpret. My system is fairly well balanced, so the idle time is fairly small, but it is still large enough for lots of useful zeroing of pages to be done in the idle thread. Other measurements show that the idle thread used to take about 60 seconds (almost 3% of the makeworld time), but I optimized it to take about 30 seconds. If idle zeroing is turned off, then the real time for makeworld doesn't change much but the system time increases by approx. the same time that the idlezero thread took, provided there are enough idle cycles. Dynamic linkage is quite likely to disturb these times by requiring more zero pages. > On all attempts, I started out by doing: > rm -Rf /usr/obj/usr/src/* > sync ; sleep 1 ; sync ; sleep 1 ; sync > > before doing the 'make' command. I usually start up a 'script' I use: # /c async mounted cd /c/z || exit 1 rm -rf obj/* root/* chflags -R noschg obj root rm -rf obj/* root/* reboot ... # Sometimes: export __MAKE_CONF=/etc/nonesuch cd /wherever/src || exit 1 DESTDIR=/c/z/root \ MAKEOBJDIRPREFIX=/c/z/obj \ time -l make -s world > /tmp/world.out 2>&1 Rebooting doesn't affect the times much in relative terms (it minimizes them, short of the optimization of prefetching /usr/src), but it reduces the variance to less than a second provided the system is mostly idle. > Aside: building 5.1-"security" on this same hardware took > the following times: > real 54m10.092s [ 71.03% ] > user 41m39.121s [ 24.40% ] > sys 10m20.325s [ 210.69% ] > > And those times *are* with 'script' running, as well as a > perl-script which I use to summarize "interesting" data from > the output of a buildworld. So, those times include extra > overhead which is not included in the above buildworlds. > That's from a 'make -j3', and obviously has a static /bin/sh. Why so much faster? Now the times are only 20% larger than mine, > So, if you take that as the base, then the buildworld for > 5.2-release (using *static* /bin/sh and -j2) will see the > performance hits that I put in brackets. That probably seems > like a pretty horrifying hit, but remember that 5.1-release > did *not* build /rescue at all (not for me at least :-), and > that is probably a significant part of the increase. Builing rescue only accounts for about 2 minutes of the 86-54 difference. > For those who think I'm spoiled by fast hardware, please note > that all of the above has been done while doing just two > buildworlds and one buildkernel+installkernel on my sparc64 > box (and that second buildworld is not done yet...). So I > certainly am interested in how freebsd runs on "slower HW"! Single Athlon 1600-2000's are slow hardware :-). I plan to upgrade to an Athlon 2800 soon, but expect to be unhappy that this doesn't recover compile-time performace lost to gcc pessimisations. Moor's law seem to be hitting physical limits for CPU, so software bloat is now outrunning hardware improvements. Bruce
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20031127161940.I77322>