Date: Fri, 24 Jan 2003 18:19:07 -0800 From: Peter Wemm <peter@wemm.org> To: Marcel Moolenaar <marcel@xcllnt.net> Cc: John Baldwin <jhb@FreeBSD.org>, Nate Lawson <nate@root.org>, cvs-committers@FreeBSD.org, cvs-all@FreeBSD.org, Attila Nagy <bra@fsn.hu> Subject: Re: cvs commit: src/sys/i386/i386 identcpu.c initcpu.c locore.s Message-ID: <20030125021907.51B482A89E@canning.wemm.org> In-Reply-To: <20030125013344.GA54764@dhcp01.pn.xcllnt.net>
next in thread | previous in thread | raw e-mail | index | archive | help
Marcel Moolenaar wrote: > On Fri, Jan 24, 2003 at 05:25:27PM -0800, Peter Wemm wrote: > > John Baldwin wrote: > > > Maybe. Preliminary buildworld tests on 4.x seem to suggest that HTT > > > is slower than UP, but buildworld is just one application. HTT will > > > probably be optional on stable. On -current we will eventually use > > > ACPI to enumerate CPU's which means that we will respect BIOS settings > > > with regards to whether or not HTT is enabled. > > > > Did you remember to set machdep.cpu_idle_hlt to 1? Failing to set this > > will really suck because the logical cores will be spinning like crazy and > > stealing execution resources from functional tasks on the other part of the > > cpu. > > What about an increase in cache misses due to a degradation of locality > by having a larger, less coherent/dense working set? Sure, cache etc doesn't come free. But losing up to every second pipeline slot to the "idle" spinloop because we dont ever halt the cpu in SMP mode isn't going to help either. For example, with the default settings: # tcsh ./time.sh machdep.cpu_idle_hlt: 0 62.441u 11.219s 1:10.10 105.0% 1716+2807k 0+644io 0pf+0w 62.507u 11.304s 1:10.48 104.7% 1705+2804k 0+596io 0pf+0w 62.774u 10.689s 1:10.18 104.6% 1705+2798k 0+596io 0pf+0w 62.561u 11.314s 1:10.68 104.5% 1701+2791k 0+597io 0pf+0w And then after changing the sysctl: # tcsh ./time.sh machdep.cpu_idle_hlt: 1 47.184u 8.622s 0:53.79 103.7% 1724+2830k 4+669io 0pf+0w 46.670u 9.065s 0:53.19 104.7% 1724+2814k 0+634io 0pf+0w 47.239u 8.606s 0:53.80 103.7% 1728+2812k 0+625io 0pf+0w 46.955u 8.789s 0:53.87 103.4% 1731+2821k 0+656io 0pf+0w Personally, I think that avoiding a 32% slowdown speaks very well for turning the halt instuction on by default in the idle loop. This is a plain kernel build, entirely from memory. #! /bin/tcsh sysctl machdep.cpu_idle_hlt make -s clean depend time make -s make -s clean depend time make -s make -s clean depend time make -s make -s clean depend time make -s COPTFLAGS has got "-O -pipe" in /etc/make.conf. Note that I'm not using -jN. I cant test this machine without HTT enabled because it wont boot (except in UP mode). CPU: Intel(R) Xeon(TM) CPU 2.80GHz (2799.70-MHz 686-class CPU) Origin = "GenuineIntel" Id = 0xf27 Stepping = 7 Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE> Hyperthreading: 2 logical CPUs ... FreeBSD/SMP: Multiprocessor motherboard cpu0 (BSP): apic id: 0, version: 0x00050014, at 0xfee00000 cpu1 (AP): apic id: 1, version: 0x00050014, at 0xfee00000 cpu2 (AP): apic id: 6, version: 0x00050014, at 0xfee00000 cpu3 (AP): apic id: 7, version: 0x00050014, at 0xfee00000 io0 (APIC): apic id: 2, version: 0x000f0011, at 0xfec00000 io1 (APIC): apic id: 3, version: 0x000f0011, at 0xfec01000 io2 (APIC): apic id: 4, version: 0x000f0011, at 0xfec02000 io3 (APIC): apic id: 5, version: 0x000f0011, at 0xfec03000 Note that this is a SMP P4 Xeon, not a new HTT P4. Oh, and in case somebody asks about the -jN case.. cpu_idle_hlt=0, -j4 (default) 81.127u 15.299s 0:33.10 291.2% 1751+2771k 3+528io 0pf+0w 81.046u 15.483s 0:33.14 291.2% 1747+2773k 3+612io 0pf+0w cpu_idle_hlt=1, -j4 76.891u 13.749s 0:31.28 289.7% 1743+2745k 3+646io 0pf+0w 76.230u 14.105s 0:31.82 283.8% 1750+2755k 3+591io 0pf+0w Again, it is faster with a true halt rather than a spinloop. cpu_idle_hlt=0, -j6 (default) 84.083u 15.899s 0:29.54 338.4% 1764+2791k 3+629io 0pf+0w 84.790u 15.030s 0:29.75 335.5% 1759+2782k 3+606io 0pf+0w cpu_idle_hlt=1, -j6 81.572u 14.802s 0:29.59 325.6% 1754+2762k 3+689io 0pf+0w 82.642u 13.887s 0:29.10 331.6% 1764+2768k 3+625io 0pf+0w Not quite as significant, but still an improvement. I didn't try any larger -jN numbers. The last time I tried this on a non-HTT system, enabling the true halt caused a slight slowdown. But the machine used a lot less power and the room was cooler. :-] Cheers, -Peter -- Peter Wemm - peter@wemm.org; peter@FreeBSD.org; peter@yahoo-inc.com "All of this is for nothing if we don't go to the stars" - JMS/B5 To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe cvs-all" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20030125021907.51B482A89E>