From owner-freebsd-performance@FreeBSD.ORG Mon Dec 8 09:43:07 2014 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 13D321B9; Mon, 8 Dec 2014 09:43:07 +0000 (UTC) Received: from mail-wg0-x229.google.com (mail-wg0-x229.google.com [IPv6:2a00:1450:400c:c00::229]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id B1EB87A9; Mon, 8 Dec 2014 09:43:06 +0000 (UTC) Received: by mail-wg0-f41.google.com with SMTP id y19so5743656wgg.0 for ; Mon, 08 Dec 2014 01:43:05 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:cc:content-type; bh=jtfu6pSC2KlrzcqpzWEScy8RbVrsr4JSBaPM6VTsl7g=; b=c19ymOvfYVx6BhfFfUxpvs2AZAWxJ+4mscjENQt4PBzTWgafyBGvwI5wKgP+mBDHeQ 2NF6hFeg118zuUSnxVRiDwVwV+NWhI+vl5wpcjRNXftTyufAqheF/EzNnofuuuce0xsm 9CVaLOJhnVsrmG1p95KksP5yvjr/YG8WZ3tAodL/cLk+CW3TIAjVS27o7jFtV9zsMuUj wIDw0jYLa6t/dyvCd1jzAFhpPLL28EcCAFIo9yW4K8Qew9CpVT+3VXp9wM/ebWqT7J1P rg6fL9VLSwQfVD4n/Yw3+i2eN4Tx6GyE46qaHiO0lG4K4idQz4lb0FLk3qzbSmboPf9c tk3Q== MIME-Version: 1.0 X-Received: by 10.180.74.68 with SMTP id r4mr22073776wiv.33.1418031785120; Mon, 08 Dec 2014 01:43:05 -0800 (PST) Received: by 10.216.151.130 with HTTP; Mon, 8 Dec 2014 01:43:05 -0800 (PST) Date: Mon, 8 Dec 2014 04:43:05 -0500 Message-ID: Subject: HyperThreading on Intel Xeon Haswell, a benefit? From: grarpamp To: freebsd-hardware@freebsd.org Content-Type: text/plain; charset=UTF-8 X-Mailman-Approved-At: Mon, 08 Dec 2014 12:33:51 +0000 Cc: freebsd-performance@freebsd.org, freebsd-smp@freebsd.org, freebsd-questions@freebsd.org X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 08 Dec 2014 09:43:07 -0000 HyperThreading on Intel Xeon Haswell, a benefit? What bits of FreeBSD are aware and can take proper advantage of Intel HTT, such as its thread/process schedulers (sched-BSD/ULE/...), etc? What system/app loads are, or are not, likely to benefit with today's HyperThreading CPU's? Kernel (ZFS/crypto/net/...) vs. Userland (apps)? Does anyone have performance stats for this current class of CPU to post comparing HT (enabled and disabled) while using more than four processes/threads in parallel? For instance, these two Intel Xeon Haswell four core CPU's are identical except for HT [1] (e3-1226v3 and e3-1246v3), and you can always turn HT off for testing. http://ark.intel.com/compare/80917,80916 There are some Core i3/i5/i7 Haswell parts with HT as well. http://ark.intel.com/Search/Advanced?s=t&ECCMemory=true&VTD=true&AESTech=true There don't seem to be many reviews of Xeon processors, let alone HT. And most Unix talk of HT seems dated by at least a few years and a couple processor generations. Also, was the HT cache leak security issue from a decade ago ever fixed in hardware? "Cache missing for fun and profit" http://www.daemonology.net/papers/ Being unsure of the best list, please direct replies to whichever is good. Thanks. [1] Plus 200MHz/6% clock per core and $59/27% market price bumps, but this thread is about whether or not there is any benefit to HT in current Intel CPU's such as Haswell, how much of one, and where. Once that is determined, then you can factor in other parameters like these to see if it's an overall value. From owner-freebsd-performance@FreeBSD.ORG Mon Dec 8 14:40:33 2014 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 59E5414D; Mon, 8 Dec 2014 14:40:33 +0000 (UTC) Received: from outpost1.zedat.fu-berlin.de (outpost1.zedat.fu-berlin.de [130.133.4.66]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id DADB0B36; Mon, 8 Dec 2014 14:40:32 +0000 (UTC) Received: from inpost2.zedat.fu-berlin.de ([130.133.4.69]) by outpost.zedat.fu-berlin.de (Exim 4.82) with esmtp (envelope-from ) id <1XxzUE-000eeK-A0>; Mon, 08 Dec 2014 15:40:30 +0100 Received: from p578a69f9.dip0.t-ipconnect.de ([87.138.105.249] helo=prometheus) by inpost2.zedat.fu-berlin.de (Exim 4.82) with esmtpsa (envelope-from ) id <1XxzUE-003Nid-4l>; Mon, 08 Dec 2014 15:40:30 +0100 Date: Mon, 8 Dec 2014 15:39:25 +0100 From: "O. Hartmann" To: grarpamp Subject: Re: HyperThreading on Intel Xeon Haswell, a benefit? Message-ID: <20141208153925.5df90587@prometheus> In-Reply-To: References: Organization: FU Berlin X-Mailer: Claws Mail 3.11.1 (GTK+ 2.24.25; amd64-portbld-freebsd11.0) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: quoted-printable X-Originating-IP: 87.138.105.249 Cc: freebsd-performance@freebsd.org, freebsd-smp@freebsd.org, freebsd-questions@freebsd.org, freebsd-hardware@freebsd.org X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 08 Dec 2014 14:40:33 -0000 On Mon, 8 Dec 2014 04:43:05 -0500 grarpamp wrote: > HyperThreading on Intel Xeon Haswell, a benefit? >=20 > What bits of FreeBSD are aware and can take proper advantage of > Intel HTT, such as its thread/process schedulers (sched-BSD/ULE/...), > etc? >=20 > What system/app loads are, or are not, likely to benefit with today's > HyperThreading CPU's? Kernel (ZFS/crypto/net/...) vs. Userland > (apps)? >=20 > Does anyone have performance stats for this current class of CPU > to post comparing HT (enabled and disabled) while using more than > four processes/threads in parallel? >=20 > For instance, these two Intel Xeon Haswell four core CPU's are > identical except for HT [1] (e3-1226v3 and e3-1246v3), and you > can always turn HT off for testing. > http://ark.intel.com/compare/80917,80916 >=20 > There are some Core i3/i5/i7 Haswell parts with HT as well. > http://ark.intel.com/Search/Advanced?s=3Dt&ECCMemory=3Dtrue&VTD=3Dtrue&AE= STech=3Dtrue >=20 > There don't seem to be many reviews of Xeon processors, let alone > HT. And most Unix talk of HT seems dated by at least a few years > and a couple processor generations. >=20 > Also, was the HT cache leak security issue from a decade ago ever > fixed in hardware? > "Cache missing for fun and profit" > http://www.daemonology.net/papers/ >=20 > Being unsure of the best list, please direct replies to whichever > is good. Thanks. >=20 > [1] Plus 200MHz/6% clock per core and $59/27% market price bumps, > but this thread is about whether or not there is any benefit to HT > in current Intel CPU's such as Haswell, how much of one, and where. > Once that is determined, then you can factor in other parameters > like these to see if it's an overall value. > _______________________________________________ > freebsd-performance@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-performance > To unsubscribe, send any mail to > "freebsd-performance-unsubscribe@freebsd.org" Hello. Well, I have a very narrow and some sort of naive experience, so be warned. =46rom my experience, mostly compiling FreeBSD sources from scratch (deleted /usr/obj, no sophisticated caching subsystems used), compiling world and kernel with as many threads allowed as possible (using value of possible threads via PARA=3D`sysctl -n hw.ncpu` and use then $PARA as variable for "make -j${PARA} ..."), a dual core, 4-thread CPU at 3.3 GHz takes ~ 60 minutes to build world, the same as a 4-core castrated i3 with disabled SMT. Switching off SMT on the dual core results in roughly 90 - 100 minutes compile time in my case, depending on the average load of the box while compiling. So, for the INTEGER performance, I see some real benefits of SMT. The picture is somehow different for the floating point performance. Using SMT in some FPU heavy caclulations on Sandy- and Ivy-Bridge CPUs (Haswell is not available as XEON to me at this very moment), I see only 10% - a max. of 25% (roughly estimated on some crude manually timed calculations!). There is some sligt benefit, even better with most recent Ivy-Bridge than Sandy-Bridge and bot latter seem to be superior in that matter to some Westmere 6-Core XEONS we used to use a couple of years ago (this may be related to some other architectural design improvements other than SMT, like the ring bus introduced in Sandy Bridge and improved in Ivy Bridge and maybe Haswell). In earlier times (pre Sandy-Bridge era) there were issues were it would be beneficial switching off SMT for heavy FPU load in some BLAS/LAPACK based benchmark scenarios, but this knowledge is years ago with older P4 designs and early Core i7. I lost track of that.=20 To make it short: I would highly recommend using/purchasing SMT capable CPUs since there is a benefit in performance. But at the end the performance gain has to meet the costs of a SMT capable XEON. As far as I know, most of the "value" XEONs do have SMT by default. There are some disadvantages regarding the amount of memory the kernel has to consume for each core (logical and/or physical) found, so systems with small amounts of physical RAM (< 8 GiB) could run into disadvantageous situations - if I'm not wrong. But for all FreeBSD users considering using ZFS fro professional/semiprofessional usage, 8 GiB at least is a must, otherwise the ZFS system is crippling performance, not SMT. oh =20 From owner-freebsd-performance@FreeBSD.ORG Mon Dec 8 16:16:01 2014 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 25C95CBF; Mon, 8 Dec 2014 16:16:01 +0000 (UTC) Received: from mail-wi0-x234.google.com (mail-wi0-x234.google.com [IPv6:2a00:1450:400c:c05::234]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id AAACF8E4; Mon, 8 Dec 2014 16:16:00 +0000 (UTC) Received: by mail-wi0-f180.google.com with SMTP id n3so5263344wiv.7 for ; Mon, 08 Dec 2014 08:15:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=PqOpWXG0dlqJaxyKysP4BixRbYlngaPff5htGfpjrr8=; b=0mogi1XcxHmlo4a+lkUT/y6UOL3iOtse8cBmVzyIfW0cxlq9oMMWgnNxSxv4ROSgzN WKL3A+KMOaHMDtBpvghE6NurOAgAf5Yir5XcYa9JWtSRYZ18ssbHoM6bjHNyBR5AM0Q+ 8VpcVFVL5KLk82YuteqVlwvODHDY08lHHbaDTKNkdKxkxWOAyWJJXREvAD4j1QDGhBPj VFX/JW7KAT5Zixm7FnH6KF3rcP85AYEvz1AxxZm63SEKIYhYxYpP7kPemlWLkFlsLr9l fuAEy6w73AIwp7fsd9ntv9xU93AoM3AFL5MdP3Jvy/isVO8V4HAQ+xZRcMjqlzeuxtnY TFrA== MIME-Version: 1.0 X-Received: by 10.194.24.103 with SMTP id t7mr46608991wjf.15.1418055358934; Mon, 08 Dec 2014 08:15:58 -0800 (PST) Sender: adrian.chadd@gmail.com Received: by 10.216.106.195 with HTTP; Mon, 8 Dec 2014 08:15:58 -0800 (PST) In-Reply-To: <20141208153925.5df90587@prometheus> References: <20141208153925.5df90587@prometheus> Date: Mon, 8 Dec 2014 08:15:58 -0800 X-Google-Sender-Auth: b8We23bshNPP5sOtfh0jrJI9qHI Message-ID: Subject: Re: HyperThreading on Intel Xeon Haswell, a benefit? From: Adrian Chadd To: "O. Hartmann" Content-Type: text/plain; charset=UTF-8 Cc: FreeBSD Mailing Lists , freebsd-hardware@freebsd.org, freebsd-smp@freebsd.org, grarpamp , FreeBSD Questions X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 08 Dec 2014 16:16:01 -0000 I've done some basic experimenting with SMT on network loads. For the most part, as long as you don't fill up one of the ports on the execution engine that's doing SMT, you're okay. I've found that a memcpy heavy load (read: normal, non-zero copy network traffic) brings SMT threads to their knees. A pair of threads gets as much work done in normal UDP transmit/receive as a single non-SMT thread. It looks like it's because the ports doing memory input/output are full and there's not really any other work that's being done. I think haswell still only has one store data port per core. :( -adrian On 8 December 2014 at 06:39, O. Hartmann wrote: > On Mon, 8 Dec 2014 04:43:05 -0500 > grarpamp wrote: > >> HyperThreading on Intel Xeon Haswell, a benefit? >> >> What bits of FreeBSD are aware and can take proper advantage of >> Intel HTT, such as its thread/process schedulers (sched-BSD/ULE/...), >> etc? >> >> What system/app loads are, or are not, likely to benefit with today's >> HyperThreading CPU's? Kernel (ZFS/crypto/net/...) vs. Userland >> (apps)? >> >> Does anyone have performance stats for this current class of CPU >> to post comparing HT (enabled and disabled) while using more than >> four processes/threads in parallel? >> >> For instance, these two Intel Xeon Haswell four core CPU's are >> identical except for HT [1] (e3-1226v3 and e3-1246v3), and you >> can always turn HT off for testing. >> http://ark.intel.com/compare/80917,80916 >> >> There are some Core i3/i5/i7 Haswell parts with HT as well. >> http://ark.intel.com/Search/Advanced?s=t&ECCMemory=true&VTD=true&AESTech=true >> >> There don't seem to be many reviews of Xeon processors, let alone >> HT. And most Unix talk of HT seems dated by at least a few years >> and a couple processor generations. >> >> Also, was the HT cache leak security issue from a decade ago ever >> fixed in hardware? >> "Cache missing for fun and profit" >> http://www.daemonology.net/papers/ >> >> Being unsure of the best list, please direct replies to whichever >> is good. Thanks. >> >> [1] Plus 200MHz/6% clock per core and $59/27% market price bumps, >> but this thread is about whether or not there is any benefit to HT >> in current Intel CPU's such as Haswell, how much of one, and where. >> Once that is determined, then you can factor in other parameters >> like these to see if it's an overall value. >> _______________________________________________ >> freebsd-performance@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-performance >> To unsubscribe, send any mail to >> "freebsd-performance-unsubscribe@freebsd.org" > > Hello. > > Well, I have a very narrow and some sort of naive experience, so be > warned. > > From my experience, mostly compiling FreeBSD sources from scratch > (deleted /usr/obj, no sophisticated caching subsystems used), compiling > world and kernel with as many threads allowed as possible (using > value of possible threads via PARA=`sysctl -n hw.ncpu` and use then > $PARA as variable for "make -j${PARA} ..."), a dual core, 4-thread CPU > at 3.3 GHz takes ~ 60 minutes to build world, the same as a 4-core > castrated i3 with disabled SMT. Switching off SMT on the dual core > results in roughly 90 - 100 minutes compile time in my case, depending > on the average load of the box while compiling. So, for the INTEGER > performance, I see some real benefits of SMT. > > The picture is somehow different for the floating point performance. > Using SMT in some FPU heavy caclulations on Sandy- and Ivy-Bridge CPUs > (Haswell is not available as XEON to me at this very moment), I see > only 10% - a max. of 25% (roughly estimated on some crude manually > timed calculations!). There is some sligt benefit, even better with > most recent Ivy-Bridge than Sandy-Bridge and bot latter seem to be > superior in that matter to some Westmere 6-Core XEONS we used to use a > couple of years ago (this may be related to some other architectural > design improvements other than SMT, like the ring bus introduced in > Sandy Bridge and improved in Ivy Bridge and maybe Haswell). > > In earlier times (pre Sandy-Bridge era) there were issues were it > would be beneficial switching off SMT for heavy FPU load in some > BLAS/LAPACK based benchmark scenarios, but this knowledge is years > ago with older P4 designs and early Core i7. I lost track of that. > > To make it short: I would highly recommend using/purchasing SMT > capable CPUs since there is a benefit in performance. But at the end > the performance gain has to meet the costs of a SMT capable XEON. As > far as I know, most of the "value" XEONs do have SMT by default. > > There are some disadvantages regarding the amount of memory the > kernel has to consume for each core (logical and/or physical) found, > so systems with small amounts of physical RAM (< 8 GiB) could run > into disadvantageous situations - if I'm not wrong. But for all > FreeBSD users considering using ZFS fro > professional/semiprofessional usage, 8 GiB at least is a must, > otherwise the ZFS system is crippling performance, not SMT. > > oh > _______________________________________________ > freebsd-questions@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-questions > To unsubscribe, send any mail to "freebsd-questions-unsubscribe@freebsd.org" From owner-freebsd-performance@FreeBSD.ORG Tue Dec 9 23:05:00 2014 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id B96C21EE; Tue, 9 Dec 2014 23:05:00 +0000 (UTC) Received: from mail-wi0-x22a.google.com (mail-wi0-x22a.google.com [IPv6:2a00:1450:400c:c05::22a]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 497D92EC; Tue, 9 Dec 2014 23:05:00 +0000 (UTC) Received: by mail-wi0-f170.google.com with SMTP id bs8so11688349wib.5 for ; Tue, 09 Dec 2014 15:04:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=8M4ZjLcM+5XpetWShd+SbrXov4i/y3vHTXT1bztH6Bc=; b=MmVtZvfcunztCwgdt73K9pLphEZWt+4MhHAYxE95CObi97Kh5CmetXI7yFZRbV4uqQ HQtQHipH5lDqP7jdR6QaNz1R/tXuUV5Xkrhxq1Q7ccKFsZr9+nQ6rlLW5byBiXP0gj7I Fgw4ZEOop2RdO9OmVQc6R8S7IivOJH+TGjZOWjGLPJEIL4MursGA+TkNsYyrcWdCCnOA hbO3+EeOPiPiTGY62Y02oXzY9SL7KwVDMhAlYwqgWlcyKJS4aL/t3IqknKya1HfCB4TS +AWbtgU74LyqbvSTq2HNqit8sguD7mmJcXnLTrBklseXUzKpvPtuLoLJVZJK16AsoMpy 5xPQ== MIME-Version: 1.0 X-Received: by 10.180.103.162 with SMTP id fx2mr8014463wib.42.1418166298671; Tue, 09 Dec 2014 15:04:58 -0800 (PST) Received: by 10.216.151.130 with HTTP; Tue, 9 Dec 2014 15:04:58 -0800 (PST) In-Reply-To: References: <20141208153925.5df90587@prometheus> Date: Tue, 9 Dec 2014 18:04:58 -0500 Message-ID: Subject: Re: HyperThreading on Intel Xeon Haswell, a benefit? From: grarpamp To: FreeBSD Mailing Lists Content-Type: text/plain; charset=UTF-8 X-Mailman-Approved-At: Wed, 10 Dec 2014 00:05:50 +0000 Cc: FreeBSD Questions , freebsd-hardware@freebsd.org X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 09 Dec 2014 23:05:00 -0000 > Ohartmann: > From my experience, mostly compiling FreeBSD sources from scratch > ... > a dual core, 4-thread CPU > at 3.3 GHz takes ~ 60 minutes to build world, the same as a 4-core > castrated i3 with disabled SMT. Switching off SMT on the dual core > ... > Using SMT in some FPU heavy caclulations on Sandy- and Ivy-Bridge CPUs > (Haswell is not available as XEON to me at this very moment), I see > Adrian: > I've done some basic experimenting with SMT on network loads. > ... > I've found that a memcpy heavy load (read: normal, non-zero copy Ohartmann, Adrian... Good introductory info. What were your CPU models / lines / sSpec numbers above? Anyone else? Expanding... This evaluation should not be strictly confined to Intel, after all, AMD has CMT which is similar to HTT (not clear whether it's on Opteron, FX or APU lines). Though it will probably be 2016 before AMD really capitalizes and shines on their full architecture vision. By then Intel will just shift a few gears to match. So we should probably stay on subject Intel HTT for now. http://wccftech.com/amds-high-performance-processor-cores-coming-2015-giving-modular-architecture/ http://en.wikipedia.org/wiki/Simultaneous_multithreading http://en.wikipedia.org/wiki/Hyper-threading http://forums.anandtech.com/showthread.php?t=2381524 My thought is that the available evaluations of SMT are all 'old'... discontinued processors, old compilers, old schedulers, etc, all dating back to the Intel P4 arch. So let's bring this current in terms of today's Intel Haswell and AMD APU/FX processors, with new tests and community data. (Opteron is still on an even 'older' architecture [refresh] compared to FX and APU.) http://anandtech.com/show/8742/amd-announces-carrizo-and-carrizol-next-gen-apus-for-h1-2015 http://wccftech.com/amd-berlin-server-apu-glimpse-upcoming-kaveri-apu-4-steamroller-cores-512-gcn-sps/ From owner-freebsd-performance@FreeBSD.ORG Wed Dec 10 13:20:20 2014 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id EF4D4F4C; Wed, 10 Dec 2014 13:20:19 +0000 (UTC) Received: from mail-ig0-x22c.google.com (mail-ig0-x22c.google.com [IPv6:2607:f8b0:4001:c05::22c]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id B418EF34; Wed, 10 Dec 2014 13:20:19 +0000 (UTC) Received: by mail-ig0-f172.google.com with SMTP id hl2so6384072igb.11 for ; Wed, 10 Dec 2014 05:20:19 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type; bh=nwuHLsGBfhIOwgIY7jtzObmlQb2X22oM6WPpbNylYMQ=; b=WNteIjP7CS3pBRJSQpgDDeSvqT2lnGM/7/yaD6yiToPyT3cGllALGs0pSUmTObtgf1 NWlDh7XILF1N+HCKhMkGmKx5/d4B7fiP2s92UI6vLVoymhQeMidLEJJs0FD9R8QzP85O +mCepv55J7gT98B/3PBKP2YN98oQ4OFPM5Pj5hUiXgu+Gt0kMIpCWZIuW8MTseGFlBXr vB3LWTsc4qA2S5INxdglZ2WkURlLv2TRb11CPt2kvaq/nWoMFeZwi93kBI+oMh2FfP73 lsSXK95n1RQZdimN4PvU0xxeHJ30teuB6ngFyvn3ymrqT6SswCb/cSYANW179NtIwyMf eiCw== X-Received: by 10.51.16.37 with SMTP id ft5mr7927393igd.6.1418217619146; Wed, 10 Dec 2014 05:20:19 -0800 (PST) MIME-Version: 1.0 Received: by 10.107.175.4 with HTTP; Wed, 10 Dec 2014 05:19:48 -0800 (PST) In-Reply-To: References: <20141208153925.5df90587@prometheus> From: Jia-Shiun Li Date: Wed, 10 Dec 2014 21:19:48 +0800 Message-ID: Subject: Re: HyperThreading on Intel Xeon Haswell, a benefit? To: Adrian Chadd Content-Type: text/plain; charset=UTF-8 Cc: freebsd-smp@freebsd.org, grarpamp , "freebsd-hardware@freebsd.org" , FreeBSD Mailing Lists , "O. Hartmann" , FreeBSD Questions X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 10 Dec 2014 13:20:20 -0000 On Tue, Dec 9, 2014 at 12:15 AM, Adrian Chadd wrote: > I've found that a memcpy heavy load (read: normal, non-zero copy > network traffic) brings SMT threads to their knees. A pair of threads > gets as much work done in normal UDP transmit/receive as a single > non-SMT thread. It looks like it's because the ports doing memory > input/output are full and there's not really any other work that's > being done. > > I think haswell still only has one store data port per core. :( Yes, Haswell has an additional store addr but still only one store data unit. http://www.tomshardware.com/reviews/core-i7-4770k-haswell-review,3521.html But I guess they'd argue that they meant to saturate memory channels with all available cores as possible first, and additional threads are only for last resort. And that's probably what the most schedulers do. I benchmarked it on a 4th gen i3. Buildkernel got 5~10% benefit IIRC. The best way to tell is still to conduct tests with your own workload. If the claimed 5% transistor cost brings 10% benefits, that's already a win. OTTH how much you paid for it is another story. - Jia-Shiun.