From owner-svn-src-all@FreeBSD.ORG Wed Jul 25 12:29:38 2012 Return-Path: Delivered-To: svn-src-all@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 42F03106567B; Wed, 25 Jul 2012 12:29:38 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 13CFA8FC26; Wed, 25 Jul 2012 12:29:36 +0000 (UTC) Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua [212.40.38.101]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id PAA17235; Wed, 25 Jul 2012 15:29:34 +0300 (EEST) (envelope-from avg@FreeBSD.org) Message-ID: <500FE6AE.8070706@FreeBSD.org> Date: Wed, 25 Jul 2012 15:29:34 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:13.0) Gecko/20120625 Thunderbird/13.0.1 MIME-Version: 1.0 To: Konstantin Belousov References: <201207242210.q6OMACqV079603@svn.freebsd.org> <500F9E22.4080608@FreeBSD.org> <20120725102130.GH2676@deviant.kiev.zoral.com.ua> In-Reply-To: <20120725102130.GH2676@deviant.kiev.zoral.com.ua> X-Enigmail-Version: 1.4.2 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit Cc: svn-src-head@FreeBSD.org, Jim Harris , svn-src-all@FreeBSD.org, src-committers@FreeBSD.org Subject: Re: svn commit: r238755 - head/sys/x86/x86 X-BeenThere: svn-src-all@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "SVN commit messages for the entire src tree \(except for " user" and " projects" \)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 25 Jul 2012 12:29:38 -0000 on 25/07/2012 13:21 Konstantin Belousov said the following: > On Wed, Jul 25, 2012 at 10:20:02AM +0300, Andriy Gapon wrote: >> on 25/07/2012 01:10 Jim Harris said the following: >>> Author: jimharris >>> Date: Tue Jul 24 22:10:11 2012 >>> New Revision: 238755 >>> URL: http://svn.freebsd.org/changeset/base/238755 >>> >>> Log: >>> Add rmb() to tsc_read_##x to enforce serialization of rdtsc captures. >>> >>> Intel Architecture Manual specifies that rdtsc instruction is not serialized, >>> so without this change, TSC synchronization test would periodically fail, >>> resulting in use of HPET timecounter instead of TSC-low. This caused >>> severe performance degradation (40-50%) when running high IO/s workloads due to >>> HPET MMIO reads and GEOM stat collection. >>> >>> Tests on Xeon E5-2600 (Sandy Bridge) 8C systems were seeing TSC synchronization >>> fail approximately 20% of the time. >> >> Should rather the synchronization test be fixed if it's the culprit? > Synchronization test for what ? The synchronization test mentioned above. So, oops, very sorry - I missed the fact that the change was precisely in the test. I confused it for another place where tsc is used. Thank you for pointing this out. >> Or is this change universally good for the real uses of TSC? > > What I understood from the Intel SDM, and also from additional experiments > which Jim kindly made despite me being annoying as usual, is that 'read > memory barrier' AKA LFENCE there is used for its secondary implementation > effects, not for load/load barrier as you might assume. > > According to SDM, LFENCE fully drains execution pipeline (but comparing > with MFENCE, does not drain write buffers). The result is that RDTSC is > not started before previous instructions are finished. Yes, I am fully aware of this. > For tsc test, this means that after the change RDTSC executions are not > reordered on the single core among themself. As I understand, CPU has > no dependency noted between two reads of tsc by RDTSC, which allows > later read to give lower value of counter. This is fixed by Intel by > introduction of RDTSCP instruction, which is defined to be serialization > point, and use of which (instead of LFENCE; RDTSC sequence) also fixes > test, as confirmed by Jim. Yes. I think that previously Intel recommended to precede rdtsc with cpuid for all the same reasons. Not sure if there is any difference performance-wise comparing to lfence. Unfortunately, rdtscp is not available on all CPUs, so using it would require extra work. > In fact, I now think that we should also apply the following patch. > Otherwise, consequtive calls to e.g. binuptime(9) could return decreased > time stamps. Note that libc __vdso_gettc.c already has LFENCE nearby the > tsc reads, which was done not for this reason, but apparently needed for > the reason too. > > diff --git a/sys/x86/x86/tsc.c b/sys/x86/x86/tsc.c > index 085c339..229b351 100644 > --- a/sys/x86/x86/tsc.c > +++ b/sys/x86/x86/tsc.c > @@ -594,6 +594,7 @@ static u_int > tsc_get_timecount(struct timecounter *tc __unused) > { > > + rmb(); > return (rdtsc32()); > } > This makes sense to me. We probably want correctness over performance here. [BTW, I originally thought that the change was here; brain malfunction] > @@ -602,8 +603,9 @@ tsc_get_timecount_low(struct timecounter *tc) > { > uint32_t rv; > > + rmb(); > __asm __volatile("rdtsc; shrd %%cl, %%edx, %0" > - : "=a" (rv) : "c" ((int)(intptr_t)tc->tc_priv) : "edx"); > + : "=a" (rv) : "c" ((int)(intptr_t)tc->tc_priv) : "edx"); > return (rv); > } > It would correct here too, but not sure if it would make any difference given that some lower bits are discarded anyway. Probably depends on exact CPU. And, oh hmm, I read AMD Software Optimization Guide for AMD Family 10h Processors and they suggest using cpuid (with a note that it may be intercepted in virtualized environments) or _mfence_ in the discussed role (Appendix F of the document). Googling for 'rdtsc mfence lfence' yields some interesting results. -- Andriy Gapon